Menu

Tech blog (34)

Moving Up from SVN: Experiences in Migrating to Git and Mercurial

(Note: this text is a result of a couple of months of research… Only when I finished the migration and got to the point where things are running smoothly I got around to finishing this post. Some of this information may be a bit outdated, but from what I’ve seen things haven’t moved much in the meantime. Anyway, I added comments in the text where the situation might have changed).

Now that I think about it, there was one crucial reason to abandon SVN and upgrade to something more powerful: merging. My company now has a lot of parallel development (several customized instances of an application), and SVN’s support for this is not as good as it should be – that is, not as good as the competition’s. In SVN you can separate a development branch and keep it parallel to the main one, and merge-reintegrate it back. SVN will remember what revision it merged so that it doesn’t try to do it again (which would otherwise produce merge conflicts). But that seems to be the limit of its capabilities: a bit more complicated structure and merge produces so much conflicts that it looks the same as when everything is done manually. Contrast that to a more recent VCS like Git or Mercurial, where the essential feature is the ability to manipulate changesets, stack them one upon another or restructure their connections: if you can merge a set of changes from one branch to another, then continue developing both of them, do a back-merge (even though you shouldn’t), and the system doesn’t produce an enormous amount of conflicts, it gives you more power.

Of course, there are additional advantages that both Git and Mercurial give over SVN, but in our case that was just a bonus, not a necessity. I believe that many developers think likewise. SVN is good enough at what it does, the problem is when you need something that it wasn’t designed to do.

So which one to choose? It seems that Git, Mercurial and Bazaar are the most popular. People say that of the three Git is the most powerful because it was written by Linux kernel developers who do complicated merges every day. Ok, so I seemed natural to choose that one.

Git

The first impression that I got is that Linux kernel developers are guys who talk between themselves in hexadecimal and to the rest of the world in mnemonics. Git had some documentation but not nearly enough explanations on how things work. The error messages the tools produced were cryptic – and not only that, they were formatted to be viewed in the console so when they are displayed by GUI tools they tend to get messed up. Ok, so Linux kernel developers think GUIs are lame… While we’re at it, error messages are also misleading: I spent a lot of time trying to diagnose what’s wrong with my repository because the clone operation kept producing a warning about the remote head pointing to a non-existent revision or branch (and yeah, it was a fatal warning – a new term, at least for me - because the command didn’t do anything but gave only a warning). It turned out at the end that I was using a wrong URL for the repository – that is, I was trying to clone a nonexistent repository. I agree, in a nonexistent repository the nonexistent head points to a nonexistent revision, but it’s a bit of a roundabout way to say it, isn’t it?

But these are the things that can – and most probably will – be addressed as the tools mature. Since there are so much Git users, it’s surely reliable enough. I don’t mind some rough edges, I’ve been there with SVN and it was worth it (it was much better to cope with occasional bugs in SVN than to start with CVS and then migrate to SVN).

Ok, so how do we get Git to work on a Windows server? The answer that seemed complete enough hinted that I should simulate a Linux environment, install the shell, something like a SSH daemon and everything else. In other words, it’s not supposed to run on Windows but it can. Ok, I tried – there are several variations to the theme and each one had a glossed-over part that needs to be researched (something like “at this point you need to import the created certificate into Putty” – well, Putty doesn’t want to read this file format). And it didn’t help that Git itself doesn’t always tell you the real reason something doesn’t work, as I already mentioned. Moreover, it was for some reason allergic to empty repositories – unless I commit something locally, it won’t work. And the repository got easily screwed up while experimenting – that is, it got into a state where I didn’t know what to do with it and it was easier to create a new one.

At this point it was clear that the client tooling also leaves a lot to be desired – there was TortoiseGit that locked occasionaly on clone/pull (and it never displayed the progress of the operation so you never really knew if it was doing something), there was Git GUI that was a bit more stable, and there was Git Bash that was the most reliable. (One interesting sidenote is that at one point I managed to get three different error messages for doing the same – clone - operation with these three tools). One thing, though, the Bash in Git Bash is probably the best part of the package, I had almost forgotten the comfort of working in a Unix shell. Command prompt is light years behind it.

I did get the server to work, though, after a week of running around in circles and repeating the same stuff with slight variations. At the end I was able to create a new repository, clone it, make changes, commit, push, pull… Everything worked until I restarted the server. Or so it seems – when I tried to clone the same repository afterwards, it started again producing weird error messages. I didn’t know what else changed except for the restart (which shouldn’t have affected it – and probably didn’t, but what else? It’s an isolated environment). If I didn’t have the cloned repository with revision history I would have doubted I actually succeeded in doing it… Ok, so it’s also fragile.

Then I tried to find an alternative: there’s a couple of PHP scripts that wrap Git server functionality, but they don’t seem to work on Windows (I tried them in Apache). There’s Bonobo Git server that is written in .Net – well, I never looked at it seriously, how’s a .Net Git server going to work when their own true-to-the-original-idea SSH configuration doesn’t? But it does work – it also needs a bit of tinkering (you have to Google-translate a page in Chinese to get the info on how to really do it, WebDAV etc.) but the installation is amazingly painless: it takes a couple of hours, which is nothing compared to a week wasted for the previous experiment.

So, on to the next step: migration. I migrated a test folder from SVN with no trouble. Tried a bigger one, something under a 1000 revisions – well, it finished in a couple of days, I suppose it’s tolerable. Finally, tried to convert the central project – and when after two weeks of non-stop import it managed only 2000 of the repository’s 5000 revisions, I gave up. Back to Google: why is it so slow, how to speed it up? Turns out that the Git client is developed on Windows XP and that it should probably work well there. As it did: it managed to get all 5000 revisions in a couple of hours. Ok, now this I didn’t like. How can a serious tool not work on newer Windows versions? They said, it’s slow because of the UAC (introduced on Windows Vista), the UAC slows everything down. Well, it’s not like Vista was released yesterday. If this problem exists for years, should I expect a solution ever to appear? More research hinted that Linux kernel programmers think Windows users are lame. So – Git was slow on Windows newer than XP. TortoiseGit seems to execute the same Git shell commands, so it’s the same. I found Git Extensions in the meantime, which is supposed to be independent – but it didn’t even handle HTTP repositories.

In the meantime, I tried cloning the big repository I converted – big as in 200 megabytes big – and it was, as expected, slow. But, I don’t really know which one was to blame here – seems like Bonobo server choked on the large repository since it nailed the CPU at 100% and produced around 40 bytes per second to the client (possibly it was just some kind of sync messages and no data at all). Ok – Bonobo is open source and was built around GitSharp (or something like that) Git library written in .Net. What if I tried myself to update the library and compile the server? Well – GitSharp is discontinued at version 0.3. They’ve all gone to some new library written in C.

Ok, that was enough. After three weeks completely wasted, I gave up on Git.

(Update: Bonobo was since upgraded to version 1.1. I looked at the change log hoping to see a note about moving away from GitSharp, but it didn’t seem to happen. So as far as I know, this performance issue may still be present – nevertheless Bonobo seems the most promising solution for a Windows Git server).

Mercurial

So? Should I look at Mercurial? The Mercurial supporters seem a bit shy – that is, compared to Git fanatics who shout GIT! GIT! GIT! GIT! at each possible occasion, there occasionally appears one that says “or, you could try Mercurial”.

Well, the first look revealed the difference: I can download everything I need from a single web page. Mercurial basic installation – right there, server support included (as a script that is installed in a web server). TortoiseHg, right below it – even it has an ad-hoc server built in! Python extensions – do I need this? So I thought – no, this is too easy. Let’s try something unheard of in Git – RhodeCode. It’s a server implementation that is in fact a full-featured web application. Seems very nice, but due to some incompatibility in a crypto library, very hard to get installed on Windows: it took a lot of workarounds, I ended up installing Visual C++ Express 2008 (it has to be 2008, 2010 doesn’t cut it) and another kind of simulated shell environment (MingW32) to try to get the installer to compile it from source but it was impossible. That is: impossible on Windows 2008 R2, 64-bit. The RhodeCode developers say they’re working on making it more compatible with Windows (and for one thing changing the crypto library), and I found that I believe them, so I’ll be coming back to it. (In the meantime, they’ve released a couple of versions with windows-specific bugfixes, it might be worth it to check it out again).

In the vanilla Mercurial instalation there’s a script called hgweb.cgi that contains the full functionality needed to get a server running. A bit of tinkering is needed to make it run inside IIS – and there are a couple of slightly outdated tutorials on how to do this. I found out that the best combination is to download Mercurial for Python - so, no Mercurial or TortoiseHG on the server. This download says in its name for which version of Python it was written, and that version of Python is the second thing needed. Once both are installed, it is sufficient to put the hgweb.cgi in a virtual server in the IIS, add a hgweb.config and a web.config file, configure the IIS (basic authentication and what have ya), and set permissions on the folders – including the repositories. It took less than one day, research included, to get it up and running.

The client tooling seems better than SVN. TortoiseSVN (in fact, TortoiseCVS) was a breakthrough idea – a VCS integrated into windows, allowing you to version-control any folder on your disk. Well, TortoiseHG went one step further and actually improved the user experience. It has its bugs – more than SVN – but also has a lot more features. The whole thing was written in Python and seems to have a good API because a lot of plugins have been written for it, and TortoiseHG includes the most important ones. At this point I had to install TortoiseHG on the server because that’s the only way to get the subversion converter plug-in. The other way would be to install the Python module, but it cannot be done: first of all, they’ll tell you to install Subversion for Python (which is quite simple, there’s a pre-packaged downloadable setup), but when you do and get an error from the convert extension, you’ll find out that you don’t need that package but something called SWiG. But SWiG doesn’t have anything to do with Subversion – you have to download the Subversion package from Subversion site which moved to Apache leaving the Python part behind and the best you can do is find a source somewhere and compile it, but nobody says how it’s done.

On to converting the repositories – for one thing, it’s speed is normal, on any Windows. As fast as Git on XP, maybe even faster. So I was encouraged to do the thing I never even got around to thinking about with Git – and that is splitting the repositories into individual projects. It did take a week – the details of it will be a subject of a future post – but in the end it produced much less pain then Git.

Conclusion

Looking at it now, I think that Mercurial is unfairly underrated, and this seems to be due to the loud chanting of Git proponents. They say Git is blazingly fast – well, if you calculate the average performance on all operating systems, I think that on the average it’s either very slow or Windows users don’t use it at all. On Windows XP, Mercurial is at least as fast as Git, and on newer Windows versions Git is usable only for small repositories. Git is largely badly documented (which is getting better but Mercurial is way ahead – suffice it to say that I understood some Git concepts only when I read the Mercurial documentation). Git tooling, generally speaking, sucks – it was designed to be used in a Unix shell and nowhere else – and on Windows it is total crap. On the other hand, Mercurial tooling on Windows is, after this experience with Git, impressive. My conclusion is that for Git I would have to require special skills when employing programmers - “C# and Bash knowledge required”, how sane is that? Ok, I’m joking but it’s not far from truth: there has to be at least one Git specialist on the team when ordinary developers get stuck. With Mercurial, the usual SVN-like skill level should be enough for all because it’s not that easy to get stuck. And, after all this, I’m inclined to think that the story about Git being so great should be taken with a grain of salt since everything I’ve seen so far from it seems to tell exactly the opposite.

How to fix comment character encoding in TortoiseHG

I imported a couple of repositories from SVN into Mercurial and discovered that characters not present in the standard ASCII table have become mangled in the comments… Or at least they looked mangled in the console output as well as in TortoiseHG – now, the console is not that important, but how to fix this in Tortoise?

I tried searching for a solution on how to modify the import process and found nothing. Tried to add a new comment to the repository with a non-ASCII character and got a Python error (“expected string, QString found”). Some said that I should change my Windows’ default system encoding (which is English(US)), and that solved the problem but I would have liked a simpler solution, since changing the default encoding used to cause other problems in the past. I managed to find a couple of workarounds that solve the problem of console display and involve setting environment variables… Would it work for Tortoise? Actually: it does. The solution is simple: go to (this is on Windows 7) Control Panel – System – Advanced System Settings – Environment Variables, add a new user variable called HGENCODING and set it’s value to either “utf-8” or your code page (mine is “cp1250”). TortoiseHg respects this. There’s a slight difference in the two values, though, because the diff viewer doesn’t really like “utf-8”, it prefers the concrete code page. There may be other components that behave like this, so I suppose that setting the code page is the optimal solution.

Repository moved temporarily to '/viewvc/nhibernate/trunk/'; please relocate

When trying to switch my local Subversion copy of the NHibernate source to a different tag (from 3.1GA to trunk, in this case), I got this error:

Repository moved temporarily to '/viewvc/nhibernate/trunk/'; please relocate

The frustrating thing was that I was trying to relocate to exactly this url. And if I tried others, it said that I should relocate to them… I searched the net in vain for the solution, the only information I got is that I should re-configure my apache server (thanks a bunch!)

The problem is, in fact, simple: the URL is wrong. I thought I could just copy the repository’s URL from my web browser, like I do with other sites. Not here: there’s a separate entry for direct SVN access. So instead of using this url:

http://nhibernate.svn.sourceforge.net/viewvc/nhibernate/

use this one:

https://nhibernate.svn.sourceforge.net/svnroot/nhibernate

It does seem like a simple problem but the solution wasn’t so easy to find.

Visual Studio 2010 cannot reference ManagedDTS dll from SQL Server 2005

A C# project that worked with Visual Studio 2008, when converted to Visual Studio 2010, starts complaining about not being able to find classes defined in Microsoft.SQLServer.ManagedDTS.dll and others. These dlls are contained in the SQL Server 2005. If you try to remove the reference and add it again, the errors disappear in the editor, but appear again when you compile the solution. At the end of the jumble of compiler errors there is a small one that betrays the cause:

warning MSB3258: The primary reference "Microsoft.SQLServer.ManagedDTS, Version=9.0.242.0, Culture=neutral, PublicKeyToken=89845dcd8080cc91, processorArchitecture=MSIL" could not be resolved because it has an indirect dependency on the .NET Framework assembly "mscorlib, Version=2.0.3600.0, Culture=neutral, PublicKeyToken=b77a5c561934e089" which has a higher version "2.0.3600.0" than the version "2.0.0.0" in the current target framework.

The problem lies in the Microsoft.SQLServer.msxml6_interop.dll that references the beta version of the .Net framework 2.0. Yes, even after installing three service packs – and worse still, even if you install SQL Server 2008 it will remain there. Why? Apparently, there’s a newer msxml6_interop dll with this reference fixed but unfortunately it has the same version as the old one so it doesn’t replace it in the GAC. Talk about eliminating DLL hell.

But that’s not all, you cannot simply find the new dll and replace it in the GAC. The old one cannot be removed because it’s referenced by the Windows Installer. You have to use brute force, something like this: open the command prompt and try to find the real path to the assembly on the disk. (From Windows Explorer you cannot do this because it replaces the real GAC folder structure with a conceptual, flat view). So, CD to c:\Windows\Assembly and find the folder called Microsoft.SqlServer.msxml6_interop. In it, there will be another folder called something like 6.0.0.0__89845dcd8080cc91, and in it the dll we’ve been looking for. On my computer, the full path is

c:\windows\assembly\GAC_MSIL\Microsoft.SqlServer.msxml6_interop\6.0.0.0__89845dcd8080cc91

Ok, now you should be able to manipulate the dll directly and replace it with the new one. What I like to do in these cases is SUBST the folder and make it accessible from Windows Explorer. Type something like this -

SUBST x: c:\windows\assembly\GAC_MSIL\Microsoft.SqlServer.msxml6_interop\6.0.0.0__89845dcd8080cc91

- and you will be able to see the folder in Windows Explorer as a separate volume X:. From here you can delete the existing file and copy over the newer one. You can find the new one only if you have a machine where SQL Server 2008 is installed first – it’s in the same (or similar) place in the GAC. I used again the command prompt trick to get the file. (Note that I did everything as administrator, you might have to employ additional tricks to work around security).

Here’s a more detailed description with other possible solutions:

http://blogs.msdn.com/b/jason_howell/archive/2010/08/18/visual-studio-2010-solution-build-process-give-a-warning-about-indirect-dependency-on-the-net-framework-assembly-due-to-ssis-references.aspx

How to fix CAB to support dependencies across class hierarchy

The Composite UI Application Block’s Object Builder doesn’t support dependencies for same-named properties at different levels in the class hierarchy. If you add a dependency property which has the same name as a property in a base or derived class, only one of them will be initialized.

The reason for this is probably that the mechanism is based on the Type.GetProperties() method. This method doesn’t return all of the properties the class (and the base classes) contain – rather, it employs a “hide by name and signature” convention and gives only the topmost properties. So the first step we have to do is eliminate the GetProperties method. We do this by modifying the GetMembers() method of the PropertyReflectionStrategy (located in ObjectBuilder/Strategies/Property). It should look like this:

protected override IEnumerable<IReflectionMemberInfo<PropertyInfo>> GetMembers(IBuilderContext context, Type typeToBuild, object existing, string idToBuild)
{
    foreach (PropertyInfo propInfo in GetPropertiesFlattened(typeToBuild))
        yield return new PropertyReflectionMemberInfo(propInfo);
}

private IEnumerable<PropertyInfo> GetPropertiesFlattened(Type typeToBuild)
{
    for (Type t = typeToBuild; t != null; t = t.BaseType)
    {
        foreach (var pi in t.GetProperties(BindingFlags.Public | BindingFlags.Instance | BindingFlags.DeclaredOnly)) // get only properties in this class
        {
            yield return pi;
        }
    }
}

The next problem arises because the PropertyReflectionStrategy keeps a dictionary of existing properties. It’s indexed by property name, which would eliminate our duplicate properties. We have to change it to use the full path - property name prefixed by class name and namespace. I did this by adding a property called FullName to the IReflectionMemberInfo and ReflectionMemberInfo (found in ObjectBuilder/Strategies).

In IReflectionMemberInfo, add:

string FullName { get; }

In ReflectionMemberInfo, add:

public string FullName
{
    get { return memberInfo.DeclaringType.FullName + "." + memberInfo.Name; }
}

There’s a PropertyReflectionMemberInfo class embedded in the PropertyReflectionStrategy, we have to add a similar property to it:

public string FullName
{
    get { return prop.DeclaringType.FullName + "." + prop.Name; }
}

Ok – next, in the PropertyReflectionStrategy we rewire the dictionary to use this new property. Go to AddParametersToPolicy method and change this -

if (!result.Properties.ContainsKey(member.Name))
    result.Properties.Add(member.Name, new PropertySetterInfo(member.MemberInfo, parameter));

- to this -

if (!result.Properties.ContainsKey(member.FullName))
    result.Properties.Add(member.FullName, new PropertySetterInfo(member.MemberInfo, parameter));

One last glitch to fix: go to CompositeUI/WorkItem class, and in the BuildUp() method change this -

propPolicy.Properties.Add("Parent", new PropertySetterInfo("Parent", new ValueParameter(typeof(WorkItem), null)));

- to this -

propPolicy.Properties.Add("Microsoft.Practices.CompositeUI.WorkItem.Parent", new PropertySetterInfo("Parent", new ValueParameter(typeof(WorkItem), null)));

Without this modification, the root WorkItem would have its Parent property reference itself, and it would not be recognized as root WorkItem because it’s Parent property is not null. As a consequence, some initialization methods would not get called and almost nothing would work.

How to make LINQ to NHibernate eager-load joined properties like the Criteria API

In terms of “lazyness” of a property, there are currently three different ways in which it can be mapped in NHibernate:

  • lazy=”false” means that it’s not lazy at all – the property’s content will be loaded along with its owner object. This means additional data is always loaded when you load an object, and may mean additional sql queries, too.
  • lazy=”proxy” means that the object contained in the property is loaded when any of its public members is accessed. This property will contain an instance of a proxy, which is an object that knows how to initialize itself on-access. It performs the initialization not by loading its properties but by loading an instance of a real object and redirecting its properties and methods to it. This is why everything on the class that is to be proxied needs to be virtual: proxy is an instance of a class derived from it, which is dynamically generated and has every public member overridden to support lazy-loading.
  • lazy=”no-proxy” means that the property is lazy-loaded, but without a proxy. From what I’ve seen, here the lazy property iself is manipulated on the owner object so that it facilitates on-access loading. In any case, there’s no proxy and no duplicate instances. This feature is currently (in v3.0.0) buggy and it seems to work the same as the first option (lazy=”false”), just as it did in 2.0 when it was unsupported.

Each of the options has its bad sides: with proxies, you get duplicate objects and must make everything virtual on your data classes. In the non-lazy option, for each eager property a join is usually added in the sql query so that the property’s values are loaded at the same time, and the same goes for the property’s properties etc. Even worse, HQL queries don’t respect this joining method, they load the main table in one SQL query and then execute an additional query or two (or dozen) for each record to collect its related data – this is called the “N+1 selects” problem. Needles to say, using HQL for such queries is madness: in these cases, it is best to switch to Criteria API which does the joins properly.

And the bad sides of the no-proxy option? It doesn’t work… Other than that, it seems the perfect solution: no joins, no duplicate objects. If you ask me, I don’t want my data to be loaded on-demand at all. I don’t want the application to decide when it will load its data: if I fill a datagrid with one hundred objects and then the grid triggers lazy-loading on each of the objects in turn, this will create chaos. No, I want the application to break if it accesses data that was not explicitly loaded. But with the current implementation I have no choice: it’s either eager or proxy, and I’m choosing eager, for better or worse.

How about LINQ queries? In 2.x it was implemented over the Criteria API which means it knew how to join-load additional records. Not so in 3.x: now it behaves like HQL, N+1 selects all over the place.

So, what is there to do? It seems the only option left is to write all queries with explicit fetch statements for every non-lazy property… This would definitely solve the execution performance issue, but development performance would suffer: if I add a new non-lazy property, I’d have to rewrite all queries where it appears.

Ok, but if we’re using LINQ, it’s a dynamical query, right? It can be modified to include all required fetches. After some research, it turns out that there’s a solution that (at least on the outside) looks even elegant: use an extension method to do this. So, you would do something like:

session.Query<Person>().Where(…).EagerFetchAllNonLazyProperties()

or:

(from p in session.Query<Person> where … select …).EagerFetchAllNonLazyProperties()

Here’s one way this method could be implemented. Note that this is a somewhat hacked implementation and that there are probably some unsupported cases – one thing that is suspicious to me is that Criteria API joined the eager properties recursively while only the first level is covered here, so be careful. But it’s a good start... Preliminary tests were very promising ;).

public static IQueryable<TOriginating> EagerFetchAll<TOriginating>
  (this IQueryable<TOriginating> query)
{
  // hack the session reference out of the provider - or is
  // there a better way to do this?
  ISession session = (ISession)typeof(NhQueryProvider)
    .GetField("_session", System.Reflection.BindingFlags.Instance
      | System.Reflection.BindingFlags.NonPublic)
    .GetValue(query.Provider);

  IClassMetadata metaData = session.SessionFactory
    .GetClassMetadata(typeof(TOriginating));

  for(int i = 0; i < metaData.PropertyNames.Length; i++)
  {
    global::NHibernate.Type.IType propType = metaData.PropertyTypes[i];

    // get eagerly mapped associations to other entities
    if (propType.IsAssociationType && propType.IsEntityType
      && !metaData.PropertyLaziness[i])
    {
      ParameterExpression par = Expression.Parameter(typeof(TOriginating), "p");

      Expression propExp = Expression.Property(par, metaData.PropertyNames[i]);

      Expression callExpr = Expression.Call(null,
        typeof(EagerFetchingExtensionMethods).GetMethod("Fetch")
          .MakeGenericMethod(typeof(TOriginating), propType.ReturnedClass),
        // first parameter is the query, second is property access expression
        query.Expression, Expression.Lambda(propExp, par)
      );

      LambdaExpression expr = Expression.Lambda(callExpr, par);

      Type fetchGenericType = typeof(NhFetchRequest<,>)
        .MakeGenericType(typeof(TOriginating), propType.ReturnedClass);
      query = (IQueryable<TOriginating>)Activator.CreateInstance
        (fetchGenericType, query.Provider, callExpr);
    }
  }

  return query;
}

Lazy developer exception: “Operation is not valid due to the current state of the object.”

If you ever encountered this exception and wondered what it means, here’s the answer: it doesn’t mean anything. This is the default message text of the InvalidOperationException. If an invalid operation exception contains this text, someone was too lazy to supply a real error message when throwing an exception and did just “throw new InvalidOperationException()”.

I wasted hours trying to figure out why the Sync Framework’s FileSyncProvider keeps complaining about the state of an object, which object’s state this is etc. only to find out that someone at Microsoft was loath to spend one minute to at least write a meaningful message. Hopefully, this post will save someone else’s time.

Powershell script to zip/rar files in subfolders and delete old ones

This is a slight change from the usual “pure programming” stuff, but I’ve been looking for a complete solution for this one and was unable to find it, so why not. A change of pace is good sometimes.

The problem is this: I have a script that backs up my database server. It creates one folder for each database and puts a new file into it every day. I want to rar each file (with a password), delete the original and delete all files in the folder except the two newest.

Here’s the script – you may recognize parts of it from other scripts, but unfortunately I can’t remember the URL’s where I picked these pieces up, sorry… Google around for solutions to this problem and you’ll probably find them.

#Powershell Script to recurse input path looking for .bak files, move them into a rar archive
# and delete all archives in each folder except the newest two. 

$InputPath = $args[0] 

if($InputPath.Length -lt 2)
{
    Write-Host "Please supply a path name as your first argument" -foregroundcolor Red
    return
}
if(-not (Test-Path $InputPath))
{
    Write-Host "Path does not appear to be valid" -foregroundcolor Red
    return
} 

$BakFiles = Get-ChildItem $InputPath -Include *.bak -recurse
Foreach ($Bak in $BakFiles)
{
  $ZipFile = $Bak.FullName -replace ".bak", ".rar"
  if (Test-Path $ZipFile)
  {
      Write-Host "$ZipFile exists already, aborted." -foregroundcolor Red
  }
  else
  {
    & "C:\Program Files\WinRAR\winrar.exe" m -m1 -pyourpasswordhere "$ZipFile" "$Bak" | Out-Null 

    if(Test-Path $ZipFile)
    {
      # Keep two newest files in the directory based on creation time
      $path = split-path $ZipFile -Parent
      $total= (ls $path).count - 2 # Change number 2 to whatever number of files you want to keep
      $path = $path + "\*.rar";
      ls $path |sort-object -Property {$_.CreationTime} | Select-Object -first $total | Remove-Item -force
    }
  }
}
Subscribe to this RSS feed