Moving Up from SVN: Experiences in Migrating to Git and Mercurial
- Published in Development Environment
- Written by Boris Drajer
(Note: this text is a result of a couple of months of research… Only when I finished the migration and got to the point where things are running smoothly I got around to finishing this post. Some of this information may be a bit outdated, but from what I’ve seen things haven’t moved much in the meantime. Anyway, I added comments in the text where the situation might have changed).
Now that I think about it, there was one crucial reason to abandon SVN and upgrade to something more powerful: merging. My company now has a lot of parallel development (several customized instances of an application), and SVN’s support for this is not as good as it should be – that is, not as good as the competition’s. In SVN you can separate a development branch and keep it parallel to the main one, and merge-reintegrate it back. SVN will remember what revision it merged so that it doesn’t try to do it again (which would otherwise produce merge conflicts). But that seems to be the limit of its capabilities: a bit more complicated structure and merge produces so much conflicts that it looks the same as when everything is done manually. Contrast that to a more recent VCS like Git or Mercurial, where the essential feature is the ability to manipulate changesets, stack them one upon another or restructure their connections: if you can merge a set of changes from one branch to another, then continue developing both of them, do a back-merge (even though you shouldn’t), and the system doesn’t produce an enormous amount of conflicts, it gives you more power.
Of course, there are additional advantages that both Git and Mercurial give over SVN, but in our case that was just a bonus, not a necessity. I believe that many developers think likewise. SVN is good enough at what it does, the problem is when you need something that it wasn’t designed to do.
So which one to choose? It seems that Git, Mercurial and Bazaar are the most popular. People say that of the three Git is the most powerful because it was written by Linux kernel developers who do complicated merges every day. Ok, so I seemed natural to choose that one.
Git
The first impression that I got is that Linux kernel developers are guys who talk between themselves in hexadecimal and to the rest of the world in mnemonics. Git had some documentation but not nearly enough explanations on how things work. The error messages the tools produced were cryptic – and not only that, they were formatted to be viewed in the console so when they are displayed by GUI tools they tend to get messed up. Ok, so Linux kernel developers think GUIs are lame… While we’re at it, error messages are also misleading: I spent a lot of time trying to diagnose what’s wrong with my repository because the clone operation kept producing a warning about the remote head pointing to a non-existent revision or branch (and yeah, it was a fatal warning – a new term, at least for me - because the command didn’t do anything but gave only a warning). It turned out at the end that I was using a wrong URL for the repository – that is, I was trying to clone a nonexistent repository. I agree, in a nonexistent repository the nonexistent head points to a nonexistent revision, but it’s a bit of a roundabout way to say it, isn’t it?
But these are the things that can – and most probably will – be addressed as the tools mature. Since there are so much Git users, it’s surely reliable enough. I don’t mind some rough edges, I’ve been there with SVN and it was worth it (it was much better to cope with occasional bugs in SVN than to start with CVS and then migrate to SVN).
Ok, so how do we get Git to work on a Windows server? The answer that seemed complete enough hinted that I should simulate a Linux environment, install the shell, something like a SSH daemon and everything else. In other words, it’s not supposed to run on Windows but it can. Ok, I tried – there are several variations to the theme and each one had a glossed-over part that needs to be researched (something like “at this point you need to import the created certificate into Putty” – well, Putty doesn’t want to read this file format). And it didn’t help that Git itself doesn’t always tell you the real reason something doesn’t work, as I already mentioned. Moreover, it was for some reason allergic to empty repositories – unless I commit something locally, it won’t work. And the repository got easily screwed up while experimenting – that is, it got into a state where I didn’t know what to do with it and it was easier to create a new one.
At this point it was clear that the client tooling also leaves a lot to be desired – there was TortoiseGit that locked occasionaly on clone/pull (and it never displayed the progress of the operation so you never really knew if it was doing something), there was Git GUI that was a bit more stable, and there was Git Bash that was the most reliable. (One interesting sidenote is that at one point I managed to get three different error messages for doing the same – clone - operation with these three tools). One thing, though, the Bash in Git Bash is probably the best part of the package, I had almost forgotten the comfort of working in a Unix shell. Command prompt is light years behind it.
I did get the server to work, though, after a week of running around in circles and repeating the same stuff with slight variations. At the end I was able to create a new repository, clone it, make changes, commit, push, pull… Everything worked until I restarted the server. Or so it seems – when I tried to clone the same repository afterwards, it started again producing weird error messages. I didn’t know what else changed except for the restart (which shouldn’t have affected it – and probably didn’t, but what else? It’s an isolated environment). If I didn’t have the cloned repository with revision history I would have doubted I actually succeeded in doing it… Ok, so it’s also fragile.
Then I tried to find an alternative: there’s a couple of PHP scripts that wrap Git server functionality, but they don’t seem to work on Windows (I tried them in Apache). There’s Bonobo Git server that is written in .Net – well, I never looked at it seriously, how’s a .Net Git server going to work when their own true-to-the-original-idea SSH configuration doesn’t? But it does work – it also needs a bit of tinkering (you have to Google-translate a page in Chinese to get the info on how to really do it, WebDAV etc.) but the installation is amazingly painless: it takes a couple of hours, which is nothing compared to a week wasted for the previous experiment.
So, on to the next step: migration. I migrated a test folder from SVN with no trouble. Tried a bigger one, something under a 1000 revisions – well, it finished in a couple of days, I suppose it’s tolerable. Finally, tried to convert the central project – and when after two weeks of non-stop import it managed only 2000 of the repository’s 5000 revisions, I gave up. Back to Google: why is it so slow, how to speed it up? Turns out that the Git client is developed on Windows XP and that it should probably work well there. As it did: it managed to get all 5000 revisions in a couple of hours. Ok, now this I didn’t like. How can a serious tool not work on newer Windows versions? They said, it’s slow because of the UAC (introduced on Windows Vista), the UAC slows everything down. Well, it’s not like Vista was released yesterday. If this problem exists for years, should I expect a solution ever to appear? More research hinted that Linux kernel programmers think Windows users are lame. So – Git was slow on Windows newer than XP. TortoiseGit seems to execute the same Git shell commands, so it’s the same. I found Git Extensions in the meantime, which is supposed to be independent – but it didn’t even handle HTTP repositories.
In the meantime, I tried cloning the big repository I converted – big as in 200 megabytes big – and it was, as expected, slow. But, I don’t really know which one was to blame here – seems like Bonobo server choked on the large repository since it nailed the CPU at 100% and produced around 40 bytes per second to the client (possibly it was just some kind of sync messages and no data at all). Ok – Bonobo is open source and was built around GitSharp (or something like that) Git library written in .Net. What if I tried myself to update the library and compile the server? Well – GitSharp is discontinued at version 0.3. They’ve all gone to some new library written in C.
Ok, that was enough. After three weeks completely wasted, I gave up on Git.
(Update: Bonobo was since upgraded to version 1.1. I looked at the change log hoping to see a note about moving away from GitSharp, but it didn’t seem to happen. So as far as I know, this performance issue may still be present – nevertheless Bonobo seems the most promising solution for a Windows Git server).
Mercurial
So? Should I look at Mercurial? The Mercurial supporters seem a bit shy – that is, compared to Git fanatics who shout GIT! GIT! GIT! GIT! at each possible occasion, there occasionally appears one that says “or, you could try Mercurial”.
Well, the first look revealed the difference: I can download everything I need from a single web page. Mercurial basic installation – right there, server support included (as a script that is installed in a web server). TortoiseHg, right below it – even it has an ad-hoc server built in! Python extensions – do I need this? So I thought – no, this is too easy. Let’s try something unheard of in Git – RhodeCode. It’s a server implementation that is in fact a full-featured web application. Seems very nice, but due to some incompatibility in a crypto library, very hard to get installed on Windows: it took a lot of workarounds, I ended up installing Visual C++ Express 2008 (it has to be 2008, 2010 doesn’t cut it) and another kind of simulated shell environment (MingW32) to try to get the installer to compile it from source but it was impossible. That is: impossible on Windows 2008 R2, 64-bit. The RhodeCode developers say they’re working on making it more compatible with Windows (and for one thing changing the crypto library), and I found that I believe them, so I’ll be coming back to it. (In the meantime, they’ve released a couple of versions with windows-specific bugfixes, it might be worth it to check it out again).
In the vanilla Mercurial instalation there’s a script called hgweb.cgi that contains the full functionality needed to get a server running. A bit of tinkering is needed to make it run inside IIS – and there are a couple of slightly outdated tutorials on how to do this. I found out that the best combination is to download Mercurial for Python - so, no Mercurial or TortoiseHG on the server. This download says in its name for which version of Python it was written, and that version of Python is the second thing needed. Once both are installed, it is sufficient to put the hgweb.cgi in a virtual server in the IIS, add a hgweb.config and a web.config file, configure the IIS (basic authentication and what have ya), and set permissions on the folders – including the repositories. It took less than one day, research included, to get it up and running.
The client tooling seems better than SVN. TortoiseSVN (in fact, TortoiseCVS) was a breakthrough idea – a VCS integrated into windows, allowing you to version-control any folder on your disk. Well, TortoiseHG went one step further and actually improved the user experience. It has its bugs – more than SVN – but also has a lot more features. The whole thing was written in Python and seems to have a good API because a lot of plugins have been written for it, and TortoiseHG includes the most important ones. At this point I had to install TortoiseHG on the server because that’s the only way to get the subversion converter plug-in. The other way would be to install the Python module, but it cannot be done: first of all, they’ll tell you to install Subversion for Python (which is quite simple, there’s a pre-packaged downloadable setup), but when you do and get an error from the convert extension, you’ll find out that you don’t need that package but something called SWiG. But SWiG doesn’t have anything to do with Subversion – you have to download the Subversion package from Subversion site which moved to Apache leaving the Python part behind and the best you can do is find a source somewhere and compile it, but nobody says how it’s done.
On to converting the repositories – for one thing, it’s speed is normal, on any Windows. As fast as Git on XP, maybe even faster. So I was encouraged to do the thing I never even got around to thinking about with Git – and that is splitting the repositories into individual projects. It did take a week – the details of it will be a subject of a future post – but in the end it produced much less pain then Git.
Conclusion
Looking at it now, I think that Mercurial is unfairly underrated, and this seems to be due to the loud chanting of Git proponents. They say Git is blazingly fast – well, if you calculate the average performance on all operating systems, I think that on the average it’s either very slow or Windows users don’t use it at all. On Windows XP, Mercurial is at least as fast as Git, and on newer Windows versions Git is usable only for small repositories. Git is largely badly documented (which is getting better but Mercurial is way ahead – suffice it to say that I understood some Git concepts only when I read the Mercurial documentation). Git tooling, generally speaking, sucks – it was designed to be used in a Unix shell and nowhere else – and on Windows it is total crap. On the other hand, Mercurial tooling on Windows is, after this experience with Git, impressive. My conclusion is that for Git I would have to require special skills when employing programmers - “C# and Bash knowledge required”, how sane is that? Ok, I’m joking but it’s not far from truth: there has to be at least one Git specialist on the team when ordinary developers get stuck. With Mercurial, the usual SVN-like skill level should be enough for all because it’s not that easy to get stuck. And, after all this, I’m inclined to think that the story about Git being so great should be taken with a grain of salt since everything I’ve seen so far from it seems to tell exactly the opposite.