Menu

Development Environment (5)

CruiseControl.Net Missing Xml node (sourceControls) for required member (ThoughtWorks . CruiseControl . Core . Sourcecontrol . MultiSourceControl . SourceControls).

It’s a silly error but the solution is not very obvious or logical… I modified my CruiseControl.Net configuration to include multiple source control nodes, but it started complaining that the XML was malformed. The error was something like

[CCNet Server] ERROR CruiseControl.NET [(null)] - Exception: 
  Unable to instantiate CruiseControl projects from configuration document.
  Configuration document is likely missing Xml nodes required for properly populating CruiseControl configuration.
  Missing Xml node (sourceControls) for required member (ThoughtWorks.CruiseControl.Core.Sourcecontrol.MultiSourceControl.SourceControls).
  Xml: <sourcecontrol><sourcecontrols><hg><executable>C:\Program Files\TortoiseHg\hg.exe</executable> […]

It complains of a missing sourceControls node for required member SourceControls – but it’s present in the xml. The problem is the capital “C”: 1. XML is case-sensitive. 2. CruiseControl config tags are inconsistent in that sourcecontrol is not camel-cased but sourceControls is. That’s what confused me - hopefully this post will help someone else.

Lessons learned: migrating a complicated repository from subversion to mercurial

Migrating from SVN to Mercurial is a simple process only if the SVN repository has a straight-and-square structure - that is, there are trunk, branches and tags folders in the root and nothing else, not in its present state or ever before. If you used your SVN repository in a way that was convenient in SVN but not in Mercurial – for example, you created branches in various subdirectories, it still shouldn’t be too hard to migrate. But if you, like me, decided late in the game to create the mentioned folders in SVN and then moved an renamed your folders, you will need to invest serious time if you don’t want to lose parts of your history. You need to plot your migration very thoroughly and do a lot of test runs.

The reason for this is that Mercurial’s ConvertExtension is somewhat of a low-level tool. (In other words, although reliable it is not too bright). Browsing the internet you may get the impression that it’s an automated conversion system: it isn’t. It does fully automated migration only for straight SVN repositories, but for the rest it’s more like something to use in your migration script. It seems to do its primary purpose – converting revisions from one repository format to another – quite well but the rest of the tool is not so intelligent and it needs help. So, lesson number one: if you have a complex repository, don’t take the migration lightly.

A small disclaimer is in order: this post is not intended to be a complete step-by-step guide to migration. Rather, it’s something to fill in the blanks left by what little is available on the internet. I’ve done a complex migration and I want to do a brain dump for my future reference or for “whomeverother it may concern”.

Between the two alternatives I perceived as most promising, hgsubversion and the convert extension, i chose the latter. Hgsubversion was claimed by some to be the better tool for this job, but it was somewhat troublesome. The problem with hgsubversion was that it had a memory leak and broke easily in the middle of the conversion (note that this happened a couple of months ago: things may have changed in the meantime). The solution, they say, was to do hg pull repeatedly until it finishes. I wanted to do a hg clone with a filemap, but when the import broke I was in trouble because hg pull doesn’t accept filemaps. (It could be that the filemap was cached somewhere inside of the new repository and my worries were unfounded, I don’t really know). I may try that in the future. One other way around it would be to do a straight clone of SVN – no branches or anything – into an intermediate mercurial repository and then split that into separate final repositories. In that case, hgsubversion could be a viable solution, maybe even better than the conversion extension. I had more success with the conversion extension so this is what we’ll talk about here.

Moving Up from SVN: Experiences in Migrating to Git and Mercurial

(Note: this text is a result of a couple of months of research… Only when I finished the migration and got to the point where things are running smoothly I got around to finishing this post. Some of this information may be a bit outdated, but from what I’ve seen things haven’t moved much in the meantime. Anyway, I added comments in the text where the situation might have changed).

Now that I think about it, there was one crucial reason to abandon SVN and upgrade to something more powerful: merging. My company now has a lot of parallel development (several customized instances of an application), and SVN’s support for this is not as good as it should be – that is, not as good as the competition’s. In SVN you can separate a development branch and keep it parallel to the main one, and merge-reintegrate it back. SVN will remember what revision it merged so that it doesn’t try to do it again (which would otherwise produce merge conflicts). But that seems to be the limit of its capabilities: a bit more complicated structure and merge produces so much conflicts that it looks the same as when everything is done manually. Contrast that to a more recent VCS like Git or Mercurial, where the essential feature is the ability to manipulate changesets, stack them one upon another or restructure their connections: if you can merge a set of changes from one branch to another, then continue developing both of them, do a back-merge (even though you shouldn’t), and the system doesn’t produce an enormous amount of conflicts, it gives you more power.

Of course, there are additional advantages that both Git and Mercurial give over SVN, but in our case that was just a bonus, not a necessity. I believe that many developers think likewise. SVN is good enough at what it does, the problem is when you need something that it wasn’t designed to do.

So which one to choose? It seems that Git, Mercurial and Bazaar are the most popular. People say that of the three Git is the most powerful because it was written by Linux kernel developers who do complicated merges every day. Ok, so I seemed natural to choose that one.

Git

The first impression that I got is that Linux kernel developers are guys who talk between themselves in hexadecimal and to the rest of the world in mnemonics. Git had some documentation but not nearly enough explanations on how things work. The error messages the tools produced were cryptic – and not only that, they were formatted to be viewed in the console so when they are displayed by GUI tools they tend to get messed up. Ok, so Linux kernel developers think GUIs are lame… While we’re at it, error messages are also misleading: I spent a lot of time trying to diagnose what’s wrong with my repository because the clone operation kept producing a warning about the remote head pointing to a non-existent revision or branch (and yeah, it was a fatal warning – a new term, at least for me - because the command didn’t do anything but gave only a warning). It turned out at the end that I was using a wrong URL for the repository – that is, I was trying to clone a nonexistent repository. I agree, in a nonexistent repository the nonexistent head points to a nonexistent revision, but it’s a bit of a roundabout way to say it, isn’t it?

But these are the things that can – and most probably will – be addressed as the tools mature. Since there are so much Git users, it’s surely reliable enough. I don’t mind some rough edges, I’ve been there with SVN and it was worth it (it was much better to cope with occasional bugs in SVN than to start with CVS and then migrate to SVN).

Ok, so how do we get Git to work on a Windows server? The answer that seemed complete enough hinted that I should simulate a Linux environment, install the shell, something like a SSH daemon and everything else. In other words, it’s not supposed to run on Windows but it can. Ok, I tried – there are several variations to the theme and each one had a glossed-over part that needs to be researched (something like “at this point you need to import the created certificate into Putty” – well, Putty doesn’t want to read this file format). And it didn’t help that Git itself doesn’t always tell you the real reason something doesn’t work, as I already mentioned. Moreover, it was for some reason allergic to empty repositories – unless I commit something locally, it won’t work. And the repository got easily screwed up while experimenting – that is, it got into a state where I didn’t know what to do with it and it was easier to create a new one.

At this point it was clear that the client tooling also leaves a lot to be desired – there was TortoiseGit that locked occasionaly on clone/pull (and it never displayed the progress of the operation so you never really knew if it was doing something), there was Git GUI that was a bit more stable, and there was Git Bash that was the most reliable. (One interesting sidenote is that at one point I managed to get three different error messages for doing the same – clone - operation with these three tools). One thing, though, the Bash in Git Bash is probably the best part of the package, I had almost forgotten the comfort of working in a Unix shell. Command prompt is light years behind it.

I did get the server to work, though, after a week of running around in circles and repeating the same stuff with slight variations. At the end I was able to create a new repository, clone it, make changes, commit, push, pull… Everything worked until I restarted the server. Or so it seems – when I tried to clone the same repository afterwards, it started again producing weird error messages. I didn’t know what else changed except for the restart (which shouldn’t have affected it – and probably didn’t, but what else? It’s an isolated environment). If I didn’t have the cloned repository with revision history I would have doubted I actually succeeded in doing it… Ok, so it’s also fragile.

Then I tried to find an alternative: there’s a couple of PHP scripts that wrap Git server functionality, but they don’t seem to work on Windows (I tried them in Apache). There’s Bonobo Git server that is written in .Net – well, I never looked at it seriously, how’s a .Net Git server going to work when their own true-to-the-original-idea SSH configuration doesn’t? But it does work – it also needs a bit of tinkering (you have to Google-translate a page in Chinese to get the info on how to really do it, WebDAV etc.) but the installation is amazingly painless: it takes a couple of hours, which is nothing compared to a week wasted for the previous experiment.

So, on to the next step: migration. I migrated a test folder from SVN with no trouble. Tried a bigger one, something under a 1000 revisions – well, it finished in a couple of days, I suppose it’s tolerable. Finally, tried to convert the central project – and when after two weeks of non-stop import it managed only 2000 of the repository’s 5000 revisions, I gave up. Back to Google: why is it so slow, how to speed it up? Turns out that the Git client is developed on Windows XP and that it should probably work well there. As it did: it managed to get all 5000 revisions in a couple of hours. Ok, now this I didn’t like. How can a serious tool not work on newer Windows versions? They said, it’s slow because of the UAC (introduced on Windows Vista), the UAC slows everything down. Well, it’s not like Vista was released yesterday. If this problem exists for years, should I expect a solution ever to appear? More research hinted that Linux kernel programmers think Windows users are lame. So – Git was slow on Windows newer than XP. TortoiseGit seems to execute the same Git shell commands, so it’s the same. I found Git Extensions in the meantime, which is supposed to be independent – but it didn’t even handle HTTP repositories.

In the meantime, I tried cloning the big repository I converted – big as in 200 megabytes big – and it was, as expected, slow. But, I don’t really know which one was to blame here – seems like Bonobo server choked on the large repository since it nailed the CPU at 100% and produced around 40 bytes per second to the client (possibly it was just some kind of sync messages and no data at all). Ok – Bonobo is open source and was built around GitSharp (or something like that) Git library written in .Net. What if I tried myself to update the library and compile the server? Well – GitSharp is discontinued at version 0.3. They’ve all gone to some new library written in C.

Ok, that was enough. After three weeks completely wasted, I gave up on Git.

(Update: Bonobo was since upgraded to version 1.1. I looked at the change log hoping to see a note about moving away from GitSharp, but it didn’t seem to happen. So as far as I know, this performance issue may still be present – nevertheless Bonobo seems the most promising solution for a Windows Git server).

Mercurial

So? Should I look at Mercurial? The Mercurial supporters seem a bit shy – that is, compared to Git fanatics who shout GIT! GIT! GIT! GIT! at each possible occasion, there occasionally appears one that says “or, you could try Mercurial”.

Well, the first look revealed the difference: I can download everything I need from a single web page. Mercurial basic installation – right there, server support included (as a script that is installed in a web server). TortoiseHg, right below it – even it has an ad-hoc server built in! Python extensions – do I need this? So I thought – no, this is too easy. Let’s try something unheard of in Git – RhodeCode. It’s a server implementation that is in fact a full-featured web application. Seems very nice, but due to some incompatibility in a crypto library, very hard to get installed on Windows: it took a lot of workarounds, I ended up installing Visual C++ Express 2008 (it has to be 2008, 2010 doesn’t cut it) and another kind of simulated shell environment (MingW32) to try to get the installer to compile it from source but it was impossible. That is: impossible on Windows 2008 R2, 64-bit. The RhodeCode developers say they’re working on making it more compatible with Windows (and for one thing changing the crypto library), and I found that I believe them, so I’ll be coming back to it. (In the meantime, they’ve released a couple of versions with windows-specific bugfixes, it might be worth it to check it out again).

In the vanilla Mercurial instalation there’s a script called hgweb.cgi that contains the full functionality needed to get a server running. A bit of tinkering is needed to make it run inside IIS – and there are a couple of slightly outdated tutorials on how to do this. I found out that the best combination is to download Mercurial for Python - so, no Mercurial or TortoiseHG on the server. This download says in its name for which version of Python it was written, and that version of Python is the second thing needed. Once both are installed, it is sufficient to put the hgweb.cgi in a virtual server in the IIS, add a hgweb.config and a web.config file, configure the IIS (basic authentication and what have ya), and set permissions on the folders – including the repositories. It took less than one day, research included, to get it up and running.

The client tooling seems better than SVN. TortoiseSVN (in fact, TortoiseCVS) was a breakthrough idea – a VCS integrated into windows, allowing you to version-control any folder on your disk. Well, TortoiseHG went one step further and actually improved the user experience. It has its bugs – more than SVN – but also has a lot more features. The whole thing was written in Python and seems to have a good API because a lot of plugins have been written for it, and TortoiseHG includes the most important ones. At this point I had to install TortoiseHG on the server because that’s the only way to get the subversion converter plug-in. The other way would be to install the Python module, but it cannot be done: first of all, they’ll tell you to install Subversion for Python (which is quite simple, there’s a pre-packaged downloadable setup), but when you do and get an error from the convert extension, you’ll find out that you don’t need that package but something called SWiG. But SWiG doesn’t have anything to do with Subversion – you have to download the Subversion package from Subversion site which moved to Apache leaving the Python part behind and the best you can do is find a source somewhere and compile it, but nobody says how it’s done.

On to converting the repositories – for one thing, it’s speed is normal, on any Windows. As fast as Git on XP, maybe even faster. So I was encouraged to do the thing I never even got around to thinking about with Git – and that is splitting the repositories into individual projects. It did take a week – the details of it will be a subject of a future post – but in the end it produced much less pain then Git.

Conclusion

Looking at it now, I think that Mercurial is unfairly underrated, and this seems to be due to the loud chanting of Git proponents. They say Git is blazingly fast – well, if you calculate the average performance on all operating systems, I think that on the average it’s either very slow or Windows users don’t use it at all. On Windows XP, Mercurial is at least as fast as Git, and on newer Windows versions Git is usable only for small repositories. Git is largely badly documented (which is getting better but Mercurial is way ahead – suffice it to say that I understood some Git concepts only when I read the Mercurial documentation). Git tooling, generally speaking, sucks – it was designed to be used in a Unix shell and nowhere else – and on Windows it is total crap. On the other hand, Mercurial tooling on Windows is, after this experience with Git, impressive. My conclusion is that for Git I would have to require special skills when employing programmers - “C# and Bash knowledge required”, how sane is that? Ok, I’m joking but it’s not far from truth: there has to be at least one Git specialist on the team when ordinary developers get stuck. With Mercurial, the usual SVN-like skill level should be enough for all because it’s not that easy to get stuck. And, after all this, I’m inclined to think that the story about Git being so great should be taken with a grain of salt since everything I’ve seen so far from it seems to tell exactly the opposite.

How to fix comment character encoding in TortoiseHG

I imported a couple of repositories from SVN into Mercurial and discovered that characters not present in the standard ASCII table have become mangled in the comments… Or at least they looked mangled in the console output as well as in TortoiseHG – now, the console is not that important, but how to fix this in Tortoise?

I tried searching for a solution on how to modify the import process and found nothing. Tried to add a new comment to the repository with a non-ASCII character and got a Python error (“expected string, QString found”). Some said that I should change my Windows’ default system encoding (which is English(US)), and that solved the problem but I would have liked a simpler solution, since changing the default encoding used to cause other problems in the past. I managed to find a couple of workarounds that solve the problem of console display and involve setting environment variables… Would it work for Tortoise? Actually: it does. The solution is simple: go to (this is on Windows 7) Control Panel – System – Advanced System Settings – Environment Variables, add a new user variable called HGENCODING and set it’s value to either “utf-8” or your code page (mine is “cp1250”). TortoiseHg respects this. There’s a slight difference in the two values, though, because the diff viewer doesn’t really like “utf-8”, it prefers the concrete code page. There may be other components that behave like this, so I suppose that setting the code page is the optimal solution.

A macro to find missing files in Visual Studio Solutions

Or: how to solve the setup project message “ERROR: An error occurred while validating.  HRESULT = '80004005'” It seems that for a large part of features in Visual Studio .Net, the development stops at the point where they are mostly usable and effectively demo-able. The Microsoft people are very eager to show you how easy it is to solve a trivial problem with a couple of clicks, but they are very reserved once something really serious has to be done. A case in point: the Setup/Deployment projects in Visual Studio. (Yeah, the ones, Zero Click Deployment - they make it sound like it reads your thoughts - and the like). If you have a missing file in your solution, and if the file is of a non-critical type (e.g. a resource) so that the solution compiles, you have a big problem because the setup won’t. It will fail with a moronic message “ERROR: An error occurred while validating.  HRESULT = '80004005'”. Which file is missing? Well, if you really really care – go through all files in your projects and check (don’t forget to expand the controls, some RESX child file may be the culprit!) Another thing that can happen is that a reference in one of the projects is bad. For example, you had the project reference another project and you moved the other project out of the solution… The first project doesn’t use anything from it so that the build passes but the setup is not as forgiving. In this case, you need to check all references. If, like me, you have a solution that consists of thirty project and thousands of files, this would mean a major headache. One way to solve it would be to create a new, temporary setup project and add to it one by one all outputs from the original setup. Build after each step and when the error appears you’ll know which project is at fault. Now, if the first is the case (a missing source file), one possible solution is to automate the manual search by using a macro. Below you’ll find one, modified from the example on the excellent MZ Tools site. As for the missing reference, it is somewhat easier to find since there are considerably less references in a solution than project files (you can detect these using the one-by-one method described above). I’m leaving it to you as a TODO: improve this script to detect missing references, post it on your blog and let me know so I can add a link to it.

Option Strict Off
Option Explicit Off
Imports System
Imports EnvDTE
Imports EnvDTE80
Imports EnvDTE90
Imports System.Diagnostics
Imports System.Windows.Forms

Public Module Module2

    Sub FindMissingFiles()

        Dim objProject As EnvDTE.Project

        Try
            If Not DTE.Solution.IsOpen Then
                MessageBox.Show("Please load or create a solution")
            Else
                For Each objProject In DTE.Solution.Projects
                    NavigateProject(objProject)
                Next
            End If
        Catch objException As System.Exception
            MessageBox.Show(objException.ToString)
        End Try

    End Sub

    Private Sub NavigateProject(ByVal objProject As Project)

        Dim objParentProjectItem As ProjectItem

        Try
            objParentProjectItem = objProject.ParentProjectItem
        Catch
        End Try

        NavigateProjectItems(objProject.ProjectItems)

    End Sub

    Private Sub NavigateProjectItems(ByVal colProjectItems As ProjectItems)

        Dim objProjectItem As EnvDTE.ProjectItem

        If Not (colProjectItems Is Nothing) Then
            For Each objProjectItem In colProjectItems
                For i As Integer = 1 To objProjectItem.FileCount
                    Dim fileName As String = objProjectItem.FileNames(i)

                    If fileName <> "" And Not System.IO.File.Exists(fileName) _
                            And Not System.IO.Directory.Exists(fileName) Then
                        MessageBox.Show("File missing: " + fileName)
                    End If
                Next

                If Not (objProjectItem.SubProject Is Nothing) Then
                    ' We navigate recursively because it can be:
                    ' - An Enterprise project in Visual Studio .NET 2002/2003
                    ' - A solution folder in VS 2005
                    NavigateProject(objProjectItem.SubProject)
                Else
                    ' We navigate recursively because it can be:
                    ' - An folder inside a project
                    ' - A project item with nested project items (code-behind files, etc.)
                    NavigateProjectItems(objProjectItem.ProjectItems)
                End If
            Next
        End If

    End Sub

End Module
Subscribe to this RSS feed