Menu

Microsoft Sync Framework: where will it end?

It seems that the Microsoft Sync Framework is being developed in a hurry. It is a quite a big task - or at least it should become big if we're to have a serious data distribution framework - therefore it probably merits some patience, but the first thing that is sacrificed in similar cases is documentation. So what we now have is a piece of software for which it is not easy to figure out how it works, and once you do, there are cases when you’re left to your own power of deduction to figure out how it’s supposed to work.

I haven't dug into the framework deeply enough, but I'm digging. And I have the intention to document the findings, even if it means just sketching everything in short sentences. There are many things not obvious until you start disassembling (and Reflector Ilspy-ing, of course) the innards of the dlls. And even in that case, you have to keep notes because it's not a simple system. I intend to come back to this subject in the posts to come (and get way more technical), be sure to check back if you're interested.

I cannot say for sure, but given the complexity of problem the Sync Framework set out to solve, it is commendably (somewhat bravely, even) comprehensive, well thought out - and quite stable for a Microsoft V1. There's a V2 "on the air" right now, but it's a technology preview and it mostly contains a more mature version of the stuff we've already seen before. But even V2 or V3 would only be the small first step: what it currently does, copying database rows back and forth between PCs is not a mechanism that will allow us to one day easily build distributed systems. Even Sync Framework guys themselves acknowledge that the biggest obstacle is replication conflicts - irregularities that occur when the same piece of data is changed in multiple locations at the same time. Microsoft cannot help but give us a simplified solution in the form of record-by-record detection and resolution, and this is because the framework is in its very early stages: I don't know even if (or when) it will grow smart enough to handle more serious conflict resolution.

The thing is, record-by-record resolution cannot help you enforce business rules: for example, if your business logic depends on an invoice not containing the same product multiple times, how do you prevent this from happening in a distributed system? Two users working with two different databases can each add a record for a single item, but when the records replicate you get two of them. This really needs to be detected, and not in such way that would require programming a separate copy of validation logic for synchronization issues (which, when you think of it, should validate data that was already succesfully written to the database… I shudder to think of it). The synchronization framework would really need to somehow integrate with validation logic: in this aspect (and this is probably the biggest issue but only one of the issues present), Microsoft Sync Framework is much closer to the start line than to the finish. But at least it's moving...

The IT industry has so far moved on mostly in a step by step fashion, by implementing better solutions than the existing ones. This is where the Sync Framework will be of most use, to finally help us start thinking in terms of distributed data. Also, once there's a working system for data distribution, most will be interested in having it. And once they do, it will be much easier to persuade them that they need to structure their data and/or applications differently. Hopefully we'll be moving onto an application design philosophy in which it is a "good thing" to have distributed data just like it currently is a "good thing" to have object-orientedness, layered structure etc. There’s a good chance CRUD will be the one of the things that we'll start getting rid of. Because, once you look at it, storing the current state of data (which is the essence of CRUD - Create, Read, Update, Delete - philosophy) is the major factor in causing replication conflicts. If the databases stored operations - that is, changes to the data – besides data itself, it would be much easier to resolve conflicts, many of them automatically. The logic would know what the two mentioned users did - added the same product to the invoice - and act with this knowledge. In this concrete example there would be a much clearer situation for conflict resolution, the system could replay the operations so that the second one gets a chance to detect there already is a record present and act accordingly - be it to add the second quantity to the first or raise an error. Note that now a common validation logic for the operation could be employed... This is light years away from getting "fait accompli" duplicated rows and having to do a Sherlock Holmes to discover what has happened. Of course, this is also light years from where we currently are, but when you think of it, the database servers are way overdue for serious feature upgrades – and in any case, they already store something that resembles this in transactional logs.

So, it seems we're making the first step in the right general direction, even if we're not sure what precise direction we should move in. Trying to wrap our heads around distributed data philosphy is good – and seeing this practice widely deployed will be even better.

Intro

Welcome to the 8bit blog. This is the first in what we hope to be a series of posts on various subjects related to IT, e-business and application development. We will concentrate on a broad range of topics, some of them highly technical, but for the most part we will try to provide information that is otherwise hard to find, at least in a readable and concise form.

Moving Up from SVN: Experiences in Migrating to Git and Mercurial

(Note: this text is a result of a couple of months of research… Only when I finished the migration and got to the point where things are running smoothly I got around to finishing this post. Some of this information may be a bit outdated, but from what I’ve seen things haven’t moved much in the meantime. Anyway, I added comments in the text where the situation might have changed).

Now that I think about it, there was one crucial reason to abandon SVN and upgrade to something more powerful: merging. My company now has a lot of parallel development (several customized instances of an application), and SVN’s support for this is not as good as it should be – that is, not as good as the competition’s. In SVN you can separate a development branch and keep it parallel to the main one, and merge-reintegrate it back. SVN will remember what revision it merged so that it doesn’t try to do it again (which would otherwise produce merge conflicts). But that seems to be the limit of its capabilities: a bit more complicated structure and merge produces so much conflicts that it looks the same as when everything is done manually. Contrast that to a more recent VCS like Git or Mercurial, where the essential feature is the ability to manipulate changesets, stack them one upon another or restructure their connections: if you can merge a set of changes from one branch to another, then continue developing both of them, do a back-merge (even though you shouldn’t), and the system doesn’t produce an enormous amount of conflicts, it gives you more power.

Of course, there are additional advantages that both Git and Mercurial give over SVN, but in our case that was just a bonus, not a necessity. I believe that many developers think likewise. SVN is good enough at what it does, the problem is when you need something that it wasn’t designed to do.

So which one to choose? It seems that Git, Mercurial and Bazaar are the most popular. People say that of the three Git is the most powerful because it was written by Linux kernel developers who do complicated merges every day. Ok, so I seemed natural to choose that one.

Git

The first impression that I got is that Linux kernel developers are guys who talk between themselves in hexadecimal and to the rest of the world in mnemonics. Git had some documentation but not nearly enough explanations on how things work. The error messages the tools produced were cryptic – and not only that, they were formatted to be viewed in the console so when they are displayed by GUI tools they tend to get messed up. Ok, so Linux kernel developers think GUIs are lame… While we’re at it, error messages are also misleading: I spent a lot of time trying to diagnose what’s wrong with my repository because the clone operation kept producing a warning about the remote head pointing to a non-existent revision or branch (and yeah, it was a fatal warning – a new term, at least for me - because the command didn’t do anything but gave only a warning). It turned out at the end that I was using a wrong URL for the repository – that is, I was trying to clone a nonexistent repository. I agree, in a nonexistent repository the nonexistent head points to a nonexistent revision, but it’s a bit of a roundabout way to say it, isn’t it?

But these are the things that can – and most probably will – be addressed as the tools mature. Since there are so much Git users, it’s surely reliable enough. I don’t mind some rough edges, I’ve been there with SVN and it was worth it (it was much better to cope with occasional bugs in SVN than to start with CVS and then migrate to SVN).

Ok, so how do we get Git to work on a Windows server? The answer that seemed complete enough hinted that I should simulate a Linux environment, install the shell, something like a SSH daemon and everything else. In other words, it’s not supposed to run on Windows but it can. Ok, I tried – there are several variations to the theme and each one had a glossed-over part that needs to be researched (something like “at this point you need to import the created certificate into Putty” – well, Putty doesn’t want to read this file format). And it didn’t help that Git itself doesn’t always tell you the real reason something doesn’t work, as I already mentioned. Moreover, it was for some reason allergic to empty repositories – unless I commit something locally, it won’t work. And the repository got easily screwed up while experimenting – that is, it got into a state where I didn’t know what to do with it and it was easier to create a new one.

At this point it was clear that the client tooling also leaves a lot to be desired – there was TortoiseGit that locked occasionaly on clone/pull (and it never displayed the progress of the operation so you never really knew if it was doing something), there was Git GUI that was a bit more stable, and there was Git Bash that was the most reliable. (One interesting sidenote is that at one point I managed to get three different error messages for doing the same – clone - operation with these three tools). One thing, though, the Bash in Git Bash is probably the best part of the package, I had almost forgotten the comfort of working in a Unix shell. Command prompt is light years behind it.

I did get the server to work, though, after a week of running around in circles and repeating the same stuff with slight variations. At the end I was able to create a new repository, clone it, make changes, commit, push, pull… Everything worked until I restarted the server. Or so it seems – when I tried to clone the same repository afterwards, it started again producing weird error messages. I didn’t know what else changed except for the restart (which shouldn’t have affected it – and probably didn’t, but what else? It’s an isolated environment). If I didn’t have the cloned repository with revision history I would have doubted I actually succeeded in doing it… Ok, so it’s also fragile.

Then I tried to find an alternative: there’s a couple of PHP scripts that wrap Git server functionality, but they don’t seem to work on Windows (I tried them in Apache). There’s Bonobo Git server that is written in .Net – well, I never looked at it seriously, how’s a .Net Git server going to work when their own true-to-the-original-idea SSH configuration doesn’t? But it does work – it also needs a bit of tinkering (you have to Google-translate a page in Chinese to get the info on how to really do it, WebDAV etc.) but the installation is amazingly painless: it takes a couple of hours, which is nothing compared to a week wasted for the previous experiment.

So, on to the next step: migration. I migrated a test folder from SVN with no trouble. Tried a bigger one, something under a 1000 revisions – well, it finished in a couple of days, I suppose it’s tolerable. Finally, tried to convert the central project – and when after two weeks of non-stop import it managed only 2000 of the repository’s 5000 revisions, I gave up. Back to Google: why is it so slow, how to speed it up? Turns out that the Git client is developed on Windows XP and that it should probably work well there. As it did: it managed to get all 5000 revisions in a couple of hours. Ok, now this I didn’t like. How can a serious tool not work on newer Windows versions? They said, it’s slow because of the UAC (introduced on Windows Vista), the UAC slows everything down. Well, it’s not like Vista was released yesterday. If this problem exists for years, should I expect a solution ever to appear? More research hinted that Linux kernel programmers think Windows users are lame. So – Git was slow on Windows newer than XP. TortoiseGit seems to execute the same Git shell commands, so it’s the same. I found Git Extensions in the meantime, which is supposed to be independent – but it didn’t even handle HTTP repositories.

In the meantime, I tried cloning the big repository I converted – big as in 200 megabytes big – and it was, as expected, slow. But, I don’t really know which one was to blame here – seems like Bonobo server choked on the large repository since it nailed the CPU at 100% and produced around 40 bytes per second to the client (possibly it was just some kind of sync messages and no data at all). Ok – Bonobo is open source and was built around GitSharp (or something like that) Git library written in .Net. What if I tried myself to update the library and compile the server? Well – GitSharp is discontinued at version 0.3. They’ve all gone to some new library written in C.

Ok, that was enough. After three weeks completely wasted, I gave up on Git.

(Update: Bonobo was since upgraded to version 1.1. I looked at the change log hoping to see a note about moving away from GitSharp, but it didn’t seem to happen. So as far as I know, this performance issue may still be present – nevertheless Bonobo seems the most promising solution for a Windows Git server).

Mercurial

So? Should I look at Mercurial? The Mercurial supporters seem a bit shy – that is, compared to Git fanatics who shout GIT! GIT! GIT! GIT! at each possible occasion, there occasionally appears one that says “or, you could try Mercurial”.

Well, the first look revealed the difference: I can download everything I need from a single web page. Mercurial basic installation – right there, server support included (as a script that is installed in a web server). TortoiseHg, right below it – even it has an ad-hoc server built in! Python extensions – do I need this? So I thought – no, this is too easy. Let’s try something unheard of in Git – RhodeCode. It’s a server implementation that is in fact a full-featured web application. Seems very nice, but due to some incompatibility in a crypto library, very hard to get installed on Windows: it took a lot of workarounds, I ended up installing Visual C++ Express 2008 (it has to be 2008, 2010 doesn’t cut it) and another kind of simulated shell environment (MingW32) to try to get the installer to compile it from source but it was impossible. That is: impossible on Windows 2008 R2, 64-bit. The RhodeCode developers say they’re working on making it more compatible with Windows (and for one thing changing the crypto library), and I found that I believe them, so I’ll be coming back to it. (In the meantime, they’ve released a couple of versions with windows-specific bugfixes, it might be worth it to check it out again).

In the vanilla Mercurial instalation there’s a script called hgweb.cgi that contains the full functionality needed to get a server running. A bit of tinkering is needed to make it run inside IIS – and there are a couple of slightly outdated tutorials on how to do this. I found out that the best combination is to download Mercurial for Python - so, no Mercurial or TortoiseHG on the server. This download says in its name for which version of Python it was written, and that version of Python is the second thing needed. Once both are installed, it is sufficient to put the hgweb.cgi in a virtual server in the IIS, add a hgweb.config and a web.config file, configure the IIS (basic authentication and what have ya), and set permissions on the folders – including the repositories. It took less than one day, research included, to get it up and running.

The client tooling seems better than SVN. TortoiseSVN (in fact, TortoiseCVS) was a breakthrough idea – a VCS integrated into windows, allowing you to version-control any folder on your disk. Well, TortoiseHG went one step further and actually improved the user experience. It has its bugs – more than SVN – but also has a lot more features. The whole thing was written in Python and seems to have a good API because a lot of plugins have been written for it, and TortoiseHG includes the most important ones. At this point I had to install TortoiseHG on the server because that’s the only way to get the subversion converter plug-in. The other way would be to install the Python module, but it cannot be done: first of all, they’ll tell you to install Subversion for Python (which is quite simple, there’s a pre-packaged downloadable setup), but when you do and get an error from the convert extension, you’ll find out that you don’t need that package but something called SWiG. But SWiG doesn’t have anything to do with Subversion – you have to download the Subversion package from Subversion site which moved to Apache leaving the Python part behind and the best you can do is find a source somewhere and compile it, but nobody says how it’s done.

On to converting the repositories – for one thing, it’s speed is normal, on any Windows. As fast as Git on XP, maybe even faster. So I was encouraged to do the thing I never even got around to thinking about with Git – and that is splitting the repositories into individual projects. It did take a week – the details of it will be a subject of a future post – but in the end it produced much less pain then Git.

Conclusion

Looking at it now, I think that Mercurial is unfairly underrated, and this seems to be due to the loud chanting of Git proponents. They say Git is blazingly fast – well, if you calculate the average performance on all operating systems, I think that on the average it’s either very slow or Windows users don’t use it at all. On Windows XP, Mercurial is at least as fast as Git, and on newer Windows versions Git is usable only for small repositories. Git is largely badly documented (which is getting better but Mercurial is way ahead – suffice it to say that I understood some Git concepts only when I read the Mercurial documentation). Git tooling, generally speaking, sucks – it was designed to be used in a Unix shell and nowhere else – and on Windows it is total crap. On the other hand, Mercurial tooling on Windows is, after this experience with Git, impressive. My conclusion is that for Git I would have to require special skills when employing programmers - “C# and Bash knowledge required”, how sane is that? Ok, I’m joking but it’s not far from truth: there has to be at least one Git specialist on the team when ordinary developers get stuck. With Mercurial, the usual SVN-like skill level should be enough for all because it’s not that easy to get stuck. And, after all this, I’m inclined to think that the story about Git being so great should be taken with a grain of salt since everything I’ve seen so far from it seems to tell exactly the opposite.

Repository moved temporarily to '/viewvc/nhibernate/trunk/'; please relocate

When trying to switch my local Subversion copy of the NHibernate source to a different tag (from 3.1GA to trunk, in this case), I got this error:

Repository moved temporarily to '/viewvc/nhibernate/trunk/'; please relocate

The frustrating thing was that I was trying to relocate to exactly this url. And if I tried others, it said that I should relocate to them… I searched the net in vain for the solution, the only information I got is that I should re-configure my apache server (thanks a bunch!)

The problem is, in fact, simple: the URL is wrong. I thought I could just copy the repository’s URL from my web browser, like I do with other sites. Not here: there’s a separate entry for direct SVN access. So instead of using this url:

http://nhibernate.svn.sourceforge.net/viewvc/nhibernate/

use this one:

https://nhibernate.svn.sourceforge.net/svnroot/nhibernate

It does seem like a simple problem but the solution wasn’t so easy to find.

Subscribe to this RSS feed