Microsoft Sync Framework: where will it end?
- Written by Boris Drajer
- Published in Synchronization Framework
It seems that the Microsoft Sync Framework is being developed in a hurry. It is a quite a big task - or at least it should become big if we're to have a serious data distribution framework - therefore it probably merits some patience, but the first thing that is sacrificed in similar cases is documentation. So what we now have is a piece of software for which it is not easy to figure out how it works, and once you do, there are cases when you’re left to your own power of deduction to figure out how it’s supposed to work.
I haven't dug into the framework deeply enough, but I'm digging. And I have the intention to document the findings, even if it means just sketching everything in short sentences. There are many things not obvious until you start disassembling (and Reflector Ilspy-ing, of course) the innards of the dlls. And even in that case, you have to keep notes because it's not a simple system. I intend to come back to this subject in the posts to come (and get way more technical), be sure to check back if you're interested.
I cannot say for sure, but given the complexity of problem the Sync Framework set out to solve, it is commendably (somewhat bravely, even) comprehensive, well thought out - and quite stable for a Microsoft V1. There's a V2 "on the air" right now, but it's a technology preview and it mostly contains a more mature version of the stuff we've already seen before. But even V2 or V3 would only be the small first step: what it currently does, copying database rows back and forth between PCs is not a mechanism that will allow us to one day easily build distributed systems. Even Sync Framework guys themselves acknowledge that the biggest obstacle is replication conflicts - irregularities that occur when the same piece of data is changed in multiple locations at the same time. Microsoft cannot help but give us a simplified solution in the form of record-by-record detection and resolution, and this is because the framework is in its very early stages: I don't know even if (or when) it will grow smart enough to handle more serious conflict resolution.
The thing is, record-by-record resolution cannot help you enforce business rules: for example, if your business logic depends on an invoice not containing the same product multiple times, how do you prevent this from happening in a distributed system? Two users working with two different databases can each add a record for a single item, but when the records replicate you get two of them. This really needs to be detected, and not in such way that would require programming a separate copy of validation logic for synchronization issues (which, when you think of it, should validate data that was already succesfully written to the database… I shudder to think of it). The synchronization framework would really need to somehow integrate with validation logic: in this aspect (and this is probably the biggest issue but only one of the issues present), Microsoft Sync Framework is much closer to the start line than to the finish. But at least it's moving...
The IT industry has so far moved on mostly in a step by step fashion, by implementing better solutions than the existing ones. This is where the Sync Framework will be of most use, to finally help us start thinking in terms of distributed data. Also, once there's a working system for data distribution, most will be interested in having it. And once they do, it will be much easier to persuade them that they need to structure their data and/or applications differently. Hopefully we'll be moving onto an application design philosophy in which it is a "good thing" to have distributed data just like it currently is a "good thing" to have object-orientedness, layered structure etc. There’s a good chance CRUD will be the one of the things that we'll start getting rid of. Because, once you look at it, storing the current state of data (which is the essence of CRUD - Create, Read, Update, Delete - philosophy) is the major factor in causing replication conflicts. If the databases stored operations - that is, changes to the data – besides data itself, it would be much easier to resolve conflicts, many of them automatically. The logic would know what the two mentioned users did - added the same product to the invoice - and act with this knowledge. In this concrete example there would be a much clearer situation for conflict resolution, the system could replay the operations so that the second one gets a chance to detect there already is a record present and act accordingly - be it to add the second quantity to the first or raise an error. Note that now a common validation logic for the operation could be employed... This is light years away from getting "fait accompli" duplicated rows and having to do a Sherlock Holmes to discover what has happened. Of course, this is also light years from where we currently are, but when you think of it, the database servers are way overdue for serious feature upgrades – and in any case, they already store something that resembles this in transactional logs.
So, it seems we're making the first step in the right general direction, even if we're not sure what precise direction we should move in. Trying to wrap our heads around distributed data philosphy is good – and seeing this practice widely deployed will be even better.