Microsoft .Net (24)

NHibernate 2.1 updates schema metadata without being asked to

In NHibernate 2.1, the session factory is set up to access the database immediately when you build it. This is done by a Hbm2ddl component to update something called SchemaMetaData: I’m not sure what this is all about, but I am certain that such behaviour is not nice. The previous version of NHibernate didn’t do it, so I expect the new one to behave likewise unless I explicitly order the change.

The solution for this is to add a line to your hibernate.cfg.xml file that says:

<property name="hbm2ddl.keywords">none</property>

Note that completely omitting this setting will actually enable the feature… Did I already mention I don’t like it? I don’t, so much that I decided not to change config files but to hardcode it disabled. I use one global method to load the NHibernate configuration, so this is easy. The code looks something like this:

_configuration = new global::NHibernate.Cfg.Configuration();
_configuration.SetProperty("hbm2ddl.keywords", "none");

Sync Framework 2 CTP2 – no SqlExpressClientSyncProvider yet

Ok, I’ve finally gotten around to installing the CTP2 of the Sync framework. Let's see what new and interesting stuff I got with it:

1. A headache.
2. Erm... anything else?

All witticism aside, a lot of details have probably changed, but the main gripe I had still stands: there is no support for hub-and-spoke replication between two SQL servers etc. (Oh, and the designer is still unusable… Two gripes).

As for the first thing, I hoped I was finally going to get rid of my SqlExpressClientSyncProvider debugged demo but no such luck. It turns out that nothing of the sort is (yet?) included in the sync framework. Judging by a forum post, v2 is soon due to be released, but any questions regarding the Sql Express provider are met with a dead silence. It doesn’t seem it will be included this time (9200 downloads of the SqlExpressClientSyncProvider demo are obviously not significant for these guys). You almost literally have to read between the lines: regarding information about this CTP, there was a very sparse announcement, and a somewhat misleading one at that (and since this is the only information you get, any ambiguity can lead you in the wrong direction).

The CTP2 release announcement said:

# New database providers (SqlSyncProvider and SqlCeSyncProvider)
Enable hub-and-spoke and peer-to-peer synchronization for SQL Server, SQL Server Express, and SQL Server Compact.

So, does SqlSyncProvider work in hub-and-spoke scenario? I thought yes. How would you interpret the above sentence?

The truth is that SqlSyncProvider cannot be used as local sync provider in a SyncAgent (that is, in a hub-and-spoke scenario as I know it) because it is not derived from ClientSyncProvider. The SyncAgent explicitly denies it - and throws a very un-useful exception that says, essentially “ClientSyncProvider”… Translated, this means: “use the Reflector to see what has happened”, which I did. The code in the SyncAgent looks like this:

public SyncProvider LocalProvider 
        return this._localProvider; 
        ClientSyncProvider provider = value as ClientSyncProvider; 
        if ((value != null) && (provider == null)) 
            throw new InvalidCastException(typeof(ClientSyncProvider).ToString()); 
        this._localProvider = provider; 

(Someone was too lazy to write a meaningful error message… How much effort does it take? I know I wouldn’t tolerate this kind of behavior in my company.)

So there’s no chance for it to work (or I’m somehow using an old version of the SyncAgent). Is there any other way to do a hub-and-spoke sync, without a SyncAgent? I don’t know of it. But the docs for the CTP2 say:

SqlSyncProvider and SqlCeSyncProvider can be used for client-server, peer-to-peer, and mixed topologies, whereas DbServerSyncProvider and SqlCeClientSyncProvider are appropriate only for client-server topologies.

I thought that client-server is the same as hub-and-spoke, now I’m not so sure… At the end, after hours spent researching, I still don't know what to think.

NHibernate queued adds on lazy-load collections

I don’t know if this behaviour is documented (well, yeah, the (N)Hibernate documentation is pretty thin but it’s improving), I wasn’t fully aware of it and this caused a bug… I’m posting this in hope it may save for someone else the time I have lost today :).

In our software we use a custom NHibernate collection type that is derived from AbstractPersistentCollection and keeps track of “back references”: that is, references to the record that owns the collection. It does this automatically when an object is added to the collection.

Now, on collections mapped as lazy-loading, Add() operations are allowed even if the collection is not initialized (i.e. the collection just acts as a proxy). When adding an object to a collection in this state, a QueueAdd() method is called that stores the added object in a secondary collection. Once a lazy initialization is performed, this secondary collection is merged into the main one (I believe it’s the DelayedAddAll() method that does this). This can be hard to debug because lazy load is transparently triggered if you just touch the collection with the debugger (providing the session is connected at that moment), and everything gets initialized properly.

Our backreference was initialized at the moment the object was really added into the main collection. But this is not enough, we had to support queued adds - that is, the cases when QueueAdd returns true. The other alternative is to disable delayed adds by commenting out the places where QueueAdd is called – I don’t know if this is possible, there seems to be some code that supports it. We decided to support delayed add, and it seems to work. The modification looks something like this (this is the PersistentXyz class):

int IList.Add(object value) 
    if (!QueueAdd(value)) 
        return ((IList) bag).Add(value); 
        // if the add was queued, we must set the back reference explicitly 
        if (BackReferenceController != null) 
        return -1; 

Debugging SQL Express Client Sync Provider

How to finally get the SQL Express Client Sync Provider to work correctly? It’s been almost a year since it was released, and still it has documented bugs. One was detected by Microsoft more than a month after release and documented on the forum, but the fix was never included in the released version. We could analyze this kind of shameless negligence in the context of Microsoft's overall quality policies, but it’s a broad (and also well documented) topic, so we’ll leave it at that. It wouldn’t be such a problem if there were no people interested in using it, but there are, very much so. So, what else is there to do than to try to fix what we can ourselves… You can find the source for the class here. To use it, you may also want to download (if you don’t already have it) the original sql express provider source which has the solution and project files which I didn’t include. (UPDATE: the original source seems to be removed from the MSDN site, and my code was updated - see the comments for this post to download the latest version). The first (and solved, albeit only on the forum) problem was that the provider was reversing the sync direction. This happens because the client provider basically simulates client behavior by internally using a server provider. In hub-and-spoke replication, the distinction between client and server is important since only the client databases keep track of synchronization anchors (that is, remember what was replicated and when). I also incorporated support for datetime anchors I proposed in the mentioned forum post, which wasn’t present in the original source. But that is not all that’s wrong with the provider: it seems that it also swaps client and server anchors, and that is a very serious blunder because it’s very hard to detect. It effectively uses client time/timestamps to detect changes on the server and vice versa. I tested it using datetime anchors, and this is the most dangerous situation because if the server clocks aren’t perfectly synchronized, data can be lost. (It might behave differently with timestamps, but it doubt it). The obvious solution for anchors is to also swap them before and after running synchronization. This can be done by modifying the ApplyChanges method like this:

foreach (SyncTableMetadata metaTable in groupMetadata.TablesMetadata)
    SyncAnchor temp = metaTable.LastReceivedAnchor;
    metaTable.LastReceivedAnchor = metaTable.LastSentAnchor;
    metaTable.LastSentAnchor = temp;

// this is the original line
SyncContext syncContext = _dbSyncProvider.ApplyChanges(groupMetadata, dataSet, syncSession); 

foreach (SyncTableMetadata metaTable in groupMetadata.TablesMetadata)
    SyncAnchor temp = metaTable.LastReceivedAnchor;
    metaTable.LastReceivedAnchor = metaTable.LastSentAnchor;
    metaTable.LastSentAnchor = temp;

This seems to correct the anchor confusion but for some reason the @sync_new_received_anchor parameter still receives an invalid value in the update/insert/delete stage, so it shouldn’t be used. The reason for this could be that both the client and server use the same sync metadata and that the server sync provider posing as client probably doesn’t think it is required to leave valid anchor values after it’s finished. I promise to post in the future some more information I gathered poking around the sync framework innards. Note that this version is by no means fully tested nor without issues, but its basic functionality seems correct. You have to be careful to use @sync_new_anchor only in queries that select changes (either that or modify the provider further to correct this behaviour: I think this can be done by storing and restoring anchors in the metadata during ApplyChanges, but I’m not sure whether this is compatible with the provider internal logic). Another minor issue I found was that the trace log reports both client and server providers as servers. If you find and/or fix another issue with the provider, please post a comment here so that we can one day have a fully functional provider.

Who cares about C# interfaces?

I remember that when the first version of Java was released one of the frequent questions was: why doesn’t it have multiple inheritance? This also happened later with C#. The answer in both cases was: you can use interfaces instead. It is not the same, you don’t inherit the logic, but you can at least simulate it by implementing multiple interfaces.

Well, this was obviously forgotten by the people designing C# classes: even worse, they seem not to have been aware of it because most of the classes in v1.0 onward don’t have equivalent interfaces that would allow this.

There’s a real-life example (or should I say everyday example? Because I’ve encountered it a million times, you probably did too) that perfectly illustrates the point: why isn’t there an IControl interface that represents a (windows, web, whichever) control type? The logical answer (and it was probably the real reason the framework designers came up with) would be: well, it would be too difficult to implement. Anything that is supposed to work as a control should really have to be derived from Control. True: but the purpose of interfaces is not only to implement them.

Let’s say I want to create my control type that implements some interface, IMyInterface. How do I reference this component, using a variable of Control type or IMyInterface? Whichever I use it isn’t complete, I have to cast it into the other one and I lose compile-time type safety. I could create a ControlWithMyInterface base class that implements the interface and use this instead of both, but then I’d have to inherit everything from it, which in most cases would not be possible. No, the only solution would be to support the “interfaces instead of multiple inheritance” principle and derive IMyInterface from IControl, but IControl doesn’t exist. Although it wouldn’t be too difficult to implement, one could just put all of Control’s public members into the interface.

There could really be a programming rule enforcing the existence of an equivalent interface for every class, even something that generates them on the fly from its public members. I wonder if there is a pre-processor that can help with this?

Supporting triggers that occasionally generate values with NHibernate

NHibernate supports fields that are generated by the database, but in a limited way. You can mark a field as generated on insert, on update, or both. In this case, NHibernate doesn’t write the field’s value to the database, but creates a select statement that retrieves its value after update or insert.

Ok, but what if you have a trigger that updates this field in some cases and sometimes doesn’t? For example, you may have a document number that is generated for some types of documents, and set by the user for other types. You cannot do this with NHibernate in a regular way – but there is a workaround…

It is possible to map multiple properties to the same database column. So, if you make a non-generated property that is writable, and a generated read-only property, this works. You have to be careful, though, because the non-generated property’s value won’t be refreshed after database writes.

A more secure solution would be to make one of the properties non-public and implement the other one to support both functionalities. Like this:

// This field is used only to send a value to the database trigger:
// the value set here will be written to the database table and can be consumed by
// the trigger. But it will not be refreshed if the value was changed by the trigger.
private int? setDocumentNumber;	

private int? _documentnumber;

// The public property that works as expected, generated but not read-only
// public int? DocumentNumber { get { return _documentnumber; } set { // NHibernate is indifferent to this property's value (it will not // be written to the database), so we have to update the setDocumentNumber // field which is regularly mapped _documentnumber = value; setDocumentNumber = value; } }

Here’s the NHibernate mapping for these two:

<property name="DocumentNumber" generated="always" insert="false" update="false"/>
<property name="setDocumentNumber" column="DocumentNumber" access="field"/>

Running Composite UI Application Block inside a windows service

This was a brain-twister: not a lot of work but hard to figure out. How does one use CAB in a windows service?

Is this a reasonable requirement, a Composite UI framework in an application with no UI? Well, it is, since CAB is not only about UI… If you have a framework of components that use CAB services and need to run them in unattended mode, it would be much easier to implement CAB support in the service instead of modifying everything to run with as well as without CAB.

I’m going to present one solution that worked for me, but I believe there are other variations. Since your requirements may vary, I’ll describe the general idea so you can modify or improve it.

Microsoft Sync Framework: where will it end?

It seems that the Microsoft Sync Framework is being developed in a hurry. It is a quite a big task - or at least it should become big if we're to have a serious data distribution framework - therefore it probably merits some patience, but the first thing that is sacrificed in similar cases is documentation. So what we now have is a piece of software for which it is not easy to figure out how it works, and once you do, there are cases when you’re left to your own power of deduction to figure out how it’s supposed to work.

I haven't dug into the framework deeply enough, but I'm digging. And I have the intention to document the findings, even if it means just sketching everything in short sentences. There are many things not obvious until you start disassembling (and Reflector Ilspy-ing, of course) the innards of the dlls. And even in that case, you have to keep notes because it's not a simple system. I intend to come back to this subject in the posts to come (and get way more technical), be sure to check back if you're interested.

I cannot say for sure, but given the complexity of problem the Sync Framework set out to solve, it is commendably (somewhat bravely, even) comprehensive, well thought out - and quite stable for a Microsoft V1. There's a V2 "on the air" right now, but it's a technology preview and it mostly contains a more mature version of the stuff we've already seen before. But even V2 or V3 would only be the small first step: what it currently does, copying database rows back and forth between PCs is not a mechanism that will allow us to one day easily build distributed systems. Even Sync Framework guys themselves acknowledge that the biggest obstacle is replication conflicts - irregularities that occur when the same piece of data is changed in multiple locations at the same time. Microsoft cannot help but give us a simplified solution in the form of record-by-record detection and resolution, and this is because the framework is in its very early stages: I don't know even if (or when) it will grow smart enough to handle more serious conflict resolution.

The thing is, record-by-record resolution cannot help you enforce business rules: for example, if your business logic depends on an invoice not containing the same product multiple times, how do you prevent this from happening in a distributed system? Two users working with two different databases can each add a record for a single item, but when the records replicate you get two of them. This really needs to be detected, and not in such way that would require programming a separate copy of validation logic for synchronization issues (which, when you think of it, should validate data that was already succesfully written to the database… I shudder to think of it). The synchronization framework would really need to somehow integrate with validation logic: in this aspect (and this is probably the biggest issue but only one of the issues present), Microsoft Sync Framework is much closer to the start line than to the finish. But at least it's moving...

The IT industry has so far moved on mostly in a step by step fashion, by implementing better solutions than the existing ones. This is where the Sync Framework will be of most use, to finally help us start thinking in terms of distributed data. Also, once there's a working system for data distribution, most will be interested in having it. And once they do, it will be much easier to persuade them that they need to structure their data and/or applications differently. Hopefully we'll be moving onto an application design philosophy in which it is a "good thing" to have distributed data just like it currently is a "good thing" to have object-orientedness, layered structure etc. There’s a good chance CRUD will be the one of the things that we'll start getting rid of. Because, once you look at it, storing the current state of data (which is the essence of CRUD - Create, Read, Update, Delete - philosophy) is the major factor in causing replication conflicts. If the databases stored operations - that is, changes to the data – besides data itself, it would be much easier to resolve conflicts, many of them automatically. The logic would know what the two mentioned users did - added the same product to the invoice - and act with this knowledge. In this concrete example there would be a much clearer situation for conflict resolution, the system could replay the operations so that the second one gets a chance to detect there already is a record present and act accordingly - be it to add the second quantity to the first or raise an error. Note that now a common validation logic for the operation could be employed... This is light years away from getting "fait accompli" duplicated rows and having to do a Sherlock Holmes to discover what has happened. Of course, this is also light years from where we currently are, but when you think of it, the database servers are way overdue for serious feature upgrades – and in any case, they already store something that resembles this in transactional logs.

So, it seems we're making the first step in the right general direction, even if we're not sure what precise direction we should move in. Trying to wrap our heads around distributed data philosphy is good – and seeing this practice widely deployed will be even better.

Subscribe to this RSS feed