Menu

Microsoft .Net (24)

Visual Studio 2010 cannot reference ManagedDTS dll from SQL Server 2005

A C# project that worked with Visual Studio 2008, when converted to Visual Studio 2010, starts complaining about not being able to find classes defined in Microsoft.SQLServer.ManagedDTS.dll and others. These dlls are contained in the SQL Server 2005. If you try to remove the reference and add it again, the errors disappear in the editor, but appear again when you compile the solution. At the end of the jumble of compiler errors there is a small one that betrays the cause:

warning MSB3258: The primary reference "Microsoft.SQLServer.ManagedDTS, Version=9.0.242.0, Culture=neutral, PublicKeyToken=89845dcd8080cc91, processorArchitecture=MSIL" could not be resolved because it has an indirect dependency on the .NET Framework assembly "mscorlib, Version=2.0.3600.0, Culture=neutral, PublicKeyToken=b77a5c561934e089" which has a higher version "2.0.3600.0" than the version "2.0.0.0" in the current target framework.

The problem lies in the Microsoft.SQLServer.msxml6_interop.dll that references the beta version of the .Net framework 2.0. Yes, even after installing three service packs – and worse still, even if you install SQL Server 2008 it will remain there. Why? Apparently, there’s a newer msxml6_interop dll with this reference fixed but unfortunately it has the same version as the old one so it doesn’t replace it in the GAC. Talk about eliminating DLL hell.

But that’s not all, you cannot simply find the new dll and replace it in the GAC. The old one cannot be removed because it’s referenced by the Windows Installer. You have to use brute force, something like this: open the command prompt and try to find the real path to the assembly on the disk. (From Windows Explorer you cannot do this because it replaces the real GAC folder structure with a conceptual, flat view). So, CD to c:\Windows\Assembly and find the folder called Microsoft.SqlServer.msxml6_interop. In it, there will be another folder called something like 6.0.0.0__89845dcd8080cc91, and in it the dll we’ve been looking for. On my computer, the full path is

c:\windows\assembly\GAC_MSIL\Microsoft.SqlServer.msxml6_interop\6.0.0.0__89845dcd8080cc91

Ok, now you should be able to manipulate the dll directly and replace it with the new one. What I like to do in these cases is SUBST the folder and make it accessible from Windows Explorer. Type something like this -

SUBST x: c:\windows\assembly\GAC_MSIL\Microsoft.SqlServer.msxml6_interop\6.0.0.0__89845dcd8080cc91

- and you will be able to see the folder in Windows Explorer as a separate volume X:. From here you can delete the existing file and copy over the newer one. You can find the new one only if you have a machine where SQL Server 2008 is installed first – it’s in the same (or similar) place in the GAC. I used again the command prompt trick to get the file. (Note that I did everything as administrator, you might have to employ additional tricks to work around security).

Here’s a more detailed description with other possible solutions:

http://blogs.msdn.com/b/jason_howell/archive/2010/08/18/visual-studio-2010-solution-build-process-give-a-warning-about-indirect-dependency-on-the-net-framework-assembly-due-to-ssis-references.aspx

How to fix CAB to support dependencies across class hierarchy

The Composite UI Application Block’s Object Builder doesn’t support dependencies for same-named properties at different levels in the class hierarchy. If you add a dependency property which has the same name as a property in a base or derived class, only one of them will be initialized.

The reason for this is probably that the mechanism is based on the Type.GetProperties() method. This method doesn’t return all of the properties the class (and the base classes) contain – rather, it employs a “hide by name and signature” convention and gives only the topmost properties. So the first step we have to do is eliminate the GetProperties method. We do this by modifying the GetMembers() method of the PropertyReflectionStrategy (located in ObjectBuilder/Strategies/Property). It should look like this:

protected override IEnumerable<IReflectionMemberInfo<PropertyInfo>> GetMembers(IBuilderContext context, Type typeToBuild, object existing, string idToBuild)
{
    foreach (PropertyInfo propInfo in GetPropertiesFlattened(typeToBuild))
        yield return new PropertyReflectionMemberInfo(propInfo);
}

private IEnumerable<PropertyInfo> GetPropertiesFlattened(Type typeToBuild)
{
    for (Type t = typeToBuild; t != null; t = t.BaseType)
    {
        foreach (var pi in t.GetProperties(BindingFlags.Public | BindingFlags.Instance | BindingFlags.DeclaredOnly)) // get only properties in this class
        {
            yield return pi;
        }
    }
}

The next problem arises because the PropertyReflectionStrategy keeps a dictionary of existing properties. It’s indexed by property name, which would eliminate our duplicate properties. We have to change it to use the full path - property name prefixed by class name and namespace. I did this by adding a property called FullName to the IReflectionMemberInfo and ReflectionMemberInfo (found in ObjectBuilder/Strategies).

In IReflectionMemberInfo, add:

string FullName { get; }

In ReflectionMemberInfo, add:

public string FullName
{
    get { return memberInfo.DeclaringType.FullName + "." + memberInfo.Name; }
}

There’s a PropertyReflectionMemberInfo class embedded in the PropertyReflectionStrategy, we have to add a similar property to it:

public string FullName
{
    get { return prop.DeclaringType.FullName + "." + prop.Name; }
}

Ok – next, in the PropertyReflectionStrategy we rewire the dictionary to use this new property. Go to AddParametersToPolicy method and change this -

if (!result.Properties.ContainsKey(member.Name))
    result.Properties.Add(member.Name, new PropertySetterInfo(member.MemberInfo, parameter));

- to this -

if (!result.Properties.ContainsKey(member.FullName))
    result.Properties.Add(member.FullName, new PropertySetterInfo(member.MemberInfo, parameter));

One last glitch to fix: go to CompositeUI/WorkItem class, and in the BuildUp() method change this -

propPolicy.Properties.Add("Parent", new PropertySetterInfo("Parent", new ValueParameter(typeof(WorkItem), null)));

- to this -

propPolicy.Properties.Add("Microsoft.Practices.CompositeUI.WorkItem.Parent", new PropertySetterInfo("Parent", new ValueParameter(typeof(WorkItem), null)));

Without this modification, the root WorkItem would have its Parent property reference itself, and it would not be recognized as root WorkItem because it’s Parent property is not null. As a consequence, some initialization methods would not get called and almost nothing would work.

How to make LINQ to NHibernate eager-load joined properties like the Criteria API

In terms of “lazyness” of a property, there are currently three different ways in which it can be mapped in NHibernate:

  • lazy=”false” means that it’s not lazy at all – the property’s content will be loaded along with its owner object. This means additional data is always loaded when you load an object, and may mean additional sql queries, too.
  • lazy=”proxy” means that the object contained in the property is loaded when any of its public members is accessed. This property will contain an instance of a proxy, which is an object that knows how to initialize itself on-access. It performs the initialization not by loading its properties but by loading an instance of a real object and redirecting its properties and methods to it. This is why everything on the class that is to be proxied needs to be virtual: proxy is an instance of a class derived from it, which is dynamically generated and has every public member overridden to support lazy-loading.
  • lazy=”no-proxy” means that the property is lazy-loaded, but without a proxy. From what I’ve seen, here the lazy property iself is manipulated on the owner object so that it facilitates on-access loading. In any case, there’s no proxy and no duplicate instances. This feature is currently (in v3.0.0) buggy and it seems to work the same as the first option (lazy=”false”), just as it did in 2.0 when it was unsupported.

Each of the options has its bad sides: with proxies, you get duplicate objects and must make everything virtual on your data classes. In the non-lazy option, for each eager property a join is usually added in the sql query so that the property’s values are loaded at the same time, and the same goes for the property’s properties etc. Even worse, HQL queries don’t respect this joining method, they load the main table in one SQL query and then execute an additional query or two (or dozen) for each record to collect its related data – this is called the “N+1 selects” problem. Needles to say, using HQL for such queries is madness: in these cases, it is best to switch to Criteria API which does the joins properly.

And the bad sides of the no-proxy option? It doesn’t work… Other than that, it seems the perfect solution: no joins, no duplicate objects. If you ask me, I don’t want my data to be loaded on-demand at all. I don’t want the application to decide when it will load its data: if I fill a datagrid with one hundred objects and then the grid triggers lazy-loading on each of the objects in turn, this will create chaos. No, I want the application to break if it accesses data that was not explicitly loaded. But with the current implementation I have no choice: it’s either eager or proxy, and I’m choosing eager, for better or worse.

How about LINQ queries? In 2.x it was implemented over the Criteria API which means it knew how to join-load additional records. Not so in 3.x: now it behaves like HQL, N+1 selects all over the place.

So, what is there to do? It seems the only option left is to write all queries with explicit fetch statements for every non-lazy property… This would definitely solve the execution performance issue, but development performance would suffer: if I add a new non-lazy property, I’d have to rewrite all queries where it appears.

Ok, but if we’re using LINQ, it’s a dynamical query, right? It can be modified to include all required fetches. After some research, it turns out that there’s a solution that (at least on the outside) looks even elegant: use an extension method to do this. So, you would do something like:

session.Query<Person>().Where(…).EagerFetchAllNonLazyProperties()

or:

(from p in session.Query<Person> where … select …).EagerFetchAllNonLazyProperties()

Here’s one way this method could be implemented. Note that this is a somewhat hacked implementation and that there are probably some unsupported cases – one thing that is suspicious to me is that Criteria API joined the eager properties recursively while only the first level is covered here, so be careful. But it’s a good start... Preliminary tests were very promising ;).

public static IQueryable<TOriginating> EagerFetchAll<TOriginating>
  (this IQueryable<TOriginating> query)
{
  // hack the session reference out of the provider - or is
  // there a better way to do this?
  ISession session = (ISession)typeof(NhQueryProvider)
    .GetField("_session", System.Reflection.BindingFlags.Instance
      | System.Reflection.BindingFlags.NonPublic)
    .GetValue(query.Provider);

  IClassMetadata metaData = session.SessionFactory
    .GetClassMetadata(typeof(TOriginating));

  for(int i = 0; i < metaData.PropertyNames.Length; i++)
  {
    global::NHibernate.Type.IType propType = metaData.PropertyTypes[i];

    // get eagerly mapped associations to other entities
    if (propType.IsAssociationType && propType.IsEntityType
      && !metaData.PropertyLaziness[i])
    {
      ParameterExpression par = Expression.Parameter(typeof(TOriginating), "p");

      Expression propExp = Expression.Property(par, metaData.PropertyNames[i]);

      Expression callExpr = Expression.Call(null,
        typeof(EagerFetchingExtensionMethods).GetMethod("Fetch")
          .MakeGenericMethod(typeof(TOriginating), propType.ReturnedClass),
        // first parameter is the query, second is property access expression
        query.Expression, Expression.Lambda(propExp, par)
      );

      LambdaExpression expr = Expression.Lambda(callExpr, par);

      Type fetchGenericType = typeof(NhFetchRequest<,>)
        .MakeGenericType(typeof(TOriginating), propType.ReturnedClass);
      query = (IQueryable<TOriginating>)Activator.CreateInstance
        (fetchGenericType, query.Provider, callExpr);
    }
  }

  return query;
}

Lazy developer exception: “Operation is not valid due to the current state of the object.”

If you ever encountered this exception and wondered what it means, here’s the answer: it doesn’t mean anything. This is the default message text of the InvalidOperationException. If an invalid operation exception contains this text, someone was too lazy to supply a real error message when throwing an exception and did just “throw new InvalidOperationException()”.

I wasted hours trying to figure out why the Sync Framework’s FileSyncProvider keeps complaining about the state of an object, which object’s state this is etc. only to find out that someone at Microsoft was loath to spend one minute to at least write a meaningful message. Hopefully, this post will save someone else’s time.

An update to “Remote File Sync using WCF and MSF”

This is a follow-up to Bryant Likes’ post where he gave a prototype solution for file synchronization over WCF. I converted the code to Microsoft Sync Framework 2.0 so now it compiles and seems to run well enough. But you have to keep in mind that it isn’t a complete example (it wasn’t that in the original code either): it only does upload sync, it doesn’t have any conflict resolution logic etc. It is my opinion that this is not really worth pursuing any further, because one would need to develop two complete custom providers - and all that (just for copying files?) when there’s an existing FileSyncProvider in the framework which knows how to cooperate with other providers so it should be able to communicate over WCF… On the other hand, if you do pull the heroic act of completing this code, please let me know because I’m (obviously) very interested in it. I tried to keep the modified code as close to the original as possible so that a simple diff (e.g. WinMerge) can show what I’ve done, because I’m not sure I got it all right (as I’m afraid Bryant wasn’t, too).

Here’s the complete solution: http://www.8bit.rs/download/samples/RemoteSync converted to MSF 2.0.zip

Rewriting LINQ expressions

In this post I’m going to show a real-life example of analyzing LINQ expressions and converting them into other LINQ expression on-the-fly. The real-life example will be making an expression that retrieves individual elements from a path expression: for example, given an expression like “a.PropertyB.PropertyC.PropertyD”, we will create another expression that retrieves all objects on the path, that is, a, a.PropertyB, a.PropertyB.PropertyC, etc. To be more specific, the concrete application for this would be handling a PropertyChanged event on a path: let’s say that, given the “a” value from the above example, we need to be notified when the “PropertyD” property is changed on the object pointed by “a.PropertyB.PropertyC” expression. But at any given moment, either of the objects along the path can be null or it can be replaced: in that case, we need to wait until any of the properties gets changed (that is, subscribe to the PropertyChanged event on its parent in the path), and re-attach our PropertyChanged event handler(s). We could solve this problem by describing the path with a string and utilizing reflection, but LINQ gives us not only compile-time safety (if we miss something within the path expression, we would get a compile error instead of a runtime exception) but the ability to compile the lambda expression and get it to run faster…

Ok, the first step is a brain-twister: we need to construct a LINQ expression. What would it look like? We need to convert

a.PropertyB.PropertyC.PropertyD

into something returning non-null elements in the path so we can attach to their PropertyChanged handlers. Now, one thing is sure here: we don’t have to build the whole expression in LINQ. It is perfectly sensible that we make our own utility methods – this way, we will get at least that part compiled by C# compiler which would probably make it run faster and reduce the amount of work done by the runtime LINQ compiler.

The troublesome detail here is that we cannot call a.PropertyB.PropertyC if a.PropertyB is null. That one should be skipped – but we cannot (yet?) write procedural code inside LINQ expressions, so the best we can do is use the IIF construct – i.e. the “?” and “:” operators in C#. If we could generate an array of values like this -

a, (a == null) ? null : a.PropertyB, (a == null) ? null : ((a.PropertyB == null) ? null : a.PropertyB.PropertyC), …

- then we could possibly feed it to some hardcoded method to extract non-null values and do the rest of the processing. It is easy to call a method from LINQ, we can just use a Call expression. Here’s a snippet of code that would produce something of this sort:

static void Test1()
{
	// example of the input expression
	Expression<Func<ClassA, object>> fn = (a => a.PropB.PropC);

	// example of the output expression
	Expression<Func<ClassA, IList<object>>> res =
	(
		a => Process(a, ((a == null) ? null : a.PropB), 
		((a == null) ? null : ((a.PropB == null) ? null : a.PropB.PropC)))
	);


	// arguments to our method call
	List<Expression> callArgs = new List<Expression>();

	// the first argument will be the parameter: but we also need to use
	// this parameter on the main expression itself
	ParameterExpression paramExpr1 = Expression.Parameter(typeof(ClassA), 
		"a");
	callArgs.Add(paramExpr1);

	// this is the conditional expression
	Expression conditionalExpr1 =
		Expression.Condition
		(
			Expression.Equal(paramExpr1, 
				Expression.Constant(null, typeof(ClassA))),
			Expression.Constant(null, typeof(ClassB)),
			Expression.Property(paramExpr1, "PropA")
		);
		
	callArgs.Add(conditionalExpr1);

	// method call expression: the method signature is below in the source
	Expression callExpr = Expression.Call(null, 
		typeof(Program).GetMethod("Process"), 
		Expression.NewArrayInit(typeof(object), callArgs));

	// and our final expression
	Expression<Func<ClassA, IList<object>> final =
		Expression.Lambda<Func<ClassA, IList<object>>>
		(callExpr, paramExpr1);

	// expression can be compiled to run faster
	Func compiledFunc = final.Compile();

	// a couple of examples of usage
	ClassA a1 = new ClassA();
	ClassA a2 = new ClassA() { PropA = new ClassB() };

	object ret1 = compiledFunc(a1);
	object ret2 = compiledFunc(a2);

	// note that the same thing can be done in a similar way but 
	// may mislead you to do DynamicInvoke which is slower
	LambdaExpression lambda = Expression.Lambda(callExpr, paramExpr1);

	// The slow version can also be compiled, produces a delegate - 
	// which is the same as compiledFunc but has to be cast into Func<> 
	Delegate del = lambda.Compile();

	// this is slower than directly calling Func, although it's calling 
	// the same compiled code
	object ret3 = del.DynamicInvoke(a2);

	// it will be faster to cast it to Func<> and then call it directly
	Func<classa , IList<object>> castDelegate = (Func<ClassA,
		IList<object>>)del;

	// this is as fast as compiledFunc
	object ret4 = castDelegate(a2);
}

public static IList<object> Process(params object[] objs)
{
	return new List<object>(objs);
}

This source is very sketchy - deliberately so since it's idea is just to illustrate the principle (I think that way it would be more useful if you need to do something similar but slightly different). Moreover, if you don't really need this exact solution, the best way to continue is to write an example expression, compile it and then decompile it with reflector ilspy: the compiler produces code for your expression that is exactly the same as the one required to dynamically build it. For the reverse operation, analyzing an existing expression, a very useful tool is the Expression Tree visualizer - it's somewhere in the Visual Studio/Samples folder and needs to be compiled. Once you copy it to My Documents\Visual Studio whatever\Visualizers folder, you can view expression trees inside Quick Watch.

One word about performance: I ran a couple of ad-hoc tests using a simple property accessor expression – not a really serious test but it does show the orders of magnitude we’re dealing with. The speed of the compiled Func is comparable to the speed of compiled C# code. When I try the same operation using reflection, it gets around ten thousand times slower (note that this includes calling GetType().GetProperty() each time, but optimizing this increases its speed for about 20%). DynamicInvoke has similar performance – but this is because there’s only one operation in the expression itself, it would be safe to expect that the overhead of DynamicInvoke doesn’t increase with expression complexity, while the overhead of using reflection would.

The biggest resource hog here is expression compilation, it is one million times slower than compiled execution – that means a hundred times slower than reflection. Not that any of the tests were noticeably slow – it is a simple expression, but even so it performs a thousand compiles for less than a second, which is decidedly not bad.

 

So we now have an idea how to build the output expression: the next step is to analyze the input expression. This is not so simple because the LINQ expression tree elements don’t have anything resembling a tidy class hierarchy (even the “mostly decent” DOM API is a space shuttle compared to it): because of this it seems that any expression type that can possibly appear in the expression tree should be special-cased in our logic. Luckily, we limit our ambition to property-referencing expressions only. In LINQ expression speak, this means we have a series of chained MemberExpressions pointing backwards to a ParameterExpression. So, an expression like “a => a.PropB.PropC” would have an expression tree like this:

MemberExpression
(
	Member = {PropertyInfo pointing to the PropC property}
	Expression = MemberExpression
	(
		Member = {PropertyInfo pointing to the PropB property}
		Expression = {ParameterExpression for parameter a}
	)
)

This should be fairly simple, all we need to do is get the MemberExpression contained in the Body of the root expression, then recursively run through all chained MemberExpressions and stop when we reach the ParameterExpression – the parameter we can copy into our own rewritten expression.

There is at least one small catch here – this is the only one I discovered, there may be more: the compiler may insert a conversion expression at the root of the expression tree if we use nullable types (for what reason, I can’t say, possibly value boxing?) It is represented as a UnaryExpression. In this example, we’ll simply skip over it (just use its Operand property which is a MemberExpression), but I’m quite sure this example is oversimplified and there could be more special cases that need to be handled. (Like, for example, casting, which could be quite legal – even necessary – in expressions like these).

Ok, on to the example... This is an excerpt from working code where PathExpression is the LINQ expressions we want to process.

 

Stack<MemberExpression> expressionStack = new Stack<MemberExpression>();

Expression exp = PathExpression.Body; 

while(true)
{
	if (exp is MemberExpression)
	{
		expressionStack.Push((MemberExpression)exp);
		exp = ((MemberExpression)exp).Expression;
	}
	else if (exp is UnaryExpression 
		&& ((UnaryExpression)exp).NodeType == ExpressionType.Convert)
	{
		// skip convert nodes (there could be one at the beginning of the 
		// expression for some reason if we use nullable properties
		exp = ((UnaryExpression)exp).Operand;
	}
	else if (exp == null || exp is ParameterExpression)
	{
		break;
	}
	else // exp.Expression != null but it’s not a member nor parameter expression
	{
		throw new InvalidOperationException("Unsupported expression type: " 
		+ exp.NodeType + ". Only member access expressions are supported.");
	}
}

ParameterExpression inputParamExpression = null;
Expression previousExpression = null;

// arguments to the method call
List<Expression> callArgs = new List<Expression>();

// the first one should point to the parameter

MemberExpression firstMe = expressionStack.Peek();
if (!(firstMe.Expression is ParameterExpression))
{
	throw new InvalidOperationException("The first expression element 
	doesn't reference an input parameter. The expression should be like 
	'x.PropA.PropB.PropC' where x is an input parameter.");
}

inputParamExpression = (ParameterExpression)firstMe.Expression;
callArgs.Add(inputParamExpression);
previousExpression = inputParamExpression;

List<string> propertyNames = new List<string>();

// now unwrap the expression: we want to build an expression like
//Expression<Func<ClassA, IList<object>>> res =
//(
//    x => Process(x, ((x == null) ? null : x.PropA), ((x == null) ? null 
//	: ((x.PropA == null) ? null : x.PropA.PropB)))
//);
while(expressionStack.Count > 0)
{
	MemberExpression me = expressionStack.Pop(); 

	// skip the last property in the expression: 
	// we don't need its value because
	// we won't attach to its PropertyChanged event
	if (expressionStack.Count >= 1)
	{
		Expression conditionalExpression =
			Expression.Condition
			(
			// in each step we reference the previous expression
				Expression.Equal(previousExpression, 
					Expression.Constant(null, 
						previousExpression.Type)),
				Expression.Constant(null, 
					((PropertyInfo)me.Member).PropertyType),
				Expression.Property(previousExpression, 
					(PropertyInfo)me.Member)
			);

		callArgs.Add(conditionalExpression);
		previousExpression = conditionalExpression;
	}

	propertyNames.Add(((PropertyInfo)me.Member).Name);
}

ParameterExpression thisExpression = 
  Expression.Parameter(typeof(PropertyChangedOnPathWrapper<T>), "this");

Expression callExpr = Expression.Call(thisExpression,
  typeof(PropertyChangedOnPathWrapper<T>).GetMethod
  ("Process", BindingFlags.Instance | BindingFlags.NonPublic),
	Expression.NewArrayInit(typeof(object), callArgs));

Expression<Action<T, PropertyChangedOnPathWrapper<T>>> finalExpr = 
  Expression.Lambda<Action<T, PropertyChangedOnPathWrapper<T>>>
  (callExpr, inputParamExpression, thisExpression);

ExtractionExpression = finalExpr.Compile();

Collection owner not associated with session? Not quite.

I hate when this happens. I upgraded to NHibernate 2.0 and then quickly afterward to 2.1.0 (you guessed it: because of LINQ). I had to change a couple of things to support it in my company’s application framework and it all seemed to work well – until I discovered that deleting any entity that has a one-to-many relation with cascade=”all-delete-orphan” stopped functioning. It died with a cryptic error message of “collection owner not associated with session”… If I changed to cascade=”all” it worked, but this is not the point, it wasn’t broken earlier. Of course, I tried looking all over the web and apart from a page in Spanish (which wouldn’t be helpful even if it was in English) came up blank. Tried moving to NHibernate 2.1.2 - which is not that simple since we’re using a slightly modified version of NHibernate (a reason more to suspect that the solution to this problem would be hard to find). So here’s a short post for anyone stumbling upon a similar problem.

In the end, I traced it to this behaviour: the collection owner is not found in the session because NHibernate tries to find it using ID = 0, while it’s original ID was 48. The logic is somewhat strange here, because the method receives the original collection owner (which is in the session), retrieves its ID (which was for some reason reset to 0) and then tries to find it using this wrong ID. Moreover, there’s a commented-out code that says “// TODO NH Different behavior” that would seem to do things properly (I checked it, it’s still standing in the NHibernate trunk as is). But the real reason why this happened is that blasted zero in the ID: further debugging (thankfully, there’s a full source for NHibernate available), revealed that it was reset because “use_identifier_rollback” was turned on in the configuration. Well… I probably set this to experiment with it and forgot. Turning it off solved the problem for me… Luckily, I didn’t really need this rollback functionality - as it’s not exactly what it seems to be: it doesn’t rollback identifiers when the transaction is rolled back, it rolls them back when entities are deleted! Why the second feature made more sense to implement than the first one is a mystery to me...

Replicating self-referencing tables and circular foreign keys with Microsoft Sync Framework

Self-referencing tables – or, at least circular foreign key references between tables – are probably a common thing in all but the simplest database designs. Yet Microsoft Sync Framework doesn’t have a clear strategy on how to replicate such data. I found various suggestions on the net: order rows so that the parent records come before children – this is usable for self-referencing tables (although not endorsed by Microsoft because the framework doesn’t guarantee it will respect this order), but not nearly good enough for circular references – if you have two rows in two tables pointing at each other, ordering them cannot solve the problem. On an MSDN forum there was a suggestion to temporarily disable foreign key constraints: this I cannot take seriously because it opens my database to corruption, all it takes is one faulty write while the constraint is down and I have invalid data in the database (unless I lock the tables before synchronization, and I’m not sure how to do this from within the Sync Framework).

So, when all else fails, you have to sit and think: what would be the general principle for solving this, Sync Framework notwithstanding? Exactly - do it in two passes. The problem is present only when inserting rows, if the row contains a reference to another row that wasn’t yet created, we get a foreign key violation… Our strategy could be to insert all rows without setting foreign key field values, then do another pass to just connect the foreign keys. If we do this after all tables have finished their first pass (inserts, updates, deletes and all), we also support the circular references because required rows are present in all tables. Ok, that was fairly easy to figure out (not much harder to implement either, but more on that later). We have another issue here that is not so obvious, deleting the rows… There may be other rows referencing the one we are deleting that haven’t yet been replicated. Since the Sync Framework applies the deletes first, we can be fairly certain that the referencing rows are yet to be replicated - they will either be deleted or updated to reference something else. So we can put a null value in all fields that reference our row. (Note that this will probably mark the other rows as modified and cause them to be replicated back – this is an issue I won’t go into in this post, but I’m quite certain there needs to be a global mechanism for disabling change tracking while we’re writing replicated data. I currently use a temporary “secret handshake” solution: I send a special value - the birth date of Humphrey Bogart - in the row’s creation/last update date fields that disables the change tracking trigger).

Ok, on to the code. I won’t give you a working example here, just sample lines with comments. You’ve probably figured out by now that it will be necessary to write SQL commands for the sync adapter by hand. I don’t know about you, but I’m no longer surprised by this: many of the tools and components we get in the .Net framework packages solve just the simplest problems and provide nice demos – if you need anything clever, you code it by hand. My solution was to create my own designer/code generator, and now I’m free to support any feature I need (also, I am able to do it much faster than Microsoft, for whatever reason: it took me a couple of days to add this feature… It may be that I’m standing on the shoulders of giants, but the giants could really have spared a couple of days to do this themselves). For simplicity, I’ll show how to replicate a circular reference: there’s an Item table that has an ID, a Name, and a ParentID, referencing itself. For replication, I split the table into two SyncAdapters: Item, that inserts only ID and Name and has a special delete command to eliminate foreign references beforehand, and Item2ndPass, which has only the insert command – but the only thing insert command does is wiring up of ParentID’s, it does not insert anything. I’ve deleted all the usual command creation and parameter addition code, the point is only to show the SQL’s, since they hold the key to the solution.

[Serializable]
public partial class ItemSyncAdapter : Microsoft.Synchronization.Data.Server.SyncAdapter
{
	partial void OnInitialized();

	public ItemSyncAdapter()
	{
		this.InitializeCommands();
		this.InitializeAdapterProperties();
		this.OnInitialized();
	}

	private void InitializeCommands()
	{
		// InsertCommand
		// 1899-12-25 00:00:00.000 is a 'Humphrey Bogart' special value telling
		// the change tracking trigger to skip this row
		this.InsertCommand.CommandText =  @"SET IDENTITY_INSERT Item ON
INSERT INTO Item ([ID], [Name], [CreatedDate], [LastUpdatedDate]) VALUES (@ID, @Name,
@sync_last_received_anchor, '1899-12-25 00:00:00.000') SET @sync_row_count = @@rowcount
SET IDENTITY_INSERT Item OFF";

		// UpdateCommand
		this.UpdateCommand.CommandText = @"UPDATE Item SET [Name] = @Name,
CreatedDate='1899-12-25 00:00:00.000', LastUpdatedDate=@sync_last_received_anchor WHERE
([ID] = @ID) AND (@sync_force_write = 1 OR ([LastUpdatedDate] IS NULL OR [LastUpdatedDate]
<= @sync_last_received_anchor)) SET @sync_row_count = @@rowcount";		

		// DeleteCommand
		this.DeleteCommand.CommandText = @"UPDATE Item SET [ParentID] = NULL
WHERE [ParentID] = @ID DELETE FROM Item WHERE ([ID] = @ID) AND (@sync_force_write = 1 OR
([LastUpdatedDate] <= @sync_last_received_anchor OR [LastUpdatedDate] IS NULL))
SET @sync_row_count = @@rowcount";

		// SelectConflictUpdatedRowsCommand, SelectConflictDeletedRowsCommand
		// skipped because they are not relevant

		// SelectIncrementalInsertsCommand
		this.SelectIncrementalInsertsCommand.CommandText = @"SELECT  [ID],
[ParentID], [CreatedDate], [LastUpdatedDate] FROM Item WHERE ([CreatedDate] >
@sync_last_received_anchor AND [CreatedDate] <= @sync_new_received_anchor)";

		// SelectIncrementalUpdatesCommand
		this.SelectIncrementalUpdatesCommand.CommandText = @"SELECT  [ID],
[ParentID], [CreatedDate], [LastUpdatedDate] FROM Item WHERE ([LastUpdatedDate] >
@sync_last_received_anchor AND [LastUpdatedDate] <= @sync_new_received_anchor AND
[CreatedDate] <= @sync_last_received_anchor)";

		// SelectIncrementalDeletesCommand
		this.SelectIncrementalDeletesCommand.CommandText = @"SELECT FirstID
AS ID FROM sys_ReplicationTombstone WHERE NameOfTable = 'Item' AND DeletionDate >
@sync_last_received_anchor AND DeletionDate <= @sync_new_received_anchor";
	}

	private void InitializeAdapterProperties()
	{
		this.TableName = "Item";
	}

} // end ItemSyncAdapter 
[Serializable]
public partial class Item2ndPassSyncAdapter : Microsoft.Synchronization.Data.Server.SyncAdapter
{
	partial void OnInitialized();

	public Item2ndPassSyncAdapter()
	{
		this.InitializeCommands();
		this.InitializeAdapterProperties();
		this.OnInitialized();
	}

	private void InitializeCommands()
	{
		// InsertCommand
		this.InsertCommand.CommandText =  @"UPDATE Item SET [ParentID] = @ParentID,
CreatedDate='1899-12-25 00:00:00.000', LastUpdatedDate=@sync_last_received_anchor WHERE ([ID] =
@ID) AND (@sync_force_write = 1 OR ([LastUpdatedDate] IS NULL OR [LastUpdatedDate] <=
@sync_last_received_anchor)) SET @sync_row_count = @@rowcount";

		// SelectIncrementalInsertsCommand
		this.SelectIncrementalInsertsCommand.CommandText = @"SELECT  [ID],
[ParentID] FROM Item WHERE ([CreatedDate] > @sync_last_received_anchor AND [CreatedDate] <=
@sync_new_received_anchor)";

	}

	private void InitializeAdapterProperties()
	{
		this.TableName = "Item2ndPass";
	}

} // end Item2ndPassSyncAdapter

In this case, it would be enough to setup the second-pass sync adapter to be executed after the first one. For circular references, I put all second-pass adapters at the end, after all first-pass adapters. Notice that the commands for selecting incremental inserts and updates read all columns - this is probably suboptimal because some fields will not be used, but it's much more convenient to have all field values handy than to rework the whole code generator template for each minor adjustment.

UPDATE (24.11.2010): I’ve added a source file with an illustration for this solution. I haven’t tested it (although it does compile and may well work) but it could be useful in showing the overall picture. It was created by extracting one generated sync adapter from my application, hacking away most of our specific code and then making it compile. http://www.8bit.rs/download/samples/ItemSyncAgentSample.cs Note that the file contains a couple of things that stray away from standard implementation, like using one table for all tombstone records, using the sample sql express client sync provider etc. Just ignore these. One thing, though, may be of interest (I may even do a blog post about it one day): the insert command knows how to restore autoincrement ID after replication, so that different autoincrement ranges can exist in different replicated databases (no GUIDs are used for primary keys) and identity insert is possible. This is necessary because SQL server automatically sets the current autoincrement seed to the largest value inserted in the autoincrement column. Keep in mind that this (as well as the whole class, for that matter) may not be the best way to do things – but I’ve been using it for some time now and haven’t had any problems.

Subscribe to this RSS feed