Menu

Replicating self-referencing tables and circular foreign keys with Microsoft Sync Framework

Self-referencing tables – or, at least circular foreign key references between tables – are probably a common thing in all but the simplest database designs. Yet Microsoft Sync Framework doesn’t have a clear strategy on how to replicate such data. I found various suggestions on the net: order rows so that the parent records come before children – this is usable for self-referencing tables (although not endorsed by Microsoft because the framework doesn’t guarantee it will respect this order), but not nearly good enough for circular references – if you have two rows in two tables pointing at each other, ordering them cannot solve the problem. On an MSDN forum there was a suggestion to temporarily disable foreign key constraints: this I cannot take seriously because it opens my database to corruption, all it takes is one faulty write while the constraint is down and I have invalid data in the database (unless I lock the tables before synchronization, and I’m not sure how to do this from within the Sync Framework).

So, when all else fails, you have to sit and think: what would be the general principle for solving this, Sync Framework notwithstanding? Exactly - do it in two passes. The problem is present only when inserting rows, if the row contains a reference to another row that wasn’t yet created, we get a foreign key violation… Our strategy could be to insert all rows without setting foreign key field values, then do another pass to just connect the foreign keys. If we do this after all tables have finished their first pass (inserts, updates, deletes and all), we also support the circular references because required rows are present in all tables. Ok, that was fairly easy to figure out (not much harder to implement either, but more on that later). We have another issue here that is not so obvious, deleting the rows… There may be other rows referencing the one we are deleting that haven’t yet been replicated. Since the Sync Framework applies the deletes first, we can be fairly certain that the referencing rows are yet to be replicated - they will either be deleted or updated to reference something else. So we can put a null value in all fields that reference our row. (Note that this will probably mark the other rows as modified and cause them to be replicated back – this is an issue I won’t go into in this post, but I’m quite certain there needs to be a global mechanism for disabling change tracking while we’re writing replicated data. I currently use a temporary “secret handshake” solution: I send a special value - the birth date of Humphrey Bogart - in the row’s creation/last update date fields that disables the change tracking trigger).

Ok, on to the code. I won’t give you a working example here, just sample lines with comments. You’ve probably figured out by now that it will be necessary to write SQL commands for the sync adapter by hand. I don’t know about you, but I’m no longer surprised by this: many of the tools and components we get in the .Net framework packages solve just the simplest problems and provide nice demos – if you need anything clever, you code it by hand. My solution was to create my own designer/code generator, and now I’m free to support any feature I need (also, I am able to do it much faster than Microsoft, for whatever reason: it took me a couple of days to add this feature… It may be that I’m standing on the shoulders of giants, but the giants could really have spared a couple of days to do this themselves). For simplicity, I’ll show how to replicate a circular reference: there’s an Item table that has an ID, a Name, and a ParentID, referencing itself. For replication, I split the table into two SyncAdapters: Item, that inserts only ID and Name and has a special delete command to eliminate foreign references beforehand, and Item2ndPass, which has only the insert command – but the only thing insert command does is wiring up of ParentID’s, it does not insert anything. I’ve deleted all the usual command creation and parameter addition code, the point is only to show the SQL’s, since they hold the key to the solution.

[Serializable]
public partial class ItemSyncAdapter : Microsoft.Synchronization.Data.Server.SyncAdapter
{
	partial void OnInitialized();

	public ItemSyncAdapter()
	{
		this.InitializeCommands();
		this.InitializeAdapterProperties();
		this.OnInitialized();
	}

	private void InitializeCommands()
	{
		// InsertCommand
		// 1899-12-25 00:00:00.000 is a 'Humphrey Bogart' special value telling
		// the change tracking trigger to skip this row
		this.InsertCommand.CommandText =  @"SET IDENTITY_INSERT Item ON
INSERT INTO Item ([ID], [Name], [CreatedDate], [LastUpdatedDate]) VALUES (@ID, @Name,
@sync_last_received_anchor, '1899-12-25 00:00:00.000') SET @sync_row_count = @@rowcount
SET IDENTITY_INSERT Item OFF";

		// UpdateCommand
		this.UpdateCommand.CommandText = @"UPDATE Item SET [Name] = @Name,
CreatedDate='1899-12-25 00:00:00.000', LastUpdatedDate=@sync_last_received_anchor WHERE
([ID] = @ID) AND (@sync_force_write = 1 OR ([LastUpdatedDate] IS NULL OR [LastUpdatedDate]
<= @sync_last_received_anchor)) SET @sync_row_count = @@rowcount";		

		// DeleteCommand
		this.DeleteCommand.CommandText = @"UPDATE Item SET [ParentID] = NULL
WHERE [ParentID] = @ID DELETE FROM Item WHERE ([ID] = @ID) AND (@sync_force_write = 1 OR
([LastUpdatedDate] <= @sync_last_received_anchor OR [LastUpdatedDate] IS NULL))
SET @sync_row_count = @@rowcount";

		// SelectConflictUpdatedRowsCommand, SelectConflictDeletedRowsCommand
		// skipped because they are not relevant

		// SelectIncrementalInsertsCommand
		this.SelectIncrementalInsertsCommand.CommandText = @"SELECT  [ID],
[ParentID], [CreatedDate], [LastUpdatedDate] FROM Item WHERE ([CreatedDate] >
@sync_last_received_anchor AND [CreatedDate] <= @sync_new_received_anchor)";

		// SelectIncrementalUpdatesCommand
		this.SelectIncrementalUpdatesCommand.CommandText = @"SELECT  [ID],
[ParentID], [CreatedDate], [LastUpdatedDate] FROM Item WHERE ([LastUpdatedDate] >
@sync_last_received_anchor AND [LastUpdatedDate] <= @sync_new_received_anchor AND
[CreatedDate] <= @sync_last_received_anchor)";

		// SelectIncrementalDeletesCommand
		this.SelectIncrementalDeletesCommand.CommandText = @"SELECT FirstID
AS ID FROM sys_ReplicationTombstone WHERE NameOfTable = 'Item' AND DeletionDate >
@sync_last_received_anchor AND DeletionDate <= @sync_new_received_anchor";
	}

	private void InitializeAdapterProperties()
	{
		this.TableName = "Item";
	}

} // end ItemSyncAdapter 
[Serializable]
public partial class Item2ndPassSyncAdapter : Microsoft.Synchronization.Data.Server.SyncAdapter
{
	partial void OnInitialized();

	public Item2ndPassSyncAdapter()
	{
		this.InitializeCommands();
		this.InitializeAdapterProperties();
		this.OnInitialized();
	}

	private void InitializeCommands()
	{
		// InsertCommand
		this.InsertCommand.CommandText =  @"UPDATE Item SET [ParentID] = @ParentID,
CreatedDate='1899-12-25 00:00:00.000', LastUpdatedDate=@sync_last_received_anchor WHERE ([ID] =
@ID) AND (@sync_force_write = 1 OR ([LastUpdatedDate] IS NULL OR [LastUpdatedDate] <=
@sync_last_received_anchor)) SET @sync_row_count = @@rowcount";

		// SelectIncrementalInsertsCommand
		this.SelectIncrementalInsertsCommand.CommandText = @"SELECT  [ID],
[ParentID] FROM Item WHERE ([CreatedDate] > @sync_last_received_anchor AND [CreatedDate] <=
@sync_new_received_anchor)";

	}

	private void InitializeAdapterProperties()
	{
		this.TableName = "Item2ndPass";
	}

} // end Item2ndPassSyncAdapter

In this case, it would be enough to setup the second-pass sync adapter to be executed after the first one. For circular references, I put all second-pass adapters at the end, after all first-pass adapters. Notice that the commands for selecting incremental inserts and updates read all columns - this is probably suboptimal because some fields will not be used, but it's much more convenient to have all field values handy than to rework the whole code generator template for each minor adjustment.

UPDATE (24.11.2010): I’ve added a source file with an illustration for this solution. I haven’t tested it (although it does compile and may well work) but it could be useful in showing the overall picture. It was created by extracting one generated sync adapter from my application, hacking away most of our specific code and then making it compile. http://www.8bit.rs/download/samples/ItemSyncAgentSample.cs Note that the file contains a couple of things that stray away from standard implementation, like using one table for all tombstone records, using the sample sql express client sync provider etc. Just ignore these. One thing, though, may be of interest (I may even do a blog post about it one day): the insert command knows how to restore autoincrement ID after replication, so that different autoincrement ranges can exist in different replicated databases (no GUIDs are used for primary keys) and identity insert is possible. This is necessary because SQL server automatically sets the current autoincrement seed to the largest value inserted in the autoincrement column. Keep in mind that this (as well as the whole class, for that matter) may not be the best way to do things – but I’ve been using it for some time now and haven’t had any problems.

9 comments

  • Guy

    Hello bdrajer,

    Thanks very much for you clear and concise reply, it has really elevated my fears, as I wasn't sure if there was an automatic process in the sync framework that tagged an item version on the first step, making it in-sync for the next step. I assume we have to manually

    The design of my sync config file is that I have the MSSQL tables and columns imported, but then I create "Table definitions" A table definition is simply a table definition name (prefix and table name) that contains a table and a selection of its columns that I want to synchronise. I can then attach these to sync-groups, maybe one called maybe "first pass" containing multiple definitions of all tables (with some excluding weak FKs relations that will be inserted as null), then a second group called "second pass" that just contains the weak FKs columns, to complete the transaction. In my tool I can add sync to sysnc sets in sequence, so the developer can control the order in which the groups are executed, but the tables in a group are executed in auto dependency order.

    Note my tool automatically loads table information and dependencies from MSSQL, then when table definitions are added to groups they are automatically ordered into strong to weak dependency order. Unfortunately this does not deal with deletions, so maybe a third pass in reverse dependency order is required for that, as we do not use any cascading deletion. Note I set a dependency level to each MSSQL table, i.e. All tables without compulsory FK columns are set to level 0, then all tables that only have dependencies to level 0 are set to level 1, and so on. This obviously excludes self references too.

    Just wanted to share this with others, as I have found this design to be really flexible.

    Thanks again,

    Guy

    posted by Guy sreda, 30 maj 2012 12:04 Comment Link
  • bdrajer

    Hi Guy,

    If I understood your problem correctly, your assumption that the sync framework will consider the strong entity up-to-date after step one may be wrong. It is up to us to construct the SelectIncremental*** commands to tell the framework what has changed, and all of the select commands are executed *before* any of the updates takes place (either 1st or 2nd pass).

    If my memory of how sync framework does things is still valid, this is what happens... The sync framework collects all changes from all adapters, both 1st pass and 2nd pass. To do this, it executes all SelectIncremental***Command's. It puts all the retrieved data in a DataSet, transports it to the other database and only then starts writing any data anywhere. So, the second pass is only a means for us to get the data written in a specific order, otherwise it doesn't change the replication process: the changes are always completely retrieved before doing any inserts or updates.

    (Note that I'm talking about a one-way sync since it makes the example simpler. I don't suppose there could be a two-way border case which couldn't be handled by this algorithm but I may be wrong.)

    The important thing to remember for the insert/update part is that all inserts are executed first, one for each sync adapter and in the order in which they were added to the SyncAdapters collection. Here I take care to put all first-step adapters (that is, the ones not containing the circular references) first, and the second-step adapters after them. Also, parent entities come before children: of course, if both entities are strong, it is up to me to decide which of the two is the parent and which is the child. This determines which of the two relation is the "backward" relation that goes into the second step. In your case there is no dilemma, the strong entity should come before the weak, and you would put the strong -> weak relation in the second step adapter.

    The outcome of the first-pass insert is that we now have all of the records present in the database. Since parent entities are inserted before children, "forward" relations can be inserted along with them. This should also mean that weak entities can be inserted with no problems. The second-pass insert actually does an sql update to reconnect the missing "backward" references from parents to the children.

    I hope this helped at least a bit... A synchronization adapter is quite a complex thing to set up, there's a lot of things to consider and many small details that can easily create a bug which would then be insanely hard to track down. I'm not sure there is not a hidden glitch somewhere inside my implementation, but I use the sync framework only in a limited scope so I can't say. My opinion is that the sync framework is only 10% of a real solution, it contains an idea but for real-world implementation we're still mostly left to our own devices.

    posted by bdrajer petak, 25 maj 2012 13:31 Comment Link
  • Guy

    Hi,

    Great minds think alike, I came up with almost the same solution and am trying to implement it at the moment. The issue I have is that we have circular relations between 2 tables as opposed to self referencing tables. You solution above is elegant for self referencing tables, but obviously we have a 3 step process between 2 tables that need to be run in sequence to work.

    i.e.

    1. Strong Entity Table: Create Parent of 1..n relation with child 0..n relation set to null

    2. Weak Entity Table: Create child of 1..n relation

    3. String Entity Table: update 0..n relation from null to child entity ID.

    My concern here is how do I do this, as the way I understand sync framework , it will assume the Strong Entity Table is up-to-date after step 1., therefore step 3. will not occur. Am I correct or have I miss understood the way the metadata works on versioning? If not how can I work around this issue.

    I have written a tool that sorts the sync table steps into strong entity relations and can create multiple definitions of the same tables, i.e. different columns in different definitions, this meets the requirements for 1. and 3. above. I then can put the steps into groups that are automatically ordered based on dependency, I can then order the groups. This means that the steps are run in group order, followed by table dependency order.

    Given this I am still having issues getting my head around how the sync process will deal with the change data, for the same row version in steps that are separate by a child table step.

    Maybe I just am misunderstanding something here???

    Guy

    posted by Guy petak, 25 maj 2012 09:03 Comment Link
  • bdrajer

    Hi Nic,

    The solution was implemented with Sync Framework 2.1. The sys_ReplicationTombstone is a common tombstone table for the whole database. It has columns for table name, deletion date and primary key value (or values) for the deleted row.

    As for the link to the file, it was the wrong link, sorry (don't know what happened to it, I'm sure I checked it when I posted it). Please find the updated link above.

    I can't say if there's a better solution for this problem since I haven't been very active lately as far as Sync Framework is concerned. The whole thing is starting to look like an open source project that died, so I'm running with what I have and looking for alternatives - of which a centralized database looks like the most promising one :).

    Best regards!

    posted by bdrajer sreda, 21 mart 2012 16:50 Comment Link
  • Nic

    Hi

    I seem to have run into the same issue with the Sync Framework on self-referencing tables and this article is basically the only resource available with a solution to the problem (besides the others on Microsoft forum - but i have the same concerns, as you have written in the first paragraph).

    I have attempted to implement your solution (changing sync adapter) but there are a few things im not sure of:
    1. What sync framework version is the solution in this article for?
    2. What is sys_ReplicationTombstone?
    3. Just to make sure the example in the article and the link to the client sync provider are two separate solutions (i dont quite understand what you have done in the client provider though)?
    3. Is there an official updated solution to this problem (i havent found one on the Microsoft forums yet)?

    posted by Nic ponedeljak, 19 mart 2012 16:26 Comment Link
  • bdrajer

    Hi Mickaël,

    Sorry for the delay. It took some time but I managed to slap up an example source file. It's somewhat different from the code shown above (a bit more complicated), but hopefully it will help you get around. See the update at the end of the post.

    posted by bdrajer sreda, 24 novembar 2010 15:22 Comment Link
  • Mickaël

    Hi,

    I'm trying to create a poc of your idea. On the second adapter, you only have two commands ?

    Thanks

    posted by Mickaël četvrtak, 18 novembar 2010 17:21 Comment Link
  • bdrajer

    Hi,

    I don't have a standalone example for this, it's an excerpt from a big implementation, sorry... But if you can tell me what exactly is the problem, I may be able to help.

    Here's an idea that could be useful: the code I have shown above was written in the same manner as what is generated by the Local Database Cache designer (which is built into the Visual Studio). If you add a Local Database Cache item to a project, point it to a database and select a couple of tables from it, it will generate (in its *.Designer.cs file) code that is very similar to the one in my post. But, where the designer produces a single sync adapter for one table (and only for one-way replication), my solution uses two - one "normal" and one for the second pass replication. Look at the designer-generated InitializeSyncAdapters() method, it is responsible for creating adapters and adding them to the internal adapters collection. Modify this code so that it instantiates our "normal" adapter instead of the of the generated one, and add the second-pass one at the end. In this way, the normal adapter will replicate all fields except the circular-referencing ones, and the second pass adapter will come at the end and update what's missing. (Of course, if you modify generated code and use the designer, you'll lose your changes - so copy the code somewhere else, you may have to reconstruct the generated class a bit to be able do this).

    Hope this makes any sense... It's not a trivial task because there may be a lot of code to be written, but the code itself is not too complicated (you could modify the generated code or write your own generator for it), so it's not too hard either.

    posted by bdrajer četvrtak, 18 novembar 2010 12:28 Comment Link
  • Mickaël

    I have the same problem but I don't understand how to integrate your solution in my development. Can you share a sample please ?

    Thanks

    posted by Mickaël četvrtak, 18 novembar 2010 11:17 Comment Link

Leave a comment

Make sure you enter the (*) required information where indicated. HTML code is not allowed.

Na vrh