Menu

Microsoft .Net (22)

Implementing Equals() and GetHashCode() in ORM classes with autoincrement keys

The requirements for implementing Equals() and GetHashCode() in .Net are very hard to satisfy and in some areas nearly impossible. There are some situations where an object's identity inevitably changes, and saving an ORM-mapped object with autogenerated key is a case in point. Making its hash code conform with requirements would require an inordinately large quantity of code.

The plot goes something like this: loading ORM objects on the same session (in NHibernate speak) or context (Entity Framework) guarantees that for one record only one object instance will be created. But, if you use multiple sessions/context, no such guarantee exists. (And you want to do this if objects have different lifespans: for example, you fill a combo box once and then bind it to different records... Obviously, I'm talking about WinForms here, but the principle applies to server-side logic although it's probably not as frequent). In case of multiple instances, .Net components don't know it's the same record unless you override Equals() and get them compared by primary key values. In WinForms, for example, this means that a combo box won't know which record in the dropdown is equal to the one in the bound property, and won't select it.

Ok, so we override Equals(): usually, the record has an autoincrement key called, say, ID. We implement it so that it compares ID's of different objects (and type, obviously)... And now we run into the .Net requirement which says that two objects that are equal must have the same hash code, and that the hash code must never change for an object. We can override GetHashCode() to return the hash of the ID, but if the object gets saved to the database, the ID - and therefore the hash - will change.

Here's an example of how it would work: create a new ORM object instance, it's ID is NULL or zero. Use it as index in a dictionary, the dictionary retrieves the index's hash code and stores data in a special bucket for this hash. Save the record - the ID changes. If the hash code changes now, you won't be able to retrieve your data from the dictionary anymore. But if you load this record on a different session/context, it will have a different hash code unless we somehow notify it to use the already generated one... Which would probably mean using a static instance of a component that tracks all objects. Way too much work to get a hash code right, isn't it...

A couple of details that could lead us closer to a solution:

  • On a new object instance (unsaved - with ID null or zero or whatever), we cannot use the ID in our overrides. Two objects with the empty ID are not equal nor will they ever be: if they both get saved into the database, it will create two separate records, and their IDs will acquire new values that were never used before... An unsaved object is only equal to itself. We could generate same hash codes for unsaved objects, but this wouldn't resolve our problem if a saved object gets its hash from the ID - it would still be different.
  • While we're at it, it's useful knowing how the Dictionary works: it calls GetHashCode() on the given index and stores the entry in a bucket tagged with this hash code. Of course, there may be multiple objects with the same hash, and a bucket may contain multiple entries. Therefore, when retrieving data, the dictionary also calls Equals() on indexes in the bucket to see which of the entries is the right one. This means we have to get both Equals() and GetHashCode() right for the dictionary to work: Equals() should be OK in its simplest form if we always use the same instance for the index - basically, Equals() must be able to recognise the object as equal to itself.
  • Other components, like grids and combo boxes, also use hash codes to efficiently identify instances, so a dictionary isn't the only thing we're supporting with this.

One part of the solution seems mandatory: we need to remember the generated hash on the object. This is mandatory only on an object whose ID may change in the future: if a saved object gets a permanent ID (as they usually do), caching is not necessary. If an unsaved object gets saved, we still use the hash we generated using the empty ID. We do this on-demand: when GetHashCode() is called, the hash is generated and remembered. This is probably the only meaningful way to do this, but it's worth pointing out one detail: if the object isn't used in a dictionary, it's hash won't be generated and won't change when it's saved. Thus, we narrowed down our problem to where this feature is actually used.

But there's still the possibility to have two objects that are equal (same record loaded on different sessions) but have different hash codes (the first one saved into the database and then the same record loaded afterwards). I'm not sure what problems this would create, but one is obvious: we wouldn't be able to use both of them interchangeably in a dictionary (or, possibly, a grid). This is not entirely unnatural, at least to me: if I used an object as an index in a dictionary, I'm regarding the stored entry as related to the object instance and not the record that sits behind it. I'm unaware of other consequences - please comment if you know any.

Note that we can also avoid the dictionary problem by using the IDs themselves instead of objects... But still, it would remain for grids and elsewhere. Also I'm not sure if it could be resolved by not using the data layer (ORM) objects in the dictionaries and grids but having data copied into business layer object instances: if we did this, we'd still need a component that tracks duplicates, only it would track business objects instead of data objects.

Can we narrow this further down? A rather important point is that the ID gets changed only when saving data - and ORMs usually save the whole session as one transaction. If we discarded the saved session and started afresh, we'd get correct hash codes and unchanged IDs. We'd only have a brief period of possible irregularity in the time after the old data is saved and before new data is loaded, and only if we load data on different sessions and use it in a dictionary or some other similar component. In client-side applications, this is a risky period anyway because different components get the new data at different times, and care must be taken not to mix it. At least some kind of freeze should be imposed on the components - suspending layout, disabling databinding etc. Also, reloading data is natural if you have logic that runs in the database (usually triggers) that may perform additional changes to data after we saved ours (and it may do this on related records, not just the ones we saved)... But that is a different, as they say, can of worms: it's just that these worms often link up to better plot our ruin.

Running 64-bit Ace OLEDB driver with 32-bit Office

When 32-bit Microsoft Office is installed on 64-bit Windows, there is a problem connecting to OLEDB sources using Microsoft Jet provider from .Net applications (and probably others). A .Net application, unless otherwise instructed, runs as 64-bit on 64-bit OS's and expects a 64-bit OLEDB provider for Jet. But, since Office is 32-bit, there is no 64-bit provider, and it complains that the provider is "not registered on the local machine". Actually, there doesn't seem to exist a 64-bit Jet provider, and the recommendation is to use the replacement provider which is called ACE and is backwards compatible with Jet. You get it by installing the Microsoft Access Database Engine Redistributable (one version available here) - but: the 32-bit installation doesn't solve the problem, and the 64-bit installation refuses to install because you don't have 64-bit Office.

There are many possible solutions and workarounds on the net (of which the most frequent one is to degrade your application to running 32-bit only) but the real solution is not easy to find. You need to force the 64-bit Access engine installation to install by calling it with the "/passive" argument. Call it from the command prompt like so:

AccessDatabaseEngine_X64.exe /passive

Be careful, though, not to install the same version of the engine as your version of Office. To be more precise, if you install 64-bit Access 2010 engine when 32-bit Office 2010 is present on the system, your Office applications may start complaining. On my laptop, Microsoft Excel started showing a dialog that said "One of your object libraries (|) is missing or damaged" and then tried to install/repair some components, ending with an error saying that it doesn't have the rights to install fonts (?!). This was easily resolved by uninstalling the 64-bit Access engine and installing the 32-bit one, but afterwards my ACE driver was gone again. The winning combination was to upgrade to Office 2013 (still 32-bit) and then install the 64-bit Access 2010 engine. This also seems to work with the Office 2016 / Engine 2010 combination, but not with Office 2010 / Engine 2013... It seems that the newer engine versions are smarter and don't fall for the "/passive" trick, but I haven't tried that many combinations to be sure.

As the last step, you need to change your connection string to use ACE instead of Jet and you're done. What I usually do is have a utility component that detects the presence of drivers and uses ACE as a fallback to Jet. For Excel files, it looks something like this:

/// <summary>
/// True if ACE oleDb driver is supported. False if not. Null if not checked yet.
/// </summary>
private bool? AceOleDbSupported = null;

/// <summary>
/// Name of the Excel file to be imported
/// </summary>
public string FileName { get; set; }

/// <summary>
/// True if the excel file's first row counts as a header
/// </summary>
public bool HasHeaderRow { get; set; }

public string GetConnectionString()
{
    if (AceOleDbSupported == null)
    {
        OleDbEnumerator e = new OleDbEnumerator();
        AceOleDbSupported = e.GetElements().Rows.Cast<DataRow>().Any(dr => dr["SOURCES_NAME"] as string == "Microsoft.ACE.OLEDB.12.0");
    }
    if (AceOleDbSupported.Value)
    {
        return "Provider=Microsoft.ACE.OLEDB.12.0;"
            + "Data Source=" + FileName
            + @";Extended Properties=""Excel 8.0;IMEX=1;"
            + "HDR=" + (HasHeaderRow ? "YES" : "NO") + @";""";
    }
    else
    {
        return "Provider=Microsoft.Jet.OLEDB.4.0;"
            + "Data Source=" + FileName
            + @";Extended Properties=""Excel 8.0;"
            + "HDR=" + (HasHeaderRow ? "YES" : "NO") + @";""";
    }
}

Fluent API vs. named arguments for readability

Fluent APIs are great to make code readable - should I say, at the expense of verbosity? Because verbosity is, actually, one of the positive traits of being fluent. It is very useful in complex and hard to follow scenarios like data transformation (think LINQ), configuration, testing etc., leading the developer to the solution but also documenting what's going on.

One example: I have a LINQ-like method that processes an array. It takes as parameters a criterion for determining duplicates and separate operations for new and repeated entries. It could look like this:

list.DuplicateAwareSelect
(
    r => r.Name.Trim(),
    x => new Entry() { Name = x.Name.Trim(), Date = x.Date },
    (a, b) => { b.Date = a.Date > b.Date ? b.Date : a.Date; }
);

But it isn't obvious what's going on here... The code isn't descriptive at all - and only partially due to the fact that the variable naming scheme stinks. If we make it fluent, the purpose is much clearer:

list.DuplicateAwareSelect()
    .WithKey(r => r.Name.Trim())
    .SelectFirstAppearance(x => new Entry() { Name = x.Name.Trim(), Date = x.Date })
    .ProcessRepeatedAppearance((a, b) => { b.Date = a.Date > b.Date ? b.Date : a.Date; });

But what is the real difference here? In the second example we introduced methods just to describe the parameters from the first example. These parameters had names, though, didn't they, and in this respect the biggest flaw of the first example was that they were invisible. But, we can make them visible using named arguments. Like this:

list.DuplicateAwareSelect
(
    key: r => r.Name.Trim(),
    selectFirstAppearance: x => new Entry() { Name = x.Name.Trim(), Date = x.Date },
    processRepeatedAppearance: (a, b) => { b.Date = a.Date > b.Date ? b.Date : a.Date; }
);

Arguably, this makes the code as readable as the fluent example - aesthetic concerns aside. Of course, fluent can be much more powerful than this and allow more flexibility by providing different methods for different scenarios. But, when describing parameters is the primary concern, named arguments may work just as well.

Naturally, standard rules for code readability also apply, and the first one is naming. If we give descriptive names to arguments, we get something like this:

list.DuplicateAwareSelect
(
    key: sourceRow => sourceRow.Name.Trim(),
    selectFirstAppearance: sourceRow => new Entry() { Name = sourceRow.Name.Trim(), Date = sourceRow.Date },
    processRepeatedAppearance: (sourceRow, previousEntry) => { previousEntry.Date = sourceRow.Date > previousEntry.Date ? previousEntry.Date : sourceRow.Date; }
);

This could be enough for someone familiar with the API to understand the intention.

Get content from WPF DataGridCell in one line of code (hack)

How do you get the text displayed in a WPF DataGridCell? It should be simple, but incredibly it doesn’t seem it is: all the solutions given on the ‘net contain at least a page of code (I suppose the grid designers didn’t think anyone would want to get the value from a grid cell). But when you quick-view a DataGridCell in the debugger, it routinely shows the required value in the “value” column. It does this by calling a GetPlainText() method, which, unfortunately, isn’t public. We can hack it by using reflection – and, absurdly, this solution seems more elegant than any other I’ve seen.

DataGridCell cell = something;

var value = typeof(DataGridCell).GetMethod("GetPlainText", 
System.Reflection.BindingFlags.NonPublic | System.Reflection.BindingFlags.Instance)
.Invoke(cell, null);

Workaround for HQL SELECT TOP 1 in a subquery

HQL doesn’t seem to support clauses like “SELECT TOP N…”, which can cause headaches when for example you need to get the data for the newest record from a table. One way to resolve this would be to do something like “SELECT * FROM X WHERE ID in (SELECT ID FROM X WHERE Date IN (SELECT MAX(Date) FROM X))”, a doubly nested query which looks complicated even in this simple example and gets out of control when query conditions need to be more complex.

What is the alternative? Use EXISTS – as in “a newer record doesn’t exist”. It still looks a bit ugly but at least it’s manageable. The above query would then look like this: “SELECT * FROM X AS X1 WHERE NOT EXISTS(SELECT * FROM X AS X2 WHERE X2.Date > X1.Date)”

Note that this works only for “SELECT TOP 1”. For a greater number there doesn’t seem to be a solution at all.

Repository moved temporarily to '/viewvc/nhibernate/trunk/'; please relocate

When trying to switch my local Subversion copy of the NHibernate source to a different tag (from 3.1GA to trunk, in this case), I got this error:

Repository moved temporarily to '/viewvc/nhibernate/trunk/'; please relocate

The frustrating thing was that I was trying to relocate to exactly this url. And if I tried others, it said that I should relocate to them… I searched the net in vain for the solution, the only information I got is that I should re-configure my apache server (thanks a bunch!)

The problem is, in fact, simple: the URL is wrong. I thought I could just copy the repository’s URL from my web browser, like I do with other sites. Not here: there’s a separate entry for direct SVN access. So instead of using this url:

http://nhibernate.svn.sourceforge.net/viewvc/nhibernate/

use this one:

https://nhibernate.svn.sourceforge.net/svnroot/nhibernate

It does seem like a simple problem but the solution wasn’t so easy to find.

Visual Studio 2010 cannot reference ManagedDTS dll from SQL Server 2005

A C# project that worked with Visual Studio 2008, when converted to Visual Studio 2010, starts complaining about not being able to find classes defined in Microsoft.SQLServer.ManagedDTS.dll and others. These dlls are contained in the SQL Server 2005. If you try to remove the reference and add it again, the errors disappear in the editor, but appear again when you compile the solution. At the end of the jumble of compiler errors there is a small one that betrays the cause:

warning MSB3258: The primary reference "Microsoft.SQLServer.ManagedDTS, Version=9.0.242.0, Culture=neutral, PublicKeyToken=89845dcd8080cc91, processorArchitecture=MSIL" could not be resolved because it has an indirect dependency on the .NET Framework assembly "mscorlib, Version=2.0.3600.0, Culture=neutral, PublicKeyToken=b77a5c561934e089" which has a higher version "2.0.3600.0" than the version "2.0.0.0" in the current target framework.

The problem lies in the Microsoft.SQLServer.msxml6_interop.dll that references the beta version of the .Net framework 2.0. Yes, even after installing three service packs – and worse still, even if you install SQL Server 2008 it will remain there. Why? Apparently, there’s a newer msxml6_interop dll with this reference fixed but unfortunately it has the same version as the old one so it doesn’t replace it in the GAC. Talk about eliminating DLL hell.

But that’s not all, you cannot simply find the new dll and replace it in the GAC. The old one cannot be removed because it’s referenced by the Windows Installer. You have to use brute force, something like this: open the command prompt and try to find the real path to the assembly on the disk. (From Windows Explorer you cannot do this because it replaces the real GAC folder structure with a conceptual, flat view). So, CD to c:\Windows\Assembly and find the folder called Microsoft.SqlServer.msxml6_interop. In it, there will be another folder called something like 6.0.0.0__89845dcd8080cc91, and in it the dll we’ve been looking for. On my computer, the full path is

c:\windows\assembly\GAC_MSIL\Microsoft.SqlServer.msxml6_interop\6.0.0.0__89845dcd8080cc91

Ok, now you should be able to manipulate the dll directly and replace it with the new one. What I like to do in these cases is SUBST the folder and make it accessible from Windows Explorer. Type something like this -

SUBST x: c:\windows\assembly\GAC_MSIL\Microsoft.SqlServer.msxml6_interop\6.0.0.0__89845dcd8080cc91

- and you will be able to see the folder in Windows Explorer as a separate volume X:. From here you can delete the existing file and copy over the newer one. You can find the new one only if you have a machine where SQL Server 2008 is installed first – it’s in the same (or similar) place in the GAC. I used again the command prompt trick to get the file. (Note that I did everything as administrator, you might have to employ additional tricks to work around security).

Here’s a more detailed description with other possible solutions:

http://blogs.msdn.com/b/jason_howell/archive/2010/08/18/visual-studio-2010-solution-build-process-give-a-warning-about-indirect-dependency-on-the-net-framework-assembly-due-to-ssis-references.aspx

How to fix CAB to support dependencies across class hierarchy

The Composite UI Application Block’s Object Builder doesn’t support dependencies for same-named properties at different levels in the class hierarchy. If you add a dependency property which has the same name as a property in a base or derived class, only one of them will be initialized.

The reason for this is probably that the mechanism is based on the Type.GetProperties() method. This method doesn’t return all of the properties the class (and the base classes) contain – rather, it employs a “hide by name and signature” convention and gives only the topmost properties. So the first step we have to do is eliminate the GetProperties method. We do this by modifying the GetMembers() method of the PropertyReflectionStrategy (located in ObjectBuilder/Strategies/Property). It should look like this:

protected override IEnumerable<IReflectionMemberInfo<PropertyInfo>> GetMembers(IBuilderContext context, Type typeToBuild, object existing, string idToBuild)
{
    foreach (PropertyInfo propInfo in GetPropertiesFlattened(typeToBuild))
        yield return new PropertyReflectionMemberInfo(propInfo);
}

private IEnumerable<PropertyInfo> GetPropertiesFlattened(Type typeToBuild)
{
    for (Type t = typeToBuild; t != null; t = t.BaseType)
    {
        foreach (var pi in t.GetProperties(BindingFlags.Public | BindingFlags.Instance | BindingFlags.DeclaredOnly)) // get only properties in this class
        {
            yield return pi;
        }
    }
}

The next problem arises because the PropertyReflectionStrategy keeps a dictionary of existing properties. It’s indexed by property name, which would eliminate our duplicate properties. We have to change it to use the full path - property name prefixed by class name and namespace. I did this by adding a property called FullName to the IReflectionMemberInfo and ReflectionMemberInfo (found in ObjectBuilder/Strategies).

In IReflectionMemberInfo, add:

string FullName { get; }

In ReflectionMemberInfo, add:

public string FullName
{
    get { return memberInfo.DeclaringType.FullName + "." + memberInfo.Name; }
}

There’s a PropertyReflectionMemberInfo class embedded in the PropertyReflectionStrategy, we have to add a similar property to it:

public string FullName
{
    get { return prop.DeclaringType.FullName + "." + prop.Name; }
}

Ok – next, in the PropertyReflectionStrategy we rewire the dictionary to use this new property. Go to AddParametersToPolicy method and change this -

if (!result.Properties.ContainsKey(member.Name))
    result.Properties.Add(member.Name, new PropertySetterInfo(member.MemberInfo, parameter));

- to this -

if (!result.Properties.ContainsKey(member.FullName))
    result.Properties.Add(member.FullName, new PropertySetterInfo(member.MemberInfo, parameter));

One last glitch to fix: go to CompositeUI/WorkItem class, and in the BuildUp() method change this -

propPolicy.Properties.Add("Parent", new PropertySetterInfo("Parent", new ValueParameter(typeof(WorkItem), null)));

- to this -

propPolicy.Properties.Add("Microsoft.Practices.CompositeUI.WorkItem.Parent", new PropertySetterInfo("Parent", new ValueParameter(typeof(WorkItem), null)));

Without this modification, the root WorkItem would have its Parent property reference itself, and it would not be recognized as root WorkItem because it’s Parent property is not null. As a consequence, some initialization methods would not get called and almost nothing would work.

Subscribe to this RSS feed

Joomla! Debug Console

Sesija

Informacije profila

Iskorišćenost memorije

Upiti baze podataka