Pre-fetching data with LINQ to SQL?

(Yes, I know I’m behind the times - “LINQ to SQL? Who needs it when there’s the newest preview/alpha/beta of the Entity Framework?" Well, I did start this application in EF v1 and ran away when I saw “unsupported” stickers plastered all over it. So, no thanks, I’m waiting for the proverbial “Microsoft v3.11” (or 3.51, whatever they call it)).

Looking superficially, one would say that all ORMs are alike. Moreover, as one of the newest to come into the world, LINQ to SQL would be expected to have it’s philosophy and design done according to previously accumulated knowledge. Erm, yes, it’s a polite way of saying that I expected it to be a rip-off of NHibernate…

This similarity may exist in general, but there are some areas in which the two are completely separate worlds. The example that I encountered is performance optimization. Coming from the NHibernate background I was surprised to discover that there are not much optimization topics in common with the two. In some aspects, NHibernate has already solved (at least for me) issues that LINQ to SQL has not yet stumbled upon, but in others, LINQ to SQL focuses on performance issues that don’t even exist as topics in NHibernate.

The search started with the (for an NHibernate guy like me common) “N+1” record problem. Well, either I don’t know how to search for “N+1” in Google or this subject is non-existent in the LINQ world. The “N+1” problem happens if you need to read additional data (say, related records) for each record fetched and do this by executing one SQL command for each record. You fetch 100 records in one query and then execute additional 100 queries for 100 records. So, in total you have 101 SQL commands executed for 100 records read.

Of course, this happens only if you want to load joined data – that is, read multiple joined records and the ORM doesn’t know how to do this in one go. NHibernate sometimes join-loads data by itself but LINQ to SQL doesn’t, so you have to explicitly instruct it to do this. For this, it you can use Eager loading – you instruct the ORM (using DataLoadOptions.LoadWith() method) to fetch the related record along with the original one in a single query. The trouble with this is that if you specify too complicated a structure for eager fetching (for ex. multiple eager-fetch collections on a single record), the ORM won’t be able to read everything in a single query and will execute N+1 SQL commands by itself. Both NHibernate and LINQ to SQL actually do this in a similar fashion. But in NHibernate, you can solve this by pre-fetching additional records in a separate query: you separate the data structure in portions and join-fetch each portion in a single query (this way, each of the multiple mentioned collections gets a separate query and a separate join-fetch). All records get cached in the NHibernate session and each query adds a portion of the missing structure. At the end, you have, say, 3 queries executed to get the full structure.

In LINQ to SQL this cannot be done because you must use DataLoadingOptions to say what gets loaded with what, and once you set the DataLoadingOptions, they cannot be changed. Because of this, you cannot load one set of records at a time: either set DataLoadingOptions to load all or you’re left with partial data. Also, you cannot explicitly load joined objects like in NHibernate (e.g. the statement “load Client along with its associated Orders” would in NHibernate sound something like “from Client c join fetch c.Orders”), here joins are used solely for filtering data.

Obviously, if we want to load two different child collections on a single object we cannot do this in one go. But we also cannot do two consecutive queries with different fetching strategies because we are obliged to use DataLoadingOptions which we cannot change. There may be some secret handshake that can allow me to pre-fetch data separately, but I don’t know about it. Probably the EF will come up with a wider and more mature feature set. As I mentioned, last time I checked EF it was pretty much infantile. But testing it paid off because I learned one important thing: using LINQ for all queries allowed me to rapidly move to LINQ to SQL, and this is where I am now with my N+1 problem. How do you think I’m going to solve it? Yes! LINQ to NHibernate, here we come.

To conclude on the note with which we started, the difference between the two engines… To me it seems that nobody even thought of the N+1 problem in LINQ to SQL: here the biggest performance optimization seems to be on the client side as everyone mentioning query optimization talks about compiling queries… I find it strange that optimizing parsing of a LINQ query could give a performance gain bigger than reducing the number of SQL commands executed, but I never checked so I’ll leave it at that. In NHibernate, on the other hand, there are such things as cached and named queries, but if their impact on performance is proportional to the amount of space given to them on NHibernate blogs and in docs, they are not on top of the list. Could it be that the two technologies are so much different that they have to be optimized in different ways? I don’t think so. It is evident that NHibernate has significantly more mileage than LINQ, but NHibernate tech guys are notorious for not documenting basic stuff, so if something is not mentioned in the docs it doesn’t mean it’s unimportant (ah, the irony of it…) On the other hand, Microsoft technologies receive the most attention in the pre-release stages, when most of the information available is extremely superficial (and I must add that the bloggers behave like a flock of parrots, mindlessly replicating the content from Microsoft CTP announcements ad infinitum and flooding search engines). Because of this, most of the high-tech LINQ information has yet to come. Of course, after LINQ to SQL was discontinued and they started to build yet another ORM – the Entity Framework – from scratch, the wait is naturally going to get even longer. I don’t expect that EF will have a radically different philosophy in the end, but it must at one point implement a way to solve the N+1 problem. Maybe it will do this by itself? It could, and so could NHibernate…

Leave a comment

Make sure you enter the (*) required information where indicated. HTML code is not allowed.

Na vrh