Menu

Rewriting LINQ expressions

In this post I’m going to show a real-life example of analyzing LINQ expressions and converting them into other LINQ expression on-the-fly. The real-life example will be making an expression that retrieves individual elements from a path expression: for example, given an expression like “a.PropertyB.PropertyC.PropertyD”, we will create another expression that retrieves all objects on the path, that is, a, a.PropertyB, a.PropertyB.PropertyC, etc. To be more specific, the concrete application for this would be handling a PropertyChanged event on a path: let’s say that, given the “a” value from the above example, we need to be notified when the “PropertyD” property is changed on the object pointed by “a.PropertyB.PropertyC” expression. But at any given moment, either of the objects along the path can be null or it can be replaced: in that case, we need to wait until any of the properties gets changed (that is, subscribe to the PropertyChanged event on its parent in the path), and re-attach our PropertyChanged event handler(s). We could solve this problem by describing the path with a string and utilizing reflection, but LINQ gives us not only compile-time safety (if we miss something within the path expression, we would get a compile error instead of a runtime exception) but the ability to compile the lambda expression and get it to run faster…

Ok, the first step is a brain-twister: we need to construct a LINQ expression. What would it look like? We need to convert

a.PropertyB.PropertyC.PropertyD

into something returning non-null elements in the path so we can attach to their PropertyChanged handlers. Now, one thing is sure here: we don’t have to build the whole expression in LINQ. It is perfectly sensible that we make our own utility methods – this way, we will get at least that part compiled by C# compiler which would probably make it run faster and reduce the amount of work done by the runtime LINQ compiler.

The troublesome detail here is that we cannot call a.PropertyB.PropertyC if a.PropertyB is null. That one should be skipped – but we cannot (yet?) write procedural code inside LINQ expressions, so the best we can do is use the IIF construct – i.e. the “?” and “:” operators in C#. If we could generate an array of values like this -

a, (a == null) ? null : a.PropertyB, (a == null) ? null : ((a.PropertyB == null) ? null : a.PropertyB.PropertyC), …

- then we could possibly feed it to some hardcoded method to extract non-null values and do the rest of the processing. It is easy to call a method from LINQ, we can just use a Call expression. Here’s a snippet of code that would produce something of this sort:

static void Test1()
{
	// example of the input expression
	Expression<Func<ClassA, object>> fn = (a => a.PropB.PropC);

	// example of the output expression
	Expression<Func<ClassA, IList<object>>> res =
	(
		a => Process(a, ((a == null) ? null : a.PropB), 
		((a == null) ? null : ((a.PropB == null) ? null : a.PropB.PropC)))
	);


	// arguments to our method call
	List<Expression> callArgs = new List<Expression>();

	// the first argument will be the parameter: but we also need to use
	// this parameter on the main expression itself
	ParameterExpression paramExpr1 = Expression.Parameter(typeof(ClassA), 
		"a");
	callArgs.Add(paramExpr1);

	// this is the conditional expression
	Expression conditionalExpr1 =
		Expression.Condition
		(
			Expression.Equal(paramExpr1, 
				Expression.Constant(null, typeof(ClassA))),
			Expression.Constant(null, typeof(ClassB)),
			Expression.Property(paramExpr1, "PropA")
		);
		
	callArgs.Add(conditionalExpr1);

	// method call expression: the method signature is below in the source
	Expression callExpr = Expression.Call(null, 
		typeof(Program).GetMethod("Process"), 
		Expression.NewArrayInit(typeof(object), callArgs));

	// and our final expression
	Expression<Func<ClassA, IList<object>> final =
		Expression.Lambda<Func<ClassA, IList<object>>>
		(callExpr, paramExpr1);

	// expression can be compiled to run faster
	Func compiledFunc = final.Compile();

	// a couple of examples of usage
	ClassA a1 = new ClassA();
	ClassA a2 = new ClassA() { PropA = new ClassB() };

	object ret1 = compiledFunc(a1);
	object ret2 = compiledFunc(a2);

	// note that the same thing can be done in a similar way but 
	// may mislead you to do DynamicInvoke which is slower
	LambdaExpression lambda = Expression.Lambda(callExpr, paramExpr1);

	// The slow version can also be compiled, produces a delegate - 
	// which is the same as compiledFunc but has to be cast into Func<> 
	Delegate del = lambda.Compile();

	// this is slower than directly calling Func, although it's calling 
	// the same compiled code
	object ret3 = del.DynamicInvoke(a2);

	// it will be faster to cast it to Func<> and then call it directly
	Func<classa , IList<object>> castDelegate = (Func<ClassA,
		IList<object>>)del;

	// this is as fast as compiledFunc
	object ret4 = castDelegate(a2);
}

public static IList<object> Process(params object[] objs)
{
	return new List<object>(objs);
}

This source is very sketchy - deliberately so since it's idea is just to illustrate the principle (I think that way it would be more useful if you need to do something similar but slightly different). Moreover, if you don't really need this exact solution, the best way to continue is to write an example expression, compile it and then decompile it with reflector ilspy: the compiler produces code for your expression that is exactly the same as the one required to dynamically build it. For the reverse operation, analyzing an existing expression, a very useful tool is the Expression Tree visualizer - it's somewhere in the Visual Studio/Samples folder and needs to be compiled. Once you copy it to My Documents\Visual Studio whatever\Visualizers folder, you can view expression trees inside Quick Watch.

One word about performance: I ran a couple of ad-hoc tests using a simple property accessor expression – not a really serious test but it does show the orders of magnitude we’re dealing with. The speed of the compiled Func is comparable to the speed of compiled C# code. When I try the same operation using reflection, it gets around ten thousand times slower (note that this includes calling GetType().GetProperty() each time, but optimizing this increases its speed for about 20%). DynamicInvoke has similar performance – but this is because there’s only one operation in the expression itself, it would be safe to expect that the overhead of DynamicInvoke doesn’t increase with expression complexity, while the overhead of using reflection would.

The biggest resource hog here is expression compilation, it is one million times slower than compiled execution – that means a hundred times slower than reflection. Not that any of the tests were noticeably slow – it is a simple expression, but even so it performs a thousand compiles for less than a second, which is decidedly not bad.

 

So we now have an idea how to build the output expression: the next step is to analyze the input expression. This is not so simple because the LINQ expression tree elements don’t have anything resembling a tidy class hierarchy (even the “mostly decent” DOM API is a space shuttle compared to it): because of this it seems that any expression type that can possibly appear in the expression tree should be special-cased in our logic. Luckily, we limit our ambition to property-referencing expressions only. In LINQ expression speak, this means we have a series of chained MemberExpressions pointing backwards to a ParameterExpression. So, an expression like “a => a.PropB.PropC” would have an expression tree like this:

MemberExpression
(
	Member = {PropertyInfo pointing to the PropC property}
	Expression = MemberExpression
	(
		Member = {PropertyInfo pointing to the PropB property}
		Expression = {ParameterExpression for parameter a}
	)
)

This should be fairly simple, all we need to do is get the MemberExpression contained in the Body of the root expression, then recursively run through all chained MemberExpressions and stop when we reach the ParameterExpression – the parameter we can copy into our own rewritten expression.

There is at least one small catch here – this is the only one I discovered, there may be more: the compiler may insert a conversion expression at the root of the expression tree if we use nullable types (for what reason, I can’t say, possibly value boxing?) It is represented as a UnaryExpression. In this example, we’ll simply skip over it (just use its Operand property which is a MemberExpression), but I’m quite sure this example is oversimplified and there could be more special cases that need to be handled. (Like, for example, casting, which could be quite legal – even necessary – in expressions like these).

Ok, on to the example... This is an excerpt from working code where PathExpression is the LINQ expressions we want to process.

 

Stack<MemberExpression> expressionStack = new Stack<MemberExpression>();

Expression exp = PathExpression.Body; 

while(true)
{
	if (exp is MemberExpression)
	{
		expressionStack.Push((MemberExpression)exp);
		exp = ((MemberExpression)exp).Expression;
	}
	else if (exp is UnaryExpression 
		&& ((UnaryExpression)exp).NodeType == ExpressionType.Convert)
	{
		// skip convert nodes (there could be one at the beginning of the 
		// expression for some reason if we use nullable properties
		exp = ((UnaryExpression)exp).Operand;
	}
	else if (exp == null || exp is ParameterExpression)
	{
		break;
	}
	else // exp.Expression != null but it’s not a member nor parameter expression
	{
		throw new InvalidOperationException("Unsupported expression type: " 
		+ exp.NodeType + ". Only member access expressions are supported.");
	}
}

ParameterExpression inputParamExpression = null;
Expression previousExpression = null;

// arguments to the method call
List<Expression> callArgs = new List<Expression>();

// the first one should point to the parameter

MemberExpression firstMe = expressionStack.Peek();
if (!(firstMe.Expression is ParameterExpression))
{
	throw new InvalidOperationException("The first expression element 
	doesn't reference an input parameter. The expression should be like 
	'x.PropA.PropB.PropC' where x is an input parameter.");
}

inputParamExpression = (ParameterExpression)firstMe.Expression;
callArgs.Add(inputParamExpression);
previousExpression = inputParamExpression;

List<string> propertyNames = new List<string>();

// now unwrap the expression: we want to build an expression like
//Expression<Func<ClassA, IList<object>>> res =
//(
//    x => Process(x, ((x == null) ? null : x.PropA), ((x == null) ? null 
//	: ((x.PropA == null) ? null : x.PropA.PropB)))
//);
while(expressionStack.Count > 0)
{
	MemberExpression me = expressionStack.Pop(); 

	// skip the last property in the expression: 
	// we don't need its value because
	// we won't attach to its PropertyChanged event
	if (expressionStack.Count >= 1)
	{
		Expression conditionalExpression =
			Expression.Condition
			(
			// in each step we reference the previous expression
				Expression.Equal(previousExpression, 
					Expression.Constant(null, 
						previousExpression.Type)),
				Expression.Constant(null, 
					((PropertyInfo)me.Member).PropertyType),
				Expression.Property(previousExpression, 
					(PropertyInfo)me.Member)
			);

		callArgs.Add(conditionalExpression);
		previousExpression = conditionalExpression;
	}

	propertyNames.Add(((PropertyInfo)me.Member).Name);
}

ParameterExpression thisExpression = 
  Expression.Parameter(typeof(PropertyChangedOnPathWrapper<T>), "this");

Expression callExpr = Expression.Call(thisExpression,
  typeof(PropertyChangedOnPathWrapper<T>).GetMethod
  ("Process", BindingFlags.Instance | BindingFlags.NonPublic),
	Expression.NewArrayInit(typeof(object), callArgs));

Expression<Action<T, PropertyChangedOnPathWrapper<T>>> finalExpr = 
  Expression.Lambda<Action<T, PropertyChangedOnPathWrapper<T>>>
  (callExpr, inputParamExpression, thisExpression);

ExtractionExpression = finalExpr.Compile();

Leave a comment

Make sure you enter the (*) required information where indicated. HTML code is not allowed.

Na vrh