Note: this article was (largely) written by Marc Gravell. The classes mentioned as a solution are part of MiscUtil (usage page). Marc and I are collaborating on a "push" model for LINQ, which will be described when fully implemented - Jon.
.NET 2.0 introduced generics into the .NET world, which opened the door
for many elegant solutions to existing problems. Generic constraints can be
used to restrict the type-arguments to known interfaces etc, to ensure access
to functionality - or for simple equality/inequality tests the
Comparer<T>.Default
and EqualityComparer<T>.Default
singletons implement IComparer<T>
and IEqualityComparer<T>
respectively (allowing us to sort elements for instance, without having to
know anything about the "T" in question).
With all this, though, there is still a big gap when it comes to operators.
Because operators are declared as static methods, there is no IMath<T>
or similar equivalent interface that all the numeric types implement; and indeed,
the flexibility of operators would make this very hard to do in a meaningful way.
Worse: many of the operators on primitive types don't even exist as operators;
instead there are direct IL methods. To make the situation even more complex,
Nullable<> demands the concept of "lifted operators", where the inner "T"
describes the operators applicable to the nullable type - but this is implemented
as a language feature, and is not provided by the runtime (making reflection even more fun).
A question seen very frequently on forums etc is "how can I do maths with generics?".
Usually people just want a simple sum, average etc of data, but when people find themselves
using generics this becomes an almost impassable barrier. There are some existing
solutions that involve passing helper classes around (usually by some bespoke interface
on T), but this involves a lot of extra coding, and pollutes the API slightly by having to
include this extra passenger. Compare this to the common uses Comparer<T>.Default
for
finding minimum/maximum; there is no extra work required there, so why should there be for sum/average?
Consider the following simple overloads:
Conceptually they are very similar - but the implementations are very different;
the first simply uses the "add" IL instruction; the second uses the op_Addition
static method on System.Decimal
, and the third needs to check that both arguments
have values (with .HasValue
), and if so uses System.Decimal.op_Addition
on the
inner values (GetValueOrDefault()
), and finally wraps the resulting
decimal in a Nullable<T>
wrapper.
The different ways in which operators can be implemented makes it very undesirable to wrap
(or worse: write IL for); there would be a lot of special cases. Personally I find it
very strange that Microsoft didn't provide the standard op_Addition
etc on the
primitive types (Int32
, Single
, etc); for sure, the compiler
doesn't need them, but it would have made generic operators far easier to implement
(i.e. if you know that you just look for op_Addition
etc, with Nullable<T>
as a special case).
OK; we've seen that operators are tricky; so what else can we do? As it happens, .NET 3.5 included some new classes for expressing simple operations in a very abstract way; it also included code to compile these operations into regular code at runtime. This is the Expression class (and associated subclasses), introduced for LINQ, but with many other powerful applications. Maybe we could use these to do our maths?
The following demonstrates building an Expression
for adding two numbers
(based on generics), and then compiling this Expression
into a Func<T,T,T>
(i.e. a function that accepts two arguments of T, and returns a T):
As the comment indicates, this still leaves a lot to be desired with regards to caching
the compiled function, but indicates the general approach. The actual code makes use of
static classes and static constructors to cache the operators efficiently, and uses some
shared code in ExpressionUtil
to simplify the construction of the various
Add
, Subtract
etc operators - but the theory is the same.
Well, compiling the operators isn't trivial, but the static constructor ensures that
we only do this once for each signature; however, once we have the function (Func<T,T,T>
),
the JIT-compiler is remarkably good at inlining the simple code that invokes the
methods; enough that runtime performance is within a few percent of regular compiled code.
To be honest this surprised me too when I saw it, so scepticism is allowed; but I urge
you to test it to see for yourself. An interface-based approach wouldn't have been
inlineable (due to the virtual call); and a simple object-based/GetType()
/switch
approach would have introduced boxing and branching (and masses of code),
so would have been much slower (and harder to implement).
Interestingly, yes! Because we haven't restricted ourselves to any knowledge of the type,
it means that we can write code that operates on any type. Perhaps you have a
ComplexInt32
, or a DecimalMatrix
, etc. As long as you have
declared suitable operators, the Expression approach will find your operator and use it.
This is a credit to the Microsoft team responsible for the Expression implementation; well done!
As a direct example: regular LINQ provides (via System.Linq.Enumerable
) a range of
overloads for Sum()
and Average()
to cover different datatypes -
20 of each in fact (half of those are for "selectors").
Of course, overloads don't help much when you only have a T...
Using the generic operators, it is possible to cover all of these and everything else you can consider with just 2 methods (1 with and 1 without a "selector"). And in fact, we've implemented these 2 methods so that one simply calls the other, so in fact we've only had to write a single method. Now that is code re-use.
Yes; good point. Put simply; it can't do this. In reality this is unlikely to be a problem;
if you are throwing a type at a reusable block of generic code to do a "sum" or similar, it is
likely that you know that "add" makes sense for that object. It is no different to
EqualityComparer<T>.Default
: this also doesn't provide compile-time
safety, but that doesn't mean it isn't useful in almost every case when you would use it.
Marc is not the first person to tackle this problem, although his approach is the most practical that I've seen. There are plenty of thoughts around the web about this, including a CodeProject page by Rüdiger Klaehn, which is the most comprehensive resource I've seen on the topic (partly because it references plenty of other approaches).
Back to the main page.