Archive

Posts Tagged ‘IEqualityComparer’

Using IEqualityComparer on Custom Types with Except()

April 12th, 2010 Christian Etter No comments

Recently I was writing about an alternative way of using Linq Distinct() on custom types which does not involve writing a custom IEqualityComparer derivate.

Today there was a similar requirement, using Linq Except() for determining all elements of an IEnumerable which do not intersect with the elements of another IEnumerable. Again, there is a one line solution to it, which is slow on large input data:

byte[][] hashes_old = /* an array of byte arrays containing a hash value */;
byte[][] hashes_new = /* another array of byte arrays containing a hash value */;
byte[][] hashes_obsolete = ( from x in hashes_old where hashes_new.Any( y => y.SequenceEqual( x ) ) == false select x ).ToArray();

We are using two arrays of 16 byte long hashes and determine which elements do not intersect. It is a more or less elegant one liner that does not require any other comparison code. Yet when we test this with larger amounts of data, it runs slow since SequenceEqual() has to be called for every single comparison:

hashes_old: 18830 hashes_new: 8210 hashes_obsolete: 12228 time: 19564 ms

Since the Except() method is extensively using the GetHashCode() override, a lot of time can be saved by properly implementing a hash function within an IEqualityComparer derivate.

byte[][] hashes_old = /* an array of byte arrays containing a hash value */;
byte[][] hashes_new = /* another array of byte arrays containing a hash value */;
byte[][] hashes_obsolete = hashes_old.Except( hashes_new, new ByteArrayComparer() ).ToArray();
 
/* .... */
public class ByteArrayComparer : IEqualityComparer<byte[]>
{
    public bool Equals( byte[] a, byte[] b )
    {
        if ( a == null || b == null )
            return a == b;
        return a.SequenceEqual( b );
    }
    public int GetHashCode( byte[] x )
    {
        if ( x == null )
            throw new ArgumentNullException();
        int iHash = 0;
        for ( int i = 0; i < x.Length; ++i )
            iHash ^= ( x[ i ] << ( ( 0x03 & i ) << 3 ) );
        return iHash;
    }
}

hashes_old: 18830 hashes_new: 8210 hashes_obsolete: 12228 time: 14 ms

Same result, but instead of 19 seconds we only need 16 milliseconds, that is 1400 times faster!

What happens is that the result of GetHashCode() is used for each comparison. When two arrays have the same hash code, Linq calls the Equals function to ensure both are really equal (there has been no hash collision). So the main speedup is realized by writing a low-collision hashing function.

Using Linq Distinct() without IEqualityComparer

January 18th, 2010 Christian Etter No comments

Given an IEnumerable class, such as a generic list or array, it is only possible to use the Distinct() method when working with simple data types. As soon as we are operating on a list of objects though, we are forced to write your own class implementing IEqualityComparer, which is a bit bothersome in many cases. At tehe first glance, it seems that Microsoft has simply forgotten to implement Lambda Expressions for Distinct() and similar functions. Another reason might be that these functions immensely benefit from the use of a hash based comparison algorithm, and basically that is what the IEqualityComparer is all about. See my other blog post about this subject.

For those who are just looking for a simple solution, the following one-liner might be useful:

SomeObject[] array_1 = new SomeObject[] { ... }
SomeObject[] array_2 = array_1.GroupBy( x => x.SomePropertyOrMethod ).Select( x => x.First() ).ToArray();

A standard implementation using IEqualityComparer could look like this:

byte[][] hash_distinct = hash_duplicate.Distinct( new ByteArrayComparer() ).ToArray();
/* .... */
public class ByteArrayComparer : IEqualityComparer<byte[]>
{
    public bool Equals( byte[] a, byte[] b )
    {
        if ( a == null || b == null )
            return a == b;
        return a.SequenceEqual( b );
    }
    public int GetHashCode( byte[] x )
    {
        if ( x == null )
            throw new ArgumentNullException();
        int iHash = 0;
        for ( int i = 0; i < x.Length; ++i )
            iHash ^= ( x[ i ] << ( ( 0x03 & i ) << 3 ) );
        return iHash;
    }
}

In my tests the performance gain by using an IEqualityComparer implementation instead of the above solution is about 100% when working on an array of 18000 elements.