Archive

Posts Tagged ‘Distinct’

Using Linq Distinct() without IEqualityComparer

January 18th, 2010 Christian Etter No comments

Given an IEnumerable class, such as a generic list or array, it is only possible to use the Distinct() method when working with simple data types. As soon as we are operating on a list of objects though, we are forced to write your own class implementing IEqualityComparer, which is a bit bothersome in many cases. At tehe first glance, it seems that Microsoft has simply forgotten to implement Lambda Expressions for Distinct() and similar functions. Another reason might be that these functions immensely benefit from the use of a hash based comparison algorithm, and basically that is what the IEqualityComparer is all about. See my other blog post about this subject.

For those who are just looking for a simple solution, the following one-liner might be useful:

SomeObject[] array_1 = new SomeObject[] { ... }
SomeObject[] array_2 = array_1.GroupBy( x => x.SomePropertyOrMethod ).Select( x => x.First() ).ToArray();

A standard implementation using IEqualityComparer could look like this:

byte[][] hash_distinct = hash_duplicate.Distinct( new ByteArrayComparer() ).ToArray();
/* .... */
public class ByteArrayComparer : IEqualityComparer<byte[]>
{
    public bool Equals( byte[] a, byte[] b )
    {
        if ( a == null || b == null )
            return a == b;
        return a.SequenceEqual( b );
    }
    public int GetHashCode( byte[] x )
    {
        if ( x == null )
            throw new ArgumentNullException();
        int iHash = 0;
        for ( int i = 0; i < x.Length; ++i )
            iHash ^= ( x[ i ] << ( ( 0x03 & i ) << 3 ) );
        return iHash;
    }
}

In my tests the performance gain by using an IEqualityComparer implementation instead of the above solution is about 100% when working on an array of 18000 elements.