Archive

Posts Tagged ‘Array’

Using IEqualityComparer on Custom Types with Except()

April 12th, 2010 No comments

Recently I was writing about an alternative way of using Linq Distinct() on custom types which does not involve writing a custom IEqualityComparer derivate.

Today there was a similar requirement, using Linq Except() for determining all elements of an IEnumerable which do not intersect with the elements of another IEnumerable. Again, there is a one line solution to it, which is slow on large input data:

byte[][] hashes_old = /* an array of byte arrays containing a hash value */;
byte[][] hashes_new = /* another array of byte arrays containing a hash value */;
byte[][] hashes_obsolete = ( from x in hashes_old where hashes_new.Any( y => y.SequenceEqual( x ) ) == false select x ).ToArray();

We are using two arrays of 16 byte long hashes and determine which elements do not intersect. It is a more or less elegant one liner that does not require any other comparison code. Yet when we test this with larger amounts of data, it runs slow since SequenceEqual() has to be called for every single comparison:

hashes_old: 18830 hashes_new: 8210 hashes_obsolete: 12228 time: 19564 ms

Since the Except() method is extensively using the GetHashCode() override, a lot of time can be saved by properly implementing a hash function within an IEqualityComparer derivate.

byte[][] hashes_old = /* an array of byte arrays containing a hash value */;
byte[][] hashes_new = /* another array of byte arrays containing a hash value */;
byte[][] hashes_obsolete = hashes_old.Except( hashes_new, new ByteArrayComparer() ).ToArray();
 
/* .... */
public class ByteArrayComparer : IEqualityComparer<byte[]>
{
    public bool Equals( byte[] a, byte[] b )
    {
        if ( a == null || b == null )
            return a == b;
        return a.SequenceEqual( b );
    }
    public int GetHashCode( byte[] x )
    {
        if ( x == null )
            throw new ArgumentNullException();
        int iHash = 0;
        for ( int i = 0; i < x.Length; ++i )
            iHash ^= ( x[ i ] << ( ( 0x03 & i ) << 3 ) );
        return iHash;
    }
}

hashes_old: 18830 hashes_new: 8210 hashes_obsolete: 12228 time: 14 ms

Same result, but instead of 19 seconds we only need 16 milliseconds, that is 1400 times faster!

What happens is that the result of GetHashCode() is used for each comparison. When two arrays have the same hash code, Linq calls the Equals function to ensure both are really equal (there has been no hash collision). So the main speedup is realized by writing a low-collision hashing function.

Byte Array to Hex String in C# and Linq

February 18th, 2009 2 comments

Converting a byte array into a hexadecimal string is straight forward to do using a loop and a few lines of code. With Linq, it can be done more elegantly in a single line:

string s = String.Join( String.Empty, byte_array.Select( x => x.ToString( "X2" ) ).ToArray() );

Another one-liner without using Linq (also works on the framework 2.0):

string s = String.Join( String.Empty, Array.ConvertAll<byte, string>( byte_array, delegate( byte b ) { return b.ToString( "X2" ); } ) );

There is no Array.ConvertAll<>() in the .NET Compact Framework 2.0 however. But with a little help from the BitConverter class, a one-liner is still possible:

string s = BitConverter.ToString( byte_array ).Replace( "-", String.Empty );

Which one of the above is the most efficient? Most likely it does not matter, unless you are converting megabytes of data at once. Obviously, using Linq is the slowest option, it takes more than twice as long as Array.ConvertAll<>().
Using BitConverter.ToString() is the most efficient option, being about 8 times (!) as fast as Linq.

Finally, if you are interested in rolling your own function, here is a really fast one which can compete with the BitConverter class:

static readonly char[] hex = "0123456789ABCDEF".ToCharArray();
 
static string ByteToHexString( byte[] array )
{
    if ( array == null || array.Length < 1 )
        return String.Empty;
    char[] c = new char[ array.Length << 1 ];
    for ( int i = 0; i < array.Length; ++i )
    {
        byte b = array[ i ];
        c[ i << 1 ] = hex[ b >> 4 ];
        c[ ( i << 1 ) | 0x01 ] = hex[ ( b & 0x0F ) ];
    }
    return new string( c );
}