Archive

Posts Tagged ‘Optimizing’

Using IEqualityComparer on Custom Types with Except()

April 12th, 2010 Christian Etter No comments

Recently I was writing about an alternative way of using Linq Distinct() on custom types which does not involve writing a custom IEqualityComparer derivate.

Today there was a similar requirement, using Linq Except() for determining all elements of an IEnumerable which do not intersect with the elements of another IEnumerable. Again, there is a one line solution to it, which is slow on large input data:

byte[][] hashes_old = /* an array of byte arrays containing a hash value */;
byte[][] hashes_new = /* another array of byte arrays containing a hash value */;
byte[][] hashes_obsolete = ( from x in hashes_old where hashes_new.Any( y => y.SequenceEqual( x ) ) == false select x ).ToArray();

We are using two arrays of 16 byte long hashes and determine which elements do not intersect. It is a more or less elegant one liner that does not require any other comparison code. Yet when we test this with larger amounts of data, it runs slow since SequenceEqual() has to be called for every single comparison:

hashes_old: 18830 hashes_new: 8210 hashes_obsolete: 12228 time: 19564 ms

Since the Except() method is extensively using the GetHashCode() override, a lot of time can be saved by properly implementing a hash function within an IEqualityComparer derivate.

byte[][] hashes_old = /* an array of byte arrays containing a hash value */;
byte[][] hashes_new = /* another array of byte arrays containing a hash value */;
byte[][] hashes_obsolete = hashes_old.Except( hashes_new, new ByteArrayComparer() ).ToArray();
 
/* .... */
public class ByteArrayComparer : IEqualityComparer<byte[]>
{
    public bool Equals( byte[] a, byte[] b )
    {
        if ( a == null || b == null )
            return a == b;
        return a.SequenceEqual( b );
    }
    public int GetHashCode( byte[] x )
    {
        if ( x == null )
            throw new ArgumentNullException();
        int iHash = 0;
        for ( int i = 0; i < x.Length; ++i )
            iHash ^= ( x[ i ] << ( ( 0x03 & i ) << 3 ) );
        return iHash;
    }
}

hashes_old: 18830 hashes_new: 8210 hashes_obsolete: 12228 time: 14 ms

Same result, but instead of 19 seconds we only need 16 milliseconds, that is 1400 times faster!

What happens is that the result of GetHashCode() is used for each comparison. When two arrays have the same hash code, Linq calls the Equals function to ensure both are really equal (there has been no hash collision). So the main speedup is realized by writing a low-collision hashing function.

Efficient import of large data sources into SQL Server 2008

January 20th, 2010 Christian Etter No comments

When importing large amounts of data into SQL Server 2008 there are several choices. Most people working with SQL Server are aware of the BULK INSERT statement and the bcp utility. In the most simple case, we can specify a text file that contains information in CSV format and import it with a single SQL statement.
In many cases this might work well, yet there are also a lot of circumstances under which this approach is bound to fail. Especially when you are dealing with data that is delimited according to a different convention, such as escaped strings, single quotes instead of double quotes, binary data and the like.

Already simple tasks like skipping a certain column in output or parsing text in a different encoding on one column can be challenging. There is a simple way of doing the latter by using XML format files that can specify more complex import rules.
Here is an example:

<?xml version="1.0"?>
<BCPFORMAT xmlns="http://schemas.microsoft.com/sqlserver/2004/bulkload/format" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
 <RECORD>
	 <FIELD ID="1" xsi:type="CharTerm" TERMINATOR=";" COLLATION="Latin1_General_CI_AS"/>
	 <FIELD ID="2" xsi:type="CharTerm" TERMINATOR=";" COLLATION="Latin1_General_CI_AS"/>
	 <FIELD ID="3" xsi:type="CharTerm" TERMINATOR=";" COLLATION="Latin1_General_CI_AS"/>
	 <FIELD ID="4" xsi:type="CharTerm" TERMINATOR=";" COLLATION="Latin1_General_CI_AS"/>
	 <FIELD ID="5" xsi:type="CharTerm" TERMINATOR=";" COLLATION="Latin1_General_CI_AS"/>
	 <FIELD ID="6" xsi:type="CharTerm" TERMINATOR=";" COLLATION="Latin1_General_CI_AS"/>
	 <FIELD ID="7" xsi:type="CharTerm" TERMINATOR=";" COLLATION="Latin1_General_CI_AS"/>
	 <FIELD ID="8" xsi:type="CharTerm" TERMINATOR=";" COLLATION="Latin1_General_CI_AS"/>
	 <FIELD ID="9" xsi:type="CharTerm" TERMINATOR=";" COLLATION="Latin1_General_CI_AS"/>
	 <FIELD ID="10" xsi:type="CharTerm" TERMINATOR=";" COLLATION="Latin1_General_CI_AS"/>
	 <FIELD ID="11" xsi:type="CharTerm" TERMINATOR=";" COLLATION="Latin1_General_CI_AS"/>
	 <FIELD ID="12" xsi:type="CharTerm" TERMINATOR=";" COLLATION="Latin1_General_CI_AS"/>
	 <FIELD ID="13" xsi:type="CharTerm" TERMINATOR=";" COLLATION="Latin1_General_CI_AS"/>
	 <FIELD ID="14" xsi:type="CharTerm" TERMINATOR=";" COLLATION="Latin1_General_CI_AS"/>
	 <FIELD ID="15" xsi:type="CharTerm" TERMINATOR=";" COLLATION="Latin1_General_CI_AS"/>
	 <FIELD ID="16" xsi:type="CharTerm" TERMINATOR="\n" MAX_LENGTH="2000" COLLATION="Latin1_General_CI_AS"/>
 </RECORD>
 <ROW>
  <COLUMN SOURCE="1" NAME="WarenwirtschaftsID" xsi:type="SQLINT"/>
  <COLUMN SOURCE="2" NAME="Bezeichnung" xsi:type="SQLNVARCHAR"/>
  <COLUMN SOURCE="3" NAME="Beschreibung" xsi:type="SQLNVARCHAR"/>
  <COLUMN SOURCE="4" NAME="Vorname" xsi:type="SQLNVARCHAR"/>
  <COLUMN SOURCE="5" NAME="Ort" xsi:type="SQLNVARCHAR"/>
  <COLUMN SOURCE="6" NAME="PLZ" xsi:type="SQLNVARCHAR"/>
  <COLUMN SOURCE="7" NAME="Land" xsi:type="SQLNVARCHAR"/>
  <COLUMN SOURCE="8" NAME="Strasse" xsi:type="SQLNVARCHAR"/>
  <COLUMN SOURCE="11" NAME="Telefon" xsi:type="SQLNVARCHAR"/>
  <COLUMN SOURCE="14" NAME="Fax" xsi:type="SQLNVARCHAR"/>
  <COLUMN SOURCE="15" NAME="Email" xsi:type="SQLVARYCHAR"/>
 </ROW>
</BCPFORMAT>

The above serves as a format description for the following ETL script:

INSERT INTO Kunden ( Bezeichnung, Strasse, PLZ, Ort, Land, Telefon, Fax, Email, WarenwirtschaftsID, 
    Beschreibung, HotlineVertragVorhanden, ZeiterfassungsID, GeloeschtAm, Ansprechpartner )
    SELECT Bezeichnung, Strasse, PLZ, Ort, Land, Telefon, Fax, Email, WarenwirtschaftsID, 
        Beschreibung, 0 AS HotlineVertragVorhanden, NULL AS ZeiterfassungsID, NULL AS GeloeschtAm, 
        CASE WHEN Vorname IS NOT NULL THEN Vorname + ' ' + Bezeichnung ELSE NULL END AS Ansprechpartner
        FROM  OPENROWSET(BULK  '.....\KUNDEN.csv' , 
        FORMATFILE = '.....\Kunden.xml',
        FIRSTROW = 2, CODEPAGE = '1252'
    ) AS T ;

The neat thing about the BULK INSERT command is that you can do it in pure SQL (left aside the XML file) and therefore it can be easily automated as a task/job that runs on a regular basis.

If you are looking for more transformations, you have to resort to a programming language that allows for greater flexibility, such as C#.

In traditional transactional processing you would read each row of data from your data source, transform it and then insert it into the SQL Server data base. Due to the nature of transactional processing, this can be a very inefficient and slow process.

For large amounts of data, it makes sense to use a bulk load mechanism encapsulated in the SqlBulkCopy class. The following method uses a compressed file as an input stream, decompresses it and uses SqlBulkCopy to mass insert all data.

private static void ProcessLocations( ZipInputStream oZip )
{
    using ( SqlConnection connection = new SqlConnection( SQL_SERVER_CONNECTION ) )
    {
        connection.Open();
        SqlCommand cmd = connection.CreateCommand();
 
        cmd.CommandText =
            "IF EXISTS ( SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[T_Location]') AND type in (N'U')) " +
            "TRUNCATE TABLE [dbo].[T_Location];" +
            "ELSE " +
            "CREATE TABLE T_Location ( LocationID INT PRIMARY KEY CLUSTERED, Country CHAR(2) NOT NULL, Region CHAR(2), City NVARCHAR(50), PostalCoce VARCHAR(50), Latitude REAL, Longitude REAL )";
        cmd.CommandTimeout = 60;
        cmd.ExecuteNonQuery();
 
        /// open stream with ISO-8859-1 (Latin-1) encoding, do not close streamreader
        StreamReader re = new StreamReader( oZip, Encoding.GetEncoding( 28591 ) );
 
        using ( SqlBulkCopy oBulk = new SqlBulkCopy( connection ) )
        {
            oBulk.BatchSize =
            oBulk.NotifyAfter = 50000;
            oBulk.DestinationTableName = "T_Location";
            oBulk.SqlRowsCopied += new SqlRowsCopiedEventHandler( delegate( object sender, SqlRowsCopiedEventArgs e ) { Output( String.Format( "{0:0,0} Rows imported.", e.RowsCopied ) ); } );
            /// need to add a column mapping because we are only storing 7 out of 9 columns.
            for ( int i = 0; i < 7; ++i )
                oBulk.ColumnMappings.Add( i, i );
 
            LocationCSVReader oBlocks = new LocationCSVReader( re, Output );
            oBulk.WriteToServer( oBlocks );
        }
    }
}

How are files parsed and columns extracted? As you may have guessed, the LocationCSVReader:

 
private class LocationCSVReader : IDataReader
{
    private StreamReader oReader = null;
    private object[] oCurrentValues = new object[ 9 ];
    private int iCurrentRow = 0;
    private OutputEventHandler Output = null;
 
    public delegate void OutputEventHandler( string s );
 
    public LocationCSVReader( StreamReader oReader ) : this( oReader, null ) { }
 
    public LocationCSVReader( StreamReader oReader, OutputEventHandler o )
    {
        this.oReader = oReader;
        this.Output = o;
        Output( "Skipping header\r\n" + this.oReader.ReadLine() + "\r\n" + this.oReader.ReadLine() );
        this.iCurrentRow = 2;
    }
 
    #region IDataReader Members
 
    public bool Read()
    {
        string input;
        while ( ( input = this.oReader.ReadLine() ) != null )
        {
            ++this.iCurrentRow;
            input = input.Replace( "\"", "" );
            string[] ss = input.Split( new char[] { ',' } );
            if ( ss.Length != this.oCurrentValues.Length )
            {
                if ( this.Output != null )
                    this.Output( String.Format( "Invalid number of columns on line {0:0,0} - {1}", this.iCurrentRow, String.Join( " - ", ss ) ) );
                continue;
            }
            for ( int i = 0; i < this.oCurrentValues.Length; ++i )
                this.oCurrentValues[ i ] = ss[ i ];
            this.oCurrentValues[ 0 ] = (Int32)( Int64.Parse( ss[ 0 ] ) - Int32.MaxValue );
            this.oCurrentValues[ 5 ] = Double.Parse( ss[ 5 ], System.Globalization.NumberStyles.Float, CultureInfo.InvariantCulture );
            this.oCurrentValues[ 6 ] = Double.Parse( ss[ 6 ], System.Globalization.NumberStyles.Float, CultureInfo.InvariantCulture );
            return true;
        }
        return false;
    }
 
    // ...
    #endregion
 
    #region IDisposable Members
    // ...
    #endregion
 
    #region IDataRecord Members
    // ...
    #endregion
}

One important thing to note: Consider that your code will be executed on a system with a different locale than yours. That’s frequently the point when things start to fall apart: E.g. when testing on a German server, things seem alright. Although when published on an English production server, “1,002″ suddenly becomes equivalent to “1.000″ and vice versa. Same problem exists with dates and times. Therefore whenever possible specify a specific culture or CultureInfo.InvariantCulture when parsing text values!

Optimized Image Matching with SSE2

August 8th, 2009 Christian Etter No comments

When it comes to matching a large numer of images according to certain visual criteria, the number of iterations that need to be performed are likely to increase exponentially. So developing a fast matching algorithm is essential.

Depending on your choice of algorithm, making use of your processor’s SIMD extensions can be worthwhile.

In this scenario I was benchmarking a modified Euclidean matching algorithm against a set of 85.730 images of various sizes. Due to the exponential number of iterations, there are a total of 3,674,773,585 comparisons to be performed, as every image is matched against all remaining images. No need to explain why a fast matching algorithm is essential.

A standard C implementation would take about 20,000 clock cycles per iteration on my machine.

Using SSE intrinsics, this could be optimized to take less than 800 clock cycles per iteration on a smaller number of images.  Once I tried to process more than a few thousand pictures at once, the same code would suddenly perform several times slower than before, taking around 1,000 cycles per iteration. Where was the catch? After changing allocation APIs to ensure that all data was aligned on 32 byte boundaries there was still no signifficant speed increase. Using SSE prefetching would not work either.  The reason for the slowdown was in fact that the RGB data was scattered in memory, combined with other image specific data. This would result in a very inefficient use of the CPU’s internal caching, making it necessary to pull information from slow main memory more often than necessary.

Solution: Copy all the data used in the inner loop in a contiguous memory location. Then use the same algorithm to run against the contiguous data. Performance increased by 580%. After unrolling the loop for a 4x4x3 matrix, we were down to an average of 58 ticks per iteration, comparing 85K images within just 66 seconds, that is roughly 5.4 GiB/second!

inline static DOUBLE CalcDifferenceSSE2Unrolled4x4x3( DATA* p1, DATA* p2 )
{
    __m128i* pp1 = (__m128i*)p1->rgb_small;
    __m128i* pp2 = (__m128i*)p2->rgb_small;
    const DWORD RGB_BYTES = sizeof( p1->rgb_small ) / sizeof( *p1->rgb_small );
    ATLASSERT( ( RGB_BYTES &; 15 ) == 0 );   // number of elements must be divisible by 16
    ATLASSERT( ( (DWORD)pp1 & 15 ) == 0 ); // pointer must point to address multiple of 16 (cache line)
    ATLASSERT( ( (DWORD)pp2 & 15 ) == 0 ); // pointer must point to address multiple of 16
 
    __m128i b, c, d;
    b = _mm_sad_epu8( pp1[ 0 ], pp2[ 0 ] );
    c = _mm_sad_epu8( pp1[ 1 ], pp2[ 1 ] );
    d = _mm_sad_epu8( pp1[ 2 ], pp2[ 2 ] );
 
    __m128i a = _mm_setzero_si128();
    a = _mm_add_epi64( a, b );
    a = _mm_add_epi64( a, c );
    a = _mm_add_epi64( a, d );
 
    b = _mm_srli_si128 ( a, 8 );
    DWORD dw = _mm_cvtsi128_si32( a ) + _mm_cvtsi128_si32( b );
    return (DOUBLE)dw / ( RGB_BYTES * 0xFF );
}

When searching for duplicate images, alternate approaches should be used to eliminate potential duplicates before entering a one by one comparison like the above. Potential preselection criteria could be aspect ratio, dimensions, or colorspace.