<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>CE&#039;s Blog</title>
	<atom:link href="http://www.christian-etter.de/?feed=rss2" rel="self" type="application/rss+xml" />
	<link>http://www.christian-etter.de</link>
	<description>Stuff I found noteworthy.</description>
	<lastBuildDate>Sun, 25 Jul 2010 08:59:43 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.1</generator>
		<item>
		<title>Mobile Development Missing in Visual Studio 2010</title>
		<link>http://www.christian-etter.de/?p=511</link>
		<comments>http://www.christian-etter.de/?p=511#comments</comments>
		<pubDate>Sun, 13 Jun 2010 18:34:37 +0000</pubDate>
		<dc:creator>Christian Etter</dc:creator>
				<category><![CDATA[Mobile Development]]></category>
		<category><![CDATA[Software Development]]></category>
		<category><![CDATA[MSDN]]></category>
		<category><![CDATA[Smart Device]]></category>
		<category><![CDATA[Visual Studio 2010]]></category>
		<category><![CDATA[Windows CE]]></category>
		<category><![CDATA[Windows Mobile]]></category>

		<guid isPermaLink="false">http://www.christian-etter.de/?p=511</guid>
		<description><![CDATA[VS 2010 is out for some time now. Off course it brings some useful new features. Shure it needs more resources and does not run as quick as the 2008 version, that&#8217;s kind of what everyone would expect. However it carries quite a bad surprise for mobile developers&#8230;. Low and behold, support for smart device [...]]]></description>
			<content:encoded><![CDATA[<p>VS 2010 is out for some time now. Off course it brings some useful new features. Shure it needs more resources and does not run as quick as the 2008 version, that&#8217;s kind of what everyone would expect. However it carries quite a bad surprise for mobile developers&#8230;. </p>
<p>Low and behold, support for smart device dev is gone!<br />
MSDN is quite clear about it: &#8220;<a href="http://msdn.microsoft.com/en-us/library/sa69he4t.aspx">Visual Studio 2010 does not support mobile application development for versions of Windows Phone prior to Windows Phone OS 7.0.</a>&#8221;</p>
<p>That&#8217;s quite a bummer.  Hard to believe&#8230; why would I have to license a legacy IDE to develop software for state of the art mobile operating systems? Perhaps even license two IDEs?</p>
<p>With this in mind, it doesn&#8217;t surprise Android is gaining market shares&#8230;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.christian-etter.de/?feed=rss2&amp;p=511</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>OutputDebugString with Variable Arguments</title>
		<link>http://www.christian-etter.de/?p=506</link>
		<comments>http://www.christian-etter.de/?p=506#comments</comments>
		<pubDate>Tue, 11 May 2010 17:49:42 +0000</pubDate>
		<dc:creator>Christian Etter</dc:creator>
				<category><![CDATA[C++]]></category>
		<category><![CDATA[Software Development]]></category>
		<category><![CDATA[OutputDebugString]]></category>
		<category><![CDATA[Variable Arguments]]></category>
		<category><![CDATA[va_list]]></category>
		<category><![CDATA[va_start]]></category>
		<category><![CDATA[Win32]]></category>

		<guid isPermaLink="false">http://www.christian-etter.de/?p=506</guid>
		<description><![CDATA[Just working with the new Visual Studio 2010, integrating Win32 console based code into a window app. The old code used fprintf(&#8230;) for debug output. Since stdout/stderr is not available in a windows app, the fprintf based debug output had to be changed to print to the visual studio output window. This can be accomplished [...]]]></description>
			<content:encoded><![CDATA[<p>Just working with the new Visual Studio 2010, integrating Win32 console based code into a window app. The old code used <em>fprintf(&#8230;)</em> for debug output.<br />
Since stdout/stderr is not available in a windows app, the fprintf based debug output had to be changed to print to the visual studio output window.</p>
<p>This can be accomplished via the universal <em>OutputDebugString()</em> API. The only shortcoming is, that it does not support variable arguments in a <em>sprintf(&#8230;)</em> form. </p>
<p>Here is a workaround:</p>

<div class="wp_syntax"><div class="code"><pre class="cpp" style="font-family:monospace;"><span style="color: #339900;">#include &lt;strsafe.h&gt;</span>
<span style="color: #666666;">// ...</span>
<span style="color: #0000ff;">void</span> MyOutputDebugString<span style="color: #008000;">&#40;</span> LPCTSTR sFormat, ... <span style="color: #008000;">&#41;</span>
<span style="color: #008000;">&#123;</span>
    <span style="color: #0000ff;">va_list</span> argptr<span style="color: #008080;">;</span>      
    <span style="color: #0000dd;">va_start</span><span style="color: #008000;">&#40;</span> argptr, sFormat <span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span> 
    TCHAR buffer<span style="color: #008000;">&#91;</span> <span style="color: #0000dd;">2000</span> <span style="color: #008000;">&#93;</span><span style="color: #008080;">;</span>
    HRESULT hr <span style="color: #000080;">=</span> StringCbVPrintf<span style="color: #008000;">&#40;</span> buffer, <span style="color: #0000dd;">sizeof</span><span style="color: #008000;">&#40;</span> buffer <span style="color: #008000;">&#41;</span>, sFormat, argptr <span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
    <span style="color: #0000ff;">if</span> <span style="color: #008000;">&#40;</span> STRSAFE_E_INSUFFICIENT_BUFFER <span style="color: #000080;">==</span> hr <span style="color: #000040;">||</span> S_OK <span style="color: #000080;">==</span> hr <span style="color: #008000;">&#41;</span>
        OutputDebugString<span style="color: #008000;">&#40;</span> buffer <span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
    <span style="color: #0000ff;">else</span>
        OutputDebugString<span style="color: #008000;">&#40;</span> _T<span style="color: #008000;">&#40;</span><span style="color: #FF0000;">&quot;StringCbVPrintf error.&quot;</span><span style="color: #008000;">&#41;</span> <span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
<span style="color: #008000;">&#125;</span></pre></div></div>

<p>I am using strsafe.h in this case to offer protection against buffer overruns. In case the internal buffer is not big enough to handle the output string, it will be safely truncated with an ending \0. In case you cannot make use of strsafe.h, here is an old style solution:</p>

<div class="wp_syntax"><div class="code"><pre class="cpp" style="font-family:monospace;"><span style="color: #0000ff;">void</span> MyOutputDebugString<span style="color: #008000;">&#40;</span> LPCTSTR sFormat, ... <span style="color: #008000;">&#41;</span>
<span style="color: #008000;">&#123;</span>
    <span style="color: #0000ff;">va_list</span> argptr<span style="color: #008080;">;</span>      
    <span style="color: #0000dd;">va_start</span><span style="color: #008000;">&#40;</span> argptr, sFormat <span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span> 
    TCHAR buffer<span style="color: #008000;">&#91;</span> <span style="color: #0000dd;">2000</span> <span style="color: #008000;">&#93;</span><span style="color: #008080;">;</span>
    wvsprintf<span style="color: #008000;">&#40;</span> buffer, <span style="color: #0000dd;">sizeof</span><span style="color: #008000;">&#40;</span> buffer <span style="color: #008000;">&#41;</span>, sFormat, argptr <span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
    buffer<span style="color: #008000;">&#91;</span> <span style="color: #008000;">&#40;</span> <span style="color: #0000dd;">sizeof</span><span style="color: #008000;">&#40;</span> buffer <span style="color: #008000;">&#41;</span> <span style="color: #000040;">/</span> <span style="color: #0000dd;">sizeof</span><span style="color: #008000;">&#40;</span> <span style="color: #000040;">*</span>buffer <span style="color: #008000;">&#41;</span> <span style="color: #008000;">&#41;</span> <span style="color: #000040;">-</span> <span style="color: #0000dd;">1</span> <span style="color: #008000;">&#93;</span> <span style="color: #000080;">=</span> <span style="color: #FF0000;">'<span style="color: #006699; font-weight: bold;">\0</span>'</span><span style="color: #008080;">;</span>
    OutputDebugString<span style="color: #008000;">&#40;</span> buffer <span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
<span style="color: #008000;">&#125;</span></pre></div></div>

]]></content:encoded>
			<wfw:commentRss>http://www.christian-etter.de/?feed=rss2&amp;p=506</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>SQL Server 2005/2008 CLR support by example: crypt() UDF in C#</title>
		<link>http://www.christian-etter.de/?p=486</link>
		<comments>http://www.christian-etter.de/?p=486#comments</comments>
		<pubDate>Mon, 03 May 2010 21:51:57 +0000</pubDate>
		<dc:creator>Christian Etter</dc:creator>
				<category><![CDATA[C#]]></category>
		<category><![CDATA[SQL]]></category>
		<category><![CDATA[Software Development]]></category>
		<category><![CDATA[authentication]]></category>
		<category><![CDATA[C++]]></category>
		<category><![CDATA[CLR]]></category>
		<category><![CDATA[crypt]]></category>
		<category><![CDATA[hash]]></category>
		<category><![CDATA[Password]]></category>
		<category><![CDATA[PayPal]]></category>
		<category><![CDATA[SQL Server]]></category>
		<category><![CDATA[Stored Procedure]]></category>
		<category><![CDATA[UDF]]></category>

		<guid isPermaLink="false">http://www.christian-etter.de/?p=486</guid>
		<description><![CDATA[.NET programmability support has been one of the big novelties of SQL Server 2005. Sometimes I had the impression that CLR support was also one of the most misunderstood features of the new platform. Some people considered moving their existing pre-2005 T-SQL code to .NET in an attempt to gain more efficiency by using a [...]]]></description>
			<content:encoded><![CDATA[<p>.NET programmability support has been one of the big novelties of SQL Server 2005. Sometimes I had the impression that CLR support was also one of the most misunderstood features of the new platform. Some people considered moving their existing pre-2005 T-SQL code to .NET in an attempt to gain more efficiency by using a &#8216;compiled&#8217; programming language.</p>
<p>This was a bit of a common misbelief, since stored T-SQL procedures and functions are also optimized/compiled prior to execution. Performance gains by using .NET code instead of T-SQL are usually limited to highly iterative/procedural code, and not for optimizing anything which can be easily done within native SQL.</p>
<p>Let us have a look at a simple user defined function written in C#. This function takes a clear text string and a 2 character salt parameter and calculates a <em>Unix crypt</em> hash based on both.</p>

<div class="wp_syntax"><div class="code"><pre class="csharp" style="font-family:monospace;"><span style="color: #0600FF;">public</span> <span style="color: #0600FF;">partial</span> <span style="color: #FF0000;">class</span> UserDefinedFunctions
<span style="color: #000000;">&#123;</span>
    <span style="color: #000000;">&#91;</span>Microsoft.<span style="color: #0000FF;">SqlServer</span>.<span style="color: #0000FF;">Server</span>.<span style="color: #0000FF;">SqlFunction</span><span style="color: #000000;">&#40;</span> IsDeterministic<span style="color: #008000;">=</span><span style="color: #0600FF;">true</span> <span style="color: #000000;">&#41;</span><span style="color: #000000;">&#93;</span>
    <span style="color: #0600FF;">public</span> <span style="color: #0600FF;">static</span> SqlString Crypt<span style="color: #000000;">&#40;</span> SqlString sPlainText, SqlChars cSalt <span style="color: #000000;">&#41;</span>
    <span style="color: #000000;">&#123;</span>
        <span style="color: #0600FF;">if</span> <span style="color: #000000;">&#40;</span> sPlainText.<span style="color: #0000FF;">IsNull</span> <span style="color: #008000;">||</span> cSalt.<span style="color: #0000FF;">IsNull</span> <span style="color: #008000;">||</span> cSalt.<span style="color: #0000FF;">Length</span> <span style="color: #008000;">&lt;</span> <span style="color: #FF0000;">2</span> <span style="color: #000000;">&#41;</span>
            <span style="color: #0600FF;">return</span> SqlString.<span style="color: #0000FF;">Null</span><span style="color: #008000;">;</span>
        <span style="color: #FF0000;">string</span> sSalt <span style="color: #008000;">=</span> <span style="color: #008000;">new</span> <span style="color: #FF0000;">string</span><span style="color: #000000;">&#40;</span> cSalt.<span style="color: #0000FF;">Value</span> <span style="color: #000000;">&#41;</span><span style="color: #008000;">;</span>
        sSalt <span style="color: #008000;">=</span> sSalt.<span style="color: #0000FF;">Substring</span><span style="color: #000000;">&#40;</span> <span style="color: #FF0000;">0</span>, <span style="color: #FF0000;">2</span> <span style="color: #000000;">&#41;</span><span style="color: #008000;">;</span>
        <span style="color: #0600FF;">return</span> <span style="color: #008000;">new</span> SqlString<span style="color: #000000;">&#40;</span> UnixCrypt.<span style="color: #0000FF;">Crypt</span><span style="color: #000000;">&#40;</span> sSalt, sPlainText.<span style="color: #0000FF;">Value</span> <span style="color: #000000;">&#41;</span> <span style="color: #000000;">&#41;</span><span style="color: #008000;">;</span>
    <span style="color: #000000;">&#125;</span>
<span style="color: #000000;">&#125;</span></pre></div></div>

<p>We just declare a class with a static function qualified as a <em>SqlFunction</em> that accepts and returns classes/structs of the <em>System.Data.SqlTypes</em> namespace. The <em>UnixCrypt.Crypt</em> method contains the actual code which I am excluding here. If you are interested in building this example, you can find a working implementation of the crypt class in C# at <a href="http://www.codeproject.com/KB/cs/unixcrypt.aspx">The Code Project</a>. </p>
<p>After compiling the above code into a dll, you may load it into SQL Server and run it with the following commands:</p>

<div class="wp_syntax"><div class="code"><pre class="sql" style="font-family:monospace;"><span style="color: #808080; font-style: italic;">-- CLR support is disabled by default</span>
EXEC sp_configure <span style="color: #ff0000;">'clr enabled'</span> <span style="color: #66cc66;">,</span> <span style="color: #ff0000;">'1'</span>
GO
reconfigure;
GO
<span style="color: #808080; font-style: italic;">-- load dll and create assembly</span>
<span style="color: #993333; font-weight: bold;">CREATE</span> ASSEMBLY asmYourAssembly <span style="color: #993333; font-weight: bold;">FROM</span> <span style="color: #ff0000;">'D:<span style="color: #000099; font-weight: bold;">\Y</span>our<span style="color: #000099; font-weight: bold;">\P</span>ath<span style="color: #000099; font-weight: bold;">\t</span>o<span style="color: #000099; font-weight: bold;">\C</span>ryptUDF.dll'</span> <span style="color: #993333; font-weight: bold;">WITH</span> PERMISSION_SET <span style="color: #66cc66;">=</span> SAFE
GO
<span style="color: #808080; font-style: italic;">-- add a function that references the CLR UDF</span>
<span style="color: #993333; font-weight: bold;">CREATE</span> <span style="color: #993333; font-weight: bold;">FUNCTION</span> dbo<span style="color: #66cc66;">.</span>Crypt<span style="color: #66cc66;">&#40;</span> @plain <span style="color: #993333; font-weight: bold;">AS</span> NVARCHAR<span style="color: #66cc66;">&#40;</span><span style="color: #cc66cc;">200</span><span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">,</span> @salt <span style="color: #993333; font-weight: bold;">AS</span> NCHAR<span style="color: #66cc66;">&#40;</span><span style="color: #cc66cc;">2</span><span style="color: #66cc66;">&#41;</span> <span style="color: #66cc66;">&#41;</span>
    RETURNS NVARCHAR<span style="color: #66cc66;">&#40;</span><span style="color: #cc66cc;">200</span><span style="color: #66cc66;">&#41;</span> <span style="color: #993333; font-weight: bold;">AS</span> EXTERNAL NAME asmYourAssembly<span style="color: #66cc66;">.</span>UserDefinedFunctions<span style="color: #66cc66;">.</span>Crypt
GO
<span style="color: #993333; font-weight: bold;">SELECT</span> dbo<span style="color: #66cc66;">.</span>Crypt<span style="color: #66cc66;">&#40;</span> <span style="color: #ff0000;">'password'</span><span style="color: #66cc66;">,</span> <span style="color: #ff0000;">'xy'</span> <span style="color: #66cc66;">&#41;</span>;</pre></div></div>

<p>Note<em> PERMISSION_SET = SAFE</em>, which reflects the external dependencies on framework libraries. Since we are doing all our calculations within the library, there is nothing additional we need to reference, which permits us to be running within the safest permission set. You can learn more about the CLR integration permission sets on <a href="http://msdn.microsoft.com/en-us/library/ms189524.aspx">MSDN</a>.</p>
<p>When doing authentication, you could use the UDF as follows:</p>

<div class="wp_syntax"><div class="code"><pre class="sql" style="font-family:monospace;"><span style="color: #993333; font-weight: bold;">SELECT</span> @UserID<span style="color: #66cc66;">=</span><span style="color: #66cc66;">&#91;</span>UserID<span style="color: #66cc66;">&#93;</span> <span style="color: #993333; font-weight: bold;">FROM</span> dbo<span style="color: #66cc66;">.</span>tblAccounts 
    <span style="color: #993333; font-weight: bold;">WHERE</span> <span style="color: #66cc66;">&#91;</span>UserName<span style="color: #66cc66;">&#93;</span><span style="color: #66cc66;">=</span>@Username COLLATE Latin1_General_CI_AS
    <span style="color: #993333; font-weight: bold;">AND</span> dbo<span style="color: #66cc66;">.</span>Crypt<span style="color: #66cc66;">&#40;</span> @Password<span style="color: #66cc66;">,</span> <span style="color: #66cc66;">&#91;</span>Password<span style="color: #66cc66;">&#93;</span> <span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">=</span><span style="color: #66cc66;">&#91;</span>Password<span style="color: #66cc66;">&#93;</span> COLLATE Latin1_General_CS_AS</pre></div></div>

<p>Note that we are doing a case insensitive compare on the UserName column and a case sensitive compare on the crypt hash of the crypt hash column.</p>
<p>Why does it make sense to do authentication like this inside the database instead of calculating the password hash outside of the database and then just pass it to a <em>WHERE Username=@Username AND Password=@Password</em> query?</p>
<p>The reason lies in the nature of the crypt function: the hash it creates is based on a two character prefix or salt, which has to be used for hashing the candidate password prior to comparing it with the original password hash. Since the above approach can be completely encapsulated within a stored procedure, no hashes or salts need to be returned to the calling application for verifying the correctness of the password.</p>
<p>Where is this used today? While Unix and Linux user authentication is the classic usage of the crypt function, also some internet services are relying on it. E.g. the <strong>PayPal IPN</strong> feature for selling subscription accounts is transmitting passwords as crypt hashes only.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.christian-etter.de/?feed=rss2&amp;p=486</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Optimized Reading of Meta-Data using ExifTool (Unicode-Proof!)</title>
		<link>http://www.christian-etter.de/?p=476</link>
		<comments>http://www.christian-etter.de/?p=476#comments</comments>
		<pubDate>Thu, 22 Apr 2010 09:41:52 +0000</pubDate>
		<dc:creator>Christian Etter</dc:creator>
				<category><![CDATA[C#]]></category>
		<category><![CDATA[Software Development]]></category>
		<category><![CDATA[C++]]></category>
		<category><![CDATA[ExifTool]]></category>
		<category><![CDATA[Internationalization]]></category>
		<category><![CDATA[Json]]></category>
		<category><![CDATA[Linq]]></category>
		<category><![CDATA[Unicode]]></category>
		<category><![CDATA[UTF-8]]></category>

		<guid isPermaLink="false">http://www.christian-etter.de/?p=476</guid>
		<description><![CDATA[Today we are going to look at how to work around the lack of Unicode support in ExifTool. In my last post, I have already been talking about a safe way of handling Unicode file/path names, which was rather slow unfortunately. In this post I would like to elaborate on how to combine this with [...]]]></description>
			<content:encoded><![CDATA[<p>Today we are going to look at how to work around the lack of Unicode support in ExifTool. </p>
<p>In my last post, I have already been talking about a safe way of handling Unicode file/path names, which was rather slow unfortunately. In this post I would like to elaborate on how to combine this with a fast reading approach using .NET. </p>
<p>I have chosen to give examples using C# code in these series, since it allows me to demonstrate my ideas in a very compact way. However the general approach is compatible with many programming languages and therefore not a .NET only solution.</p>
<p>Basically we are combining a batch read using ExifTool with a single file read operation for incompatible file names. In optimal circumstances, i.e. when all file names are convertible, this method performs as fast as ExifTool can be. Worst case would be reading all files one by one, which has a bigger performance penalty.</p>
<p>Prior to processing any files, we have to divide all file names into compatible and incompatible ones. After splitting them up, we start the actual reading.</p>

<div class="wp_syntax"><div class="code"><pre class="csharp" style="font-family:monospace;"><span style="color: #0600FF;">public</span> ExifFileJson<span style="color: #000000;">&#91;</span><span style="color: #000000;">&#93;</span> GetOriginalDateExifToolUnicode<span style="color: #000000;">&#40;</span> <span style="color: #FF0000;">string</span><span style="color: #000000;">&#91;</span><span style="color: #000000;">&#93;</span> files <span style="color: #000000;">&#41;</span>
<span style="color: #000000;">&#123;</span>
    <span style="color: #008080; font-style: italic;">// first, single out all files with incompatible file names, since they cannot be handled in a batch</span>
    var tmp <span style="color: #008000;">=</span> <span style="color: #000000;">&#40;</span> from x <span style="color: #0600FF;">in</span> files select <span style="color: #008000;">new</span> <span style="color: #000000;">&#123;</span> OriginalName <span style="color: #008000;">=</span> x, ConvertedName <span style="color: #008000;">=</span> Encoding.<span style="color: #0000FF;">ASCII</span>.<span style="color: #0000FF;">GetString</span><span style="color: #000000;">&#40;</span> Encoding.<span style="color: #0000FF;">UTF8</span>.<span style="color: #0000FF;">GetBytes</span><span style="color: #000000;">&#40;</span> x <span style="color: #000000;">&#41;</span> <span style="color: #000000;">&#41;</span> <span style="color: #000000;">&#125;</span> <span style="color: #000000;">&#41;</span>.<span style="color: #0000FF;">ToArray</span><span style="color: #000000;">&#40;</span><span style="color: #000000;">&#41;</span><span style="color: #008000;">;</span>
    <span style="color: #FF0000;">string</span><span style="color: #000000;">&#91;</span><span style="color: #000000;">&#93;</span> batch <span style="color: #008000;">=</span> tmp.<span style="color: #0000FF;">Where</span><span style="color: #000000;">&#40;</span> x <span style="color: #008000;">=&gt;</span> x.<span style="color: #0000FF;">OriginalName</span>.<span style="color: #0000FF;">Equals</span><span style="color: #000000;">&#40;</span> x.<span style="color: #0000FF;">ConvertedName</span> <span style="color: #000000;">&#41;</span> <span style="color: #000000;">&#41;</span>.<span style="color: #0000FF;">Select</span><span style="color: #000000;">&#40;</span> x <span style="color: #008000;">=&gt;</span> x.<span style="color: #0000FF;">OriginalName</span> <span style="color: #000000;">&#41;</span>.<span style="color: #0000FF;">ToArray</span><span style="color: #000000;">&#40;</span><span style="color: #000000;">&#41;</span><span style="color: #008000;">;</span>
    <span style="color: #FF0000;">string</span><span style="color: #000000;">&#91;</span><span style="color: #000000;">&#93;</span> nobatch <span style="color: #008000;">=</span> tmp.<span style="color: #0000FF;">Where</span><span style="color: #000000;">&#40;</span> x <span style="color: #008000;">=&gt;</span> <span style="color: #008000;">!</span>x.<span style="color: #0000FF;">OriginalName</span>.<span style="color: #0000FF;">Equals</span><span style="color: #000000;">&#40;</span> x.<span style="color: #0000FF;">ConvertedName</span> <span style="color: #000000;">&#41;</span> <span style="color: #000000;">&#41;</span>.<span style="color: #0000FF;">Select</span><span style="color: #000000;">&#40;</span> x <span style="color: #008000;">=&gt;</span> x.<span style="color: #0000FF;">OriginalName</span> <span style="color: #000000;">&#41;</span>.<span style="color: #0000FF;">ToArray</span><span style="color: #000000;">&#40;</span><span style="color: #000000;">&#41;</span><span style="color: #008000;">;</span>
&nbsp;
    List<span style="color: #008000;">&lt;</span>ExifFileJson<span style="color: #008000;">&gt;</span> exiffiles <span style="color: #008000;">=</span> <span style="color: #008000;">new</span> List<span style="color: #008000;">&lt;</span>ExifFileJson<span style="color: #008000;">&gt;</span><span style="color: #000000;">&#40;</span><span style="color: #000000;">&#41;</span><span style="color: #008000;">;</span>
    exiffiles.<span style="color: #0000FF;">AddRange</span><span style="color: #000000;">&#40;</span> GetOriginalDateExifToolBatch<span style="color: #000000;">&#40;</span> batch <span style="color: #000000;">&#41;</span> <span style="color: #000000;">&#41;</span><span style="color: #008000;">;</span>
    <span style="color: #0600FF;">foreach</span> <span style="color: #000000;">&#40;</span> <span style="color: #FF0000;">string</span> s <span style="color: #0600FF;">in</span> nobatch <span style="color: #000000;">&#41;</span>
        exiffiles.<span style="color: #0000FF;">Add</span><span style="color: #000000;">&#40;</span> GetExifImageExifTool<span style="color: #000000;">&#40;</span> s <span style="color: #000000;">&#41;</span> <span style="color: #000000;">&#41;</span><span style="color: #008000;">;</span>
    <span style="color: #0600FF;">if</span> <span style="color: #000000;">&#40;</span> files.<span style="color: #0000FF;">Length</span> <span style="color: #008000;">!=</span> exiffiles.<span style="color: #0000FF;">Count</span><span style="color: #000000;">&#40;</span><span style="color: #000000;">&#41;</span> <span style="color: #000000;">&#41;</span>
        <span style="color: #0600FF;">throw</span> <span style="color: #008000;">new</span> Exception<span style="color: #000000;">&#40;</span> <span style="color: #666666;">&quot;Could not open all files. Missing: &quot;</span> <span style="color: #008000;">+</span> <span style="color: #FF0000;">String</span>.<span style="color: #0000FF;">Join</span><span style="color: #000000;">&#40;</span> <span style="color: #666666;">&quot;, &quot;</span>, files.<span style="color: #0000FF;">Except</span><span style="color: #000000;">&#40;</span> exiffiles.<span style="color: #0000FF;">Select</span><span style="color: #000000;">&#40;</span> x <span style="color: #008000;">=&gt;</span> x.<span style="color: #0000FF;">SourceFile</span> <span style="color: #000000;">&#41;</span> <span style="color: #000000;">&#41;</span>.<span style="color: #0000FF;">ToArray</span><span style="color: #000000;">&#40;</span><span style="color: #000000;">&#41;</span> <span style="color: #000000;">&#41;</span> <span style="color: #000000;">&#41;</span><span style="color: #008000;">;</span>
    <span style="color: #0600FF;">return</span> exiffiles.<span style="color: #0000FF;">ToArray</span><span style="color: #000000;">&#40;</span><span style="color: #000000;">&#41;</span><span style="color: #008000;">;</span>
<span style="color: #000000;">&#125;</span></pre></div></div>

<p>The next method basically runs ExifTool and parses the output in Json format.</p>

<div class="wp_syntax"><div class="code"><pre class="csharp" style="font-family:monospace;"><span style="color: #0600FF;">private</span> <span style="color: #0600FF;">static</span> ExifFileJson<span style="color: #000000;">&#91;</span><span style="color: #000000;">&#93;</span> GetOriginalDateExifToolBatch<span style="color: #000000;">&#40;</span> <span style="color: #FF0000;">string</span><span style="color: #000000;">&#91;</span><span style="color: #000000;">&#93;</span> files <span style="color: #000000;">&#41;</span>
<span style="color: #000000;">&#123;</span>
    Process oP <span style="color: #008000;">=</span> <span style="color: #008000;">new</span> Process<span style="color: #000000;">&#40;</span><span style="color: #000000;">&#41;</span><span style="color: #008000;">;</span>
    oP.<span style="color: #0000FF;">EnableRaisingEvents</span> <span style="color: #008000;">=</span> false<span style="color: #008000;">;</span>
    oP.<span style="color: #0000FF;">StartInfo</span>.<span style="color: #0000FF;">CreateNoWindow</span> <span style="color: #008000;">=</span> true<span style="color: #008000;">;</span>
    oP.<span style="color: #0000FF;">StartInfo</span>.<span style="color: #0000FF;">LoadUserProfile</span> <span style="color: #008000;">=</span> false<span style="color: #008000;">;</span>
    oP.<span style="color: #0000FF;">StartInfo</span>.<span style="color: #0000FF;">RedirectStandardError</span> <span style="color: #008000;">=</span> false<span style="color: #008000;">;</span>
    oP.<span style="color: #0000FF;">StartInfo</span>.<span style="color: #0000FF;">RedirectStandardOutput</span> <span style="color: #008000;">=</span> true<span style="color: #008000;">;</span>
    oP.<span style="color: #0000FF;">StartInfo</span>.<span style="color: #0000FF;">RedirectStandardInput</span> <span style="color: #008000;">=</span> true<span style="color: #008000;">;</span>
    oP.<span style="color: #0000FF;">StartInfo</span>.<span style="color: #0000FF;">StandardErrorEncoding</span> <span style="color: #008000;">=</span> null<span style="color: #008000;">;</span>
    oP.<span style="color: #0000FF;">StartInfo</span>.<span style="color: #0000FF;">StandardOutputEncoding</span> <span style="color: #008000;">=</span> Encoding.<span style="color: #0000FF;">UTF8</span><span style="color: #008000;">;</span>
    oP.<span style="color: #0000FF;">StartInfo</span>.<span style="color: #0000FF;">UseShellExecute</span> <span style="color: #008000;">=</span> false<span style="color: #008000;">;</span>
    oP.<span style="color: #0000FF;">StartInfo</span>.<span style="color: #0000FF;">WindowStyle</span> <span style="color: #008000;">=</span> ProcessWindowStyle.<span style="color: #0000FF;">Hidden</span><span style="color: #008000;">;</span>
    oP.<span style="color: #0000FF;">StartInfo</span>.<span style="color: #0000FF;">FileName</span> <span style="color: #008000;">=</span> <span style="color: #666666;">@&quot;exiftool.exe&quot;</span><span style="color: #008000;">;</span>
    oP.<span style="color: #0000FF;">StartInfo</span>.<span style="color: #0000FF;">Arguments</span> <span style="color: #008000;">=</span> <span style="color: #666666;">&quot;-EXIF:ModifyDate -EXIF:DateTimeOriginal -EXIF:CreateDate -j -d <span style="color: #008080; font-weight: bold;">\&quot;</span>%Y-%m-%d %H:%M:%S<span style="color: #008080; font-weight: bold;">\&quot;</span> -@ -&quot;</span><span style="color: #008000;">;</span>
    oP.<span style="color: #0000FF;">Start</span><span style="color: #000000;">&#40;</span><span style="color: #000000;">&#41;</span><span style="color: #008000;">;</span>
&nbsp;
    <span style="color: #008080; font-style: italic;">/// Pass all file names in an arg file which is piped to the process (no temporary file)</span>
    <span style="color: #FF0000;">byte</span><span style="color: #000000;">&#91;</span><span style="color: #000000;">&#93;</span> data <span style="color: #008000;">=</span> Encoding.<span style="color: #0000FF;">UTF8</span>.<span style="color: #0000FF;">GetBytes</span><span style="color: #000000;">&#40;</span> <span style="color: #FF0000;">String</span>.<span style="color: #0000FF;">Join</span><span style="color: #000000;">&#40;</span> <span style="color: #666666;">&quot;<span style="color: #008080; font-weight: bold;">\r</span><span style="color: #008080; font-weight: bold;">\n</span>&quot;</span>, files <span style="color: #000000;">&#41;</span> <span style="color: #000000;">&#41;</span><span style="color: #008000;">;</span>
    oP.<span style="color: #0000FF;">StandardInput</span>.<span style="color: #0000FF;">BaseStream</span>.<span style="color: #0000FF;">Write</span><span style="color: #000000;">&#40;</span> data, <span style="color: #FF0000;">0</span>, data.<span style="color: #0000FF;">Length</span> <span style="color: #000000;">&#41;</span><span style="color: #008000;">;</span>
    oP.<span style="color: #0000FF;">StandardInput</span>.<span style="color: #0000FF;">BaseStream</span>.<span style="color: #0000FF;">Close</span><span style="color: #000000;">&#40;</span><span style="color: #000000;">&#41;</span><span style="color: #008000;">;</span>
&nbsp;
    DataContractJsonSerializer deserializer <span style="color: #008000;">=</span> <span style="color: #008000;">new</span> DataContractJsonSerializer<span style="color: #000000;">&#40;</span> <span style="color: #008000;">typeof</span><span style="color: #000000;">&#40;</span> ExifFileJson<span style="color: #000000;">&#91;</span><span style="color: #000000;">&#93;</span> <span style="color: #000000;">&#41;</span> <span style="color: #000000;">&#41;</span><span style="color: #008000;">;</span>
    ExifFileJson<span style="color: #000000;">&#91;</span><span style="color: #000000;">&#93;</span> exif <span style="color: #008000;">=</span> deserializer.<span style="color: #0000FF;">ReadObject</span><span style="color: #000000;">&#40;</span> oP.<span style="color: #0000FF;">StandardOutput</span>.<span style="color: #0000FF;">BaseStream</span> <span style="color: #000000;">&#41;</span> <span style="color: #0600FF;">as</span> ExifFileJson<span style="color: #000000;">&#91;</span><span style="color: #000000;">&#93;</span><span style="color: #008000;">;</span>
&nbsp;
    oP.<span style="color: #0000FF;">WaitForExit</span><span style="color: #000000;">&#40;</span><span style="color: #000000;">&#41;</span><span style="color: #008000;">;</span>
    <span style="color: #0600FF;">return</span> exif<span style="color: #008000;">;</span>
<span style="color: #000000;">&#125;</span></pre></div></div>

<p>The following Unicode-safe way does not rely on the Perl file API, but instead pipes the image to stdin. To avoid out of memory conditions, it might be advisable to read the image file in small chunks using a stream. Do not forget to set the file name in the ExifFileJson object before returning it (ExifTool does not know about the file name).</p>

<div class="wp_syntax"><div class="code"><pre class="csharp" style="font-family:monospace;"><span style="color: #0600FF;">private</span> <span style="color: #0600FF;">static</span> ExifFileJson GetExifImageExifTool<span style="color: #000000;">&#40;</span> <span style="color: #FF0000;">string</span> sFile <span style="color: #000000;">&#41;</span>
<span style="color: #000000;">&#123;</span>
    Process oP <span style="color: #008000;">=</span> <span style="color: #008000;">new</span> Process<span style="color: #000000;">&#40;</span><span style="color: #000000;">&#41;</span><span style="color: #008000;">;</span>
    oP.<span style="color: #0000FF;">EnableRaisingEvents</span> <span style="color: #008000;">=</span> false<span style="color: #008000;">;</span>
    oP.<span style="color: #0000FF;">StartInfo</span>.<span style="color: #0000FF;">CreateNoWindow</span> <span style="color: #008000;">=</span> true<span style="color: #008000;">;</span>
    oP.<span style="color: #0000FF;">StartInfo</span>.<span style="color: #0000FF;">LoadUserProfile</span> <span style="color: #008000;">=</span> false<span style="color: #008000;">;</span>
    oP.<span style="color: #0000FF;">StartInfo</span>.<span style="color: #0000FF;">RedirectStandardError</span> <span style="color: #008000;">=</span> false<span style="color: #008000;">;</span>
    oP.<span style="color: #0000FF;">StartInfo</span>.<span style="color: #0000FF;">RedirectStandardOutput</span> <span style="color: #008000;">=</span> true<span style="color: #008000;">;</span>
    oP.<span style="color: #0000FF;">StartInfo</span>.<span style="color: #0000FF;">RedirectStandardInput</span> <span style="color: #008000;">=</span> true<span style="color: #008000;">;</span>
    oP.<span style="color: #0000FF;">StartInfo</span>.<span style="color: #0000FF;">StandardErrorEncoding</span> <span style="color: #008000;">=</span> null<span style="color: #008000;">;</span>
    oP.<span style="color: #0000FF;">StartInfo</span>.<span style="color: #0000FF;">StandardOutputEncoding</span> <span style="color: #008000;">=</span> Encoding.<span style="color: #0000FF;">UTF8</span><span style="color: #008000;">;</span>
    oP.<span style="color: #0000FF;">StartInfo</span>.<span style="color: #0000FF;">UseShellExecute</span> <span style="color: #008000;">=</span> false<span style="color: #008000;">;</span>
    oP.<span style="color: #0000FF;">StartInfo</span>.<span style="color: #0000FF;">WindowStyle</span> <span style="color: #008000;">=</span> ProcessWindowStyle.<span style="color: #0000FF;">Hidden</span><span style="color: #008000;">;</span>
    oP.<span style="color: #0000FF;">StartInfo</span>.<span style="color: #0000FF;">FileName</span> <span style="color: #008000;">=</span> <span style="color: #666666;">@&quot;exiftool.exe&quot;</span><span style="color: #008000;">;</span>
    oP.<span style="color: #0000FF;">StartInfo</span>.<span style="color: #0000FF;">Arguments</span> <span style="color: #008000;">=</span> <span style="color: #666666;">&quot;-j -EXIF:ModifyDate -EXIF:DateTimeOriginal -EXIF:CreateDate -d <span style="color: #008080; font-weight: bold;">\&quot;</span>%Y-%m-%d %H:%M:%S<span style="color: #008080; font-weight: bold;">\&quot;</span> -&quot;</span><span style="color: #008000;">;;</span>
    oP.<span style="color: #0000FF;">Start</span><span style="color: #000000;">&#40;</span><span style="color: #000000;">&#41;</span><span style="color: #008000;">;</span>
&nbsp;
    <span style="color: #FF0000;">byte</span><span style="color: #000000;">&#91;</span><span style="color: #000000;">&#93;</span> image <span style="color: #008000;">=</span> File.<span style="color: #0000FF;">ReadAllBytes</span><span style="color: #000000;">&#40;</span> sFile <span style="color: #000000;">&#41;</span><span style="color: #008000;">;</span>
    oP.<span style="color: #0000FF;">StandardInput</span>.<span style="color: #0000FF;">BaseStream</span>.<span style="color: #0000FF;">Write</span><span style="color: #000000;">&#40;</span> image, <span style="color: #FF0000;">0</span>, image.<span style="color: #0000FF;">Length</span> <span style="color: #000000;">&#41;</span><span style="color: #008000;">;</span>
    oP.<span style="color: #0000FF;">StandardInput</span>.<span style="color: #0000FF;">BaseStream</span>.<span style="color: #0000FF;">Close</span><span style="color: #000000;">&#40;</span><span style="color: #000000;">&#41;</span><span style="color: #008000;">;</span>
&nbsp;
    DataContractJsonSerializer deserializer <span style="color: #008000;">=</span> <span style="color: #008000;">new</span> DataContractJsonSerializer<span style="color: #000000;">&#40;</span> <span style="color: #008000;">typeof</span><span style="color: #000000;">&#40;</span> ExifFileJson<span style="color: #000000;">&#91;</span><span style="color: #000000;">&#93;</span> <span style="color: #000000;">&#41;</span> <span style="color: #000000;">&#41;</span><span style="color: #008000;">;</span>
    ExifFileJson<span style="color: #000000;">&#91;</span><span style="color: #000000;">&#93;</span> exif <span style="color: #008000;">=</span> deserializer.<span style="color: #0000FF;">ReadObject</span><span style="color: #000000;">&#40;</span> oP.<span style="color: #0000FF;">StandardOutput</span>.<span style="color: #0000FF;">BaseStream</span> <span style="color: #000000;">&#41;</span> <span style="color: #0600FF;">as</span> ExifFileJson<span style="color: #000000;">&#91;</span><span style="color: #000000;">&#93;</span><span style="color: #008000;">;</span>
&nbsp;
    oP.<span style="color: #0000FF;">WaitForExit</span><span style="color: #000000;">&#40;</span><span style="color: #000000;">&#41;</span><span style="color: #008000;">;</span>
    <span style="color: #0600FF;">if</span> <span style="color: #000000;">&#40;</span> exif.<span style="color: #0000FF;">Length</span> <span style="color: #008000;">&gt;</span> <span style="color: #FF0000;">0</span> <span style="color: #000000;">&#41;</span>
        exif<span style="color: #000000;">&#91;</span> <span style="color: #FF0000;">0</span> <span style="color: #000000;">&#93;</span>.<span style="color: #0000FF;">SourceFile</span> <span style="color: #008000;">=</span> sFile<span style="color: #008000;">;</span>
    <span style="color: #0600FF;">return</span> exif.<span style="color: #0000FF;">FirstOrDefault</span><span style="color: #000000;">&#40;</span><span style="color: #000000;">&#41;</span><span style="color: #008000;">;</span>
<span style="color: #000000;">&#125;</span></pre></div></div>

<p>In case you wonder about the Json class we use for deserializing output:</p>

<div class="wp_syntax"><div class="code"><pre class="csharp" style="font-family:monospace;"><span style="color: #000000;">&#91;</span>DataContract<span style="color: #000000;">&#93;</span>
<span style="color: #0600FF;">public</span> <span style="color: #FF0000;">class</span> ExifFileJson
<span style="color: #000000;">&#123;</span>
    <span style="color: #000000;">&#91;</span>DataMember<span style="color: #000000;">&#40;</span> IsRequired <span style="color: #008000;">=</span> <span style="color: #0600FF;">true</span>, Name <span style="color: #008000;">=</span> <span style="color: #666666;">&quot;SourceFile&quot;</span> <span style="color: #000000;">&#41;</span><span style="color: #000000;">&#93;</span>
    <span style="color: #0600FF;">public</span> <span style="color: #FF0000;">string</span> SourceFile<span style="color: #008000;">;</span>
    <span style="color: #000000;">&#91;</span>OnDeserializedAttribute<span style="color: #000000;">&#40;</span><span style="color: #000000;">&#41;</span><span style="color: #000000;">&#93;</span>
    <span style="color: #0600FF;">internal</span> <span style="color: #0600FF;">void</span> ReplaceBackSlashes<span style="color: #000000;">&#40;</span> StreamingContext context <span style="color: #000000;">&#41;</span> <span style="color: #000000;">&#123;</span> <span style="color: #0600FF;">this</span>.<span style="color: #0000FF;">SourceFile</span> <span style="color: #008000;">=</span> <span style="color: #0600FF;">this</span>.<span style="color: #0000FF;">SourceFile</span>.<span style="color: #0000FF;">Replace</span><span style="color: #000000;">&#40;</span> <span style="color: #666666;">'/'</span>, <span style="color: #666666;">'<span style="color: #008080; font-weight: bold;">\\</span>'</span> <span style="color: #000000;">&#41;</span><span style="color: #008000;">;</span> <span style="color: #000000;">&#125;</span>
&nbsp;
    <span style="color: #000000;">&#91;</span>DataMember<span style="color: #000000;">&#40;</span> IsRequired <span style="color: #008000;">=</span> <span style="color: #0600FF;">false</span> <span style="color: #000000;">&#41;</span><span style="color: #000000;">&#93;</span>
    <span style="color: #0600FF;">public</span> <span style="color: #FF0000;">string</span> DateTimeOriginal<span style="color: #008000;">;</span>
    <span style="color: #000000;">&#91;</span>DataMember<span style="color: #000000;">&#40;</span> IsRequired <span style="color: #008000;">=</span> <span style="color: #0600FF;">false</span> <span style="color: #000000;">&#41;</span><span style="color: #000000;">&#93;</span>
    <span style="color: #0600FF;">public</span> <span style="color: #FF0000;">string</span> CreateDate<span style="color: #008000;">;</span>
    <span style="color: #000000;">&#91;</span>DataMember<span style="color: #000000;">&#40;</span> IsRequired <span style="color: #008000;">=</span> <span style="color: #0600FF;">false</span> <span style="color: #000000;">&#41;</span><span style="color: #000000;">&#93;</span>
    <span style="color: #0600FF;">public</span> <span style="color: #FF0000;">string</span> ModifyDate<span style="color: #008000;">;</span>
<span style="color: #000000;">&#125;</span></pre></div></div>

<p>Basically we declare required and optional attributes and a name mapping if necessary. Remember to replace forward slashes to backslashes for the file names, since these are returned in Unix style. It is probably not a good idea to parse dates as DateTime? nullables, since there could be some images with unparsable dates, which will result in a parsing exception. If you would still want to do it, remember to decorate the dates in Json format in ExifTool: <em>-d &#8220;/Date(%Y-%m-%d %H:%M:%S)/&#8221;</em>.</p>
<p><span class="header">Other Options</span></p>
<p>Certainly we could reach a better performance and less coding overhead if we had a way of batch processing files independent of their name and path. If you have a perl environment on your machine with the <a href="http://kobesearch.cpan.org/htdocs/Win32-API/Win32/API.html">Win32::API</a> module, you could rewrite the above code within a Perl script and therefore get much better performance even when reading Unicode files.</p>
<p>There is also another option: It is possible to add Unicode file name support into the Perl interpreter for Windows. I recently did a a proof-of-concept which shows that ExifTool (or any UTF-8 supporting Perl app) could be using Unicode file names in Windows without changing a line of code, as long as it is executed with a Unicode supporting interpreter. The source code of Perl is pretty big however, and I am afraid I won&#8217;t be able to invest enough time to do a bullet-proof implementation.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.christian-etter.de/?feed=rss2&amp;p=476</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>ExifTool Performance Benchmark</title>
		<link>http://www.christian-etter.de/?p=458</link>
		<comments>http://www.christian-etter.de/?p=458#comments</comments>
		<pubDate>Thu, 15 Apr 2010 19:45:14 +0000</pubDate>
		<dc:creator>Christian Etter</dc:creator>
				<category><![CDATA[C#]]></category>
		<category><![CDATA[Software Development]]></category>
		<category><![CDATA[.NET]]></category>
		<category><![CDATA[C++]]></category>
		<category><![CDATA[ExifTool]]></category>
		<category><![CDATA[Performance]]></category>
		<category><![CDATA[Unicode]]></category>

		<guid isPermaLink="false">http://www.christian-etter.de/?p=458</guid>
		<description><![CDATA[I had some time to look into the performance of reading image meta data using ExifTool and the GDI+ API. If you are planning to use Exiftool in your own software project, you might find the following information useful: Test Environment Tests have been conducted on a Core i7 720QM system with 4 GB RAM [...]]]></description>
			<content:encoded><![CDATA[<p>I had some time to look into the performance of reading image meta data using ExifTool and the GDI+ API. If you are planning to use Exiftool in your own software project, you might find the following information useful:</p>
<p><span class="header">Test Environment</span><br />
Tests have been conducted on a Core i7 720QM system with 4 GB RAM and a 7200 rpm HD, running Windows 7 64 bit. The testing application was written in C# using .NET Framework 3.5 and Linq.</p>
<p><span class="header">Preparation</span></p>
<p>We are benchmarking different approaches reading a total of 1000 jpg files and measuring the total time taken to read 3 EXIF dates from each file (if present). In order to minimize misleading results when reading files which have previously been cached, we are reading all files once before performing the test.</p>
<p><span class="header">Reading files without processing</span></p>
<p>We read every file into memory after it has been cached in the file system cache, so this is more or less an in-memory operation. This gives us a clue regarding the file access overhead.</p>
<p><strong>Duration for 1000 files: 226 ms.</strong></p>
<p><span class="header">Reading meta data with GDI+</span></p>
<p>Using the .NET builtin Image class, reading and parsing of 3 attributes is rather fast, taking less than 2 seconds.</p>
<p><strong>Duration for 1000 files: 1632 ms.</strong></p>
<p><span class="header">Reading meta data using ExifTool</span></p>
<p>We call ExifTool for every file and parse the return values as DateTime. In order to support all filenames (also names which can only be represented in Unicode), we read the file into memory first and then pipe it to stdin of ExifTool. This is the slowest of all reading modes, taking about 200 times as long as GDI+. As we will see soon, the actual preformance hit ist the startup time of the Perl interpreter, and not ExifTool itself.</p>
<p><strong>Duration for 1000 files:  337000 ms</strong>.</p>
<p><span class="header">Reading meta data using ExifTool (-fast)</span></p>
<p>Same as above, but using the -fast option in ExifTool, which will prematurely cancel reading from stdin, once a sufficient amount of data has been found. This should increase performance especially when reading files directly over a slow network, but as we can see, it does not make a big difference in our case. Throughput increases by less than 10 percent.</p>
<p><strong>Duration for 1000 files: 317000 ms.</strong><br />
<span class="header">Reading meta data using ExifTool (-fast2)</span><br />
In addition to the -fast option,  using -fast2 should allow for faster processing by ommitting maker notes.  The effect is small in our case, which might also be because not all images in the test set contain maker notes. As we will see later, the actual work performed by ExifTool is only very small compared to the Perl startup time. Using the evidence gathered in this test, Perl consumes as much as 98.4% of the total execution time, whereas reading and parsing of the jpeg file only takes 1.6%.</p>
<p><strong>Duration for 1000 files: 311000 ms.</strong></p>
<p><span class="header">Reading meta data using ExifTool (without extraction)</span></p>
<p>Since exiftool.exe unpacks a payload of 967 files in 60 folders (8.67 MB), one approach was to skip the extraction process and call ExifTool directly in the temporary folder with all the PAR environment variables set. Unfortunately this could not deliver any improvement worth the effort. Therefore we can say that the amount of time used for extraction is hardly relevant.</p>
<p><strong>Duration for 1000 files: 314000 ms.</strong></p>
<p><span class="header">Reading meta data using ExifTool (multiple threads)</span></p>
<p>Since the entire operation is clearly CPU bound, the obvious optimization on a multi core / hyper threading system would be to execute ExifTool in parallel. For our test case, we are executing 8 instances of ExifTool at the same time, using 8 threads (the test system has 8 virtual cores). This results in a signifficant speed up, more than 3 times as fast as a serialized execution:</p>
<p><strong>Duration for 1000 files: 99000 ms.</strong></p>
<p><span class="header">Reading meta data using ExifTool (batch mode)</span></p>
<p>The end of the road? With the multi threaded approach we have reached the maximum optimization possible when calling ExifTool seperately for each file. This technique gave us the advantage of being able to handle file names with arbitrary character sets and detailed progress and error reporting.</p>
<p>The only way of getting more (much more) throughput is to use ExifTool in batch mode, thus minimizing the actual impact of the Perl startup time.</p>
<p>As before, ExifTool has been set to read three Date tags from each file and format the text output. This time we are using Json as the output format, since it allows us to easily parse the result of each file.  In order to avoid getting a command line which is too long, we are writing all files into an &#8216;argfile&#8217;, which is then written to stdin of Exiftool using <em>-@ -</em>.</p>
<p>This approach is F A S T! We are suddenly doing the same job within 1.6% of the previous time!</p>
<p><strong>Duration for 1000 files: 5669 ms.</strong></p>
<p><span class="header">Reading meta data using ExifTool (batch mode, -fast2)</span></p>
<p>No big improvements here, we are ending up with almost the same amount of time. This might be due to the fact that only part of the tested files contain maker notes, so results may vary.</p>
<p><strong>Duration  for 1000 files: 5422 ms.</strong></p>
<p><span class="header"><strong>Conclusion</strong></span></p>
<p>In the above test we have learned that iterating file by file is extremely slow when processing large amounts of files using ExifTool. The performance impact is caused by the overhead of calling the Perl interpreter, and is therefore difficult to minimize when calling ExifTool once for each file. Depending on the kind of data needed, it is therefore worthwhile to look into other solutions such as batch processing or using a different API.</p>
<p>When using ExifTool in batch mode, it is strongly recommended to check whether or not a file name and path can be represented using the current system&#8217;s code page. Otherwise such files will not be read. It is not safe to call ExifTool with MS-DOS &#8217;8dot3&#8242; filenames, since these are not converted to ANSI in case of eastern asian characters. Also, these names are not guaranteed to be present for each file on the NTFS file system.</p>
<p>For an ExifTool-only solution reading meta data, the following approach is recommended when reading larger amounts of files:</p>
<ol>
<li>Create a list of image files to be processed.</li>
<li>Convert each file name into a UTF-8 byte sequence and decode using the current system ANSI code page. Then compare the resulting string to the original path of the file. If both are not equal, you must not use this file in batch processing (ExifTool cannot open it) and instead read it in a separate call. Why should you convert to UTF-8 and then to ANSI? This might be confusing at first, but it mimics exactly what happens behind the scenes. You need to pass the file names as UTF-8 to ExifTool, since they appear again in the output (which should be UTF-8).</li>
<li>Write a list of UTF-8 encoded file names to stdin of ExifTool, do not pass them on the command line, since you might hit the maximum length of the command line and truncate it unknowingly.</li>
<li>Call ExifTool, preferably using -J option for generating output in Json format. Json allows for more efficient parsing (compared to XML) of the result. </li>
<li>Parse the result and ensure you are getting a result for each file. There are many libraries to handle Json parsing, in .NET it is done in a few lines of code. If you had entered file names in ANSI encoding in the previous step, you would run into errors here because your output would contain a mix of encodings.</li>
<li>All files which did not pass the UTF-8 to ANSI encoding roundtrip need to be processed one by one.</li>
</ol>
<p><span class="header">Source Code</span></p>
<p>The source code of the test app is available for <a href="http://www.christian-etter.de/wp-content/uploads/EXIF.zip">download here</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.christian-etter.de/?feed=rss2&amp;p=458</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Retrieving Image Meta-Data using GDI+ and ExifTool</title>
		<link>http://www.christian-etter.de/?p=448</link>
		<comments>http://www.christian-etter.de/?p=448#comments</comments>
		<pubDate>Wed, 14 Apr 2010 09:58:37 +0000</pubDate>
		<dc:creator>Christian Etter</dc:creator>
				<category><![CDATA[C#]]></category>
		<category><![CDATA[Software Development]]></category>
		<category><![CDATA[C++]]></category>
		<category><![CDATA[EXIF]]></category>
		<category><![CDATA[ExifTool]]></category>
		<category><![CDATA[GDI]]></category>
		<category><![CDATA[Linq]]></category>
		<category><![CDATA[Process]]></category>
		<category><![CDATA[Unicode]]></category>

		<guid isPermaLink="false">http://www.christian-etter.de/?p=448</guid>
		<description><![CDATA[How to read image meta data in .NET? Here we illustrate two techniques: First, for the sake of speed and simplicity, we chose the GDI+ builtin capabilities of the Image class: /// &#60;summary&#62;Much faster than using Exiftool. In case GDI+ cannot decode the date string we use Exiftool.&#60;/summary&#62; private DateTime? GetOriginalDate&#40; string sFileName &#41; &#123; [...]]]></description>
			<content:encoded><![CDATA[<p>How to read image meta data in .NET? Here we illustrate two techniques:</p>
<p>First, for the sake of speed and simplicity, we chose the <em>GDI+</em> builtin capabilities of the <em>Image</em> class:</p>

<div class="wp_syntax"><div class="code"><pre class="csharp" style="font-family:monospace;"><span style="color: #008080; font-style: italic;">/// &lt;summary&gt;Much faster than using Exiftool. In case GDI+ cannot decode the date string we use Exiftool.&lt;/summary&gt;</span>
<span style="color: #0600FF;">private</span> DateTime<span style="color: #008000;">?</span> GetOriginalDate<span style="color: #000000;">&#40;</span> <span style="color: #FF0000;">string</span> sFileName <span style="color: #000000;">&#41;</span>
<span style="color: #000000;">&#123;</span>
    <span style="color: #0600FF;">using</span> <span style="color: #000000;">&#40;</span> FileStream stream <span style="color: #008000;">=</span> <span style="color: #008000;">new</span> FileStream<span style="color: #000000;">&#40;</span> sFileName, FileMode.<span style="color: #0000FF;">Open</span>, FileAccess.<span style="color: #0000FF;">Read</span> <span style="color: #000000;">&#41;</span> <span style="color: #000000;">&#41;</span>
    <span style="color: #000000;">&#123;</span>
        <span style="color: #0600FF;">using</span> <span style="color: #000000;">&#40;</span> Image img <span style="color: #008000;">=</span> Image.<span style="color: #0000FF;">FromStream</span><span style="color: #000000;">&#40;</span> stream, <span style="color: #0600FF;">false</span>, <span style="color: #0600FF;">false</span> <span style="color: #000000;">&#41;</span> <span style="color: #000000;">&#41;</span>
        <span style="color: #000000;">&#123;</span>
            <span style="color: #FF0000;">int</span><span style="color: #000000;">&#91;</span><span style="color: #000000;">&#93;</span> date_tags <span style="color: #008000;">=</span> <span style="color: #008000;">new</span> <span style="color: #FF0000;">int</span><span style="color: #000000;">&#91;</span><span style="color: #000000;">&#93;</span> <span style="color: #000000;">&#123;</span> <span style="color: #FF0000;">36867</span>, <span style="color: #FF0000;">36868</span>, <span style="color: #FF0000;">306</span> <span style="color: #000000;">&#125;</span><span style="color: #008000;">;</span> <span style="color: #008080; font-style: italic;">// tag numbers with dates</span>
            <span style="color: #FF0000;">string</span><span style="color: #000000;">&#91;</span><span style="color: #000000;">&#93;</span> s1 <span style="color: #008000;">=</span> <span style="color: #000000;">&#40;</span> from x <span style="color: #0600FF;">in</span> date_tags where img.<span style="color: #0000FF;">PropertyIdList</span>.<span style="color: #0000FF;">Contains</span><span style="color: #000000;">&#40;</span> x <span style="color: #000000;">&#41;</span> select Encoding.<span style="color: #0000FF;">ASCII</span>.<span style="color: #0000FF;">GetString</span><span style="color: #000000;">&#40;</span> img.<span style="color: #0000FF;">GetPropertyItem</span><span style="color: #000000;">&#40;</span> x <span style="color: #000000;">&#41;</span>.<span style="color: #0000FF;">Value</span> <span style="color: #000000;">&#41;</span>.<span style="color: #0000FF;">Replace</span><span style="color: #000000;">&#40;</span> <span style="color: #666666;">&quot;<span style="color: #008080; font-weight: bold;">\0</span>&quot;</span>, <span style="color: #666666;">&quot;&quot;</span> <span style="color: #000000;">&#41;</span><span style="color: #000000;">&#41;</span>.<span style="color: #0000FF;">ToArray</span><span style="color: #000000;">&#40;</span><span style="color: #000000;">&#41;</span><span style="color: #008000;">;</span> <span style="color: #008080; font-style: italic;">// get date as string without training \0</span>
            DateTime d<span style="color: #008000;">;</span>
            DateTime<span style="color: #008000;">?</span><span style="color: #000000;">&#91;</span><span style="color: #000000;">&#93;</span> dd <span style="color: #008000;">=</span> <span style="color: #000000;">&#40;</span> from x <span style="color: #0600FF;">in</span> s1 where x.<span style="color: #0000FF;">Trim</span><span style="color: #000000;">&#40;</span><span style="color: #000000;">&#41;</span>.<span style="color: #0000FF;">Length</span> <span style="color: #008000;">&gt;</span> <span style="color: #FF0000;">0</span>
                select DateTime.<span style="color: #0000FF;">TryParseExact</span><span style="color: #000000;">&#40;</span> x, <span style="color: #008000;">new</span> <span style="color: #FF0000;">string</span><span style="color: #000000;">&#91;</span><span style="color: #000000;">&#93;</span> <span style="color: #000000;">&#123;</span> <span style="color: #666666;">&quot;yyyy:MM:dd HH:mm:ss&quot;</span>, <span style="color: #666666;">&quot;yyyy-MM-dd HH:mm:ss&quot;</span>, <span style="color: #666666;">&quot;MM/dd/yyyy HH:mm:ss&quot;</span>, <span style="color: #666666;">&quot;yyyy-MM-dd'T'HH:mm:sszzz&quot;</span> <span style="color: #000000;">&#125;</span>, CultureInfo.<span style="color: #0000FF;">InvariantCulture</span>, DateTimeStyles.<span style="color: #0000FF;">AllowWhiteSpaces</span> <span style="color: #008000;">|</span> DateTimeStyles.<span style="color: #0000FF;">AssumeLocal</span>, <span style="color: #0600FF;">out</span> d <span style="color: #000000;">&#41;</span> <span style="color: #008000;">?</span> d <span style="color: #0600FF;">as</span> DateTime<span style="color: #008000;">?</span> <span style="color: #008000;">:</span> 
                <span style="color: #0600FF;">null</span> <span style="color: #000000;">&#41;</span>.<span style="color: #0000FF;">ToArray</span><span style="color: #000000;">&#40;</span><span style="color: #000000;">&#41;</span><span style="color: #008000;">;</span> <span style="color: #008080; font-style: italic;">// we see if we can parse all the date attributes found</span>
            <span style="color: #0600FF;">if</span> <span style="color: #000000;">&#40;</span> dd.<span style="color: #0000FF;">Where</span><span style="color: #000000;">&#40;</span> x <span style="color: #008000;">=&gt;</span> <span style="color: #008000;">!</span>x.<span style="color: #0000FF;">HasValue</span> <span style="color: #000000;">&#41;</span>.<span style="color: #0000FF;">Count</span><span style="color: #000000;">&#40;</span><span style="color: #000000;">&#41;</span> <span style="color: #008000;">&gt;</span> <span style="color: #FF0000;">0</span> <span style="color: #000000;">&#41;</span>
                <span style="color: #0600FF;">return</span> GetOldestExifDateExifTool<span style="color: #000000;">&#40;</span> sFileName <span style="color: #000000;">&#41;</span><span style="color: #008000;">;</span> <span style="color: #008080; font-style: italic;">// if there is something in the date attribute we cannot parse we ask exiftool.</span>
            <span style="color: #0600FF;">else</span>
                <span style="color: #0600FF;">return</span> <span style="color: #000000;">&#40;</span> from x <span style="color: #0600FF;">in</span> dd where x.<span style="color: #0000FF;">Value</span> <span style="color: #008000;">&gt;</span> <span style="color: #008000;">new</span> DateTime<span style="color: #000000;">&#40;</span> <span style="color: #FF0000;">1990</span>, 01, 01 <span style="color: #000000;">&#41;</span> <span style="color: #008000;">&amp;&amp;</span> x.<span style="color: #0000FF;">Value</span> <span style="color: #008000;">&lt;</span> DateTime.<span style="color: #0000FF;">UtcNow</span> select x <span style="color: #000000;">&#41;</span>.<span style="color: #0000FF;">Min</span><span style="color: #000000;">&#40;</span><span style="color: #000000;">&#41;</span><span style="color: #008000;">;</span> <span style="color: #008080; font-style: italic;">// make sure we use a valid date range</span>
        <span style="color: #000000;">&#125;</span>
    <span style="color: #000000;">&#125;</span>
<span style="color: #000000;">&#125;</span></pre></div></div>

<p>Since the EXIF standard defines date values to be stored as text data, sometimes we find non-standard date formats. This includes dates being stored with milliseconds added, using different separator characters or including an additional UTC offset. Exiftool does a pretty decent job interpreting all those values as a date, plus it might be capable of reading certain off-standard or broken meta-data which GDI+ doesn&#8217;t.</p>
<p>Here is the second approach:</p>

<div class="wp_syntax"><div class="code"><pre class="csharp" style="font-family:monospace;"><span style="color: #008080; font-style: italic;">/// &lt;summary&gt;Extracts the oldest possible EXIF date. Can process 3 files per second, very slow, will need 8 hours for 90.000 files.&lt;/summary&gt;</span>
<span style="color: #0600FF;">private</span> <span style="color: #0600FF;">static</span> DateTime<span style="color: #008000;">?</span> GetOldestExifDateExifTool<span style="color: #000000;">&#40;</span> <span style="color: #FF0000;">string</span> sFile <span style="color: #000000;">&#41;</span>
<span style="color: #000000;">&#123;</span>
    Process oP <span style="color: #008000;">=</span> <span style="color: #008000;">new</span> Process<span style="color: #000000;">&#40;</span><span style="color: #000000;">&#41;</span><span style="color: #008000;">;</span>
    oP.<span style="color: #0000FF;">EnableRaisingEvents</span> <span style="color: #008000;">=</span> false<span style="color: #008000;">;</span>
    oP.<span style="color: #0000FF;">StartInfo</span>.<span style="color: #0000FF;">CreateNoWindow</span> <span style="color: #008000;">=</span> true<span style="color: #008000;">;</span>
    oP.<span style="color: #0000FF;">StartInfo</span>.<span style="color: #0000FF;">LoadUserProfile</span> <span style="color: #008000;">=</span> false<span style="color: #008000;">;</span>
    oP.<span style="color: #0000FF;">StartInfo</span>.<span style="color: #0000FF;">RedirectStandardError</span> <span style="color: #008000;">=</span> false<span style="color: #008000;">;</span>
    oP.<span style="color: #0000FF;">StartInfo</span>.<span style="color: #0000FF;">RedirectStandardOutput</span> <span style="color: #008000;">=</span> true<span style="color: #008000;">;</span>
    oP.<span style="color: #0000FF;">StartInfo</span>.<span style="color: #0000FF;">RedirectStandardInput</span> <span style="color: #008000;">=</span> true<span style="color: #008000;">;</span>
    oP.<span style="color: #0000FF;">StartInfo</span>.<span style="color: #0000FF;">StandardErrorEncoding</span> <span style="color: #008000;">=</span> null<span style="color: #008000;">;</span>
    oP.<span style="color: #0000FF;">StartInfo</span>.<span style="color: #0000FF;">StandardOutputEncoding</span> <span style="color: #008000;">=</span> Encoding.<span style="color: #0000FF;">UTF8</span><span style="color: #008000;">;</span>
    oP.<span style="color: #0000FF;">StartInfo</span>.<span style="color: #0000FF;">UseShellExecute</span> <span style="color: #008000;">=</span> false<span style="color: #008000;">;</span>
    oP.<span style="color: #0000FF;">StartInfo</span>.<span style="color: #0000FF;">WindowStyle</span> <span style="color: #008000;">=</span> ProcessWindowStyle.<span style="color: #0000FF;">Hidden</span><span style="color: #008000;">;</span>
    oP.<span style="color: #0000FF;">StartInfo</span>.<span style="color: #0000FF;">FileName</span> <span style="color: #008000;">=</span> <span style="color: #666666;">@&quot;exiftool.exe&quot;</span><span style="color: #008000;">;</span>
    oP.<span style="color: #0000FF;">StartInfo</span>.<span style="color: #0000FF;">Arguments</span> <span style="color: #008000;">=</span> <span style="color: #666666;">&quot;-s -s -EXIF:ModifyDate -EXIF:DateTimeOriginal -EXIF:CreateDate -d <span style="color: #008080; font-weight: bold;">\&quot;</span>%Y-%m-%d %H:%M:%S<span style="color: #008080; font-weight: bold;">\&quot;</span> -&quot;</span><span style="color: #008000;">;</span>
    oP.<span style="color: #0000FF;">Start</span><span style="color: #000000;">&#40;</span><span style="color: #000000;">&#41;</span><span style="color: #008000;">;</span>
&nbsp;
    <span style="color: #FF0000;">byte</span><span style="color: #000000;">&#91;</span><span style="color: #000000;">&#93;</span> image <span style="color: #008000;">=</span> File.<span style="color: #0000FF;">ReadAllBytes</span><span style="color: #000000;">&#40;</span> sFile <span style="color: #000000;">&#41;</span><span style="color: #008000;">;</span>
    oP.<span style="color: #0000FF;">StandardInput</span>.<span style="color: #0000FF;">BaseStream</span>.<span style="color: #0000FF;">Write</span><span style="color: #000000;">&#40;</span> image, <span style="color: #FF0000;">0</span>, image.<span style="color: #0000FF;">Length</span> <span style="color: #000000;">&#41;</span><span style="color: #008000;">;</span>
    oP.<span style="color: #0000FF;">StandardInput</span>.<span style="color: #0000FF;">BaseStream</span>.<span style="color: #0000FF;">Flush</span><span style="color: #000000;">&#40;</span><span style="color: #000000;">&#41;</span><span style="color: #008000;">;</span>
    oP.<span style="color: #0000FF;">StandardInput</span>.<span style="color: #0000FF;">BaseStream</span>.<span style="color: #0000FF;">Close</span><span style="color: #000000;">&#40;</span><span style="color: #000000;">&#41;</span><span style="color: #008000;">;</span>
    <span style="color: #FF0000;">string</span> sStdOut <span style="color: #008000;">=</span> oP.<span style="color: #0000FF;">StandardOutput</span>.<span style="color: #0000FF;">ReadToEnd</span><span style="color: #000000;">&#40;</span><span style="color: #000000;">&#41;</span><span style="color: #008000;">;</span>
    oP.<span style="color: #0000FF;">WaitForExit</span><span style="color: #000000;">&#40;</span><span style="color: #000000;">&#41;</span><span style="color: #008000;">;</span>
&nbsp;
    <span style="color: #FF0000;">string</span><span style="color: #000000;">&#91;</span><span style="color: #000000;">&#93;</span> datetags <span style="color: #008000;">=</span> <span style="color: #008000;">new</span> <span style="color: #FF0000;">string</span><span style="color: #000000;">&#91;</span><span style="color: #000000;">&#93;</span> <span style="color: #000000;">&#123;</span> <span style="color: #666666;">&quot;DateTimeOriginal&quot;</span>, <span style="color: #666666;">&quot;CreateDate&quot;</span>, <span style="color: #666666;">&quot;ModifyDate&quot;</span> <span style="color: #000000;">&#125;</span><span style="color: #008000;">;</span>
    <span style="color: #FF0000;">string</span><span style="color: #000000;">&#91;</span><span style="color: #000000;">&#93;</span> res1 <span style="color: #008000;">=</span> sStdOut.<span style="color: #0000FF;">Split</span><span style="color: #000000;">&#40;</span> <span style="color: #008000;">new</span> <span style="color: #FF0000;">string</span><span style="color: #000000;">&#91;</span><span style="color: #000000;">&#93;</span> <span style="color: #000000;">&#123;</span> <span style="color: #666666;">&quot;<span style="color: #008080; font-weight: bold;">\r</span><span style="color: #008080; font-weight: bold;">\n</span>&quot;</span> <span style="color: #000000;">&#125;</span>, StringSplitOptions.<span style="color: #0000FF;">RemoveEmptyEntries</span> <span style="color: #000000;">&#41;</span><span style="color: #008000;">;</span> <span style="color: #008080; font-style: italic;">// split lines</span>
    <span style="color: #FF0000;">string</span><span style="color: #000000;">&#91;</span><span style="color: #000000;">&#93;</span><span style="color: #000000;">&#91;</span><span style="color: #000000;">&#93;</span> res2 <span style="color: #008000;">=</span> <span style="color: #000000;">&#40;</span> from x <span style="color: #0600FF;">in</span> res1 select x.<span style="color: #0000FF;">Split</span><span style="color: #000000;">&#40;</span> <span style="color: #008000;">new</span> <span style="color: #FF0000;">char</span><span style="color: #000000;">&#91;</span><span style="color: #000000;">&#93;</span> <span style="color: #000000;">&#123;</span> <span style="color: #666666;">':'</span> <span style="color: #000000;">&#125;</span>, <span style="color: #FF0000;">2</span> <span style="color: #000000;">&#41;</span> <span style="color: #000000;">&#41;</span>.<span style="color: #0000FF;">ToArray</span><span style="color: #000000;">&#40;</span><span style="color: #000000;">&#41;</span><span style="color: #008000;">;</span> <span style="color: #008080; font-style: italic;">// split after colon to separate attributes and values</span>
    <span style="color: #FF0000;">string</span><span style="color: #000000;">&#91;</span><span style="color: #000000;">&#93;</span> res3 <span style="color: #008000;">=</span> <span style="color: #000000;">&#40;</span> from x <span style="color: #0600FF;">in</span> res2 where x.<span style="color: #0000FF;">Length</span> <span style="color: #008000;">==</span> <span style="color: #FF0000;">2</span> <span style="color: #008000;">&amp;&amp;</span> datetags.<span style="color: #0000FF;">Contains</span><span style="color: #000000;">&#40;</span> x<span style="color: #000000;">&#91;</span> <span style="color: #FF0000;">0</span> <span style="color: #000000;">&#93;</span>, StringComparer.<span style="color: #0000FF;">InvariantCultureIgnoreCase</span> <span style="color: #000000;">&#41;</span> select x<span style="color: #000000;">&#91;</span> <span style="color: #FF0000;">1</span> <span style="color: #000000;">&#93;</span> <span style="color: #000000;">&#41;</span>.<span style="color: #0000FF;">ToArray</span><span style="color: #000000;">&#40;</span><span style="color: #000000;">&#41;</span><span style="color: #008000;">;</span> <span style="color: #008080; font-style: italic;">// only chose lines of date attributes</span>
    DateTime d<span style="color: #008000;">;</span>
    DateTime<span style="color: #008000;">?</span><span style="color: #000000;">&#91;</span><span style="color: #000000;">&#93;</span> dd <span style="color: #008000;">=</span> <span style="color: #000000;">&#40;</span> from x <span style="color: #0600FF;">in</span> res3 select DateTime.<span style="color: #0000FF;">TryParseExact</span><span style="color: #000000;">&#40;</span> x, <span style="color: #666666;">&quot;yyyy-MM-dd HH:mm:ss&quot;</span>, CultureInfo.<span style="color: #0000FF;">InvariantCulture</span>, DateTimeStyles.<span style="color: #0000FF;">AllowWhiteSpaces</span> <span style="color: #008000;">|</span> DateTimeStyles.<span style="color: #0000FF;">AssumeLocal</span>, <span style="color: #0600FF;">out</span> d <span style="color: #000000;">&#41;</span> <span style="color: #008000;">?</span> d <span style="color: #0600FF;">as</span> DateTime<span style="color: #008000;">?</span> <span style="color: #008000;">:</span> <span style="color: #0600FF;">null</span> <span style="color: #000000;">&#41;</span>.<span style="color: #0000FF;">ToArray</span><span style="color: #000000;">&#40;</span><span style="color: #000000;">&#41;</span><span style="color: #008000;">;</span>
    DateTime<span style="color: #008000;">?</span> oDate <span style="color: #008000;">=</span> <span style="color: #000000;">&#40;</span> from x <span style="color: #0600FF;">in</span> dd where x.<span style="color: #0000FF;">HasValue</span> <span style="color: #008000;">&amp;&amp;</span> x <span style="color: #008000;">&gt;</span> <span style="color: #008000;">new</span> DateTime<span style="color: #000000;">&#40;</span> <span style="color: #FF0000;">1990</span>, 01, 01 <span style="color: #000000;">&#41;</span> <span style="color: #008000;">&amp;&amp;</span> x <span style="color: #008000;">&lt;</span> DateTime.<span style="color: #0000FF;">UtcNow</span> select x <span style="color: #000000;">&#41;</span>.<span style="color: #0000FF;">Min</span><span style="color: #000000;">&#40;</span><span style="color: #000000;">&#41;</span><span style="color: #008000;">;</span>
    <span style="color: #0600FF;">return</span> oDate<span style="color: #008000;">;</span>
<span style="color: #000000;">&#125;</span></pre></div></div>

<p>Basically Exiftool has three shortcomings when used within another program:</p>
<ol>
<li><strong>It is not available as a library</strong>. Therefore we need to make use of the Process API.</li>
<li><strong>It has a long startup time</strong>. This is because it has been written in Perl which is packed in a single self-expanding exe wrapper. As a result, we can only process about 3 files per second on a fast computer. GDI+ might be a hundred times faster. We might be able to work around this somehow by processing several files in a batch, which would require a bigger change in our program logic.</li>
<li><strong>It does not support the unicode filesystem API</strong>, so filenames which are not compatible with the current ANSI encoding cannot be opened. To work around this limitation, we read the file into memory first and then pipe it into ExifTool.</li>
</ol>
<p>Note: when using the Process API, you are given the option to redirect both <em>stdout</em> and <em>stderr</em> at the same time, which could allow for more detailed error handling/messages. However you <em>*must*</em> always read stdout and stderr in different threads to avoid a deadlock situation. For the sake of simplicity, I have ommitted error handling in this case.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.christian-etter.de/?feed=rss2&amp;p=448</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Extending SQLite functionality using .NET (UDF)</title>
		<link>http://www.christian-etter.de/?p=439</link>
		<comments>http://www.christian-etter.de/?p=439#comments</comments>
		<pubDate>Tue, 13 Apr 2010 09:05:17 +0000</pubDate>
		<dc:creator>Christian Etter</dc:creator>
				<category><![CDATA[Software Development]]></category>
		<category><![CDATA[.NET]]></category>
		<category><![CDATA[C++]]></category>
		<category><![CDATA[Callback]]></category>
		<category><![CDATA[Domain]]></category>
		<category><![CDATA[Reverse]]></category>
		<category><![CDATA[Sorting]]></category>
		<category><![CDATA[SQLite]]></category>
		<category><![CDATA[SQLiteFunction]]></category>
		<category><![CDATA[UDF]]></category>
		<category><![CDATA[Uri]]></category>
		<category><![CDATA[UriBuilder]]></category>
		<category><![CDATA[Url]]></category>

		<guid isPermaLink="false">http://www.christian-etter.de/?p=439</guid>
		<description><![CDATA[Although SQLite does not contain builtin support for directly writing user defined functions (UDF) in PL/SQL, it features a very fast and convenient way for extended functionality using your programming language of choice. Let me illustrate this by example: In this case we are looking at a string transformation algorithm that allows for sorting a [...]]]></description>
			<content:encoded><![CDATA[<p>Although SQLite does not contain builtin support for directly writing user defined functions (UDF) in PL/SQL, it features a very fast and convenient way for extended functionality using your programming language of choice. </p>
<p>Let me illustrate this by example: In this case we are looking at a string transformation algorithm that allows for sorting a Url string according to the top level domain, domain name and host name. E.g. <strong>http://www.wikipedia.de/favicon.ico</strong> should be sorted according to <strong>http://de.wikipedia.www/favicon.ico</strong>.</p>
<p>Using PL/SQL with its limited string processing functions, this can result in a lot of coding. Using .NET, it&#8217;s easy to write and surprisingly fast.</p>

<div class="wp_syntax"><div class="code"><pre class="csharp" style="font-family:monospace;"><span style="color: #000000;">&#91;</span>SQLiteFunction<span style="color: #000000;">&#40;</span> Name <span style="color: #008000;">=</span> <span style="color: #666666;">&quot;ReverseDomain&quot;</span>, Arguments <span style="color: #008000;">=</span> <span style="color: #FF0000;">1</span>, FuncType <span style="color: #008000;">=</span> FunctionType.<span style="color: #0000FF;">Scalar</span> <span style="color: #000000;">&#41;</span><span style="color: #000000;">&#93;</span>
<span style="color: #0600FF;">public</span> <span style="color: #FF0000;">class</span> ReverseDomain<span style="color: #008000;">:</span> SQLiteFunction
<span style="color: #000000;">&#123;</span>
    <span style="color: #0600FF;">public</span> <span style="color: #0600FF;">override</span> <span style="color: #FF0000;">object</span> Invoke<span style="color: #000000;">&#40;</span> <span style="color: #FF0000;">object</span><span style="color: #000000;">&#91;</span><span style="color: #000000;">&#93;</span> args <span style="color: #000000;">&#41;</span>
    <span style="color: #000000;">&#123;</span>
        <span style="color: #0600FF;">if</span> <span style="color: #000000;">&#40;</span> args <span style="color: #008000;">==</span> <span style="color: #0600FF;">null</span> <span style="color: #008000;">||</span> args.<span style="color: #0000FF;">Length</span> <span style="color: #008000;">&lt;</span> <span style="color: #FF0000;">1</span> <span style="color: #008000;">||</span> args<span style="color: #000000;">&#91;</span> <span style="color: #FF0000;">0</span> <span style="color: #000000;">&#93;</span> <span style="color: #008000;">==</span> <span style="color: #0600FF;">null</span> <span style="color: #000000;">&#41;</span>
            <span style="color: #0600FF;">return</span> null<span style="color: #008000;">;</span>
        <span style="color: #FF0000;">string</span><span style="color: #000000;">&#91;</span><span style="color: #000000;">&#93;</span> ss <span style="color: #008000;">=</span> args<span style="color: #000000;">&#91;</span> <span style="color: #FF0000;">0</span> <span style="color: #000000;">&#93;</span>.<span style="color: #0000FF;">ToString</span><span style="color: #000000;">&#40;</span><span style="color: #000000;">&#41;</span>.<span style="color: #0000FF;">Split</span><span style="color: #000000;">&#40;</span> <span style="color: #008000;">new</span> <span style="color: #FF0000;">string</span><span style="color: #000000;">&#91;</span><span style="color: #000000;">&#93;</span> <span style="color: #000000;">&#123;</span> <span style="color: #666666;">&quot;://&quot;</span>, <span style="color: #666666;">&quot;/&quot;</span> <span style="color: #000000;">&#125;</span>, <span style="color: #FF0000;">3</span> <span style="color: #000000;">&#41;</span><span style="color: #008000;">;</span>
        <span style="color: #0600FF;">if</span> <span style="color: #000000;">&#40;</span> ss.<span style="color: #0000FF;">Length</span> <span style="color: #008000;">&gt;</span> <span style="color: #FF0000;">1</span> <span style="color: #000000;">&#41;</span>
        <span style="color: #000000;">&#123;</span>
            <span style="color: #FF0000;">string</span><span style="color: #000000;">&#91;</span><span style="color: #000000;">&#93;</span> sss <span style="color: #008000;">=</span> ss<span style="color: #000000;">&#91;</span> <span style="color: #FF0000;">1</span> <span style="color: #000000;">&#93;</span>.<span style="color: #0000FF;">Split</span><span style="color: #000000;">&#40;</span> <span style="color: #008000;">new</span> <span style="color: #FF0000;">char</span><span style="color: #000000;">&#91;</span><span style="color: #000000;">&#93;</span> <span style="color: #000000;">&#123;</span> <span style="color: #666666;">'.'</span> <span style="color: #000000;">&#125;</span> <span style="color: #000000;">&#41;</span><span style="color: #008000;">;</span>
            Array.<span style="color: #0000FF;">Reverse</span><span style="color: #000000;">&#40;</span> sss <span style="color: #000000;">&#41;</span><span style="color: #008000;">;</span>
            <span style="color: #0600FF;">return</span> ss<span style="color: #000000;">&#91;</span> <span style="color: #FF0000;">0</span> <span style="color: #000000;">&#93;</span> <span style="color: #008000;">+</span> <span style="color: #666666;">&quot;://&quot;</span> <span style="color: #008000;">+</span> <span style="color: #FF0000;">String</span>.<span style="color: #0000FF;">Join</span><span style="color: #000000;">&#40;</span> <span style="color: #666666;">&quot;.&quot;</span>, sss <span style="color: #000000;">&#41;</span> <span style="color: #008000;">+</span> <span style="color: #666666;">&quot;/&quot;</span> <span style="color: #008000;">+</span> <span style="color: #000000;">&#40;</span> ss.<span style="color: #0000FF;">Length</span> <span style="color: #008000;">==</span> <span style="color: #FF0000;">3</span> <span style="color: #008000;">?</span> ss<span style="color: #000000;">&#91;</span> <span style="color: #FF0000;">2</span> <span style="color: #000000;">&#93;</span> <span style="color: #008000;">:</span> <span style="color: #FF0000;">String</span>.<span style="color: #0000FF;">Empty</span> <span style="color: #000000;">&#41;</span><span style="color: #008000;">;</span>
        <span style="color: #000000;">&#125;</span>
        <span style="color: #0600FF;">return</span> args<span style="color: #000000;">&#91;</span> <span style="color: #FF0000;">0</span> <span style="color: #000000;">&#93;</span><span style="color: #008000;">;</span>
    <span style="color: #000000;">&#125;</span>
<span style="color: #000000;">&#125;</span></pre></div></div>

<p>Alternatively, we could use the <strong>UriBuilder </strong>class:</p>

<div class="wp_syntax"><div class="code"><pre class="csharp" style="font-family:monospace;"><span style="color: #000000;">&#91;</span>SQLiteFunction<span style="color: #000000;">&#40;</span> Name <span style="color: #008000;">=</span> <span style="color: #666666;">&quot;ReverseDomain&quot;</span>, Arguments <span style="color: #008000;">=</span> <span style="color: #FF0000;">1</span>, FuncType <span style="color: #008000;">=</span> FunctionType.<span style="color: #0000FF;">Scalar</span> <span style="color: #000000;">&#41;</span><span style="color: #000000;">&#93;</span>
<span style="color: #0600FF;">public</span> <span style="color: #FF0000;">class</span> ReverseDomain<span style="color: #008000;">:</span> SQLiteFunction
<span style="color: #000000;">&#123;</span>
    <span style="color: #0600FF;">public</span> <span style="color: #0600FF;">override</span> <span style="color: #FF0000;">object</span> Invoke<span style="color: #000000;">&#40;</span> <span style="color: #FF0000;">object</span><span style="color: #000000;">&#91;</span><span style="color: #000000;">&#93;</span> args <span style="color: #000000;">&#41;</span>
    <span style="color: #000000;">&#123;</span>
        <span style="color: #0600FF;">if</span> <span style="color: #000000;">&#40;</span> args <span style="color: #008000;">==</span> <span style="color: #0600FF;">null</span> <span style="color: #008000;">||</span> args.<span style="color: #0000FF;">Length</span> <span style="color: #008000;">&lt;</span> <span style="color: #FF0000;">1</span> <span style="color: #008000;">||</span> args<span style="color: #000000;">&#91;</span> <span style="color: #FF0000;">0</span> <span style="color: #000000;">&#93;</span> <span style="color: #008000;">==</span> <span style="color: #0600FF;">null</span> <span style="color: #000000;">&#41;</span>
            <span style="color: #0600FF;">return</span> null<span style="color: #008000;">;</span>
        UriBuilder u <span style="color: #008000;">=</span> <span style="color: #008000;">new</span> UriBuilder<span style="color: #000000;">&#40;</span> args<span style="color: #000000;">&#91;</span> <span style="color: #FF0000;">0</span> <span style="color: #000000;">&#93;</span> <span style="color: #0600FF;">as</span> <span style="color: #FF0000;">string</span> <span style="color: #000000;">&#41;</span><span style="color: #008000;">;</span>
        <span style="color: #FF0000;">string</span><span style="color: #000000;">&#91;</span><span style="color: #000000;">&#93;</span> ss <span style="color: #008000;">=</span> u.<span style="color: #0000FF;">Host</span>.<span style="color: #0000FF;">Split</span><span style="color: #000000;">&#40;</span> <span style="color: #008000;">new</span> <span style="color: #FF0000;">char</span><span style="color: #000000;">&#91;</span><span style="color: #000000;">&#93;</span> <span style="color: #000000;">&#123;</span> <span style="color: #666666;">'.'</span> <span style="color: #000000;">&#125;</span> <span style="color: #000000;">&#41;</span><span style="color: #008000;">;</span>
        Array.<span style="color: #0000FF;">Reverse</span><span style="color: #000000;">&#40;</span> ss <span style="color: #000000;">&#41;</span><span style="color: #008000;">;</span>
        u.<span style="color: #0000FF;">Host</span> <span style="color: #008000;">=</span> <span style="color: #FF0000;">String</span>.<span style="color: #0000FF;">Join</span><span style="color: #000000;">&#40;</span> <span style="color: #666666;">&quot;.&quot;</span>, ss <span style="color: #000000;">&#41;</span><span style="color: #008000;">;</span>
        <span style="color: #0600FF;">return</span> u.<span style="color: #0000FF;">ToString</span><span style="color: #000000;">&#40;</span><span style="color: #000000;">&#41;</span><span style="color: #008000;">;</span>
    <span style="color: #000000;">&#125;</span>
<span style="color: #000000;">&#125;</span></pre></div></div>

<p>Where is the catch with <em>UriBuilder</em>? Compared to the custom string splitting solution it is about 5 times slower. It can transform only about 65000 urls per second on my machine, while the prior method can convert 350000 urls in the same time.</p>
<p>How is this new UDF called from SQL?</p>

<div class="wp_syntax"><div class="code"><pre class="sql" style="font-family:monospace;"><span style="color: #993333; font-weight: bold;">SELECT</span> <span style="color: #66cc66;">*</span> <span style="color: #993333; font-weight: bold;">FROM</span> UrlTable <span style="color: #993333; font-weight: bold;">ORDER</span> <span style="color: #993333; font-weight: bold;">BY</span> ReverseDomain<span style="color: #66cc66;">&#40;</span> Url <span style="color: #66cc66;">&#41;</span></pre></div></div>

]]></content:encoded>
			<wfw:commentRss>http://www.christian-etter.de/?feed=rss2&amp;p=439</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Using IEqualityComparer on Custom Types with Except()</title>
		<link>http://www.christian-etter.de/?p=429</link>
		<comments>http://www.christian-etter.de/?p=429#comments</comments>
		<pubDate>Mon, 12 Apr 2010 12:45:05 +0000</pubDate>
		<dc:creator>Christian Etter</dc:creator>
				<category><![CDATA[C#]]></category>
		<category><![CDATA[Software Development]]></category>
		<category><![CDATA[.NET]]></category>
		<category><![CDATA[Array]]></category>
		<category><![CDATA[C++]]></category>
		<category><![CDATA[Except]]></category>
		<category><![CDATA[IEnumerable]]></category>
		<category><![CDATA[IEqualityComparer]]></category>
		<category><![CDATA[Linq]]></category>
		<category><![CDATA[Optimizing]]></category>
		<category><![CDATA[SequenceEqual]]></category>

		<guid isPermaLink="false">http://www.christian-etter.de/?p=429</guid>
		<description><![CDATA[Recently I was writing about an alternative way of using Linq Distinct() on custom types which does not involve writing a custom IEqualityComparer derivate. Today there was a similar requirement, using Linq Except() for determining all elements of an IEnumerable which do not intersect with the elements of another IEnumerable. Again, there is a one [...]]]></description>
			<content:encoded><![CDATA[<p>Recently I was writing about an alternative way of using Linq <em>Distinct()</em> on custom types which does not involve writing a custom <em>IEqualityComparer</em> derivate.</p>
<p>Today there was a similar requirement, using Linq <em>Except()</em> for determining all elements of an <em>IEnumerable</em> which do not intersect with the elements of another <em>IEnumerable</em>. Again, there is a one line solution to it, which is slow on large input data:</p>

<div class="wp_syntax"><div class="code"><pre class="csharp" style="font-family:monospace;"><span style="color: #FF0000;">byte</span><span style="color: #000000;">&#91;</span><span style="color: #000000;">&#93;</span><span style="color: #000000;">&#91;</span><span style="color: #000000;">&#93;</span> hashes_old <span style="color: #008000;">=</span> <span style="color: #008080; font-style: italic;">/* an array of byte arrays containing a hash value */</span><span style="color: #008000;">;</span>
<span style="color: #FF0000;">byte</span><span style="color: #000000;">&#91;</span><span style="color: #000000;">&#93;</span><span style="color: #000000;">&#91;</span><span style="color: #000000;">&#93;</span> hashes_new <span style="color: #008000;">=</span> <span style="color: #008080; font-style: italic;">/* another array of byte arrays containing a hash value */</span><span style="color: #008000;">;</span>
<span style="color: #FF0000;">byte</span><span style="color: #000000;">&#91;</span><span style="color: #000000;">&#93;</span><span style="color: #000000;">&#91;</span><span style="color: #000000;">&#93;</span> hashes_obsolete <span style="color: #008000;">=</span> <span style="color: #000000;">&#40;</span> from x <span style="color: #0600FF;">in</span> hashes_old where hashes_new.<span style="color: #0000FF;">Any</span><span style="color: #000000;">&#40;</span> y <span style="color: #008000;">=&gt;</span> y.<span style="color: #0000FF;">SequenceEqual</span><span style="color: #000000;">&#40;</span> x <span style="color: #000000;">&#41;</span> <span style="color: #000000;">&#41;</span> <span style="color: #008000;">==</span> <span style="color: #0600FF;">false</span> select x <span style="color: #000000;">&#41;</span>.<span style="color: #0000FF;">ToArray</span><span style="color: #000000;">&#40;</span><span style="color: #000000;">&#41;</span><span style="color: #008000;">;</span></pre></div></div>

<p>We are using two arrays of 16 byte long hashes and determine which elements do not intersect. It is a more or less elegant one liner that does not require any other comparison code. Yet when we test this with larger amounts of data, it runs slow since <em>SequenceEqual()</em> has to be called for every single comparison:</p>
<p><strong>hashes_old: 18830 hashes_new: 8210 hashes_obsolete: 12228 time:  19564 ms</strong></p>
<p>Since the <em>Except()</em> method is extensively using the <em>GetHashCode()</em> override, a lot of time can be saved by properly implementing a hash function within an <em>IEqualityComparer</em> derivate.</p>

<div class="wp_syntax"><div class="code"><pre class="csharp" style="font-family:monospace;"><span style="color: #FF0000;">byte</span><span style="color: #000000;">&#91;</span><span style="color: #000000;">&#93;</span><span style="color: #000000;">&#91;</span><span style="color: #000000;">&#93;</span> hashes_old <span style="color: #008000;">=</span> <span style="color: #008080; font-style: italic;">/* an array of byte arrays containing a hash value */</span><span style="color: #008000;">;</span>
<span style="color: #FF0000;">byte</span><span style="color: #000000;">&#91;</span><span style="color: #000000;">&#93;</span><span style="color: #000000;">&#91;</span><span style="color: #000000;">&#93;</span> hashes_new <span style="color: #008000;">=</span> <span style="color: #008080; font-style: italic;">/* another array of byte arrays containing a hash value */</span><span style="color: #008000;">;</span>
<span style="color: #FF0000;">byte</span><span style="color: #000000;">&#91;</span><span style="color: #000000;">&#93;</span><span style="color: #000000;">&#91;</span><span style="color: #000000;">&#93;</span> hashes_obsolete <span style="color: #008000;">=</span> hashes_old.<span style="color: #0000FF;">Except</span><span style="color: #000000;">&#40;</span> hashes_new, <span style="color: #008000;">new</span> ByteArrayComparer<span style="color: #000000;">&#40;</span><span style="color: #000000;">&#41;</span> <span style="color: #000000;">&#41;</span>.<span style="color: #0000FF;">ToArray</span><span style="color: #000000;">&#40;</span><span style="color: #000000;">&#41;</span><span style="color: #008000;">;</span>
&nbsp;
<span style="color: #008080; font-style: italic;">/* .... */</span>
<span style="color: #0600FF;">public</span> <span style="color: #FF0000;">class</span> ByteArrayComparer <span style="color: #008000;">:</span> IEqualityComparer<span style="color: #008000;">&lt;</span><span style="color: #FF0000;">byte</span><span style="color: #000000;">&#91;</span><span style="color: #000000;">&#93;</span><span style="color: #008000;">&gt;</span>
<span style="color: #000000;">&#123;</span>
    <span style="color: #0600FF;">public</span> <span style="color: #FF0000;">bool</span> Equals<span style="color: #000000;">&#40;</span> <span style="color: #FF0000;">byte</span><span style="color: #000000;">&#91;</span><span style="color: #000000;">&#93;</span> a, <span style="color: #FF0000;">byte</span><span style="color: #000000;">&#91;</span><span style="color: #000000;">&#93;</span> b <span style="color: #000000;">&#41;</span>
    <span style="color: #000000;">&#123;</span>
        <span style="color: #0600FF;">if</span> <span style="color: #000000;">&#40;</span> a <span style="color: #008000;">==</span> <span style="color: #0600FF;">null</span> <span style="color: #008000;">||</span> b <span style="color: #008000;">==</span> <span style="color: #0600FF;">null</span> <span style="color: #000000;">&#41;</span>
            <span style="color: #0600FF;">return</span> a <span style="color: #008000;">==</span> b<span style="color: #008000;">;</span>
        <span style="color: #0600FF;">return</span> a.<span style="color: #0000FF;">SequenceEqual</span><span style="color: #000000;">&#40;</span> b <span style="color: #000000;">&#41;</span><span style="color: #008000;">;</span>
    <span style="color: #000000;">&#125;</span>
    <span style="color: #0600FF;">public</span> <span style="color: #FF0000;">int</span> GetHashCode<span style="color: #000000;">&#40;</span> <span style="color: #FF0000;">byte</span><span style="color: #000000;">&#91;</span><span style="color: #000000;">&#93;</span> x <span style="color: #000000;">&#41;</span>
    <span style="color: #000000;">&#123;</span>
        <span style="color: #0600FF;">if</span> <span style="color: #000000;">&#40;</span> x <span style="color: #008000;">==</span> <span style="color: #0600FF;">null</span> <span style="color: #000000;">&#41;</span>
            <span style="color: #0600FF;">throw</span> <span style="color: #008000;">new</span> ArgumentNullException<span style="color: #000000;">&#40;</span><span style="color: #000000;">&#41;</span><span style="color: #008000;">;</span>
        <span style="color: #FF0000;">int</span> iHash <span style="color: #008000;">=</span> <span style="color: #FF0000;">0</span><span style="color: #008000;">;</span>
        <span style="color: #0600FF;">for</span> <span style="color: #000000;">&#40;</span> <span style="color: #FF0000;">int</span> i <span style="color: #008000;">=</span> <span style="color: #FF0000;">0</span><span style="color: #008000;">;</span> i <span style="color: #008000;">&lt;</span> x.<span style="color: #0000FF;">Length</span><span style="color: #008000;">;</span> <span style="color: #008000;">++</span>i <span style="color: #000000;">&#41;</span>
            iHash <span style="color: #008000;">^=</span> <span style="color: #000000;">&#40;</span> x<span style="color: #000000;">&#91;</span> i <span style="color: #000000;">&#93;</span> <span style="color: #008000;">&lt;&lt;</span> <span style="color: #000000;">&#40;</span> <span style="color: #000000;">&#40;</span> 0x03 <span style="color: #008000;">&amp;</span> i <span style="color: #000000;">&#41;</span> <span style="color: #008000;">&lt;&lt;</span> <span style="color: #FF0000;">3</span> <span style="color: #000000;">&#41;</span> <span style="color: #000000;">&#41;</span><span style="color: #008000;">;</span>
        <span style="color: #0600FF;">return</span> iHash<span style="color: #008000;">;</span>
    <span style="color: #000000;">&#125;</span>
<span style="color: #000000;">&#125;</span></pre></div></div>

<p><strong>hashes_old: 18830 hashes_new: 8210 hashes_obsolete: 12228 time: 14 ms</strong></p>
<p>Same result, but instead of 19 seconds we only need 16 milliseconds, that is 1400 times faster!</p>
<p>What happens is that the result of GetHashCode() is used for each comparison. When two arrays have the same hash code, Linq calls the Equals function to ensure both are really equal (there has been no hash collision). So the main speedup is realized by writing a low-collision hashing function.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.christian-etter.de/?feed=rss2&amp;p=429</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Limiting ASP.NET GridView Text</title>
		<link>http://www.christian-etter.de/?p=424</link>
		<comments>http://www.christian-etter.de/?p=424#comments</comments>
		<pubDate>Thu, 11 Mar 2010 21:58:52 +0000</pubDate>
		<dc:creator>Christian Etter</dc:creator>
				<category><![CDATA[C#]]></category>
		<category><![CDATA[Software Development]]></category>
		<category><![CDATA[AJAX]]></category>
		<category><![CDATA[ASP.NET]]></category>
		<category><![CDATA[C++]]></category>
		<category><![CDATA[CSS]]></category>
		<category><![CDATA[Custom Control]]></category>
		<category><![CDATA[Ellipses]]></category>
		<category><![CDATA[Extension]]></category>

		<guid isPermaLink="false">http://www.christian-etter.de/?p=424</guid>
		<description><![CDATA[The ASP.NET GridView control offers a large amount of parameters for customization. Yet in some cases we need to go further for realizing specific goals. One such feature was needed by a customer who had to display text in a grid view that would easily exceed the amount of column space. What are the possible [...]]]></description>
			<content:encoded><![CDATA[<p>The ASP.NET GridView control offers a large amount of parameters for customization. Yet in some cases we need to go further for realizing specific goals.<br />
One such feature was needed by a customer who had to display text in a grid view that would easily exceed the amount of column space. What are the possible solutions here?</p>
<ol>
<li>The most basic and least flexible would be to truncate any text exceeding a set limit of characters by applying <strong>SUBSTRING </strong>in the database query.</li>
<li>The other option would be using a <strong>TemplateField</strong>, which results in a lot of coding overhead, especially if you have to incorporate markup for the insert and update events as well.</li>
<li>Depending on your needs, using the CSS <strong>text-overflow:ellipsis</strong> could be a compact solution, yet it is only supported by Internet Explorer (some kind of workaround for FireFox exists).</li>
<li>Perhaps the most elegant approach would be the subclassing of the existing <strong>BoundField</strong>, which already has all the features ready for inserting and updating. Although we only print the first few characters of the column text, we would give the user an additional feature at hand that allows him to view the whole text by hovering the cursor over the short text.
</li>
</ol>
<p>The class <strong>EllipsisTextField </strong>extends <strong>BoundField </strong>by one property that reflects the maximum abount of characters to be displayed. If left empty, the control behaves identical with the BoundField class, so it is backwards compatible.</p>

<div class="wp_syntax"><div class="code"><pre class="csharp" style="font-family:monospace;"><span style="color: #0600FF;">public</span> <span style="color: #FF0000;">class</span> EllipsisTextField <span style="color: #008000;">:</span> BoundField
<span style="color: #000000;">&#123;</span>
    <span style="color: #0600FF;">public</span> <span style="color: #FF0000;">int</span><span style="color: #008000;">?</span> MaxChars
    <span style="color: #000000;">&#123;</span>
        get <span style="color: #000000;">&#123;</span> <span style="color: #0600FF;">return</span> <span style="color: #0600FF;">this</span>.<span style="color: #0000FF;">ViewState</span><span style="color: #000000;">&#91;</span> <span style="color: #666666;">&quot;MaxChars&quot;</span> <span style="color: #000000;">&#93;</span> <span style="color: #0600FF;">as</span> <span style="color: #FF0000;">int</span><span style="color: #008000;">?;</span> <span style="color: #000000;">&#125;</span>
        set <span style="color: #000000;">&#123;</span> <span style="color: #0600FF;">this</span>.<span style="color: #0000FF;">ViewState</span><span style="color: #000000;">&#91;</span> <span style="color: #666666;">&quot;MaxChars&quot;</span> <span style="color: #000000;">&#93;</span> <span style="color: #008000;">=</span> value<span style="color: #008000;">;</span> <span style="color: #000000;">&#125;</span>
    <span style="color: #000000;">&#125;</span>
&nbsp;
    <span style="color: #0600FF;">protected</span> <span style="color: #0600FF;">override</span> <span style="color: #FF0000;">string</span> FormatDataValue<span style="color: #000000;">&#40;</span> <span style="color: #FF0000;">object</span> dataValue, <span style="color: #FF0000;">bool</span> encode <span style="color: #000000;">&#41;</span>
    <span style="color: #000000;">&#123;</span>
        <span style="color: #FF0000;">string</span> sLong <span style="color: #008000;">=</span> dataValue <span style="color: #0600FF;">as</span> string<span style="color: #008000;">;</span>
        <span style="color: #0600FF;">if</span> <span style="color: #000000;">&#40;</span> <span style="color: #0600FF;">this</span>.<span style="color: #0000FF;">MaxChars</span>.<span style="color: #0000FF;">HasValue</span> <span style="color: #008000;">&amp;&amp;</span> <span style="color: #008000;">!</span><span style="color: #FF0000;">String</span>.<span style="color: #0000FF;">IsNullOrEmpty</span><span style="color: #000000;">&#40;</span> sLong <span style="color: #000000;">&#41;</span><span style="color: #000000;">&#41;</span>
        <span style="color: #000000;">&#123;</span>
            <span style="color: #FF0000;">string</span> sShort <span style="color: #008000;">=</span> sLong<span style="color: #008000;">;</span>
            <span style="color: #0600FF;">if</span> <span style="color: #000000;">&#40;</span> sLong.<span style="color: #0000FF;">Length</span> <span style="color: #008000;">&gt;</span> <span style="color: #0600FF;">this</span>.<span style="color: #0000FF;">MaxChars</span>.<span style="color: #0000FF;">Value</span> <span style="color: #000000;">&#41;</span>
            <span style="color: #000000;">&#123;</span>
                sShort <span style="color: #008000;">=</span> sLong.<span style="color: #0000FF;">Substring</span><span style="color: #000000;">&#40;</span> <span style="color: #FF0000;">0</span>, <span style="color: #0600FF;">this</span>.<span style="color: #0000FF;">MaxChars</span>.<span style="color: #0000FF;">Value</span> <span style="color: #000000;">&#41;</span> <span style="color: #008000;">+</span> <span style="color: #666666;">&quot;...&quot;</span><span style="color: #008000;">;</span>
                sLong <span style="color: #008000;">=</span> HttpUtility.<span style="color: #0000FF;">HtmlEncode</span><span style="color: #000000;">&#40;</span> sLong <span style="color: #000000;">&#41;</span><span style="color: #008000;">;</span>
                sShort <span style="color: #008000;">=</span> HttpUtility.<span style="color: #0000FF;">HtmlEncode</span><span style="color: #000000;">&#40;</span> sShort <span style="color: #000000;">&#41;</span><span style="color: #008000;">;</span>
&nbsp;
                dataValue <span style="color: #008000;">=</span> <span style="color: #666666;">&quot;&lt;div title=<span style="color: #008080; font-weight: bold;">\&quot;</span>&quot;</span> <span style="color: #008000;">+</span> sLong <span style="color: #008000;">+</span> <span style="color: #666666;">&quot;<span style="color: #008080; font-weight: bold;">\&quot;</span> style=<span style="color: #008080; font-weight: bold;">\&quot;</span>white-space: nowrap;<span style="color: #008080; font-weight: bold;">\&quot;</span>&gt;&quot;</span> <span style="color: #008000;">+</span> sShort <span style="color: #008000;">+</span> <span style="color: #666666;">&quot;&lt;/div&gt;&quot;</span><span style="color: #008000;">;</span>
                <span style="color: #0600FF;">return</span> <span style="color: #0600FF;">base</span>.<span style="color: #0000FF;">FormatDataValue</span><span style="color: #000000;">&#40;</span> dataValue, <span style="color: #0600FF;">false</span> <span style="color: #000000;">&#41;</span><span style="color: #008000;">;</span>
            <span style="color: #000000;">&#125;</span>
        <span style="color: #000000;">&#125;</span>
        <span style="color: #0600FF;">return</span> <span style="color: #0600FF;">base</span>.<span style="color: #0000FF;">FormatDataValue</span><span style="color: #000000;">&#40;</span> dataValue, encode <span style="color: #000000;">&#41;</span><span style="color: #008000;">;</span>
    <span style="color: #000000;">&#125;</span>
<span style="color: #000000;">&#125;</span></pre></div></div>

<p>We override the <em>FormatDataValue </em>method in order to execute our own rendering code. Inside, we truncate the original string if necessary, add ellipses and then add short and long versions to a DIV. When set as the title, any mouse over action will render the full code as a tooltip.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.christian-etter.de/?feed=rss2&amp;p=424</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>How to optimize reading of large amounts of files</title>
		<link>http://www.christian-etter.de/?p=372</link>
		<comments>http://www.christian-etter.de/?p=372#comments</comments>
		<pubDate>Sun, 21 Feb 2010 19:10:16 +0000</pubDate>
		<dc:creator>Christian Etter</dc:creator>
				<category><![CDATA[C++]]></category>
		<category><![CDATA[Software Development]]></category>
		<category><![CDATA[DeviceIoControl]]></category>
		<category><![CDATA[FAT]]></category>
		<category><![CDATA[FAT32]]></category>
		<category><![CDATA[File System]]></category>
		<category><![CDATA[FSCTL_GET_RETRIEVAL_POINTERS]]></category>
		<category><![CDATA[NTFS]]></category>
		<category><![CDATA[Performance]]></category>
		<category><![CDATA[Win32]]></category>

		<guid isPermaLink="false">http://www.christian-etter.de/?p=372</guid>
		<description><![CDATA[Conventional hard drives keep getting faster. While 10 years ago an average transfer speed of 40 MiB/sec was considered &#8216;fast&#8217;, today there are mainstream hard drives which deliver average transfer speeds beyond 100 MiB/sec. So far, so good. What hasn&#8217;t increased much is the average access time. Especially when reading large numbers of relatively small [...]]]></description>
			<content:encoded><![CDATA[<p>Conventional hard drives keep getting faster. While 10 years ago an average transfer speed of 40 MiB/sec was considered &#8216;fast&#8217;, today there are mainstream hard drives which deliver average transfer speeds beyond 100 MiB/sec.<br />
So far, so good. What hasn&#8217;t increased much is the average access time. Especially when reading large numbers of relatively small files, a lot of time is spent just waiting for the hard disk to position its arm over the relevant data. No big change compared to a decade ago.</p>
<p>Note: In case you are developing software that exclusively operates on Solid State Disks (SSD) you may stop reading now&#8230;</p>
<p>The following idea has been developed for an image indexing engine, which is supposed to scan a large amount of images and extract information within the least possible amount of time. The entire indexing operation was disk-bound. For the main test case, a hard disk with roughly 100,000 image files was chosen.</p>
<p>In order to minimize the time spent waiting for each file being read, it would be worthwhile to read all files in the same physical order in which they are residing on the hard disk. So the key information we need is the physical location of each file on the hard disk.</p>
<p>For this purpose, the <em>FSCTL_GET_RETRIEVAL_POINTERS</em> operation can be used:</p>

<div class="wp_syntax"><div class="code"><pre class="cpp" style="font-family:monospace;"><span style="color: #0000ff;">struct</span> MyFile <span style="color: #008000;">&#123;</span> PTSTR sFilename<span style="color: #008080;">;</span> DWORD64 Lcn<span style="color: #008080;">;</span> DWORD dwFragments<span style="color: #008080;">;</span> <span style="color: #008000;">&#125;</span><span style="color: #008080;">;</span>
&nbsp;
<span style="color: #0000ff;">void</span> GetStartCluster<span style="color: #008000;">&#40;</span> PCTSTR sFilename, MyFile<span style="color: #000040;">&amp;</span> f <span style="color: #008000;">&#41;</span>
<span style="color: #008000;">&#123;</span>
    f.<span style="color: #007788;">sFilename</span> <span style="color: #000080;">=</span> sFilename<span style="color: #008080;">;</span>
    f.<span style="color: #007788;">dwFragments</span> <span style="color: #000080;">=</span> f.<span style="color: #007788;">Lcn</span> <span style="color: #000080;">=</span> <span style="color: #0000dd;">0</span><span style="color: #008080;">;</span>
&nbsp;
    HANDLE hFile <span style="color: #000080;">=</span> CreateFile<span style="color: #008000;">&#40;</span> sFilename, GENERIC_READ, FILE_SHARE_READ, <span style="color: #0000ff;">NULL</span>, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, <span style="color: #0000ff;">NULL</span> <span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
    <span style="color: #0000ff;">if</span> <span style="color: #008000;">&#40;</span> INVALID_HANDLE_VALUE <span style="color: #000040;">!</span><span style="color: #000080;">=</span> hFile <span style="color: #008000;">&#41;</span>
    <span style="color: #008000;">&#123;</span>
        <span style="color: #0000ff;">const</span> DWORD MAX_CLUSTERS <span style="color: #000080;">=</span> <span style="color: #0000dd;">100</span><span style="color: #008080;">;</span> <span style="color: #666666;">// actually we only care about the first cluster number</span>
        <span style="color: #0000ff;">const</span> DWORD RETRIEVAL_POINTERS_BUFFER_SIZE <span style="color: #000080;">=</span> <span style="color: #0000dd;">sizeof</span><span style="color: #008000;">&#40;</span> RETRIEVAL_POINTERS_BUFFER <span style="color: #008000;">&#41;</span> <span style="color: #000040;">+</span> <span style="color: #008000;">&#40;</span> <span style="color: #0000dd;">2</span> <span style="color: #000040;">*</span> <span style="color: #008000;">&#40;</span> MAX_CLUSTERS <span style="color: #000040;">-</span> <span style="color: #0000dd;">1</span> <span style="color: #008000;">&#41;</span> <span style="color: #000040;">*</span> <span style="color: #0000dd;">sizeof</span><span style="color: #008000;">&#40;</span> LARGE_INTEGER <span style="color: #008000;">&#41;</span> <span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
        BYTE output<span style="color: #008000;">&#91;</span> RETRIEVAL_POINTERS_BUFFER_SIZE <span style="color: #008000;">&#93;</span><span style="color: #008080;">;</span>
&nbsp;
        STARTING_VCN_INPUT_BUFFER input<span style="color: #008080;">;</span>
        input.<span style="color: #007788;">StartingVcn</span>.<span style="color: #007788;">QuadPart</span> <span style="color: #000080;">=</span> <span style="color: #0000dd;">0</span><span style="color: #008080;">;</span>
&nbsp;
        DWORD dwBytesReturned<span style="color: #008080;">;</span>
        <span style="color: #0000ff;">if</span> <span style="color: #008000;">&#40;</span> DeviceIoControl<span style="color: #008000;">&#40;</span> hFile, FSCTL_GET_RETRIEVAL_POINTERS, <span style="color: #000040;">&amp;</span>input, <span style="color: #0000dd;">sizeof</span><span style="color: #008000;">&#40;</span> input <span style="color: #008000;">&#41;</span>, <span style="color: #000040;">&amp;</span>output, <span style="color: #0000dd;">sizeof</span><span style="color: #008000;">&#40;</span> output <span style="color: #008000;">&#41;</span>, <span style="color: #000040;">&amp;</span>dwBytesReturned, <span style="color: #0000ff;">NULL</span> <span style="color: #008000;">&#41;</span> <span style="color: #008000;">&#41;</span>
        <span style="color: #008000;">&#123;</span>
            RETRIEVAL_POINTERS_BUFFER<span style="color: #000040;">*</span> p <span style="color: #000080;">=</span> <span style="color: #008000;">&#40;</span>RETRIEVAL_POINTERS_BUFFER<span style="color: #000040;">*</span><span style="color: #008000;">&#41;</span><span style="color: #000040;">&amp;</span>output<span style="color: #008080;">;</span>
            <span style="color: #0000ff;">if</span> <span style="color: #008000;">&#40;</span> p<span style="color: #000040;">-</span><span style="color: #000080;">&gt;</span>ExtentCount <span style="color: #000080;">&gt;</span> <span style="color: #0000dd;">0</span> <span style="color: #000040;">&amp;&amp;</span> p<span style="color: #000040;">-</span><span style="color: #000080;">&gt;</span>StartingVcn.<span style="color: #007788;">QuadPart</span> <span style="color: #000080;">==</span> <span style="color: #0000dd;">0</span> <span style="color: #008000;">&#41;</span>
            <span style="color: #008000;">&#123;</span>
                f.<span style="color: #007788;">dwFragments</span> <span style="color: #000080;">=</span> p<span style="color: #000040;">-</span><span style="color: #000080;">&gt;</span>ExtentCount<span style="color: #008080;">;</span>
                f.<span style="color: #007788;">Lcn</span> <span style="color: #000080;">=</span> p<span style="color: #000040;">-</span><span style="color: #000080;">&gt;</span>Extents<span style="color: #008000;">&#91;</span> <span style="color: #0000dd;">0</span> <span style="color: #008000;">&#93;</span>.<span style="color: #007788;">Lcn</span>.<span style="color: #007788;">QuadPart</span><span style="color: #008080;">;</span>
            <span style="color: #008000;">&#125;</span>
            <span style="color: #0000ff;">else</span>
                _tprintf<span style="color: #008000;">&#40;</span> _T<span style="color: #008000;">&#40;</span><span style="color: #FF0000;">&quot;Error: ExtentCount = %d StartingVcn = %I64d<span style="color: #000099; font-weight: bold;">\r</span><span style="color: #000099; font-weight: bold;">\n</span>&quot;</span><span style="color: #008000;">&#41;</span>, p<span style="color: #000040;">-</span><span style="color: #000080;">&gt;</span>ExtentCount, p<span style="color: #000040;">-</span><span style="color: #000080;">&gt;</span>StartingVcn.<span style="color: #007788;">QuadPart</span> <span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
        <span style="color: #008000;">&#125;</span>
        <span style="color: #0000ff;">else</span>
            _tprintf<span style="color: #008000;">&#40;</span> _T<span style="color: #008000;">&#40;</span><span style="color: #FF0000;">&quot;Error: DeviceIoControl = 0x%08X<span style="color: #000099; font-weight: bold;">\r</span><span style="color: #000099; font-weight: bold;">\n</span>&quot;</span><span style="color: #008000;">&#41;</span>, GetLastError<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span> <span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span>
        CloseHandle<span style="color: #008000;">&#40;</span> hFile <span style="color: #008000;">&#41;</span><span style="color: #008080;">;</span> 
    <span style="color: #008000;">&#125;</span>
<span style="color: #008000;">&#125;</span></pre></div></div>

<p>Initially, the cluster number and the number of extents (fragments) is retrieved for every file. In a second step, all files are sorted according to the number of fragments and their cluster number. Then they are processed in exactly this new order which leads to a minimized amount of disk &#8216;thrashing&#8217;.<br />
The actual retrieval of file locations using DeviceIoControl is relatively fast, since it only queries information from the file system and not from the file&#8217;s data location. Typically this operation is CPU bound and consumes less than 5% of all indexing time. </p>
<p>Taken into account the time needed to retrieve position data and sort files, this approach can save between 20 and 50% of read time, depending on average file size, fragmentation and spacial data distribution. There is no benefit from this method if the majority of files is fragmented. However, fragmentation is more likey to happen with larger files, and the average file size in this case was only 149 KiB. Generally speaking, the performance gain this method can provide increases with decreasing file sizes.</p>
<h2>Benchmarking</h2>
<p></p>
<ul>
<li>Hard disk: 1 TB 7200 rpm SATA II, avg. access time: 17 ms</li>
<li>Windows XP Professional 32 bit booted in protected mode</li>
<li>Number of files scanned: <strong>87,418</strong></li>
<li>Total file size: <strong>12.46 GiB</strong></li>
<li>Average file size: <strong>149 KiB</strong></li>
<li>Time for reading files in filesystem order: <strong>506 seconds</strong></li>
<li>Average throughput reading files in filesystem order: <strong>25.24 MiB/s</strong></li>
<li>Time for reading file position data: <strong>12 seconds</strong></li>
<li>Time for sorting files according to their position: 54 ms</li>
<li>Time for reading files in sorted order: <strong>347 seconds</strong></li>
<li>Total time using new approach: <strong>359 seconds</strong></li>
<li>Total throughput using new approach: <strong>35.58 MiB/s</strong></li>
<li>Performance increase: <strong>147 seconds</strong> or <strong>29 %</strong></li>
<li>Reading files in filesystem order takes <strong> 1.42 times as long</strong> as a pre-sorted read</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.christian-etter.de/?feed=rss2&amp;p=372</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
