Archive

Posts Tagged ‘EXIF’

Retrieving Image Meta-Data using GDI+ and ExifTool

April 14th, 2010 Christian Etter 3 comments

How to read image meta data in .NET? Here we illustrate two techniques:

First, for the sake of speed and simplicity, we chose the GDI+ builtin capabilities of the Image class:

/// <summary>Much faster than using Exiftool. In case GDI+ cannot decode the date string we use Exiftool.</summary>
private DateTime? GetOriginalDate( string sFileName )
{
    using ( FileStream stream = new FileStream( sFileName, FileMode.Open, FileAccess.Read ) )
    {
        using ( Image img = Image.FromStream( stream, false, false ) )
        {
            int[] date_tags = new int[] { 36867, 36868, 306 }; // tag numbers with dates
            string[] s1 = ( from x in date_tags where img.PropertyIdList.Contains( x ) select Encoding.ASCII.GetString( img.GetPropertyItem( x ).Value ).Replace( "\0", "" )).ToArray(); // get date as string without training \0
            DateTime d;
            DateTime?[] dd = ( from x in s1 where x.Trim().Length > 0
                select DateTime.TryParseExact( x, new string[] { "yyyy:MM:dd HH:mm:ss", "yyyy-MM-dd HH:mm:ss", "MM/dd/yyyy HH:mm:ss", "yyyy-MM-dd'T'HH:mm:sszzz" }, CultureInfo.InvariantCulture, DateTimeStyles.AllowWhiteSpaces | DateTimeStyles.AssumeLocal, out d ) ? d as DateTime? : 
                null ).ToArray(); // we see if we can parse all the date attributes found
            if ( dd.Where( x => !x.HasValue ).Count() > 0 )
                return GetOldestExifDateExifTool( sFileName ); // if there is something in the date attribute we cannot parse we ask exiftool.
            else
                return ( from x in dd where x.Value > new DateTime( 1990, 01, 01 ) && x.Value < DateTime.UtcNow select x ).Min(); // make sure we use a valid date range
        }
    }
}

Since the EXIF standard defines date values to be stored as text data, sometimes we find non-standard date formats. This includes dates being stored with milliseconds added, using different separator characters or including an additional UTC offset. Exiftool does a pretty decent job interpreting all those values as a date, plus it might be capable of reading certain off-standard or broken meta-data which GDI+ doesn’t.

Here is the second approach:

/// <summary>Extracts the oldest possible EXIF date. Can process 3 files per second, very slow, will need 8 hours for 90.000 files.</summary>
private static DateTime? GetOldestExifDateExifTool( string sFile )
{
    Process oP = new Process();
    oP.EnableRaisingEvents = false;
    oP.StartInfo.CreateNoWindow = true;
    oP.StartInfo.LoadUserProfile = false;
    oP.StartInfo.RedirectStandardError = false;
    oP.StartInfo.RedirectStandardOutput = true;
    oP.StartInfo.RedirectStandardInput = true;
    oP.StartInfo.StandardErrorEncoding = null;
    oP.StartInfo.StandardOutputEncoding = Encoding.UTF8;
    oP.StartInfo.UseShellExecute = false;
    oP.StartInfo.WindowStyle = ProcessWindowStyle.Hidden;
    oP.StartInfo.FileName = @"exiftool.exe";
    oP.StartInfo.Arguments = "-s -s -EXIF:ModifyDate -EXIF:DateTimeOriginal -EXIF:CreateDate -d \"%Y-%m-%d %H:%M:%S\" -";
    oP.Start();
 
    byte[] image = File.ReadAllBytes( sFile );
    oP.StandardInput.BaseStream.Write( image, 0, image.Length );
    oP.StandardInput.BaseStream.Flush();
    oP.StandardInput.BaseStream.Close();
    string sStdOut = oP.StandardOutput.ReadToEnd();
    oP.WaitForExit();
 
    string[] datetags = new string[] { "DateTimeOriginal", "CreateDate", "ModifyDate" };
    string[] res1 = sStdOut.Split( new string[] { "\r\n" }, StringSplitOptions.RemoveEmptyEntries ); // split lines
    string[][] res2 = ( from x in res1 select x.Split( new char[] { ':' }, 2 ) ).ToArray(); // split after colon to separate attributes and values
    string[] res3 = ( from x in res2 where x.Length == 2 && datetags.Contains( x[ 0 ], StringComparer.InvariantCultureIgnoreCase ) select x[ 1 ] ).ToArray(); // only chose lines of date attributes
    DateTime d;
    DateTime?[] dd = ( from x in res3 select DateTime.TryParseExact( x, "yyyy-MM-dd HH:mm:ss", CultureInfo.InvariantCulture, DateTimeStyles.AllowWhiteSpaces | DateTimeStyles.AssumeLocal, out d ) ? d as DateTime? : null ).ToArray();
    DateTime? oDate = ( from x in dd where x.HasValue && x > new DateTime( 1990, 01, 01 ) && x < DateTime.UtcNow select x ).Min();
    return oDate;
}

Basically Exiftool has three shortcomings when used within another program:

  1. It is not available as a library. Therefore we need to make use of the Process API.
  2. It has a long startup time. This is because it has been written in Perl which is packed in a single self-expanding exe wrapper. As a result, we can only process about 3 files per second on a fast computer. GDI+ might be a hundred times faster. We might be able to work around this somehow by processing several files in a batch, which would require a bigger change in our program logic.
  3. It does not support the unicode filesystem API, so filenames which are not compatible with the current ANSI encoding cannot be opened. To work around this limitation, we read the file into memory first and then pipe it into ExifTool.

Note: when using the Process API, you are given the option to redirect both stdout and stderr at the same time, which could allow for more detailed error handling/messages. However you *must* always read stdout and stderr in different threads to avoid a deadlock situation. For the sake of simplicity, I have ommitted error handling in this case.