Retrieving Image Meta-Data using GDI+ and ExifTool
How to read image meta data in .NET? Here we illustrate two techniques:
First, for the sake of speed and simplicity, we chose the GDI+ builtin capabilities of the Image class:
/// <summary>Much faster than using Exiftool. In case GDI+ cannot decode the date string we use Exiftool.</summary> private DateTime? GetOriginalDate( string sFileName ) { using ( FileStream stream = new FileStream( sFileName, FileMode.Open, FileAccess.Read ) ) { using ( Image img = Image.FromStream( stream, false, false ) ) { int[] date_tags = new int[] { 36867, 36868, 306 }; // tag numbers with dates string[] s1 = ( from x in date_tags where img.PropertyIdList.Contains( x ) select Encoding.ASCII.GetString( img.GetPropertyItem( x ).Value ).Replace( "\0", "" )).ToArray(); // get date as string without training \0 DateTime d; DateTime?[] dd = ( from x in s1 where x.Trim().Length > 0 select DateTime.TryParseExact( x, new string[] { "yyyy:MM:dd HH:mm:ss", "yyyy-MM-dd HH:mm:ss", "MM/dd/yyyy HH:mm:ss", "yyyy-MM-dd'T'HH:mm:sszzz" }, CultureInfo.InvariantCulture, DateTimeStyles.AllowWhiteSpaces | DateTimeStyles.AssumeLocal, out d ) ? d as DateTime? : null ).ToArray(); // we see if we can parse all the date attributes found if ( dd.Where( x => !x.HasValue ).Count() > 0 ) return GetOldestExifDateExifTool( sFileName ); // if there is something in the date attribute we cannot parse we ask exiftool. else return ( from x in dd where x.Value > new DateTime( 1990, 01, 01 ) && x.Value < DateTime.UtcNow select x ).Min(); // make sure we use a valid date range } } } |
Since the EXIF standard defines date values to be stored as text data, sometimes we find non-standard date formats. This includes dates being stored with milliseconds added, using different separator characters or including an additional UTC offset. Exiftool does a pretty decent job interpreting all those values as a date, plus it might be capable of reading certain off-standard or broken meta-data which GDI+ doesn’t.
Here is the second approach:
/// <summary>Extracts the oldest possible EXIF date. Can process 3 files per second, very slow, will need 8 hours for 90.000 files.</summary> private static DateTime? GetOldestExifDateExifTool( string sFile ) { Process oP = new Process(); oP.EnableRaisingEvents = false; oP.StartInfo.CreateNoWindow = true; oP.StartInfo.LoadUserProfile = false; oP.StartInfo.RedirectStandardError = false; oP.StartInfo.RedirectStandardOutput = true; oP.StartInfo.RedirectStandardInput = true; oP.StartInfo.StandardErrorEncoding = null; oP.StartInfo.StandardOutputEncoding = Encoding.UTF8; oP.StartInfo.UseShellExecute = false; oP.StartInfo.WindowStyle = ProcessWindowStyle.Hidden; oP.StartInfo.FileName = @"exiftool.exe"; oP.StartInfo.Arguments = "-s -s -EXIF:ModifyDate -EXIF:DateTimeOriginal -EXIF:CreateDate -d \"%Y-%m-%d %H:%M:%S\" -"; oP.Start(); byte[] image = File.ReadAllBytes( sFile ); oP.StandardInput.BaseStream.Write( image, 0, image.Length ); oP.StandardInput.BaseStream.Flush(); oP.StandardInput.BaseStream.Close(); string sStdOut = oP.StandardOutput.ReadToEnd(); oP.WaitForExit(); string[] datetags = new string[] { "DateTimeOriginal", "CreateDate", "ModifyDate" }; string[] res1 = sStdOut.Split( new string[] { "\r\n" }, StringSplitOptions.RemoveEmptyEntries ); // split lines string[][] res2 = ( from x in res1 select x.Split( new char[] { ':' }, 2 ) ).ToArray(); // split after colon to separate attributes and values string[] res3 = ( from x in res2 where x.Length == 2 && datetags.Contains( x[ 0 ], StringComparer.InvariantCultureIgnoreCase ) select x[ 1 ] ).ToArray(); // only chose lines of date attributes DateTime d; DateTime?[] dd = ( from x in res3 select DateTime.TryParseExact( x, "yyyy-MM-dd HH:mm:ss", CultureInfo.InvariantCulture, DateTimeStyles.AllowWhiteSpaces | DateTimeStyles.AssumeLocal, out d ) ? d as DateTime? : null ).ToArray(); DateTime? oDate = ( from x in dd where x.HasValue && x > new DateTime( 1990, 01, 01 ) && x < DateTime.UtcNow select x ).Min(); return oDate; } |
Basically Exiftool has three shortcomings when used within another program:
- It is not available as a library. Therefore we need to make use of the Process API.
- It has a long startup time. This is because it has been written in Perl which is packed in a single self-expanding exe wrapper. As a result, we can only process about 3 files per second on a fast computer. GDI+ might be a hundred times faster. We might be able to work around this somehow by processing several files in a batch, which would require a bigger change in our program logic.
- It does not support the unicode filesystem API, so filenames which are not compatible with the current ANSI encoding cannot be opened. To work around this limitation, we read the file into memory first and then pipe it into ExifTool.
Note: when using the Process API, you are given the option to redirect both stdout and stderr at the same time, which could allow for more detailed error handling/messages. However you *must* always read stdout and stderr in different threads to avoid a deadlock situation. For the sake of simplicity, I have ommitted error handling in this case.
Good post.
In the ExifTool example you are eating the overhead of re-compiling exiftool for each processed image. You could speed things up if you could pass all the file names to exiftool at once. Also, adding the -fast option may help.
- Phil
Appreciate your feedback Phil.
I am going to do some more research on howto speed things up using batch processing, my guess is that it would work best when parsing the Json output.
The two main reasons why I am not working with batch processing are:
- It would break on my system, since I am using a lot of unicode-only file names. Using 8.3 file names is not 100% reliable.
- It requires a major rewrite of the program logic from one-by-one to all-at-once processing which makes it difficult to show progress and handle errors. Perhaps it could be worthwhile to process large result sets in several smaller batches.
Yes, I understand how a simple one-call-per-file could be a lot easier. I regret the Unicode problems, but I haven’t found a reasonable work-around for this. Also I should have specified -fast2 instead of -fast since it seems you are not interested in the makernotes tags. I notice that exiftool runs more and more slowly for each new makernote tag that I decode, but you can avoid this whole slowdown by ignoring the makernote information. On my system here the -fast2 option gives me more than a 2x speed increase for JPEG images from Canon cameras (when batch processing).
- Phil