Archive

Posts Tagged ‘Json’

Optimized Reading of Meta-Data using ExifTool (Unicode-Proof!)

April 22nd, 2010 Christian Etter No comments

Today we are going to look at how to work around the lack of Unicode support in ExifTool.

In my last post, I have already been talking about a safe way of handling Unicode file/path names, which was rather slow unfortunately. In this post I would like to elaborate on how to combine this with a fast reading approach using .NET.

I have chosen to give examples using C# code in these series, since it allows me to demonstrate my ideas in a very compact way. However the general approach is compatible with many programming languages and therefore not a .NET only solution.

Basically we are combining a batch read using ExifTool with a single file read operation for incompatible file names. In optimal circumstances, i.e. when all file names are convertible, this method performs as fast as ExifTool can be. Worst case would be reading all files one by one, which has a bigger performance penalty.

Prior to processing any files, we have to divide all file names into compatible and incompatible ones. After splitting them up, we start the actual reading.

public ExifFileJson[] GetOriginalDateExifToolUnicode( string[] files )
{
    // first, single out all files with incompatible file names, since they cannot be handled in a batch
    var tmp = ( from x in files select new { OriginalName = x, ConvertedName = Encoding.ASCII.GetString( Encoding.UTF8.GetBytes( x ) ) } ).ToArray();
    string[] batch = tmp.Where( x => x.OriginalName.Equals( x.ConvertedName ) ).Select( x => x.OriginalName ).ToArray();
    string[] nobatch = tmp.Where( x => !x.OriginalName.Equals( x.ConvertedName ) ).Select( x => x.OriginalName ).ToArray();
 
    List<ExifFileJson> exiffiles = new List<ExifFileJson>();
    exiffiles.AddRange( GetOriginalDateExifToolBatch( batch ) );
    foreach ( string s in nobatch )
        exiffiles.Add( GetExifImageExifTool( s ) );
    if ( files.Length != exiffiles.Count() )
        throw new Exception( "Could not open all files. Missing: " + String.Join( ", ", files.Except( exiffiles.Select( x => x.SourceFile ) ).ToArray() ) );
    return exiffiles.ToArray();
}

The next method basically runs ExifTool and parses the output in Json format.

private static ExifFileJson[] GetOriginalDateExifToolBatch( string[] files )
{
    Process oP = new Process();
    oP.EnableRaisingEvents = false;
    oP.StartInfo.CreateNoWindow = true;
    oP.StartInfo.LoadUserProfile = false;
    oP.StartInfo.RedirectStandardError = false;
    oP.StartInfo.RedirectStandardOutput = true;
    oP.StartInfo.RedirectStandardInput = true;
    oP.StartInfo.StandardErrorEncoding = null;
    oP.StartInfo.StandardOutputEncoding = Encoding.UTF8;
    oP.StartInfo.UseShellExecute = false;
    oP.StartInfo.WindowStyle = ProcessWindowStyle.Hidden;
    oP.StartInfo.FileName = @"exiftool.exe";
    oP.StartInfo.Arguments = "-EXIF:ModifyDate -EXIF:DateTimeOriginal -EXIF:CreateDate -j -d \"%Y-%m-%d %H:%M:%S\" -@ -";
    oP.Start();
 
    /// Pass all file names in an arg file which is piped to the process (no temporary file)
    byte[] data = Encoding.UTF8.GetBytes( String.Join( "\r\n", files ) );
    oP.StandardInput.BaseStream.Write( data, 0, data.Length );
    oP.StandardInput.BaseStream.Close();
 
    DataContractJsonSerializer deserializer = new DataContractJsonSerializer( typeof( ExifFileJson[] ) );
    ExifFileJson[] exif = deserializer.ReadObject( oP.StandardOutput.BaseStream ) as ExifFileJson[];
 
    oP.WaitForExit();
    return exif;
}

The following Unicode-safe way does not rely on the Perl file API, but instead pipes the image to stdin. To avoid out of memory conditions, it might be advisable to read the image file in small chunks using a stream. Do not forget to set the file name in the ExifFileJson object before returning it (ExifTool does not know about the file name).

private static ExifFileJson GetExifImageExifTool( string sFile )
{
    Process oP = new Process();
    oP.EnableRaisingEvents = false;
    oP.StartInfo.CreateNoWindow = true;
    oP.StartInfo.LoadUserProfile = false;
    oP.StartInfo.RedirectStandardError = false;
    oP.StartInfo.RedirectStandardOutput = true;
    oP.StartInfo.RedirectStandardInput = true;
    oP.StartInfo.StandardErrorEncoding = null;
    oP.StartInfo.StandardOutputEncoding = Encoding.UTF8;
    oP.StartInfo.UseShellExecute = false;
    oP.StartInfo.WindowStyle = ProcessWindowStyle.Hidden;
    oP.StartInfo.FileName = @"exiftool.exe";
    oP.StartInfo.Arguments = "-j -EXIF:ModifyDate -EXIF:DateTimeOriginal -EXIF:CreateDate -d \"%Y-%m-%d %H:%M:%S\" -";;
    oP.Start();
 
    byte[] image = File.ReadAllBytes( sFile );
    oP.StandardInput.BaseStream.Write( image, 0, image.Length );
    oP.StandardInput.BaseStream.Close();
 
    DataContractJsonSerializer deserializer = new DataContractJsonSerializer( typeof( ExifFileJson[] ) );
    ExifFileJson[] exif = deserializer.ReadObject( oP.StandardOutput.BaseStream ) as ExifFileJson[];
 
    oP.WaitForExit();
    if ( exif.Length > 0 )
        exif[ 0 ].SourceFile = sFile;
    return exif.FirstOrDefault();
}

In case you wonder about the Json class we use for deserializing output:

[DataContract]
public class ExifFileJson
{
    [DataMember( IsRequired = true, Name = "SourceFile" )]
    public string SourceFile;
    [OnDeserializedAttribute()]
    internal void ReplaceBackSlashes( StreamingContext context ) { this.SourceFile = this.SourceFile.Replace( '/', '\\' ); }
 
    [DataMember( IsRequired = false )]
    public string DateTimeOriginal;
    [DataMember( IsRequired = false )]
    public string CreateDate;
    [DataMember( IsRequired = false )]
    public string ModifyDate;
}

Basically we declare required and optional attributes and a name mapping if necessary. Remember to replace forward slashes to backslashes for the file names, since these are returned in Unix style. It is probably not a good idea to parse dates as DateTime? nullables, since there could be some images with unparsable dates, which will result in a parsing exception. If you would still want to do it, remember to decorate the dates in Json format in ExifTool: -d “/Date(%Y-%m-%d %H:%M:%S)/”.

Other Options

Certainly we could reach a better performance and less coding overhead if we had a way of batch processing files independent of their name and path. If you have a perl environment on your machine with the Win32::API module, you could rewrite the above code within a Perl script and therefore get much better performance even when reading Unicode files.

There is also another option: It is possible to add Unicode file name support into the Perl interpreter for Windows. I recently did a a proof-of-concept which shows that ExifTool (or any UTF-8 supporting Perl app) could be using Unicode file names in Windows without changing a line of code, as long as it is executed with a Unicode supporting interpreter. The source code of Perl is pretty big however, and I am afraid I won’t be able to invest enough time to do a bullet-proof implementation.