Archive

Posts Tagged ‘ExifTool’

Optimized Reading of Meta-Data using ExifTool (Unicode-Proof!)

April 22nd, 2010 Christian Etter No comments

Today we are going to look at how to work around the lack of Unicode support in ExifTool.

In my last post, I have already been talking about a safe way of handling Unicode file/path names, which was rather slow unfortunately. In this post I would like to elaborate on how to combine this with a fast reading approach using .NET.

I have chosen to give examples using C# code in these series, since it allows me to demonstrate my ideas in a very compact way. However the general approach is compatible with many programming languages and therefore not a .NET only solution.

Basically we are combining a batch read using ExifTool with a single file read operation for incompatible file names. In optimal circumstances, i.e. when all file names are convertible, this method performs as fast as ExifTool can be. Worst case would be reading all files one by one, which has a bigger performance penalty.

Prior to processing any files, we have to divide all file names into compatible and incompatible ones. After splitting them up, we start the actual reading.

public ExifFileJson[] GetOriginalDateExifToolUnicode( string[] files )
{
    // first, single out all files with incompatible file names, since they cannot be handled in a batch
    var tmp = ( from x in files select new { OriginalName = x, ConvertedName = Encoding.ASCII.GetString( Encoding.UTF8.GetBytes( x ) ) } ).ToArray();
    string[] batch = tmp.Where( x => x.OriginalName.Equals( x.ConvertedName ) ).Select( x => x.OriginalName ).ToArray();
    string[] nobatch = tmp.Where( x => !x.OriginalName.Equals( x.ConvertedName ) ).Select( x => x.OriginalName ).ToArray();
 
    List<ExifFileJson> exiffiles = new List<ExifFileJson>();
    exiffiles.AddRange( GetOriginalDateExifToolBatch( batch ) );
    foreach ( string s in nobatch )
        exiffiles.Add( GetExifImageExifTool( s ) );
    if ( files.Length != exiffiles.Count() )
        throw new Exception( "Could not open all files. Missing: " + String.Join( ", ", files.Except( exiffiles.Select( x => x.SourceFile ) ).ToArray() ) );
    return exiffiles.ToArray();
}

The next method basically runs ExifTool and parses the output in Json format.

private static ExifFileJson[] GetOriginalDateExifToolBatch( string[] files )
{
    Process oP = new Process();
    oP.EnableRaisingEvents = false;
    oP.StartInfo.CreateNoWindow = true;
    oP.StartInfo.LoadUserProfile = false;
    oP.StartInfo.RedirectStandardError = false;
    oP.StartInfo.RedirectStandardOutput = true;
    oP.StartInfo.RedirectStandardInput = true;
    oP.StartInfo.StandardErrorEncoding = null;
    oP.StartInfo.StandardOutputEncoding = Encoding.UTF8;
    oP.StartInfo.UseShellExecute = false;
    oP.StartInfo.WindowStyle = ProcessWindowStyle.Hidden;
    oP.StartInfo.FileName = @"exiftool.exe";
    oP.StartInfo.Arguments = "-EXIF:ModifyDate -EXIF:DateTimeOriginal -EXIF:CreateDate -j -d \"%Y-%m-%d %H:%M:%S\" -@ -";
    oP.Start();
 
    /// Pass all file names in an arg file which is piped to the process (no temporary file)
    byte[] data = Encoding.UTF8.GetBytes( String.Join( "\r\n", files ) );
    oP.StandardInput.BaseStream.Write( data, 0, data.Length );
    oP.StandardInput.BaseStream.Close();
 
    DataContractJsonSerializer deserializer = new DataContractJsonSerializer( typeof( ExifFileJson[] ) );
    ExifFileJson[] exif = deserializer.ReadObject( oP.StandardOutput.BaseStream ) as ExifFileJson[];
 
    oP.WaitForExit();
    return exif;
}

The following Unicode-safe way does not rely on the Perl file API, but instead pipes the image to stdin. To avoid out of memory conditions, it might be advisable to read the image file in small chunks using a stream. Do not forget to set the file name in the ExifFileJson object before returning it (ExifTool does not know about the file name).

private static ExifFileJson GetExifImageExifTool( string sFile )
{
    Process oP = new Process();
    oP.EnableRaisingEvents = false;
    oP.StartInfo.CreateNoWindow = true;
    oP.StartInfo.LoadUserProfile = false;
    oP.StartInfo.RedirectStandardError = false;
    oP.StartInfo.RedirectStandardOutput = true;
    oP.StartInfo.RedirectStandardInput = true;
    oP.StartInfo.StandardErrorEncoding = null;
    oP.StartInfo.StandardOutputEncoding = Encoding.UTF8;
    oP.StartInfo.UseShellExecute = false;
    oP.StartInfo.WindowStyle = ProcessWindowStyle.Hidden;
    oP.StartInfo.FileName = @"exiftool.exe";
    oP.StartInfo.Arguments = "-j -EXIF:ModifyDate -EXIF:DateTimeOriginal -EXIF:CreateDate -d \"%Y-%m-%d %H:%M:%S\" -";;
    oP.Start();
 
    byte[] image = File.ReadAllBytes( sFile );
    oP.StandardInput.BaseStream.Write( image, 0, image.Length );
    oP.StandardInput.BaseStream.Close();
 
    DataContractJsonSerializer deserializer = new DataContractJsonSerializer( typeof( ExifFileJson[] ) );
    ExifFileJson[] exif = deserializer.ReadObject( oP.StandardOutput.BaseStream ) as ExifFileJson[];
 
    oP.WaitForExit();
    if ( exif.Length > 0 )
        exif[ 0 ].SourceFile = sFile;
    return exif.FirstOrDefault();
}

In case you wonder about the Json class we use for deserializing output:

[DataContract]
public class ExifFileJson
{
    [DataMember( IsRequired = true, Name = "SourceFile" )]
    public string SourceFile;
    [OnDeserializedAttribute()]
    internal void ReplaceBackSlashes( StreamingContext context ) { this.SourceFile = this.SourceFile.Replace( '/', '\\' ); }
 
    [DataMember( IsRequired = false )]
    public string DateTimeOriginal;
    [DataMember( IsRequired = false )]
    public string CreateDate;
    [DataMember( IsRequired = false )]
    public string ModifyDate;
}

Basically we declare required and optional attributes and a name mapping if necessary. Remember to replace forward slashes to backslashes for the file names, since these are returned in Unix style. It is probably not a good idea to parse dates as DateTime? nullables, since there could be some images with unparsable dates, which will result in a parsing exception. If you would still want to do it, remember to decorate the dates in Json format in ExifTool: -d “/Date(%Y-%m-%d %H:%M:%S)/”.

Other Options

Certainly we could reach a better performance and less coding overhead if we had a way of batch processing files independent of their name and path. If you have a perl environment on your machine with the Win32::API module, you could rewrite the above code within a Perl script and therefore get much better performance even when reading Unicode files.

There is also another option: It is possible to add Unicode file name support into the Perl interpreter for Windows. I recently did a a proof-of-concept which shows that ExifTool (or any UTF-8 supporting Perl app) could be using Unicode file names in Windows without changing a line of code, as long as it is executed with a Unicode supporting interpreter. The source code of Perl is pretty big however, and I am afraid I won’t be able to invest enough time to do a bullet-proof implementation.

ExifTool Performance Benchmark

April 15th, 2010 Christian Etter 3 comments

I had some time to look into the performance of reading image meta data using ExifTool and the GDI+ API. If you are planning to use Exiftool in your own software project, you might find the following information useful:

Test Environment
Tests have been conducted on a Core i7 720QM system with 4 GB RAM and a 7200 rpm HD, running Windows 7 64 bit. The testing application was written in C# using .NET Framework 3.5 and Linq.

Preparation

We are benchmarking different approaches reading a total of 1000 jpg files and measuring the total time taken to read 3 EXIF dates from each file (if present). In order to minimize misleading results when reading files which have previously been cached, we are reading all files once before performing the test.

Reading files without processing

We read every file into memory after it has been cached in the file system cache, so this is more or less an in-memory operation. This gives us a clue regarding the file access overhead.

Duration for 1000 files: 226 ms.

Reading meta data with GDI+

Using the .NET builtin Image class, reading and parsing of 3 attributes is rather fast, taking less than 2 seconds.

Duration for 1000 files: 1632 ms.

Reading meta data using ExifTool

We call ExifTool for every file and parse the return values as DateTime. In order to support all filenames (also names which can only be represented in Unicode), we read the file into memory first and then pipe it to stdin of ExifTool. This is the slowest of all reading modes, taking about 200 times as long as GDI+. As we will see soon, the actual preformance hit ist the startup time of the Perl interpreter, and not ExifTool itself.

Duration for 1000 files: 337000 ms.

Reading meta data using ExifTool (-fast)

Same as above, but using the -fast option in ExifTool, which will prematurely cancel reading from stdin, once a sufficient amount of data has been found. This should increase performance especially when reading files directly over a slow network, but as we can see, it does not make a big difference in our case. Throughput increases by less than 10 percent.

Duration for 1000 files: 317000 ms.
Reading meta data using ExifTool (-fast2)
In addition to the -fast option, using -fast2 should allow for faster processing by ommitting maker notes. The effect is small in our case, which might also be because not all images in the test set contain maker notes. As we will see later, the actual work performed by ExifTool is only very small compared to the Perl startup time. Using the evidence gathered in this test, Perl consumes as much as 98.4% of the total execution time, whereas reading and parsing of the jpeg file only takes 1.6%.

Duration for 1000 files: 311000 ms.

Reading meta data using ExifTool (without extraction)

Since exiftool.exe unpacks a payload of 967 files in 60 folders (8.67 MB), one approach was to skip the extraction process and call ExifTool directly in the temporary folder with all the PAR environment variables set. Unfortunately this could not deliver any improvement worth the effort. Therefore we can say that the amount of time used for extraction is hardly relevant.

Duration for 1000 files: 314000 ms.

Reading meta data using ExifTool (multiple threads)

Since the entire operation is clearly CPU bound, the obvious optimization on a multi core / hyper threading system would be to execute ExifTool in parallel. For our test case, we are executing 8 instances of ExifTool at the same time, using 8 threads (the test system has 8 virtual cores). This results in a signifficant speed up, more than 3 times as fast as a serialized execution:

Duration for 1000 files: 99000 ms.

Reading meta data using ExifTool (batch mode)

The end of the road? With the multi threaded approach we have reached the maximum optimization possible when calling ExifTool seperately for each file. This technique gave us the advantage of being able to handle file names with arbitrary character sets and detailed progress and error reporting.

The only way of getting more (much more) throughput is to use ExifTool in batch mode, thus minimizing the actual impact of the Perl startup time.

As before, ExifTool has been set to read three Date tags from each file and format the text output. This time we are using Json as the output format, since it allows us to easily parse the result of each file. In order to avoid getting a command line which is too long, we are writing all files into an ‘argfile’, which is then written to stdin of Exiftool using -@ -.

This approach is F A S T! We are suddenly doing the same job within 1.6% of the previous time!

Duration for 1000 files: 5669 ms.

Reading meta data using ExifTool (batch mode, -fast2)

No big improvements here, we are ending up with almost the same amount of time. This might be due to the fact that only part of the tested files contain maker notes, so results may vary.

Duration for 1000 files: 5422 ms.

Conclusion

In the above test we have learned that iterating file by file is extremely slow when processing large amounts of files using ExifTool. The performance impact is caused by the overhead of calling the Perl interpreter, and is therefore difficult to minimize when calling ExifTool once for each file. Depending on the kind of data needed, it is therefore worthwhile to look into other solutions such as batch processing or using a different API.

When using ExifTool in batch mode, it is strongly recommended to check whether or not a file name and path can be represented using the current system’s code page. Otherwise such files will not be read. It is not safe to call ExifTool with MS-DOS ’8dot3′ filenames, since these are not converted to ANSI in case of eastern asian characters. Also, these names are not guaranteed to be present for each file on the NTFS file system.

For an ExifTool-only solution reading meta data, the following approach is recommended when reading larger amounts of files:

  1. Create a list of image files to be processed.
  2. Convert each file name into a UTF-8 byte sequence and decode using the current system ANSI code page. Then compare the resulting string to the original path of the file. If both are not equal, you must not use this file in batch processing (ExifTool cannot open it) and instead read it in a separate call. Why should you convert to UTF-8 and then to ANSI? This might be confusing at first, but it mimics exactly what happens behind the scenes. You need to pass the file names as UTF-8 to ExifTool, since they appear again in the output (which should be UTF-8).
  3. Write a list of UTF-8 encoded file names to stdin of ExifTool, do not pass them on the command line, since you might hit the maximum length of the command line and truncate it unknowingly.
  4. Call ExifTool, preferably using -J option for generating output in Json format. Json allows for more efficient parsing (compared to XML) of the result.
  5. Parse the result and ensure you are getting a result for each file. There are many libraries to handle Json parsing, in .NET it is done in a few lines of code. If you had entered file names in ANSI encoding in the previous step, you would run into errors here because your output would contain a mix of encodings.
  6. All files which did not pass the UTF-8 to ANSI encoding roundtrip need to be processed one by one.

Source Code

The source code of the test app is available for download here.

Retrieving Image Meta-Data using GDI+ and ExifTool

April 14th, 2010 Christian Etter 3 comments

How to read image meta data in .NET? Here we illustrate two techniques:

First, for the sake of speed and simplicity, we chose the GDI+ builtin capabilities of the Image class:

/// <summary>Much faster than using Exiftool. In case GDI+ cannot decode the date string we use Exiftool.</summary>
private DateTime? GetOriginalDate( string sFileName )
{
    using ( FileStream stream = new FileStream( sFileName, FileMode.Open, FileAccess.Read ) )
    {
        using ( Image img = Image.FromStream( stream, false, false ) )
        {
            int[] date_tags = new int[] { 36867, 36868, 306 }; // tag numbers with dates
            string[] s1 = ( from x in date_tags where img.PropertyIdList.Contains( x ) select Encoding.ASCII.GetString( img.GetPropertyItem( x ).Value ).Replace( "\0", "" )).ToArray(); // get date as string without training \0
            DateTime d;
            DateTime?[] dd = ( from x in s1 where x.Trim().Length > 0
                select DateTime.TryParseExact( x, new string[] { "yyyy:MM:dd HH:mm:ss", "yyyy-MM-dd HH:mm:ss", "MM/dd/yyyy HH:mm:ss", "yyyy-MM-dd'T'HH:mm:sszzz" }, CultureInfo.InvariantCulture, DateTimeStyles.AllowWhiteSpaces | DateTimeStyles.AssumeLocal, out d ) ? d as DateTime? : 
                null ).ToArray(); // we see if we can parse all the date attributes found
            if ( dd.Where( x => !x.HasValue ).Count() > 0 )
                return GetOldestExifDateExifTool( sFileName ); // if there is something in the date attribute we cannot parse we ask exiftool.
            else
                return ( from x in dd where x.Value > new DateTime( 1990, 01, 01 ) && x.Value < DateTime.UtcNow select x ).Min(); // make sure we use a valid date range
        }
    }
}

Since the EXIF standard defines date values to be stored as text data, sometimes we find non-standard date formats. This includes dates being stored with milliseconds added, using different separator characters or including an additional UTC offset. Exiftool does a pretty decent job interpreting all those values as a date, plus it might be capable of reading certain off-standard or broken meta-data which GDI+ doesn’t.

Here is the second approach:

/// <summary>Extracts the oldest possible EXIF date. Can process 3 files per second, very slow, will need 8 hours for 90.000 files.</summary>
private static DateTime? GetOldestExifDateExifTool( string sFile )
{
    Process oP = new Process();
    oP.EnableRaisingEvents = false;
    oP.StartInfo.CreateNoWindow = true;
    oP.StartInfo.LoadUserProfile = false;
    oP.StartInfo.RedirectStandardError = false;
    oP.StartInfo.RedirectStandardOutput = true;
    oP.StartInfo.RedirectStandardInput = true;
    oP.StartInfo.StandardErrorEncoding = null;
    oP.StartInfo.StandardOutputEncoding = Encoding.UTF8;
    oP.StartInfo.UseShellExecute = false;
    oP.StartInfo.WindowStyle = ProcessWindowStyle.Hidden;
    oP.StartInfo.FileName = @"exiftool.exe";
    oP.StartInfo.Arguments = "-s -s -EXIF:ModifyDate -EXIF:DateTimeOriginal -EXIF:CreateDate -d \"%Y-%m-%d %H:%M:%S\" -";
    oP.Start();
 
    byte[] image = File.ReadAllBytes( sFile );
    oP.StandardInput.BaseStream.Write( image, 0, image.Length );
    oP.StandardInput.BaseStream.Flush();
    oP.StandardInput.BaseStream.Close();
    string sStdOut = oP.StandardOutput.ReadToEnd();
    oP.WaitForExit();
 
    string[] datetags = new string[] { "DateTimeOriginal", "CreateDate", "ModifyDate" };
    string[] res1 = sStdOut.Split( new string[] { "\r\n" }, StringSplitOptions.RemoveEmptyEntries ); // split lines
    string[][] res2 = ( from x in res1 select x.Split( new char[] { ':' }, 2 ) ).ToArray(); // split after colon to separate attributes and values
    string[] res3 = ( from x in res2 where x.Length == 2 && datetags.Contains( x[ 0 ], StringComparer.InvariantCultureIgnoreCase ) select x[ 1 ] ).ToArray(); // only chose lines of date attributes
    DateTime d;
    DateTime?[] dd = ( from x in res3 select DateTime.TryParseExact( x, "yyyy-MM-dd HH:mm:ss", CultureInfo.InvariantCulture, DateTimeStyles.AllowWhiteSpaces | DateTimeStyles.AssumeLocal, out d ) ? d as DateTime? : null ).ToArray();
    DateTime? oDate = ( from x in dd where x.HasValue && x > new DateTime( 1990, 01, 01 ) && x < DateTime.UtcNow select x ).Min();
    return oDate;
}

Basically Exiftool has three shortcomings when used within another program:

  1. It is not available as a library. Therefore we need to make use of the Process API.
  2. It has a long startup time. This is because it has been written in Perl which is packed in a single self-expanding exe wrapper. As a result, we can only process about 3 files per second on a fast computer. GDI+ might be a hundred times faster. We might be able to work around this somehow by processing several files in a batch, which would require a bigger change in our program logic.
  3. It does not support the unicode filesystem API, so filenames which are not compatible with the current ANSI encoding cannot be opened. To work around this limitation, we read the file into memory first and then pipe it into ExifTool.

Note: when using the Process API, you are given the option to redirect both stdout and stderr at the same time, which could allow for more detailed error handling/messages. However you *must* always read stdout and stderr in different threads to avoid a deadlock situation. For the sake of simplicity, I have ommitted error handling in this case.

Resizing JPEG Thumbnails with ExifTool and ImageMagick

September 14th, 2009 Christian Etter No comments

Most image editors do not offer settings for the preview image and thumbnail which are embedded into a JPEG file along with the actual image data.
Unfortunately, the size and quality of the embedded thumbnail is often not adequate for viewing. Using ExifTool and the ImageMagick tool ‘convert’ we can replace any thumbnail with a different JPEG image of an arbitrary size/resolution.

convert.exe a.jpg -thumbnail 160x160 -strip -quality 90 jpg:- | exiftool.exe  "-ThumbnailImage<=-" -m -overwrite_original a.jpg

In case you are using the above within a .BAT or .CMD file, remember to replace each percent sign by a double percent sign. the -thumbnail parameter will create a minimalistic JPEG file without additional meta data. The -strip command even removes any embedded color profile to save more space.
Optionally you could add a check if the internal thumbnail image is of a certain size or dimension before starting to replace it.
You might want to do a similar exchange of the PreviewImage (if present)

convert.exe a.jpg -thumbnail 570x570 -strip -quality 50 jpg:- | exiftool.exe  "-PreviewImage<=-" -m -overwrite_original a.jpg

In this case I am using a 570×570 target square, which is the same dimensions as created by a Nikon D80 SLR. Aspect ration will be preserved.

Following is a complete script for conditional replacement of the thumbnail image.

@ECHO OFF
SETLOCAL EnableDelayedExpansion
 
SET EXTENSION=.JPG
SET JPGDIR=.
SET THUMBNAIL_MINSIZE=3800
SET THUMBNAIL_SIZE=160x160
SET THUMBNAIL_QUALITY=85
 
SET ERRORFILE=%TEMP%\exiftool.rebuild.error.txt
SET TEMPFILE=%TEMP%\exiftool_tmp.jpg
 
ECHO Processing %JPGDIR%
IF EXIST "%ERRORFILE%" DEL %ERRORFILE%
 
IF %THUMBNAIL_MINSIZE%==0 (
	FOR %%A IN ("%JPGDIR%\*%EXTENSION%") DO (
		ECHO Replacing thumbnail in %%~nA%%~xA
		convert.exe "%%A" -thumbnail %THUMBNAIL_SIZE% -quality %THUMBNAIL_QUALITY% jpg:- | exiftool.exe "-ThumbnailImage<=-" -m -overwrite_original "%%A" 2>>"%ERRORFILE%"
	)
) ELSE (
	FOR %%A IN ("%JPGDIR%\*%EXTENSION%") DO (
		exiftool.exe -b -ThumbnailImage "%%A" > "%TEMPFILE%"
 
		FOR %%R IN ("%TEMPFILE%") DO IF %%~zR LSS %THUMBNAIL_MINSIZE% (
			IF %%~zR GTR 0 (
				ECHO Size of thumbnail is %%~zR, which is less than %THUMBNAIL_MINSIZE%, replacing %%~nxA...
				convert.exe "%%A" -thumbnail %THUMBNAIL_SIZE% -quality %THUMBNAIL_QUALITY% jpg:- | exiftool.exe "-ThumbnailImage<=-" -m -overwrite_original "%%A" 2>>"%ERRORFILE%"
				exiftool.exe -b -ThumbnailImage "%%A" > "%TEMPFILE%"
				FOR %%S IN ("%TEMPFILE%") DO ECHO New size is %%~zS
			) ELSE (
				ECHO File %%~nA%%~xA does not contain thumbnail, so we don't add one
			)
		) ELSE (
			ECHO Size of thumbnail is %%~zR, no need to replace %%~nA%%~xA.
		)
		IF EXIST "%TEMPFILE%" DEL "%TEMPFILE%"
	)	
)

More information on ImageMagick: http://www.imagemagick.org/script/command-line-processing.php

Editing Metadata using Exiftool and Unicode

June 2nd, 2009 Christian Etter 2 comments

When it comes to editing image metadata, no program or API gets close to ExifTool in terms of robustness, feature count and support. ExifTool has been written in Perl and is provided within a .exe wrapper for the Windows platform. This executable can be controlled through command line parameters and a parameter (arg)  file.

Editing metadata by passing command line parameters and attribute values generally works well, as long as certain restrictions are not violated. All parameters must be correctly escaped and may only contain ANSI characters that are supported by the current user’s codepage. The maximum length of the command line is limited, and therefore entering long text might be difficult in some cases.

Unfortunately it is impossible to enter any characters that are stored in Unicode encodings such as UTF-8, UTF-16 or UCS-2. This is due to a restriction in the Perl interpreter binary, which only accepts and forwards 8 bit character sets. Technically, UTF-8 could be passed as an 8 bit character set, however due to the command line string handling of Windows, only text encoded with the current System ANSI codepage can be passed using the CreateProcess() API.

Fortunately there are a few workarounds for passing Unicode data to ExifTool:

  • Use the -E option to write special characters as HTML character entities. Examples: &auml; – ä Umlaut, &#10; – line break, &#10;&#13; – line break (Windows), &#31435; – Chinese character 立, etc. In case the text already contains some HTML- entities, you would have to escape them first. With this approach you will be restricted to the maximum length of the command line, which is between 2047 and 8191 characters (MSDN). When using the Win32 API CreateProcess(), the maximum argument length is 32000 characters (MSDN).
  • Write the data directly to stdin and specifyon the command line which attribute is supposed to store the data. This is a very fast approach if you need to write only a single attribute in any encoding, including binary data. No escaping is necessary in this case. Multi-value attributes and line breaks are supported.
  • Write the text or binary data for each attribute into a separate temporary file before calling ExifTool. Make sure you remove every file after ExifTool finishes processing. Similar to the above, you can write any kind of data without escaping, and as an additional benefit you can write parameters in different encodings at the same time. To pass a file as a parameter, use the “attribute<=filename” syntax (needs to be escaped with double quotes on the command line).
  • Write all attributes of the same encoding into an arg file. Then call ExifTool using the -@ parameter. You could also add processing instructions to the file, as well as the names of the files to be processed.  Another benefit is that size restrictions of the command line do not apply, and therefore an arbitrary number of files and attributes can be processed. All you have to make sure is that you only use a single encoding (probably UTF-8) for all attributes in the file. Since every assignment has to fit into a single line, multi value attributes and text with line breaks has to be escaped. In case you have to write any text that contains line breaks, you have to escape them with a $/ and change the = operator to <=. Also note that you have to escape all $ characters by $$.
  • Use the argfile approach from above, but do not use a temporary file. Instead, write the contents of that file directly to stdin. You can use the -@ - syntax for this. The main advantage compared to using an arg file is that you do not need to worry about cleaning up.
  • Combine any of the above ways of passing data.

Background

Image metadata is frequentlystored within an EXIF header. Unfortunately, EXIF has a limited concept of character sets used to encode information. According to the specification, only ASCII encoded text can be stored in text attributes. Actually many applications will also read and and write text in other 8-Bit encodings, with Latin1 being a very common denominator. For any text that is written in the current ANSI encoding of the user’s system, the user will most likely be able to retrieve the saved text without any loss in information. This however is only the case as long as the meta data is viewed on a system with the same code page. Depending on the characters used to encode the attribute, a system with a different default ANSI code page is likely to show wrong characters or even complete nonsense. Except for a few attributes such as UserComment, EXIF does not forsee the usage of Unicode.

IPTC fortunately does offer storage of Unicode text. It is possible to flag all IPTC text as UTF-8. Therefore all Unicode characters can be safely stored within IPTC, as long as the reading and writing applications adhere to the specification. Storing UTF-8 encoded text is supported by ExifTool.

XMP supports UTF-8 by default, so whenever writing XMP data, one of the above techniques should be used.

Link to ExifTool: http://www.sno.phy.queensu.ca/~phil/exiftool/

ExifTool forum at CPAN: http://www.cpanforum.com/dist/Image-ExifTool

Testing for possible Unicode – ANSI code page compatibility

May 13th, 2009 Christian Etter 9 comments

When dealing with a recent ExifTool remoting task, there was a question whether or not a given Unicode file name could be safely represented in the system ANSI code page. Only if the file name was fully convertible it could be passed to the application directly.

In case the file name cannot be converted to the current code page, an application which does not utilize the CreateFileW() API will not be able to open the file with this name. In case the file system supports old style DOS 8.3 filenames, the application should resort to using those instead.

BOOL IsConvertibleText( PCWSTR sFile )
{
    BOOL bRet = FALSE;
    if ( sFile )
    {
        int iBuffer = WideCharToMultiByte( CP_ACP, 0, sFile, -1, NULL, 0, NULL, NULL );
        if ( iBuffer != 0 )
        {
            iBuffer += 1;
            PSTR a = (PSTR)HeapAlloc( GetProcessHeap(), 0, iBuffer );
            if ( a )
            {
                if ( WideCharToMultiByte( CP_ACP, 0, sFile, -1, a, iBuffer, NULL, NULL ) )
                {
                    iBuffer = MultiByteToWideChar( CP_ACP, 0, a, -1, NULL, 0 );
                    if ( iBuffer != 0 )
                    {
                        iBuffer = ( iBuffer + 1 ) * sizeof(WCHAR);
                        PWSTR w = (PWSTR)HeapAlloc( GetProcessHeap(), 0, iBuffer );
                        if ( w )
                            if ( MultiByteToWideChar( CP_ACP, 0, a, -1, w, iBuffer ) )
                                if ( CompareStringW( LOCALE_SYSTEM_DEFAULT, 0, sFile, -1, w, -1 ) == CSTR_EQUAL )
                                    bRet = TRUE;
                        HeapFree( GetProcessHeap(), 0, w );
                    }
                }
                HeapFree( GetProcessHeap(), 0, a );
            }
        }
    }
    return bRet;
}

For those using C#:

bool IsConvertibleText( string sFile )
{
    byte[] b = Encoding.Default.GetBytes( sFile );
    string s = Encoding.Default.GetString( b );
    return sFile.Equals( s, StringComparison.InvariantCulture );
}

See also: Post in CPAN::Forum
Win32API::File Unicode support bug