Home > C#, Software Development > ExifTool Performance Benchmark

ExifTool Performance Benchmark

I had some time to look into the performance of reading image meta data using ExifTool and the GDI+ API. If you are planning to use Exiftool in your own software project, you might find the following information useful:

Test Environment
Tests have been conducted on a Core i7 720QM system with 4 GB RAM and a 7200 rpm HD, running Windows 7 64 bit. The testing application was written in C# using .NET Framework 3.5 and Linq.

Preparation

We are benchmarking different approaches reading a total of 1000 jpg files and measuring the total time taken to read 3 EXIF dates from each file (if present). In order to minimize misleading results when reading files which have previously been cached, we are reading all files once before performing the test.

Reading files without processing

We read every file into memory after it has been cached in the file system cache, so this is more or less an in-memory operation. This gives us a clue regarding the file access overhead.

Duration for 1000 files: 226 ms.

Reading meta data with GDI+

Using the .NET builtin Image class, reading and parsing of 3 attributes is rather fast, taking less than 2 seconds.

Duration for 1000 files: 1632 ms.

Reading meta data using ExifTool

We call ExifTool for every file and parse the return values as DateTime. In order to support all filenames (also names which can only be represented in Unicode), we read the file into memory first and then pipe it to stdin of ExifTool. This is the slowest of all reading modes, taking about 200 times as long as GDI+. As we will see soon, the actual preformance hit ist the startup time of the Perl interpreter, and not ExifTool itself.

Duration for 1000 files: 337000 ms.

Reading meta data using ExifTool (-fast)

Same as above, but using the -fast option in ExifTool, which will prematurely cancel reading from stdin, once a sufficient amount of data has been found. This should increase performance especially when reading files directly over a slow network, but as we can see, it does not make a big difference in our case. Throughput increases by less than 10 percent.

Duration for 1000 files: 317000 ms.
Reading meta data using ExifTool (-fast2)
In addition to the -fast option, using -fast2 should allow for faster processing by ommitting maker notes. The effect is small in our case, which might also be because not all images in the test set contain maker notes. As we will see later, the actual work performed by ExifTool is only very small compared to the Perl startup time. Using the evidence gathered in this test, Perl consumes as much as 98.4% of the total execution time, whereas reading and parsing of the jpeg file only takes 1.6%.

Duration for 1000 files: 311000 ms.

Reading meta data using ExifTool (without extraction)

Since exiftool.exe unpacks a payload of 967 files in 60 folders (8.67 MB), one approach was to skip the extraction process and call ExifTool directly in the temporary folder with all the PAR environment variables set. Unfortunately this could not deliver any improvement worth the effort. Therefore we can say that the amount of time used for extraction is hardly relevant.

Duration for 1000 files: 314000 ms.

Reading meta data using ExifTool (multiple threads)

Since the entire operation is clearly CPU bound, the obvious optimization on a multi core / hyper threading system would be to execute ExifTool in parallel. For our test case, we are executing 8 instances of ExifTool at the same time, using 8 threads (the test system has 8 virtual cores). This results in a signifficant speed up, more than 3 times as fast as a serialized execution:

Duration for 1000 files: 99000 ms.

Reading meta data using ExifTool (batch mode)

The end of the road? With the multi threaded approach we have reached the maximum optimization possible when calling ExifTool seperately for each file. This technique gave us the advantage of being able to handle file names with arbitrary character sets and detailed progress and error reporting.

The only way of getting more (much more) throughput is to use ExifTool in batch mode, thus minimizing the actual impact of the Perl startup time.

As before, ExifTool has been set to read three Date tags from each file and format the text output. This time we are using Json as the output format, since it allows us to easily parse the result of each file. In order to avoid getting a command line which is too long, we are writing all files into an ‘argfile’, which is then written to stdin of Exiftool using -@ -.

This approach is F A S T! We are suddenly doing the same job within 1.6% of the previous time!

Duration for 1000 files: 5669 ms.

Reading meta data using ExifTool (batch mode, -fast2)

No big improvements here, we are ending up with almost the same amount of time. This might be due to the fact that only part of the tested files contain maker notes, so results may vary.

Duration for 1000 files: 5422 ms.

Conclusion

In the above test we have learned that iterating file by file is extremely slow when processing large amounts of files using ExifTool. The performance impact is caused by the overhead of calling the Perl interpreter, and is therefore difficult to minimize when calling ExifTool once for each file. Depending on the kind of data needed, it is therefore worthwhile to look into other solutions such as batch processing or using a different API.

When using ExifTool in batch mode, it is strongly recommended to check whether or not a file name and path can be represented using the current system’s code page. Otherwise such files will not be read. It is not safe to call ExifTool with MS-DOS ’8dot3′ filenames, since these are not converted to ANSI in case of eastern asian characters. Also, these names are not guaranteed to be present for each file on the NTFS file system.

For an ExifTool-only solution reading meta data, the following approach is recommended when reading larger amounts of files:

  1. Create a list of image files to be processed.
  2. Convert each file name into a UTF-8 byte sequence and decode using the current system ANSI code page. Then compare the resulting string to the original path of the file. If both are not equal, you must not use this file in batch processing (ExifTool cannot open it) and instead read it in a separate call. Why should you convert to UTF-8 and then to ANSI? This might be confusing at first, but it mimics exactly what happens behind the scenes. You need to pass the file names as UTF-8 to ExifTool, since they appear again in the output (which should be UTF-8).
  3. Write a list of UTF-8 encoded file names to stdin of ExifTool, do not pass them on the command line, since you might hit the maximum length of the command line and truncate it unknowingly.
  4. Call ExifTool, preferably using -J option for generating output in Json format. Json allows for more efficient parsing (compared to XML) of the result.
  5. Parse the result and ensure you are getting a result for each file. There are many libraries to handle Json parsing, in .NET it is done in a few lines of code. If you had entered file names in ANSI encoding in the previous step, you would run into errors here because your output would contain a mix of encodings.
  6. All files which did not pass the UTF-8 to ANSI encoding roundtrip need to be processed one by one.

Source Code

The source code of the test app is available for download here.

  1. Phil Harvey
    April 16th, 2010 at 17:11 | #1

    A very detailed post and a useful reference. Thanks. I hadn’t done measurements myself of the Perl compilation overhead but I knew it would be significant.

    The startup cost of extracting files from the exiftool package is very significant too, but only occurs the first time exiftool is run. The temporary files are not deleted when exiftool exits and they are not re-extracted if they already exist the next time exiftool is run. So one wouldn’t expect there to be any difference when you ran exiftool directly from the PAR directory, as you determined.

    - Phil

  2. Thomas Laimer
    April 22nd, 2010 at 08:33 | #2

    Hi Christian, I’d be happy if I could take a look into the sources. Unfortunately, the link is broken :/

  3. April 22nd, 2010 at 09:25 | #3

    Post has been updated. Thanks Thomas.

  4. Doug
    January 2nd, 2011 at 17:16 | #4

    Just discovered this myself and googled to see who else had. The solution… exiv2. It doesn’t read as many types of tags, and the syntax an manpage are a bit less clear, but it is MANY times faster than exiftool. I haven’t done the measurements but one is yup about 3 per second, and exiv 2 by eye looks more like 40 per second (give or take). And with exiv2 I’m reading ALL tags and grepping a couple of them at that speed.

  5. Doug
    January 3rd, 2011 at 03:04 | #5

    Post got deleted first time.. odd. I find this same problem with very similar speeds from a bash prompt. The solution was exiv2. It seems to have a little less functionality but it is at least ten times faster. Batch processing is not an option for my needs as scripts load images in memory and do multiple actions based on conditional criteria. Maybe exiv2 isn’t working in windows yet, I don’t know. For file serving, linux is a good part of the solution anyway if you’re not using it (and I like windows for other things, not a flamer). Maybe the post was deleted because I’m using linux and a bash interface? The code might still be found useful by someone and can be used from a prompt or from libraries obviously. It’s opensource. Also many linux users might find themselves reading this since exiftool is also opensource.

  6. January 4th, 2011 at 00:19 | #6

    Hello Doug,
    your first post did not get deleted, it was just invisible since posts only show up after they have been approved.
    Regarding exiv2, I have made the same experience you describe. It is extremely fast compared to exiftool. This is because exiv2 has been written in C++ and hence does not have the Perl overhead of exiftool. The strength of ExifTool is not speed, but reliable reading AND writing of a huge range of tags and file formats. Especially the writing of meta data is tricky at times, with ambiguous format definitions, character set problems and so on. Plus ExifTool is very actively maintained by its author Phil and a relatively big community. Although there are several techniques and features for optimizing speed when using ExifTool, it will never get close to a native compiled program – so when top speed is needed, other (open source) alternatives need to be considered.

  7. November 18th, 2011 at 12:49 | #7

    If you use exiftool’s -stay_open command, then exiftool is really, really fast! Its a bit tricky, but well worth the effort, if you’re writing an exiftool-ing app that may need multiple calls to exiftool and performance is important.

    Rob

  1. No trackbacks yet.


nine + two =