It was in late 2007 when I was met with harsh criticism in the industry. In the course of a panel discussion about vision software I had made the proposal to work on making the performance of vision software comparable for the end user. Along the lines of standard EMVA 1288 for cameras a binding definition of performance criteria and an accurate description of benchmark scenarios could be employed to compare the different software solutions for the different vision tasks. The resulting data could help users in their decision for suitable tools and products for their application.
This proposal was not very well received by some of the software suppliers back then. There was talk of the difficulty to come up with a distinct specification and of the necessity to distinguish oneself from competition especially by the differences in the software and the resulting impossibility to provide a direct comparability.
However, a couple of weeks ago Dr. Wolfgang Eckstein from MVTec delivered a public and concrete proposal on how such a benchmark could be designed and executed. He, as well, suggests to define performance criteria for typical vision applications like barcode identification or pattern matching, to establish a uniform fixed framework for the benchmark, and then to execute the comparison on selected, standardized and well-described image data.
There is a very similar approach already established in other industries, BTW. The British Home Office Security Development Branch (HOSDB), e.g., provides libraries (i-LIDS) with video footage to companies from the video analytics area for a benchmark of their software in applications like people tracking covering several cameras, perimeter surveillance or detection of abandoned luggage. This gives the suppliers the opportunity to test the capabilities of their products in a well-defined scenario and based on relevant input data while the potential customer has the chance to compare the performance of the different software packages.
The challenge, here and there, is of course in the sensible choice of application relevant image data and this gets even more demanding the more complex the vision tasks becomes. May it be even relatively straightforward to decide upon images and testing scenarios for the determination of a barcode identification performance, it ventures into the indefinite complex to do this for a surface inspection.
But there is yet another aspect, not taken care of with a pure performance benchmark of algorithms, for the value rating by the user however at the same level of importance: the usability of the software. How laborious is the parameter set-up of the tool, how transparent is the result extraction, and how coherent are system and result messages and reports? The answer to these questions can determine the success or failure of the application, independent from the performance of the algorithms employed. While following the goal to accomplish transparency and to provide impartial decision criteria to the user, this latter aspect of vision software performance needs to be included in any benchmark.
All aspects considered a benchmark of vision software is by no means a small feat, but definitely one that is worth the effort. One can only hope that Dr. Eckstein´s advance will be taken on by other players in the industry. The real benefit will be achieved only when the definition of the performance criteria and the description of the benchmark scenarios as well as the design of the image data basis are derived from the input, the know-how and the experience of many experts and the procedure and the results will then be supported by many of the suppliers.
Gabriele Jansen
Posted by gabrielejansen