We do math
Despite good vision libraries, they aren't always the answer to every problem. Sometimes tailor-made solutions are the only key to success. That's why we like to solve unique vision problems with math.
Why worry about math?
To answer the question with a few words, it is at the very heart of all your vision software. Because vision software is so hard to program, a number of good libraries have been developed over the years. OpenCV, VXL, IVT, LTI-Lib, and CVIPtools to name a few. MATLAB has also various vision operations on board. All trying to make real-time computer vision programming easier.
Generally speaking, everything goes well if one stays within the boundaries of the library. Performing a specific task usually requires a series of different steps or operations on the image.
Serious problems occur when the required operation is not available in the library. For instance, OpenCV does not know any fuzzy C-means clustering tools. Neither the twin curve fitter used in the broken bone detection is something one finds in a standard library for computer vision. The next logical step seems to be writing your own additions to the library. However, these libraries are designed to be used as a collection of high-level routines. Fetching underlying arrays via low-level manipulations comes with performance penalties.
This brings us to a second point. To fully utilize the libraries you have to know pretty well how they work behind the curtains. Python as an interpreted language is always slower compared to C++ as a compiled language. If speed is of importance, choose a C++ library. If your going for convenience Python is your choice. OpenCV is mainly static build, which forces the operating system to keep many instances of an image in memory. Even worse, if not designed by an experienced programmer, a subroutine can cause memory to be allocated and deallocated with each call, thereby reducing performance. Not so much a major issue with a modern fast computer and a lot of memory, but when going embedded this can be a show stopper.
Needless to say, embedded software requires even more knowledge of math. For example, which type of variable can be used where? Byte, integer or float is the question that every programmer must answer every time.
In daily use, computer vision algorithms dependents on parameters and variables. Without any understanding of the underlying math, programming easily ends up guessing values and then declaring them 'divine'. It works, so get your hands off!
All of this explains why more than sixty percent of our requests for assistance come from companies with well-trained programmers.
In the color theory, the Kubelka-Munk theory describes the spectrum given an ink layer thickness. With a semi-infinite ink thickness, the formula is relatively simple. Is the layer thin, or is there a stack of different colored inks, the formula becomes more complicated. To determine the absorption coefficient K and the scattering coefficient S, an evolutionary algorithm (NSGA-II) is used. Thousands of different color patches, printed in several thicknesses are used to feed the algorithm. After each iteration, a selection is done in a population of 40 solutions. The 20 very best are used to synthesize the remaining 20 by genetic cross-over and mutation operators. By the way, that's why it is called an evolutionary algorithm. A two-dimensional error space is used to rank the solutions. One dimension is the average error found in the solid colors (C, M, Y on the x-axis in the video). The other is calculated from the mixed colors (CM, CY, MY on the y-axis). As the iteration progresses, the errors decline and the characteristic Pareto frontier becomes visible. The video shows three cases.
Principal component analysis
Spectra of light are usually measured in small bands of 5 nm. For visual light (400-780 nm) this results in 77 values. Or, to put it in other words, a vector with 77 elements. Apart from laser or colored LED light with their narrow spectral peaks, most natural sources usually have a gradual course. This makes principal component analysis possible, a technique that reduces dimensions. The method works with a small set of representative vectors.
A weighted sum of these so-called eigenvectors approximates the given original vector. The more eigenvectors are used, the better the approach and the smaller the residual error. With a set of 8 to 10, the results are more than acceptable. This principal component analysis scheme forms the backbone of an application that predicts RGB values, given colored surfaces that are illuminated by different light sources.
The sideshow animates the increase of used vectors. The thin black line is the measured spectrum of the selected color, the thick colored line represents the principal component analysis outcome. The used Windows app can be downloaded here.
A wavelet transformation is a powerful tool in computer vision. It can be used in feature recognition, noise reduction, image compression or enlarging as well as in compressive sensing techniques. In this case, a wavelet is used to compress images for deep learning.
As sine and cosines are used in the Fourier transformation, here signals are decomposed by a specific short term waveform. Every time when the signal correlates to the wave it is marked. By using a scheme of subtraction and downscaling, the original signal or image is anatomized in different low and high subbands, as can see in the third slide. When applied on natural images, these subbands have surprisingly little energy. Thresholding all small numbers to zero results a high-quality compression. Even if 99,6% of all information is lost, the image remains acceptable. Especially compared to a JPEG compression with about the same ratio. Although the images are not suitable for high-quality desktop publishing, they work well in an image dataset for deep learning. The used wavelet app can be downloaded here.