Pattern recognition algorithm student with one time (One-Shot learning)

Introduction

I want to present you the result of my experiments with pattern recognition algorithms with learning from the first time (the so-called One-Shot Learning). As a result of the experiments, certain approaches to image structuring were developed and, as a result, they were embodied in several interconnected algorithms and a test application on Android that can be used to test the quality and performance of the algorithms.

My goal was to create an algorithm with a clear principle of operation that can find abstract dependencies in the picture the first time (learn) and show an acceptable recognition quality (search for such abstract dependencies) on subsequent recognition cycles. In this case, the decision-making logic should be transparent, amenable to analysis, closer to the linear algorithm. On a conventional scale, where at one end is the brain and at the other is a CNC machine, it is much closer to the machine than the neural network.

Why not neural networks?

At the moment, neural networks reign in recognition tasks, in particular, CNN is a kind of standard for pattern recognition. However, in my opinion, their application is not unlimited and it is necessary to look for other approaches.

I will cite several reasons against neural networks:

Requires large datasets for training, which may simply not be available
Great power for learning and a great time learning each picture.
The opacity of the algorithm, the inability to debug and direct impact on the result. It is very difficult if not impossible to understand the logic of the distribution of weights. This is both strength and weakness.

How does it work

The basic idea is this: the sample image must be structured, i.e. the information in it should be reduced to the necessary minimum, but so that the meaning is not lost. For example, artists paint sketches - in just a few precise lines, an artist can depict a person’s face or an object and the viewer will understand what is depicted. A photo contains a matrix of N * M pixels, each pixel contains some bit of color information, and if you present it all in the form of line parameters, the amount of information decreases dramatically and the processing of such information is much simpler. Approximately the same should do the algorithm. He should highlight the main details in the frame - that which carries the basic information and discard all unnecessary.

The algorithm finds the structure of vectors along the boundaries of objects in the sample and the same structure in the recognizable image.

In order to get a vector image goes through several stages of processing:

Translated into monochrome using a simple formula (Red + Green + Blue) / 3
Calculates the gradient for each point of the matrix.
The most significant gradient areas are found.
Chains of vectors covering these areas are searched.
Next, there is a looping of steps to obtain as a result of the minimum number of vectors carrying maximum information.

In the analyzed algorithm, the same thing happens. Next, the resulting arrays of vectors are compared:

At first, the algorithm tries to cling to some similar parts (local clusters). For example, he may find an eyebrow similar to the eyebrow in a sample, and then find a nose that looks like a nose.
And then we look for a similar relationship between local clusters. For example eyebrow + nose + another eyebrow. Already it turns out more complex cluster.
Etc. until you get a picture of the relationship between the clusters that will bring together all or almost all the vectors of the image. Those. for example from eyebrows, eyes, nose, etc. face fail.

Thus, small details are included in the overall picture and an avalanche pattern recognition occurs .
The classification itself is based on the principle of finding the most similar image from the stored ones. The most similar is the one with the greatest number of coinciding vectors with the smallest deviations relative to the total volume of the vectors in the sample.

The general scheme of the algorithms:

Training in several stages

Despite the fact that the algorithm can work efficiently from one sample, it is possible to improve the accuracy of recognition by analyzing several samples. This is not implemented in the demo version, so I just tell you about this feature, it is very simple. The principle of learning on multiple samples is to drop extra vectors. The extraneous ones are those that are not included in the mutually found cluster of vectors. For example, there may be a shadow on the sample that is recognized as a border, and it may not be on the next sample.

Thus, if a vector is included in the cluster, it is found in the saved sample and in the analyzed one it gets a +1 point, and if not, it will receive nothing. After some training, vectors that scored few points are removed from the stored sample and are no longer used for analysis.

You can also make a visual editor that simply allows you to remove unnecessary vectors from the frame after the first training.

What can be used

To be honest, I concentrated all efforts on the algorithm itself. Although I work with the environment of business solutions and industrial automation, I see one application - product recognition in warehouses and production lines - there are no large datasets here - then the sample should be shown 1 time and then recognized. Like binding barcodes only without barcodes. Well, in general, the application is the same as that of any other recognition algorithm. Application due to the capabilities and limitations of the algorithm.

Work test application

The application works with a matrix of 100 * 100 pixels, converts the image to a monochrome matrix of this size. The algorithm does not matter at what angle the sample is and its size is in some limits, too.

The result of the selection of significant areas of the current image and the matching vectors in it (in green) are shown on the left, and the structures of the vectors found and most suitable from the saved ones and the similar vectors in the saved structure are highlighted in red on the right. Thus, the structures of vectors are highlighted in red and green color, which the algorithm considers similar.

You can save multiple samples. And showing a new image, the algorithm will find the most suitable of them and show similar parts.

Source: https://habr.com/ru/post/414425/

All Articles