ConvNets. Project prototyping with Mask R-CNN

Hi, Habr! We finally got another part of a series of materials from a graduate of our Big Data Specialist and Deep Learning programs, Cyril Danilyuk, about using the currently popular Mask R-CNN neural networks as part of an image classification system, namely assessing the quality of the cooked dish using a set of data from sensors.

Having considered in the previous article a toy dataset consisting of images of road signs, we can now proceed to solving the problem I faced in real life: “Is it possible to implement a Deep Learning algorithm that could distinguish high-quality dishes from bad dishes one by one? photos? In short, the business wanted this:

What is business when thinking about machine learning:

This is an example of an incorrectly assigned task: in this case it is impossible to determine whether a solution exists, is unique and stable. In addition, the very formulation of the problem is very vague, not to mention the implementation of its solution. Of course, this article is not devoted to the effectiveness of communications or project management, however, it is important to note: never touch projects in which the final result is not defined and fixed in the ToR. One of the most reliable ways to cope with such uncertainty is to first build a prototype, and then, using new knowledge, to structure the rest of the task. That is what we did.

Formulation of the problem


In my prototype, I focused on one dish from the menu — the omelet — and built a scalable pipeline that determines the output of the omelet. This can be described in more detail as follows:



Input Images

The main goal of the pipeline is to learn how to combine several types of signals (for example, images from different angles, a heat map, etc.), getting a pre-compressed representation of each of them and passing these features through a neural network classifier for the final prediction. So, we will be able to realize our prototype and make it practically applicable in further work. Below are several signals used in the prototype:


General view of pipeline


I note that I will have to skip a few important steps, such as exploration data analysis, building a basic classifier and active labeling (I have proposed a term that means semi-automatic annotation of objects, inspired by the Polygon-RNN demo video ) Pipeline for Mask R-CNN (more about this in the following posts).

Take a look at the whole pipeline as a whole:


In this article, we are interested in the stages of Mask R-CNN and classification within the pipeline.

Next, we will look at three stages: 1) using Mask R-CNN to build masks of omelet ingredients; 2) ConvNet classifier based on Keras; 3) visualization of results using t-SNE.

Stage 1: Mask R-CNN and Mask Building


Mask R-CNN (MRCNN) recently are at the peak of popularity. Starting from the original Facebook article and ending with the Data Science Bowl 2018 on Kaggle, Mask R-CNN has established itself as a powerful architecture for instance segmentation (i.e., not only per-pixel image segmentation, but also the separation of several objects belonging to the same class ). In addition, it is a pleasure to work with the implementation of Matterport's MRCNN in Keras. The code is well structured, has good documentation and works right out of the box, albeit more slowly than expected.

In practice, especially when developing a prototype, it is crucial to have a pre-trained convolutional neural network. In most cases, the data scientist's set of labeled data is very limited, or not at all, while ConvNet requires a large amount of labeled data in order to achieve convergence (for example, the ImageNet data set contains 1.2 million labeled images). Here transfer learning comes to the rescue: we can fix the weights of the convolutional layers and only train the classifier. Fixing convolutional layers is important for small datasets, since this technique prevents retraining.

Here is what I received after the first pre-training era:


Object segmentation result: all key ingredients are recognized.

In the next phase of the pipeline ( Process Inferenced Data for Classifier ), you need to cut out the part of the image that contains the plate, and extract the two-dimensional binary mask for each ingredient on this plate:


Crops images with key ingredients in the form of binary masks

These binary masks are then combined into an 8-channel image (since I defined 8 classes of masks for MRCNN), and we get Signal # 1 :


Signal number 1 : 8-channel image consisting of binary masks. In color for better visualization

To get the Signal number 2 , I counted the number of times that each ingredient is found on the crock of the plate and got a set of vector-signs, each of which corresponds to its cropp.

Stage 2: The ConvNet Classifier at Keras


The CNN classifier was implemented from scratch using Keras. I wanted to combine several signals ( Signal No. 1 and Signal No. 2 , as well as the possible addition of data in the future) and allow the neural network to predict the quality of the dish. The following architecture is a trial and far from ideal:



A few words about the classifier architecture:


Stage 3: Visualize Results with t-SNE


To visualize the results of the classifier on the test data, I used t-SNE - an algorithm that allows you to transfer the original data into a space of a smaller dimension (to understand the principle of the algorithm, I recommend reading the original article , it is extremely informative and well written).

Before visualization, I took test images, extracted the classifier logit layer and applied the t-SNE algorithm to this dataset. Although I have not tried different values ​​of the perplexion parameter, the result still looks quite good:


Result of t-SNE operation on test data with classifier predictions

Of course, this approach is not ideal, but it works. However, there may be quite a few possible improvements:


Conclusion It is necessary, finally, to recognize that a business has neither data, nor explanations, much less a clearly set task that needs to be solved. And this is good (otherwise, why should you?), Because your job is to use different tools, multi-core processors, pre-trained models and a mixture of technical and business expertise to create additional value in the company.

Start small: a working prototype can be created from several toy blocks of code, and it will significantly increase the productivity of further conversations with company management. This is the job of a data scientist to offer new approaches and ideas to business.



September 20, 2018 starts the “Big Data Specialist 9.0” , where you, among other things, learn to visualize the data and understand the business logic behind one task or another, which will help you more effectively present the results of your work to your colleagues and management.

Source: https://habr.com/ru/post/412523/


All Articles