Machine learning and mobile development

As a rule, the data scientist has a vague idea of mobile development, and mobile application developers do not engage in machine learning. Andrei Volodin - Prisma AI engineer lives at the junction of these two worlds and told Podlodka, the leader of the podcast, what it is.

Taking advantage of the moment, Stas Tsyganov (Tutu.ru) and Gleb Novik (Tinkoff Bank), first of all, once and for all made it clear that no one was teaching neural networks on mobile devices . They also figured out that machine learning, unfortunately, is not a magician; discussed modern techniques like deep learning, reinforcement learning, and capsule networks.

As a result, since Podlodka is an audio show about mobile development, they came to her and learned how it all works for mobile devices.

Then the text version of this conversation, and the podcast recording is here .

About Andrei Volodin, cocos2d and Fiber2d

GLEB: Tell us a little about yourself, please. What do you do?

ANDREW: I’m a mobile developer, but I’m doing very little classical iOS development. My responsibilities almost do not include working with UIKit. I am the main developer of the Cocos2d game engine, which is quite popular on GitHub. At the moment I am in the position of GPU engineer in Prisma. My responsibilities include the integration of neural networks on video cards and working with augmented reality, in particular, with the VR-kit.

GLEB: Cool! Particularly interesting about cocos2d. As far as I know, this framework appeared quite a long time ago.

ANDREW: Yes, somewhere in 2009.

GLEB: Have you applied it from the very beginning?

ANDREW: No. I became the main developer only in 2015. Before that I was a core contributor. Apportable, which funded the development, went bankrupt, the people who received the development money went away, and I became the lead. Now I am an administrator on the forum, helping new users with some problems, the last few versions have been released by me. That is, I am the main maintainer at the moment.

GLEB: But is cocos2d still alive?

ANDREY: Already rather not, primarily due to the fact that it is written in Objective-C, and there is a lot of legacy there. For example, I support my old toys, written with its use, other developers - my legacy projects. From current engines you could hear about Fiber2d. This is also my project.

Fiber2d is the first Swift game engine that was ported to Android. We launched a game entirely written in Swift, and on iOS, and on Android. About this, too, can be found on Github. This is the next milestone in the development of the cocos2d community.

Pro machine learning on fingers

GLEB: Let's start moving gradually to our topic today. Today we will talk about machine learning and everything around it - connected and unrelated to mobile phones. To begin with, let's see what machine learning is all about. We will try to explain as much as possible on the fingers, because not all mobile developers are well acquainted with it. Can you tell us what it is?

ANDREW: If we follow the classical definition, machine learning is a search for patterns in the data set . A classic example is neural networks, which are now very popular. Among them are the networks associated with the classification. A simple example of a classification task is the definition of what is drawn in the picture: there is some kind of image, and we want to understand what it is: a dog, a cat, or something else.

To write it with the help of standard code is very difficult, because it is not clear how to do it. Therefore, mathematical models are used, which are commonly called machine learning. They are based on the fact that certain patterns are extracted from a large number of examples, and then, using these patterns, one can make predictions with some accuracy on new examples that were not in the original data set. This, if in a nutshell.

GLEB: Accordingly, training is a story about changing a model with the help of a training dataset?

ANDREY: During training, the model, as a rule, remains constant. That is, you choose some kind of architecture and teach it. If we take for example neural networks, by which all machine learning is not limited, there initially, roughly speaking, all weights are zeros or other identical values. In the course of how we feed our data to the learning framework, the weights change slightly with each new example, and at the end they translate into a trained machine.

STAS: The final destination of this model is to quickly get the result by submitting some data not from the training sample?

ANDREW: Yes, but it's not just about speed. For example, some tasks could not be solved in a different way — say, an example with classification is very nontrivial. Before the classification nets were fired, there was no solution to understand what is shown in the picture, especially not. That is, in some areas it is a revolutionary technology.

About manual labor and machine learning

STAS: I recently told my grandmother what machine learning is. She initially thought that machine learning is when a machine teaches someone. I began to explain to her that, in fact, on the contrary, we are trying to teach the machine to perform some task later.

I presented the problems that machine learning solves. Most of them, before machine learning fired, were performed by people. Moreover, it was considered not so low-skilled work, but not very high-tech, so to speak. These are the simplest operations that a person can perform in many respects. Can you imagine?

ANDREY: You can say that too. In fact, now such work is still needed, but only to prepare datasets for machine learning. Indeed, in some areas, for example, in medicine, machine learning makes it possible to smooth out routine tasks and ease the process somewhat. But not always. I would not say that machine learning is closed on facilitating dull work. Sometimes it does a pretty intellectual job.

STAS: Can you give an example of such intellectual work?

ANDREW: For example, our Prisma application - surely many people used it (this is not advertising!) It’s not that this is intellectual work and people often redraw the image into pictures, and the neural network does it - you give it a regular picture and get something new . Then you can argue about whether it is beautiful or not, but the very fact that it is something that a person cannot do, or it takes a tremendous amount of time is undeniable.

About history

GLEB: Yes, I think this is a great example. Probably worth a little turn to history. How long has this theme evolved? It seems to me that almost from the very beginning of programming, in any case, very, very long time ago.

ANDREY: Yes, in general, most of the concepts that are now being applied have already been developed in the 90s. Naturally, new algorithms have now appeared, and the quality of the then algorithms has improved. And although there is a feeling that a sudden interest in machine learning arose from nowhere, in fact people have been interested in it for a long time.

Progress in the first stages was due to the fact that these are mostly mathematical models, and mathematics has stabilized for a long time in terms of discoveries.

The current explosion is associated solely with the fact that the power of iron around us has greatly increased , primarily due to the use of video cards. Due to the fact that today we are able to do huge parallel computing, new technologies have appeared - machine learning, cryptocurrency, etc.

For the most part, the current interest and the current wave in general are due to the fact that this has simply become possible . These calculations could be done earlier, but catastrophically long. Now they take quite a reasonable time, and so everyone began to use it.

About iron

STAS: I am now passing a course, and there, including, I need to train all sorts of models. I teach some of them on my working MacBook. Yes, in some cases you have to wait, maybe 5 minutes, and the models are not the best, the average accuracy is around 85%, but most importantly - they work. It is clear that in battle I want to have this percentage better and possibly for production it does not quite fit.

ANDREW: Yes, such models are probably not very interesting. Most likely, this is due to the simplest predictions and other things. In reality, for example, our training sample can weigh 90GB, and this can all be taught for a week. Companies such as Nvidia, boast that now they have released a new special Tesla video card and you can train Inception V3 in 24 hours! This is considered a direct breakthrough, because earlier it took several weeks.

The more dataset and the more complex the structure of the model, the more time is spent learning . But the problem with performance is not only that. In principle, if you really need, you can wait a month. The problem is related to inference - how then to apply this neural network. It is necessary that during its use, it also showed a good result in terms of performance.

STAS: Because, among other things, I want everything to work on mobile devices and work quickly.

ANDREY: I don’t think that initially it began to evolve with a view to working on mobile applications. Such a boom began somewhere in 2011, and then all the same these were desktop solutions. But now the true interest of the community is supported by the fact that on iPhones, including, it has become possible to launch networks that work in real time.

GLEB: Stas, you said that the final result depends on how powerful your video card is and on the system in general. That is otherwise not working?

ANDREY: It is not, but I'm not sure that the model will be trained on a low-power machine.

GLEB: By the way, I remember, 5 years ago, when the neural network was just booming, our teachers said that everything new is just a good old oblivion. All this was already in the 70-80s and it will not work, since then it did not work out. Probably, they were still wrong.

ANDREW: Yes. For some tasks, machine learning is now very strongly fired. Objectively, we can say that they work.

About deep learning

GLEB: There is such a fashionable thing - deep learning. How does it differ from what we talked about for this?

ANDREY: I would not say that there are differences. There are just some subsets of machine learning, and a huge amount of them. It must be understood that what is called deep learning is that part of machine learning that is commonly referred to as neural networks . It is called deep because there are many layers in neural networks, and the more layers, the deeper the neural network is. From this came the name.

But there are other types of machine learning. For example, machine learning, built on trees, has been successfully used for face tracking so far, because it is much faster than neurons. It is also used for ranking, advertising and other things.

So deep learning is not something else. In fact, this is a subset of machine learning, which includes a lot of things. Just deep learning has become the most popular today.

About the theory of neural networks

STAS: I wanted to talk a little about the theory of neural networks, I will try more simply. You said that there are many layers in them. In theory, if we have one layer, and there are some objects located on a plane, with the help of one layer we can actually divide this plane into two parts, right?

ANDREW: No, not really.

STAS: What gives us a large number of layers, if on the fingers?

ANDREW: What is a neural network? Let's just break it down. It's just a mathematical function that takes a set of numbers as input, and a set of numbers also gives an output — that's all.

What's inside? Nowadays, the most popular are convolutional networks, within which convolution occurs - just a lot of multiplications of matrices against each other, the results are added, these operations are performed in each layer. Plus between the layers there is a so-called activation, which just allows neural networks to be deep.

Since the combination of linear transformations is a linear transformation, having made 10 linear layers, they can still be represented as one linear layer. In order for the layers not to collapse, there are certain mathematical operations between them that make the function nonlinear. This is necessary in order to increase the number of parameters.

Roughly speaking, a neural network is simply a huge array of numbers, which are then somehow applied to our data, for example, to a picture. But the picture is also a set of numbers in fact - it's just a series of pixels. When we train the network, we consider, for example, 15 million parameters (each number is a separate parameter), which can be slightly shifted slightly with the help of some heuristics, slightly to the right. Thanks to such a huge number of parameters such steep results are obtained.

Deep learning is needed precisely to ensure that these parameters were many, and everything did not collapse to one layer.

GLEB: It seems more or less clear.

ANDREW: Deep learning is a subset of machine learning. But for some reason, HYIP was raised on this topic - especially some time ago, from all the cracks, it seems to me, one could hear about deep learning. I do not know whether it is justified or not.

GLEB: I think that such popularity is due to the fact that it gives impressive results.

About tasks

STAS: With neural networks, most machine learning tasks can be solved, right?

ANDREW: Yes.

STAS: Let's talk then, what tasks can be solved using machine learning methods?

ANDREW: Actually, this is a sensitive topic, because in reality you need to stop idealizing and romanticizing what is happening. As I said, there is no artificial intelligence. This is a purely mathematical model and a mathematical function that multiplies something, etc.

It seems from the outside that now machine learning has stalled a little on certain categories of tasks. These are, for example, classification (an example about which we spoke at the very beginning), tracking of objects and their segmentation. The latter is in our application Sticky AI - it highlights the person, and the background removes. There is also biological medical segmentation, when, for example, cancer cells are detected. There are generative networks that learn from a set of random numbers, and then they can create something new. There are tasks Style Transfer and others.

But at the moment there is no convenient platform and infrastructure for the use of machine learning. For example, you have some kind of problem that you, as a person, solve easily, but as a programmer, you cannot solve it because of the complexity and because you cannot simply write an imperative algorithm. But at the same time, it is not possible to train a neural network either, primarily because there is a problem with a lack of data. In order to train a neuron, we need large datasets with a multitude of examples, moreover, very strongly formalized ones, described in certain regulations, etc. Plus, we need the architecture of this neural network.

That is, you first need to formalize the input data in the form of numbers , make the model's architecture itself, then formalize the output data in the form of numbers, somehow interpret them. To do this, you need a fairly powerful mathematical apparatus and in general an understanding of how everything works. Therefore, now the use of neurons, as it seems to me, outside specialized companies like ours sags a little.

Some problems that had not been solved before, neurons learned to solve very coolly. But there is no such thing that the neurons came and solved the whole range of unsolved problems.

GLEB: In what areas do you see global tasks for which neural networks are not suitable at all?

ANDREY: It's hard to answer right off the bat. We meet tasks on which we work and on which it is impossible to train a neural network. For example, now the game industry is very interested in learning, and even there are some neurons that have artificial intelligence. But, for example, in AAA games, this is not yet used, because it is still impossible at this point to train the artificial intelligence of an abstract soldier to behave like a person so that it looks natural. Complicated.

About Dotu

STAS: Have you heard that artificial intelligence is already winning at DotA?

ANDREW: Yes, but it is still somewhat different. DotA is a pretty mathematical game, it can be described. I do not want to offend anyone, but in fact, like checkers, the game is one and the same. There are certain rules, and you just play them.

But while there are difficulties to create some kind of natural behavior, associated primarily with a small amount of data and a small number of engineers who know how to do it.

For example, in Google, engineers are trying to teach a 3D model of a person to walk using neural networks - just to make it move. It always looks awful, people don't walk like that.

About TensorFlow

STAS: You said that now, in fact, there is no easy and cheap way, without understanding machine learning at all, to solve machine learning problems. In any case, it turns out that it is necessary to fumble. I would like to know about TensorFlow. It seems that Google is trying to make it so that even people who do not really understand all this and who do not have a very large background can solve some simple tasks. Tell us what TensorFlow is and how do you think that is possible?

ANDREW: Let's go in order. TensorFlow is not really the easiest thing of all . This is one of the most popular so-called learning frameworks - a general-purpose learning framework, not necessarily a neuron. This is not the highest level existing framework. There are, for example, Keras, which is a higher-level abstraction on TensorFlow. There you can do the same with much less code.

Typical tasks are solved quite simply, because, in particular, Github is already full of examples and repositories. For example, if your company does a search for pictures for a bookstore, then you, in principle, everything is fine. You go to Github, there are examples of how to take features of a picture, you write a search on features - everything is ready!

In fact, this is the case with a large number of tasks. If you can shift your problems to already solved or so problems in machine learning in typical ways, then you are cool. But if you have something directly unique, such that it is impossible to code, and you also need something that is called artificial intelligence, then I would not say that TensorFlow is the simplest way. To understand it, you need to spend a decent amount of time.

STAS: I understand, on the Google side, what they offer along with the framework and power plays?

ANDREY: Yes, they are very actively developing now, and in general they are striving to make everybody in the community to rummage through neurons. They have a platform on which, if you are ready to share your results, they will give you servers and stuff. The only condition that you have to share this with everyone, that is, the platform is not intended for commercial use.

About problems

GLEB: Let's go over the problems that are currently in the field of machine learning. What is most relevant? First is the data.

ANDREW: Yes. Recently, I even saw a meme - a sad picture and the signature “When I found a cool architecture, but there is no dataset to train it”. Indeed, such a problem exists.

GLEB: Well, but let's say we have a large dataset, where the model is trained for a week. How to solve problems with time? Roughly speaking, for us it is like a compilation, only compilation is at worst 15 minutes on some hard iOS project, usually still faster. There is the same situation, only two weeks already. If to adjust, this time will be gone.

ANDREY: Two weeks are probably the realities that are already gone. Now it is a day - two - three. But your fears are taking place. Indeed, it happens that the work of R & D is that they think: “Oh, let's try this!” - they change something, start training, and after two days they find out that the metrics fell by 0.5%: “OK , it didn't work, let's try this! ”And they wait again. Such a problem does exist, and one cannot get away from it.

STAS: I would like to return to the story about my poor laptop, about the fact that, after all, the coolness of the model and the success of its predictions are probably related to the time of training. Is there such a correlation? I will give an example. You can first make sure that your model more or less solves your problem - say, 70%. After that, you understand that everything will work better for you if you add the number of signs. The more signs, the longer everything will be short. Let you run for a day, but almost you will be sure that the model will work better.

ANDREY: Yes, this optimization works, rather, in the early stages. When you improve an existing model, it does not work that way. Just in order to realize some advantage, we sometimes have to retrain the model from scratch. This is a really time consuming process.

STAS: I felt that all my examples are at the second grade level.

About the work of developers

GLEB: I remembered another well-known picture about the compilation. And then the network is trained.

STAS: If training takes a whole day, what are the developers doing at this time? In the morning I came, made coffee for myself, sent me to study, went home - will we meet tomorrow?

ANDREY: In general, the work related to machine learning itself is not very similar to the classical development. I do not practice it myself, but I can tell from the experience of my colleagues.

First of all, these are people who have a very strong mathematical background . Basically they graduated from the faculties of applied mathematics, not computer science. Their professional trait is that they constantly read scientific articles and are constantly looking for new hypotheses.

In reality, this work is more towards science. To program even a rather complex neural network, you need to write about 500 lines in Python, and quite the same type. These people usually do not feel like classic coders. They have no repositories with branches and all that. They operate on several other categories, and their work is different. Often they do very hardcore things, they always have something to do while the neuron is learning.

This is from what I see from the outside. I'm still more a developer, I write more code and integrate the results of their work into mobile applications. But I can say with confidence that this is quite significantly different from the classical work of a programmer.

STAS: Does it mean that, ultimately, the learning model with Python is rewritten to something faster, for example, in C ++? To optimize the code just so that learning happens faster.

ANDREW: No, not quite. Usually, a model is trained on some learning framework, say, on TensorFlow, and as a result, a model is obtained in the TensorFlow format. Then there are several options for how to run it.

The first option is to run it where TensorFlow itself can run. To do this, you need to compile the TensorFlow kernel - this is a static library of about 1 GB - put it where we want, and run it.

Naturally, this option has limitations. For example, on iOS it is difficult and long. Therefore, most often there are so-called converters that can do from models of different learning frameworks, it does not matter whether it is Caffe, Torch, TensorFlow, etc., to get weights that you can then apply and use in your production.

Although this is also already gone realia, but literally a year and a half ago, the work of developers like me was as follows. R & D have trained some kind of neuron, it turned out a model. They took out weights from it and wrote them down simply in buffers from real numbers. Then they are given to the developer (me) and they say: “Zakod this neuronka!” - and you write all these layers, then load the weights from the buffers to them and run it all. But no one rewrites the neural network itself to C ++.

That is, these are two separate stages:

Learning, which is an abstraction. That is, no matter what framework it is executed on.
Using a trained model in many ways.

About getting datasets

GLEB: Let's go back to the actual problems. We talked about data problems and started talking about engineers.

STAS: Yes, tell me how to solve the problem of getting datasets? Won't you yourself sit and mark a million examples, right?

ANDREW: Unfortunately, this is exactly how it works. The only solution, and the other does not exist and will not exist for a very long time - this is manual labor. Usually it looks like this: there is an army of freelancers who are given a million photos, they visually determine: “So, in this photo is a cat, on this one is a dog”, in some format this is fixed and given to companies. There are special exchanges, for example, Yandex-Toloka, where you can leave the task to mark the data for some fee. Amazon also has its own stock exchange, which mainly employs Indians. This is a pretty big industry.

Now this is solved only by manual labor and investment. In order to collect dataset, you need a decent amount of money to pay for manual labor and then use the marked data.

STAS: I remembered the story about Google and their captcha, where they recognized the photos for Google Street Maps at the expense of users. It turns out, it is also from there.

ANDREW: Yes, sometimes companies cheat and somehow use you as freelancers.

About the profession Data Scientist

GLEB: And what about the engineers? I want to understand how things are going on with the market in general - there are not enough of them, too many of them, where to get them?

STAS: I bet on what is missing.

ANDREW: Yes, there is a certain shortage, but not even of engineers, but rather a data scientist. It is not enough people who can do something cool. Now this is an overhead theme. In the same HSE, all the departments with machine learning are overcrowded, everyone wants to study them, every second has a machine learning diploma. It is everywhere - even on economics already!

But it is not enough hardcore people, they are immediately snapped up. If you show even minimal skills, then just immediately leave for Mountain View, because they are waiting for you there. This is really true.

STAS: You now threw more coal into the hype train.

ANDREY: It will probably come to naught sooner or later, but now the specialty data scientist is consistently popular. If you do cool stuff, then companies are willing to pay cool money.

GLEB: There are some stereotypes of how programmers look. Is there any difference for data scientists?

ANDREW: I would not say. But they are not like us. The programmers are mostly very meticulous, they like everything to be on the shelves, the Git-repository is perfect and all that. Data scientist is still more a mathematician, the main thing for him is to find a solution to the problem. They don't have such strange things as code review, unit tests - they don't do all this nonsense. They use programming only as a tool, and their main activity is more intelligent.

STAS: Just wanted to say that they do not have a tool cult, as many developers have.

ANDREW: Yes, and they change frameworks almost every week, because as soon as something new appears, for example, Kaffe 2, PyTorch - they try it! They do not have this: "I am a Data scientist at TensorFlow."

About GPU engineers

The main problem is actually with developers like me. It seems to me that it is difficult to find a company that was interested in a developer who writes on Swift, but at the same time is engaged not in the UI-kit, but in hardcore pieces. Honestly, even on the move, I don’t know who in principle can have such vacancies.

It is clear that these are some technological startups, but startups cannot take everyone. In this sense, I really value my work, even if it is not so profitable at times. There are few engineers who can do this, but the demand is very small. Therefore, there is a trade-off between interesting projects and work at an enterprise.

If you are interested in working on something unique, more complex than screen layout, etc., you should definitely look for such vacancies. I think over the years there will be more of them, because the market is growing. But I still have little idea where you can get.

ANDREY: In this sense, I really value her very much, because such work is quite difficult to find.Just usually, GPU development is needed, after all, mostly by gaming companies. For you to be a relatively classic developer, but to deal with hardcore things - in this sense, the labor market is very small and narrow. I do not think that you can immediately get a job on Junior GPU engineer or trainee. You need to first figure it out yourself, gain experience, and come to the company already ready to solve problems on your own.

Pro glands

STAS: It’s time for the glands - the lack of glands.

GLEB: Because everyone has been disassembled into bitcoins, do you want to say?

ANDREY: Yes, Sbertech recently stated that the entire shortage of the video card market was due to them.

GLEB: And why should they - for machine learning or for mining?

ANDREW: I don’t take it to say exactly, but, in my opinion, it’s still for training.

GLEB: In general, is there a problem that not everyone can afford the necessary equipment?

ANDREY: Usually nobody does this at home. Most often it is a rented server, say, on Amazon, which has an elastic payment depending on how many resources you use. In reality, you can do machine learning with Macbook Air. That is, you simply start the calculation on the server through the terminal - somewhere far on the hot video cards it is considered, and then you download the result, that's all.

In reality, nobody teaches anything on laptops, and Nvidia Titan is also not worth anyone’s home, because it’s not profitable. Every year there are new video cards, mostly all on servers.

About testing

GLEB: How are things going with testing? I have a friend who works at Nvidia and is engaged in testing the performance of programs that train networks. That's all I heard about machine learning testing. What processes in this sense are there?

ANDREW: Actually now, for the most part, it looks like an analogue of debugging with the help of many print from our world, that is, when you do it a little by hand. But recently there was a NIPS conference at which they said that at Stanford the guys developed a repository of neurons and models that read the metrics from several different iterations and variations. They can be compared, look how and what changes. That is, now there is something, but the infrastructure is still somewhat damp - there are a lot of heterogeneous tools that do not work well with each other.

But there is progress. The ONNX standard is now being launched to describe neural networks in a unified style, which has already been accepted by many companies. But still the infrastructure is damp. As far as I can tell, a lot is being done manually. That is, to test, you need to run and see if it works or not. It is clear that some metrics can be considered, but sometimes there is subjective testing on the eye . It still has a place to be.

About benchmarks

STAS: I thought about what benchmarks could be used. I correctly understand that when you teach a model on a specific dataset and on a specific hardware, you can judge how much time it will chase?

ANDREY: No, it does not affect anything at all. Both untrained and trained model will work one amount of time. The trained model differs from the untrained one only by the numbers inside, but the number of the numbers is the same. If you specifically operate in terms of neural networks, then the speed directly depends on the thickness of the model - how many layers, how thick these layers are, etc.

There are lightweight neurons that remember not so many parameters. There are heavy ones that remember them more. It is always a compromise - there is no such thing that you take twice as many layers, and your result is twice as good. This is a non-linear correlation. It is always necessary to find a compromise so that for one set of data the network is not too thin, learn quite a few criteria and learn correctly. But at the same time it should not be too thick, because the amount of data is still not enough to use everything.

About reinforcement training and go game

GLEB: We have just talked about the most general aspects of machine learning. But there are, especially the last couple of years, breakthroughs, for example, AlphaGo. I thought it was a deep learning, but as it turns out, correctly called this reinforcement training.

ANDREY: Reinforcement training - training with reinforcements. This is a kind of training that works a little differently. Suppose you are teaching a model, for example, looking for a way out of the maze or playing checkers. You take your algorithm, your structure and put it in this environment and predetermine a set of actions that can be performed in this environment. The model enumerates the options, conditionally trying to go left or right. Reinforcement training lies in the fact that every time a car performs an action, you either beat it or you say: “All the rules!”

Indeed, there was an AlphaGo algorithm — who does not know if there is such a game of go. For a long time, it was the only game in which the machine could not beat a man. There are so many combinations that it takes a lot of time to figure everything out. A person in the literal sense of the word is not engaged in enumeration of options, so he can cope with the search for moves faster. The AlphaGo algorithm based on a neural network trained with the help of reinforcement training became the first algorithm that was able to beat a professional go player.

Of course, reinforcement training does not develop so that the machine can win go, there are more important cases, and this is a trend topic in machine learning, which is now actively developing.

GLEB: If I do not confuse anything, the order is ^10,170combinations are astronomical numbers. I watched when the very first match of AlphaGo vs. Lee Sedol was. It was not the coolest player in the world, but one of the strongest. When they beat him, the whole community, of course, drooped. There was a feeling - damn it, they still got to us!

When they played for the second time in a year, the program became completely abnormal, she studied with herself without the initial data, without any initial training. In addition, it has become many times more powerful than the first one, and now it can run on one computer, and previously a distributed network of machines was required. In short, we finally won!

About genetic algorithms

STAS: I want to ask about genetic algorithms. According to the description, it seems that genetic algorithms can also be attributed to learning with reinforcement. As I imagine them, there is a generation, we take each individual subject in a generation, it performs some task, we evaluate its actions, and then, based on these estimates, we select the best ones. Then we cross their specific properties, create a new generation, add a little mutation, and now we have a new generation. And we repeat these operations, trying to increase the final utility of each specific member of this generation. It seems that it is similar in meaning. Is this considered a reinforcement training or not?

ANDREW: No, genetic algorithms are still somewhat different.

STAS: Do they belong to machine learning?

ANDREW: I would not say that. I will not undertake to assert now, but we went through the genetic algorithms at the university, like everyone else, and it seems to me that this thing is somewhat simpler and more ad-hoc, or, in short, imperative. That is, we know in advance what input will be, output will be. In machine learning, however, things are somewhat different: there is some probability, the accuracy of predictions and everything in this spirit.

Perhaps, people who understand terminology better than me will correct me, but I would say no from top of my head .

STAS: It turns out that genetic algorithms for solving most real-world problems are not used?

ANDREW: Yes, they are mostly more algorithmic and I rarely met them in practice.

Pro capsule nets

GLEB: There is another subset of machine learning - the so-called capsule networks. Again, we will not go too deep. Tell literally in two words what it is and why is this trend now?

ANDREY: This is just a super new topic, it is only a few months old. Jeffrey Hinton released the article and said that current convolutional networks are a road to nowhere, and we offer a new vision of how this will evolve. The community accepted this statement ambiguously and divided into two camps: some say it is an overhead, others say a big thing and all that.

But if you explain it right on your fingers, how do convolutional networks work? Take, for example, the neurons that work with images. There is a convolution - a column of matrices that runs through the picture with a certain step, as if scanning it. At each iteration of such a small step, this whole convolution is applied to this piece, and each convolution turns into a new conditional pixel, but with a much larger dimension, this operation is repeated for all the grids.

But the problem with convolutional networks is such that all the data that arrive on the first layer reach the very end - maybe not in full, but they all influence and all reach the final stage. Roughly speaking, if you need to identify some part of the image, for example, one cat, you do not need to scan the entire image. It is enough to locate at some point the zone where the cat is most likely located and treat only it as a person does.

This is how capsule networks work. I’m not going to expertly explain their insides, but from what I’ve understood: inside the capsule nets there are certain trees, and each subsequent capsule accepts only relevant data as input. That is, everything that we initially accepted as input does not pass through them, but with each new layer (I don’t know how to speak in the terminology of capsule nets) only the data that is really needed — only important pieces of data are processed. . This is the key difference between convolutional and capsule networks.

GLEB: It sounds interesting, but I don’t quite understand - is it just about the images in question?

ANDREW: No, that's about everything. I used images just to explain. The key idea is this: let's not chase all the data and all the features, but only those that are relevant to the next layer.

More about games

STAS: I heard that after AlphaGo guys are going to beat everyone in StarCraft?

ANDREY: I’m forced to disappoint you, but I don’t really follow that. It’s not that eSports is interesting to me, but it is already becoming clear that this is where the future lies. For example, there are already startups that teach how to play DotA. They, as a personal trainer, analyze how you play, and say, where you are not good enough, they have their data trained in cyber sports. There are startups for bets who predict who will win and so on.

A lot of people now work in this area, primarily because a lot of money is spinning in it. But for me personally, this is simply completely uninteresting, so I do not follow news and trends, unfortunately.

STAS: What do you think is the difficulty in creating good artificial intelligence for strategic games? I understand correctly that basically this is a very large number of options?

ANDREW: Yes. In fact, we have already discussed this moment, when I explained that in AAA games artificial intelligence is still not used, but it is also in AlphaGo and, possibly, somewhere else.

For all its complexity, the game of go consists in the fact that at each step you simply put the piece in order to outline the stone, and the game StarCraft is a very complex thing. There you can send your units along a virtually unlimited number of trajectories, build different sets of your constructions, etc. All of these are parameters.

Plus the difficulty lies in the fact that neural networks do not always think like a person. When we, for example, build a unit, we remember it. But many neurons run every time. Of course, there are recursive networks that can remember their past achievements. They, in particular, are used for translation and textual information, when, as the sentence is generated, the neuronka uses more and more data.

There are enormous difficulties here because all the information and options need to be formalized, that is, to find such a dataset for learning so that it still somehow adequately responds to the actions of your opponent, which can also be a million, unlike the game or chess.

STAS: It is clear - a lot of parameters.

GLEB: But what I don’t understand, it’s clear that DotA has fewer parameters, but it’s still about the same in the sense that it’s sent anywhere, etc.

STAS: Andrew here reduced to the fact that, first, you have one unit and the number of options is much smaller.

ANDREW: To be honest, I have never played the second Dot ever, but in the original, as far as I know, this is a super deterministic game. There are 3 corridors and towers that need to be destroyed.

GLEB: Yes, but in StarCraft, although I don’t play at all, there are also some ways and the same units. You say that there are many of them, but most likely they are always driven in packs. That is about the same thing happens.

STAS: You still need to arrange each unit correctly during the battle. At that moment, when they are not driven in bundles, but are being set up, the parameters immediately become larger.

ANDREY: Your problem is that you think in such categories: put a unit, etc., but all the time you forget that a neuron is just a matrix — numbers that multiply. There you have to formalize, for example, such things as tasks. Let's say there is a map for StarCraft and there is some kind of task on it - it doesn’t matter whether to defeat a player or something else. All this needs to be presented in the form of mathematical primitives, and it is this most difficult.

If it really was artificial intelligence, the gap between Dota and StarCraft would be minimal. StarCraft, maybe a little more complicated in mechanics, but still about the same. But due to the fact that we operate with numbers, it is more difficult to formalize.

Pro network learning

STAS: I have the last question I want to ask before we go to our mobile phone. I do not know how it is properly called, but there is a way, when one neural network essentially follows the other and tries to find patterns.

ANDREY: I don’t undertake to explain now how it works, but I know for sure that there are supercool algorithms that I hear about at work sometimes, when two neural networks learn at the expense of each other. This area of expertise is completely unavailable to me, but it all sounds cool. As far as I know, this is used for generative networks. More, unfortunately, I can not say.

STAS: Good. You gave the most important keywords, the rest of Gleb and readers easily googling.

About mobile phones (Apple)

GLEB: Let's move on to mobile phones, to which we have been going for a long time. Firstly, what can we do when we talk about machine learning, on mobile devices?

ANDREW: By the way, do you have a podcast for iOS developers?

GLEB: We are not an iOS podcast. Yes, Stas?

STAS: Yes, for mobile developers. What is the question?

ANDREY: Just because the situation is very different. Apple, due to the fact that it has always been good with the integration of software and hardware, and is famous for this, very elegantly hooked on a hype train with machine learning.

In 2014, Apple introduced the Metal Metal API. Such things as computer detectors, etc., were sewn into it. With the arrival of iOS 10, all this allowed us to include a lot of layers, activations and other operators from neural networks, in particular, convolutional neural networks, into the Metal Performance Shaders framework.

It gave just a huge boost, because, as a rule, calculations on a video card are several times faster than on the central processor. When Apple was given the opportunity to read on mobile video cards, and quickly, it was not necessary to write their mathematical operators and so on. This is a very strong shot. A year later, they released CoreML (we'll talk about it a little later).

Apple had a very good foundation. I don’t know if they had such a vision, or so it was, but they are now objectively leaders in the machine learning industry on mobile devices.

About mobile phones (Android)

What works relatively cool and great in realtime on iOS, unfortunately, does not work as cool on Android. This is not only due to the fact that Android sucks. There are still other factors - first of all, the fact that Android has a very diverse infrastructure: there are weak devices, there are strong ones - you can't fit everything.

If Metal is supported on all iOS devices, then it is already more difficult on Android - OpenGL of one version is supported somewhere, somewhere else, is not supported at all. Somewhere there is Vulkan, somewhere it is not. All manufacturers have their own drivers, which, of course, are not optimized in any way, but simply minimally support the standard. It even happens that you run some neural networks on Android on a GPU, and they work in speed just like on a CPU, because working with shared memory is very inefficient and all that.

On Android, things are bad now. This is rather surprising, because Google is one of the leaders, but sags a little in this regard. On Android, there is a direct lack of a qualitative implementation of the capabilities of modern machine learning.

For us, for example, even in the application, not all features work the same. The fact that iOS is fast, Android is slower, even on comparable power flagship devices. In this sense, at the moment, Android as a platform sags.

About CoreML

STAS: Once told about CoreML, it would probably be correct to say about TensorFlow Lite.

ANDREW: CoreML, in fact, is a dark horse. When it came out last year, they all said first: “Wow, cool!” But then it became clear that this was just a small wrapper over Metal. Companies that are seriously engaged in machine learning, including ours, have long had their own solutions. For example, our solutions in tests showed better results than CoreML in terms of speed and other parameters.

But the main problem of CoreML was that it could not be customized. Sometimes it happens that you need a complex layer in the neural network, which is not, for example, in Metal, and you need to write it yourself. In CoreML, it was not possible to embed your layers, and so you had to downgrade to Metal on the lower level and write everything yourself.

Recently, CoreML added this, and now this framework has become more interesting. If you are a developer who does not have anything related to machine learning at all in the company or in the application, you can launch a neuron right in two lines and quickly run it off on the GPU. The results, which show the performance tests for CoreML, are comparable to custom solutions and bare Metal.

That is, CoreML works quite well. It is a little damp, it has bugs, but every month it gets better. Apple is actively rolling out updates - not the way we are used to, that updates from Apple’s frameworks are released once a year or on half major iOS versions. CoreML is actively updated, in this sense, everything is great.

TensorFlow Lite provides a converter in CoreML, CatBoost also supports a converter in CoreML. In short, Apple did everything again, as it should. They released an open-source converter and said: “Let's write all the converters in CoreML” - and many learning frameworks supported this.

At first there was some skepticism about CoreML, at the last WWDC the most frequent question for CoreML developers was: “Why do not you allow downloading models from the Internet? Why don't you allow them to encrypt? ”It was possible to get these models, and, it turns out, to steal intellectual property.

Now all this has been repaired, functionality has been added, and currently CoreML is exactly the leading platform in this sense.

STAS: Can you elaborate on this? It turns out that now you can no longer store the model, but simply download it from somewhere?

ANDREW: Yes, it is already possible. Previously, when we asked about this, the developers smiled and said: "Just see the headers." There really were designers to whom it is possible to transfer files and everything will gather.

But CoreML models inside are made quite interesting. They are actually ordinary binaries that store weights, but plus they generate swift files, which then create implicit classes. You use these classes in your application, and compilers compile this model into some files.

Now, using certain hacks and approaches, you can make it so that this model will become portable. You can protect your intellectual property by encrypting and lighten the weight of the application.

In general, CoreML is now moving in the right direction. Not everything can be done legally from the point of view of App Review, not everything can be done easily, without hacks, but it is noticeable how the developers improve the framework.

STAS: Cool! I wanted to add that CoreML looks like a typical solution. Relatively speaking, it is convenient when you want to do something simple using machine learning in your application. It seems that if this is a typical task, then Apple tried to make it as simple as possible all this way, if you find a ready-made model, datas and so on. This is just a story about a typical task, because, probably, everything is ready for them.

ANDREY: For typical tasks, this is generally super! Without hyperbole - there really need two lines of code in order to run the model. In this sense - yes, this is very cool, especially for indie developers or companies that do not have an R & D department on their staff, but also want to add something cool.

But this is not so interesting, because typical tasks were solved on Github and with Metal - you could just copy this code to yourself and compile - albeit a bit more complicated.

It is important that now this framework is moving not only in the direction of classic everyday tasks, but also towards complex solutions. This is really cool!

About mobile training

GLEB: You say that after the appearance of Metal it became possible to train models on mobile phones?

ANDREY: No, it was never possible to train on mobile phones. It does not make sense, you can only run. If I said so, I made a reservation. On mobile phones, of course, no one teaches.

STAS: I also did not hear anything about learning on the mobile phone.

GLEB: I did not hear it either, but I was thinking about it. Of course, it seems intuitively that this is a strange thing. But there are definitely no interesting problems, when would this be relevant?

ANDREW: It's hard to imagine them. If there is something like that, then only distributed learning. There are even scientific articles on how to do this, but I understand that you are asking how to learn from the data collected on the same phone? Simply, even if you collect so much (which will not happen), it will take so long to learn that it will never end, and no one will port the code for learning to mobile platforms, because why? Training always happens on servers, and inference on devices.

STAS: But ultimately it turns out that way. If you are a company, you want to have something like that, you need data, and you can collect it from your users, that is, periodically load it yourself.

ANDREW: Yes, but it works a little differently. You collect data from all users on one hot spot on your hot server, you train there, then send back the finished model. But not so that everyone has something to teach.

STAS: On the other hand, the mobile phone would be heated - and in the winter it would be relevant, but very, probably, for a long time.

About mobile phones and the future

GLEB: Are there any other interesting things in terms of applying machine learning to mobile devices? We talked about what we already have now. It would be interesting to look a little bit into the future - so that we generally want to get on our mobile platforms for some super products, super solutions.

ANDREW: Now, oddly enough, performance is a bottleneck - because a lot of what we want to run doesn’t pull iPhones. Of course, you need to wait some more time, when it will be possible to run more complex tasks.

There are some problems with realtime. For example, even in our flagship application, video streaming with a style-transfer does not work with all styles, because it is too long and labor-intensive. There are bottlenecks associated with the level of development of iron.

In fact, CoreML is developing very strongly. In the future, I think he will be fine. Most of all I want the industry to calm down, to start standardization: more common formats, converters, conventions - more things that work equally well on Android and iOS, because for business it is very important. We often have such that we cannot integrate a cool feature just because we cannot roll it out only on iOS or only on Android.

It would be great, it would be good for everyone if active, healthy competition began, so that it would work great everywhere - both on Android and iOS, so that Github would stop fevering from endless learning frameworks. Now this is some sort of obscurantism - even Uber has its own framework called Horovod. Apple has its own framework - everyone just has their own framework, some have several. It seems to me that this all increases the entry threshold, the difficulty of converting, including to a mobile phone - so in the future I want a steady improvement and development of everything in a row.

I think that there will be no revolution in the near future. I am not an expert, and, perhaps, I have no right to say such a thing - but from what I see, something new is not foreseen. I just want a steady improvement of what is now.

About learning machine learning

GLEB: What do you advise to people who are not particularly in the subject but want to try? What to read, see how to enter this topic? Stas has already talked about courses from Yandex and MIPT.

ANDREW: If you asked my partners, they would answer you - read Bishop (M. Bishop. Pattern Recognition and Machine Learning. Christopher. Springer. 2006), some more complicated books. But due to the fact that I have a mathematical background, I worked for a long time with 3D graphics, where there is also a lot of linear algebra, I am more or less savvy. Anyway, this is quite an integrated approach. You are very lucky if you do it at the university - go to these couples, listen - and you will not regret it.

But if it so happened that this moment is already missed, or you entered another department or faculty, then I definitely recommend self-study. There are already courses that have become the de facto standard, for example, machine learning courses from Andrew Ng on Coursera. It does not even make sense to give a link to it, because it is the first everywhere in all lists.

Definitely, you need to go through several such courses in order to understand how it works inside at least at the level of sensations, to try to look at your unpretentious models - start at least with letter recognition on the MNIST dataset. It's like Hello World, only in the world of machine learning.

Probably, this is not the area where you can create a project, enter, poke something there and see what happens. Here we must nevertheless approach more fundamentally, master the set of knowledge that is necessary for the foundation, and then increase the expertise.

STAS: And after the courses?

ANDREY: After the courses there are advanced-courses from Andrew Ng! There is, for example, the portal Kaggle, where there are competitions in which you can participate and try to train. When you are already a little savvy and can teach classical architectures, from this moment you have to read already semi-scientific or scientific articles, to understand the intricacies - if you see yourself in the role of Data Scientist.

If you are a mobile developer who wants to touch this, probably this level will be enough for you. I have mastered at about this level, and I don’t go further - I don’t need to go into the R & D process itself. I am doing exactly what the developer is doing. But, all the same, I needed a background. At first, the team taught me, conducted seminars with the basics for programmers, so that we could penetrate quickly.

But gaining a minimum level of knowledge is quite simple. You need to go through several courses, try yourself at Kaggle, do something else - and in principle you will be ready to solve 90% of existing problems.

Results

GLEB: Let's break the line then. It seems to me that we quite concisely and at the same time clearly and interestingly discussed what is around machine learning, at least what our programmer pens have reached for us.

We have learned that machine learning is simply the multiplication of matrices. Of course, not quite, but yes - this is a mathematical model with no magic inside.
We remembered all sorts of fashionable things like deep learning.
We discussed more modern techniques like reinforcement training.
We talked a little about capsule nets.
We discussed what are the actual problems of machine learning, namely, that it is very important to obtain data for good learning.
We talked a little about the market and that there are few engineers. The demand, however, is also not very big, but still they are waiting in Mountain View!
As a result, we slipped to our favorite topic - mobile development, and learned that CoreML is great, it is developing very quickly.

Many thanks to Andrei Volodin for telling us all this.

By the way, Andrei is planning to make a detailed report on this topic at AppsConf 2018 , which will be held on October 8 and 9 in Moscow.

The program committee has already received more than 80 applications , but Call for Papers is still open - submit applications until 3 August . Let me remind you, we are waiting for hardcore, applied and in places HYIP reports and conduct a rigorous selection.

Source: https://habr.com/ru/post/416477/

All Articles