Researcher of the Sberbank AI Laboratory - on the tasks of Data Science and RnD

Neural networks are not only entertaining Prisma and FindFace. Today, machine learning and Big Data are capable of solving real business problems. About the new technologies in the B2B sector, the former head of Data Science at MTS, Dmitry Babaev, the developer of the auto-completion algorithm in the Yandex search engine, knows.

Now he works as a researcher in the laboratory of artificial intelligence in Sberbank. Unfortunately, most of the development of the bank is a commercial secret, but the specialist readily told everything that was permissible.

Interviewer: Daria Kozlova
Respondent: Dmitry Babayev

What Russian companies have their own laboratory of AI and Big Data?

In Russia, few companies have similar laboratories, since this is largely an academic history. In Yandex, there is definitely a research unit. They are also available in foreign companies - for example, Google, Microsoft, Facebook. In large and some medium-sized companies, there are Data Science units, but they usually do not pay much attention to theoretical studies.

What do you think, what tasks are they trying to solve?

In units of Data Science solve the problems that are needed by the business. Before taking on a task, they assess the economic effect of its implementation, on the basis of which they decide whether to do something or not. And in research units this is easier - the potential benefits of solving research problems are already considered to be quite high.

Any examples?

The usual task of Data Science can be shown with an example from the field of telecoms: to find people who are interested in the new tariff. Another example: the optimization of the range of products in commercial networks. Often, companies manage to save very large sums simply by bringing the right assortment to the right stores, for example, sending expensive goods to the store where they are bought, and not the one where they will gather dust on the shelves.

Research tasks may look different. An example of such a task is to understand why a neural network gives a particular prediction - for certain input data. But in general, RnD-tasks are very diverse.

What amounts are invested in development related to AI and machine learning?

Depends on the company. In large corporations, they are willing to spend more money on this, in smaller ones - less. In my telecoms, according to my observations, a large project, where data volumes are hundreds of terabytes, is needed from tens to hundreds of millions of rubles. On the other hand, there is no limit to perfection (smiles - ed.) .

In Yandex, you have developed a mechanism for completing search queries. Is this neural network technology? How does the algorithm work?

No, it was a classic machine learning (ML) approach based on search query statistics. Depending on the entered beginning of the query and typical user queries, the most suitable variants of automatic completion were selected from the base of the most frequent search queries. This was before neural networks, when everyone considered them to be a dead-end technology. Then they were still inferior to the algorithms of the classic ML.

Tell us about the most significant Russian achievements in the field of AI.

The most famous example is Prisma. The company is not registered in Russia, but, nevertheless, the backbone of the state is domestic specialists. By the way, the scientists who developed the image processing method, which is used in Prism, are also from Russia (the group of Viktor Lempitsky).

In Yandex, AI algorithms are at the heart of the ranking of search results. The algorithm for ranking by proximity of the request text and the site using a neural network is called "Palekh".

Another famous example is NTechLab's FindFace. This is a demonstration of the work of its facial recognition algorithm, which it sells as a commercial product.

Russian companies that are engaged in voice technologies, for example, “Center for Speech Technologies”, are also known on the world market.

In Sberbank, by the way, the technology of facial biometrics of one of the Russian companies has been used for several years. It is used to combat identity theft in retail lending. The laboratory employs people who participated in this project.

The transition to new technologies requires the replacement of equipment and software that end users feel in the form of failures and errors in the network. How to carry out the transition phase as seamlessly as possible for the client?

In fact, this is a classic task in the development, it has long been able to solve. One of the methods is testing. Before introducing a new version, it is tested for a long time: they check boundary cases, and also whether the software will withstand the required load, and then open a new version for a small group of users. In this case, if something goes wrong, the smallest percentage of users will suffer.

Is AI able to correct errors and crashes in the system?

There are algorithms that are just designed for such situations. But their task is not to detect or correct a failure, but to predict that it will happen soon, to see anomalous patterns in the system. Usually in a complex system there are a lot of indicators of the state in which it is now. Having discovered an anomalous pattern, artificial intelligence can send a message to administrators: something goes wrong - look at what - you need to do something. For example, the load has increased, it is necessary to add unused capacities in order for the system to withstand it.

At present, the AI ​​is observational in nature and the detection function is “in advance” (for example, two hours). And to fix the problem, you still need a person.

What are they currently working on at the Sberbank AI laboratory?

The Sberbank AI Laboratory was created to develop competence in the field of AI in organizations. Now, with the advent of effective methods of teaching deep neural networks, this area has advanced greatly. Large companies need people versed in new AI technologies to keep up with the rapid progress. It is also important to understand in which areas of AI it makes sense to invest effort and money. The lab will help you find out.

Another important mission of the laboratory is its own research in this area, as well as the creation of new technologies that will benefit the bank. Of course, we are engaged in research that can be applied at the bank, but we also try to make our results useful not only in the financial field. For example, we are interested in the direction of the analysis of time series; data with such a structure (transactional and other) is very much in banking. From works with a larger scientific component, we can recall studies on the creation of new methods for interpreting the results of the neural network.

Who does Sberbank cooperate with?

The Bank cooperates with several universities: MIPT, FKN HSE, Moscow State University. IPavlov, a joint project with the Moscow Institute of Physics and Technology, is now at the hearing. This is a project of developing dialogue systems for communicating with a computer in a natural language. There are also very interesting activities with other universities, from solving complex optimization problems to fundamental research to improve depth learning algorithms. There are still many activities to learn and promote knowledge - for example, lectures on AI for schoolchildren.

What is the specificity of the algorithms of Sberbank AI?

There is a classic banking part. For example, scoring is a client's credit rating. In all countries, it is regulated by the Central Banks and is therefore largely based on well-interpreted methods — logistic regression and decision trees. These classic methods are reliable and stable. In the future, we hope, the regulator will allow the use of more complex methods. For this it is necessary to prove that the new methods are sufficiently reliable.

In the Sberbank's AI laboratory, we deal with more complex methods, mainly deep neural networks and AI algorithms for data typical of banks. A typical type of data for banks is a time series: for example, the prices of some goods (the price of yesterday, the day before yesterday, etc.).

April 19, you will speak at AI Conference . What do you tell visitors?

In the past few years, in-depth training methods have demonstrated tremendous success. They are already solving problems they were previously afraid to approach. For example, they have reached the level of human quality in speech-to-text and text-to-speech in image recognition. But they have their limitations, which prevent them from moving on. I want to tell about it, and also about approaches thanks to which, probably, it will be possible to bypass them. In many ways, this is still an area of ​​research, and not ready-made solutions for use in everyday work. However, there are reasonable hopes for the future with even more efficient AI technologies.


All Articles