DeepMind taught AI to play YouTube videos


Montezuma's Revenge game levels at Atari

The company DeepMind demonstrated the process of learning AI (its weak form) for passing games on Atari. The training was carried out by demonstrating the video game passing system from YouTube. This method is used by many human players who, for one reason or another, could not get through a game.

Usually, to solve such a problem, it is necessary to use the so-called reinforcement learning method. This technique is quite popular because it allows you to train bots to perform various specific tasks. As soon as the system achieves any result, it receives a small reward.

Developers create algorithms and models that are able to assess the game environment, including possible rewards for passing (points, bonuses, etc.). Such systems study the game step by step, gradually moving towards the final.

The new method developed in DeepMind differs from all others. Specialists of the company were able to teach AI to run such games under Atari, like Montezuma's Revenge, Pitfall and Private Eye. At the same time, the emphasis on points and prizes was not made - the training went on tutorials from YouTube. And this made it possible to achieve unusual results for AI.

The fact is that games like the same Montezuma's Revenge are difficult for machines to “understand”. There is no clear assignment, it is not clear where to go, what items to collect and what to do with them in the future. The machine is simply lost, because in the process of advancement it does not receive awards and training with reinforcements here becomes useless or almost useless.

In the game in question, you need to control a character named Panama Joe. As a result, he must get to the treasury in the old temple. According to legend, these treasures belong to Montezuma. First you need to find the first critically important object for the passing game - the golden key. To find it you need to go about 100 steps. But this is if you know what to do. If not, there is a huge amount of opportunity. 100 18 initial actions. This is too much for any man-made AI. Well, you will not get a reward here, everything is very, very specific.

One of the ways to let the computer know what to do is to demonstrate passing scripts. Actually, not only cars, but also people learn to perform various types of tasks according to examples. Dancing, the actions of the artist, rations - all this is best seen 1 time, and not 100 times to hear how to do it.

DeepMind came to the conclusion that this is the best way to show a computer how to perform a task with an implicit result. The technology created by the experts really helped. Two methods were used for learning by example: TDC (temporal distance classification) and CDC (cross-modal temporal distance classification).

In the first case, the AI ​​is trained to determine the distance in the game environment, to notice the difference between two different frames. The AI ​​also “understands” what to do in order to move from one place to another. For learning on YouTube, videos are framed in random order.

In the second case, the “understanding” of the soundtrack is also added. Sounds in almost all games correspond to the performance of certain actions. For example, jumping, getting items, etc. Thus, the computer is trained to perceive sounds as important game elements. Video + sound allows the computer to move quite well in the process of passing the game.

Here are the actions of the trained AI in Montezuma's Revenge. Passage of the other two games, mentioned at the very beginning - here .


However, it was not possible to refuse completely from the role of rewards - so far the AI ​​depends on the same points. But the usual method of teaching the system, which was used earlier, did not allow to reach at least the golden key, for which the first hundred points are given. So AI, like a blind kitten, poked in all directions, not knowing what to do. True, the system of "reinforcement" is also modified.

In the process of passing, every 16th video frame of a recording of the passage of an AI game is compared with frames of a video of a passing game by people. If the comparison shows a high degree of similarity, the AI ​​gets a reward. Over time, the AI ​​begins to perform the same sequence of actions as the person in order to get a similar frame.

Moreover, AI in many cases shows better results than human players or other passing algorithms, including Rainbow, ApeX, and DQfD.



In principle, all this is impressive, but so far the practical benefits of the achievements of DeepMind are unclear. Is it possible to use the method of teaching AI proposed by the company somewhere other than passing old games? But knowing about the achievements of DeepMind in the field of AI, there is no doubt that one way or another, all this can be used for practical purposes - experts would hardly have started working on the issue for the “fan”.

Source: https://habr.com/ru/post/413071/


All Articles