In our life there can be many moments that need to be recorded on the camera in slow motion - the first steps of the child, the first trip to the sea, the trick of a beloved dog. A modern smartphone allows you to shoot at a frequency of 240 frames per second or higher, but constantly in this mode you will not be able to record - there will not be enough memory and the battery will sit down quickly. The Nvidia-created neural network works with already captured videos, turning them into slow motion.
Researchers from Nvidia created a system based on in-depth training for processing videos at 30 frames per second into slow motion videos. They used the PyTorch library of deep learning and Nvidia Tesla V100 GPUs graphics processors. The system was trained with the help of 11,000 videos of everyday and sports activity, taken at a frequency of 240 frames per second. Because of this, she began to predict intermediate frames. To test the accuracy of the technology, the researchers used a separate video database.
The technology allows you to make videos much smoother and less blurry than in the case of the usual slowdown speed. The frame rate rises to 480 per second. To demonstrate the results, the team compared the slow-motion videos made by
The Slo Mo Guys video bloggers with the same videos slowed down by a new way.
The first neural network evaluates the video stream - the structure of motion, objects, surfaces and edges in the scene. It does this both forward on the timeline and in reverse order for the two input frames. The system then predicts how the pixels will move from one frame to the next, creating 2D vectors of these movements.
Then, the second neural network works, which predicts the visibility map — it excludes those pixels that must be covered by objects to remove artifacts. And the system with the help of all received data distorts new frames between two days off in order to ensure a smooth transition.
The video can compare the results. Of course, there are differences between the artificially created slow-motion videos and the original, which was originally shot at a high frame rate. This is especially noticeable in comparison with a balloon jump into the pool from the Slo Mo Guys at 54 seconds. But if there was nothing to compare with - it would be difficult to distinguish a real video from a “fake” one.
The team does not yet know how to commercialize their development. In their opinion, it is still far from ideal and requires a lot of resources, including temporary ones. Probably, even if such a technology will be implemented as a product, it will not run on the user's device — calculations will occur in the cloud.

In April, experts from Nvidia
showed another technology that adds new fragments to the image - the reconstruction of photographs. The method allows you to remove the object from the image, after which the system will replace the empty fragment with a realistic background, as well as add eyes and other parts of the face after removing them from the photo.
In preparation for training the neural network, the researchers created more than 55 thousand masks from random strips and holes of different sizes. Another 25,000 new masks were used to verify the accuracy of the results after training.
During the training, masks were superimposed on the images to help the neural network learn how to reconstruct the missing pixels.
The scientific work of Super SloMo: High Quality Estimation of Multiple Intermediate Frames for Video Interpolation is published on the arXiv.org preprint website:
arXiv: 1712.00080 .