Codec 2 + neural network = whole podcast on one floppy

In the previous article, we discussed the Opus codec, which runs at very low bitrates. But another codec tends to achieve even lower bit rates - this is Codec 2 .

Codec 2 is designed to encode speech only. Although the bitrate is impressive, the sound is not as good as in the case of Opus, which can be heard in audio samples . However, in combination with a neural network ( WaveNet ), the codec shows impressive results .

WaveNet Neural Layers

Introduction

Codec 2 is distributed open source and is intended for speech coding. It focuses on the bit rate from 700 to 3200 bps.

The developer is David Rowe , an electronics engineer currently living in South Australia. He began the project in September 2009 with the aim of improving low-cost radio communications for people in remote areas of the world. To this end, he was going to develop a codec that would significantly reduce the file size and bandwidth requirements for streaming.

Another motivation, according to David, was to create a codec free from patent encumbrance as an alternative to proprietary codecs, which, in his opinion, “require the design of expensive and clumsy licenses and stifle innovation.” He believes that it is possible to do without the patented codecs, so he distributes all the work under a free license.

Potential application

The author cites various codec applications, among them VoIP, voice communication over a narrow band of digital HF / UHF radio (especially for amateur radio, to avoid problems with the use of proprietary codecs), communications in developing countries and remote regions, including the army, police and rescue services .

We at Auphonic are interested in the potential use of a codec for better compression of podcasts, presentations, and audio books, which allows us to reduce the amount of space taken up and minimize the effect of bad network connections .

How it works

To reduce the bitrate, it is necessary to reduce speech to the minimum possible information / data, that is, to minimize the amount of redundantly transmitted information.

For this, Codec 2 uses harmonic sinusoidal speech coding . It divides speech into segments of 10−30 ms, which are called frames. Each frame is then analyzed for the fundamental level (pitch) and the number of harmonics that fit into the 4 kHz bandwidth. Then for each harmonic in the 4 kHz range, the amplitude and phase are recorded.

This information is then encoded, and the decoder recovers the sound based on this data.

Codec 2 block diagrams: encoder (left) and decoder (right). Rowtel illustration

Audio examples and comparison with other codecs

Although it all sounds great in theory, but what about reality? Let's listen. Here is a short wav sound file:

intro-orig.wav - 1,3 MB

Let's apply Codec 2 (without WaveNet decoder) on various available bitrates: 3200 bps , 2400 bps , 1600 bps , 1200 bps and 700 bps .

These examples show a significant reduction in file size.

Let's look at the files from the point of view of their volume for storing 1 hour of sound :

At 3200 bps, one hour of sound requires only 1.37 MB (fit on one old 3½-inch floppy disk!)
2400 bit / s bit rate corresponds to 1.03 MB / h
1600 bit / s equals 0.68 MB / h (or about two hours of sound on a single diskette! )
1200 bps - up to 0.51 MB / h
700 bps - up to 0.3 MB / h

The compression is very strong, but the result clearly sounds unnatural.

For comparison, the same sound in MP3 at 8 kbps .

The file size is significantly larger than that of Codec 2, and the quality is probably still unacceptable. You can hear well what is sometimes called sizzle - strange metallic sounds inherent in low quality MP3s.

There is the latest codec to compare with. It seems that it unites both worlds, that is, it provides an acceptable quality at a low bitrate: Opus .

Thanks to its convincing performance at low bit rates, Auphonic already offers users Opus encoding up to 6 Kbps, the lowest bit rate that the codec supports.

At 6 Kbps, the Opus codec seems to be much better than the 8 Kbps MP3. The voice is a bit muffled, but still sounds natural .

Returning to Codec 2 purely for the sake of interest, let us hear how he gets to encode music ! (Keep in mind that Codec 2 is not intended for encoding music, but only for speech).

Original file
8kbps MP3

Personally, I can not listen to MP3 on such a bitrate, so let's look at the results of Codec 2! So, 3200 bps , 2400 bps , 1600 bps , 1200 bps , 700 bps .

It is easy to understand that for this purpose it does not fit!

Codec 2 and WaveNet

As we have heard, despite the impressive compression, the result is not very natural sound.

But here it becomes more interesting if you look at the work of Bastian Klein from the Library of Cornell University. He used Codec 2 at 2400 bps for encoding, but he replaced the Codec 2 decoder with WaveNet's generative deep learning model (for more information, see “Coding low-bit speech based on Wavenet” ).

Here are some examples from the authors :

Male voice
Original file
Codec 2
With WaveNet decoder

Female voice
Original file
Codec 2
With WaveNet decoder

Compared to Codec 2, we hear a significant improvement in quality , and if we compare it with the original, there is no significant reduction in quality.

David Row himself said that he considers the result to be a “dramatic improvement in speech coding at low bit rates” and “a good wideband speech codec of 8000 bps.”

Conclusion

Although the (original) Codec 2 codec is a very interesting work, its scope is limited and the end result is not suitable for podcasting. It is also clear from audio examples that it can be used to compress only the voice, but not the music.

Nevertheless, Codec 2 in combination with the WaveNet decoder significantly improves the quality, and the low bit rate (2400 bps) will be extremely interesting for distributing podcasts and audiobooks : only 1.03 MB of space is required for one hour of sound !

Auphonic will add Codec 2 support to output files when the WaveNet decoder appears in an easy-to-use form. So far we have added Codec 2 support for input files only .

Source: https://habr.com/ru/post/415557/

All Articles