In the
previous article, we discussed the Opus codec, which runs at very low bitrates. But another codec tends to achieve even lower bit rates - this is
Codec 2 .
Codec 2 is designed to encode speech only. Although the bitrate is impressive, the sound is not as good as in the case of Opus, which can be heard in
audio samples . However, in combination with a neural network (
WaveNet ), the codec shows
impressive results .
WaveNet Neural LayersIntroduction
Codec 2 is distributed open source and is intended for speech coding. It focuses on the bit rate from 700 to 3200 bps.
The developer is
David Rowe , an electronics engineer currently living in South Australia. He began the project in September 2009 with the aim of improving low-cost radio communications for people in remote areas of the world. To this end, he was going to develop a codec that would significantly reduce the file size and bandwidth requirements for streaming.
Another motivation, according to David, was to create a codec
free from patent encumbrance as an alternative to proprietary codecs, which, in his opinion, “require the design of expensive and clumsy licenses and stifle innovation.” He believes that it is possible to do without the patented codecs, so he distributes all the work under a free license.
Potential application
The author cites various codec applications, among them VoIP, voice communication over a narrow band of digital HF / UHF radio (especially for amateur radio, to avoid problems with the use of proprietary codecs), communications in developing countries and remote regions, including the army, police and rescue services .
We at Auphonic are interested in the potential use of a codec for better compression of podcasts, presentations, and audio books, which allows us to
reduce the amount of space taken up and minimize the effect of
bad network connections .
How it works
To reduce the bitrate, it is necessary to reduce speech to the minimum possible information / data, that is, to minimize the amount of redundantly transmitted information.
For this, Codec 2 uses
harmonic sinusoidal speech coding . It divides speech into segments of 10−30 ms, which are called frames. Each frame is then analyzed for the fundamental level (pitch) and the number of harmonics that fit into the 4 kHz bandwidth. Then for each harmonic in the 4 kHz range, the amplitude and phase are recorded.
This information is then encoded, and the decoder recovers the sound based on this data.
Codec 2 block diagrams: encoder (left) and decoder (right). Rowtel illustrationAudio examples and comparison with other codecs
Although it all sounds great in theory, but what about reality? Let's listen. Here is a short wav sound file:
intro-orig.wav - 1,3 MBLet's apply Codec 2 (without WaveNet decoder) on various available bitrates:
3200 bps ,
2400 bps ,
1600 bps ,
1200 bps and
700 bps .
These examples show a significant reduction in file size.
Let's look at the files from the point of view of their
volume for storing 1 hour of sound :
- At 3200 bps, one hour of sound requires only 1.37 MB (fit on one old 3½-inch floppy disk!)
- 2400 bit / s bit rate corresponds to 1.03 MB / h
- 1600 bit / s equals 0.68 MB / h (or about two hours of sound on a single diskette! )
- 1200 bps - up to 0.51 MB / h
- 700 bps - up to 0.3 MB / h
The compression is very strong, but the result clearly sounds unnatural.
For comparison, the same sound in
MP3 at 8 kbps .
The file size is significantly larger than that of Codec 2, and the quality is probably still unacceptable. You can hear well what is sometimes called sizzle - strange metallic sounds inherent in low quality MP3s.
There is the latest codec to compare with. It seems that it unites both worlds, that is, it provides an acceptable quality at a low bitrate:
Opus .
Thanks to its convincing performance at low bit rates, Auphonic already offers users Opus encoding up to 6 Kbps, the lowest bit rate that the codec supports.
At 6 Kbps, the Opus codec seems to be much better than the 8 Kbps MP3. The voice is a bit muffled, but
still sounds natural .
Returning to Codec 2 purely for the sake of interest, let us hear how he gets to
encode music ! (Keep in mind that Codec 2 is not intended for encoding music, but only for speech).
Original file8kbps MP3Personally, I can not listen to MP3 on such a bitrate, so let's look at the results of Codec 2! So,
3200 bps ,
2400 bps ,
1600 bps ,
1200 bps ,
700 bps .
It is easy to understand that for this purpose it does not fit!
Codec 2 and WaveNet
As we have heard, despite the impressive compression, the result is not very natural sound.
But here it becomes more interesting if you look at the work of Bastian Klein from the Library of Cornell University. He used Codec 2 at 2400 bps for encoding, but he
replaced the Codec 2 decoder with WaveNet's generative deep learning model (for more information, see
“Coding low-bit speech based on Wavenet” ).
Here are some examples from the
authors :
Male voiceOriginal fileCodec 2With WaveNet decoderFemale voiceOriginal fileCodec 2With WaveNet decoderCompared to Codec 2, we hear a
significant improvement in quality , and if we compare it with the original, there is no significant reduction in quality.
David Row himself said that he considers the result to be a
“dramatic improvement in speech coding at low bit rates” and “a good wideband speech codec of 8000 bps.”
Conclusion
Although the (original) Codec 2 codec is a very interesting work, its scope is limited and the end result is not suitable for podcasting. It is also clear from audio examples that it can be used to compress only the voice, but not the music.
Nevertheless, Codec 2 in combination with the
WaveNet decoder significantly improves the quality, and the low bit rate (2400 bps) will be extremely interesting for
distributing podcasts and audiobooks : only
1.03 MB of space
is required for
one hour of sound !
Auphonic will add Codec 2 support to
output files when the WaveNet decoder appears in an easy-to-use form. So far we have added
Codec 2 support for input files only .