The author of the article, Polish programmer Tomek Rekavek, is developing the Jackrabbit Oak project as part of the Apache Software Foundation for Adobe. The article was published in the author’s personal blog on February 24, 2016.The Polish Radio-3 (the so-called Troika) is famous for its good music and intelligent presenters. On the other hand, it suffers from the presence of loud and annoying ad units in the broadcast, where usually some kind of electronics or medicine is advertised. I listen to Troika almost constantly at work and at home, so I wondered: how to remove ads? It seems I managed to find a solution.
Digital signal processing
My goal is to create an application that mutes ads. The commercial unit
starts and
ends with jingles, so the program must recognize these particular sounds and turn off the sound between them.
I know that this area of mathematics / computer science is called
digital signal processing , but DSP always seemed like magic to me. Well, a great opportunity to learn something new. I spent a day or two trying to figure out which mechanism to use for analyzing the audio stream. And in the end I found what I need: it is a
cross-correlation or cross-correlation.
Octave
Usually all refer to the MATLAB implementation. But MATLAB is an expensive application that simplifies complex mathematical operations, including DSP. Fortunately, there is a free alternative called
Octave . It seems that in Octave, it is not difficult to start mutual correlation on two audio files. You just need to run the following commands:
pkg load signal jingle = wavread('jingle.wav')(:,1); audio = wavread ('audio.wav')(:,1); [R, lag] = xcorr(jingle, audio); plot(R);
Get this schedule:

A peak is clearly visible, describing the position of
jingle.wav in
audio.wav . What surprised me was the simplicity of the method:
xcorr() does all the work, the rest of the code is only for reading files and displaying the result.
I wanted to implement the same algorithm in Java, and then I will have a tool that:
- reads an audio stream from a standard input (for example, from ffmpeg),
- analyzes it in search of jingles,
- prints the same stream to stdout and / or disables it.
Using stdin and stdout will allow you to connect a new
analyzer to other applications responsible for audio broadcasting and playback of the result.
Reading sound files
First of all, a Java program must read the jingle (saved as a
.wav file) into an array. There is some additional information in the file, such as headers, metadata and other things, but we need only sound. A suitable format is called PCM, it’s just a list of numbers representing sounds. Convert WAV to PCM can ffmpeg:
ffmpeg -i input.wav -f s16le -acodec pcm_s16le output.raw
Here each sample is saved as a 16-bit number with inverse byte order (little endian). In Java, this number is called
short , and to automatically convert the input stream to a list of
short values, you can use the class
ByteBuffer :
ByteBuffer buf = ByteBuffer.allocate(4); buf.order(ByteOrder.LITTLE_ENDIAN); buf.put(bytes); short leftChannel = buf.readShort();
Reverse Engineering xcorr
To implement the
xcorr() function in Java, I studied the Octave
source code . Without changing the final result, I was able to replace the xcorr () call with the following lines - they need to be rewritten in Java:
N = length(audio); M = 2 ^ nextpow2(2 * N - 1); pre = fft(postpad(prepad(jingle(:), length(jingle) + N - 1), M)); post = fft(postpad(audio(:), M)); cor = ifft(pre .* conj(post)); R = real(cor(1:2 * N));
It looks scary, but most of the functions are trivial array operations. The cross-correlation is based on the application of the
fast Fourier transform on a sound sample.
Fast Fourier Transform
As a person who had no experience with DSP, I simply consider FFT as a function that takes an array with a sound sample description — and returns an array with complex numbers representing frequencies. This minimalist approach worked well: I launched the FFT implementation from the
JTransforms package and got the same results as in Octave. I think this is partly a
cargo cult , but damn, it works!
Run xcorr on stream
The algorithm above assumes that
audio is an array in which we are looking for a
jingle . This is not quite suitable for radio broadcasting, where we have a continuous stream of sound. To run the analysis, I created a circular buffer slightly longer than the duration of the jingle, which needs to be recognized. The incoming stream fills the buffer, and as soon as it is filled, the cross-correlation test is run. If nothing is found, then the oldest part of the buffer is discarded - and again we expect it to be filled.
I experimented a bit with the length of the buffer and got the best results with the buffer size 1.5 times the size of the jingle.
Putting it all together
Getting a stream in PCM format is easy. This can be done using the aforementioned
ffmpeg . The command below redirects the stream to the standard
java input, and then outputs
Got jingle 0 or
Got jingle 1 when the corresponding pattern is found in the stream.
ffmpeg -loglevel -8 \ -i http://stream3.polskieradio.pl:8904/\;stream \ -f s16le -acodec pcm_s16le - \ | java -jar target/analyzer-1.0.0-SNAPSHOT-jar-with-dependencies.jar \ 2 \ src/test/resources/commercial-start-44.1k.raw 500 \ src/test/resources/commercial-end-44.1k.raw 700
Standalone version
I also prepared a simple offline version of the analyzer, which itself connects to the “Three” stream (without an external
ffmpeg ) and reproduces the result using
javax.sound . Everything fits into one JAR file and contains a basic user interface with the Star and Stop buttons. It can be downloaded
here . If you don’t like to run other people's JARs on your machine (which is absolutely correct), then all the sources are on
GitHub .

It seems that
everything works as it should :)
Further work
The ultimate goal is to disable advertising at the level of a hardware amplifier, receiving a “real” FM signal, rather than some kind of Internet stream. This is covered in the
next article .
Update (June 2018)
Hacker News TalkWykop TalkReddit Talk