The new generation of AV1 codec: corrective directional filter CDEF

Author: Monty (monty@xiph.org, cmontgomery@mozilla.com). Published June 28, 2018.

If someone has not read the previous article ... AV1 is a new universal video codec developed by the Alliance for Open Media. The Alliance took VPX codec from Google, Thor from Cisco and Daala from Mozilla / Xiph.Org as a basis. The AV1 codec is superior in performance to VP9 and HEVC, which makes it a codec not for tomorrow, but for the day after tomorrow. The AV1 format is free of any royalty and will always remain so with a licensing license.

This article was conceived as the second in a series of articles that describe in detail the functionality of AV1 and the new technologies that underlie it and are used for the first time in production. The previous article on Xiph.org explained the Chroma from Luma (CfL) brightness prediction function . Today we will talk about the limited directional corrective filter (Constrained Directional Enhancement Filter). If you always wondered what you need to write a codec, fasten your seat belts and get ready for education!

Filters in AV1

Virtually all video codecs apply correction filters to improve the subjective quality of the material at the output.

By “corrective filters,” I mean methods that do not necessarily encode information about the image or improve the objective coding efficiency, but they in some way improve the result. Corrective filters should be used carefully, because they usually lose some information - and for this reason they are sometimes considered an unacceptable method of deception, so that the result looks better than it actually is.

But this is not fair. Corrective filters are designed to circumvent or eliminate specific artifacts that are imperceptible to objective metrics, but obvious to the human eye. And even if filters are considered a form of deception, a good video codec still needs all the practical and effective cheats that it can use.

Filters are divided into several categories. First, they can be normative or non-normative . Regulatory filter is a mandatory part of the codec; without it, it is impossible to decode the video correctly. Non-standard filter is optional.

Secondly, the filters differ in the place of application. There are pre-processing filters (preprocessing) applied to the input data before encoding, post-processing filters (post -processing ) applied to the output data after decoding completion, as well as in-loop filters (loop filters) integrated into the encoding process. Pre-processing and post-processing filters are usually non-standard and will not be included in the codec. And contour filters are normative almost by definition. This is part of the codec; they are used in the encoding optimization process and are applied to reference frames or for interframe coding.

AV1 uses three normative corrective filters. The first is a deblock filter to remove the “blockiness” - obvious artifacts along the edges of the coded blocks. Although DCT is relatively well suited to condense energy in natural images, it still tends to accumulate errors at the edges of the blocks. If you remember, eliminating these artifacts was the main reason why a lapped transform was used in Daala . But AV1 is a more traditional codec with hard block boundaries. As a result, a traditional deblock filter is required here to smooth out artifacts at the edges of the blocks.

An example of artifacts at block boundaries in the traditional DCT block codec. These errors are especially noticeable.

The last of the three filters is the Loop Restoration filter. It consists of two configurable and replaceable filters: the Wiener filter and the self-guided filter. These are two convolutional filters that attempt to build a kernel to partially restore the lost quality of the original input image. They are usually used for noise reduction and / or correction at the edges of the blocks. In the case of AV1, they perform the common task of noise cancellation, removing the base DCT noise through adjustable blur.

Between them there is a limited directional correction filter (Constrained Directional Enhancement Filter, CDEF), which we will talk about. Like the recovery loop filter, it removes artifacts and basic noise at the joints, but unlike the loop recovery filter, it is a directional filter. Unlike most filters, it does not apply to everything in a row, but specifically finds the boundaries of the blocks. Therefore, CDEF is especially interesting: it is the first practical and useful directional filter used in video coding.

Long and winding road

The history of CDEF has not been easy. This is a long road with turns, side paths and dead ends. The CDEF combines several research papers, each of which gave an idea or inspiration for the final filter in AV1.

The whole point of converting blocks of pixel data using DCT and DCT-like conversions is to represent a block of pixels as few numbers as possible. DCT compresses energy fairly well in most images, that is, it tends to collect scattered pixel patterns in just a few important output factors.

But there are exceptions to the efficiency of DCT compression. For example, DCT does not transform directional boundaries or patterns very well. If you look at the DCT output of the sharp diagonal edge, then the output coefficients also form ... an acute diagonal! It is different after transformation, but still present in the image, although usually in a more complex form than at the beginning. Compression defeated!

Sharp borders are a traditional problem for DCT-based codecs, because they do not compress well, if at all. Here we see a sharp border (left) and DCT transform coefficients (right). The energy of the original boundary propagates through the DCT in a pattern of directed ripples

Over the past two decades, video codec research has increasingly considered conversions, filters, and prediction methods that are inherently directional. Researchers were looking for a way to better handle these boundaries and patterns in order to correct this fundamental limitation of DCT.

Classic Directional Predictors

Directed intra-prediction is probably one of the most well-known methods of directional action in modern video codecs. Everyone is familiar with the directional prediction modes h.264 and VP9, where the codec transfers the prediction of a particular pattern to a new block based on the surrounding pixels from the already decoded blocks. The goal is to remove (or significantly reduce) energy in hard, directional edges before the block is transformed. By predicting and removing features that cannot be compressed, we increase the overall efficiency of the codec.

Illustration of directional prediction modes in AVC / H.264 for 4 × 4 blocks. The predictor expands the values from single-pixel strips of neighboring pixels, transferring them to the predicted block in one of eight directions, plus averaging mode for simple DC prediction

An even older idea is motion compensation. This is also a form of directional forecasting, although we rarely think about it that way. This mode shifts blocks in certain directions, again to predict and extract energy before running DCT. This block offset is directional and filtered. Like directed intra-prediction, it applies carefully constructed resampling filters if the offset is not an integer number of pixels.

Directional Filters

As noted earlier, video codecs actively apply filters to remove block artifacts and noise. Although filters are applied on the 2D plane, the filters themselves usually work in 1D, that is, they are executed horizontally and vertically separately.

Directional filtering launches filters in other directions than horizontal and vertical. This method is already common in static image processing, where noise reduction and special effects filters often take into account borders and direction. But these directional filters are often based on filtering the output of directional conversions. For example, the [slightly obsolete] noise cancellation filters I wrote about are based on a dual tree of complex wavelets .

But for video encoding, we are most interested in directional filters, which are directly applied to pixels in a certain direction, rather than filtering the frequency domain at the output of the directional transform. As soon as you try to create such an animal, the Big Question will quickly arise: how to formalize a certain direction, different from horizontal and vertical, when the filter positions are no longer tied to the pixels on the grid?

One option is to use the classic approach used in high-quality image processing: convert the filter core and re-sample (resample) the pixel space as needed. It can even be argued that this is the only "correct" or "complete" answer. It is used in compensating for sub-pixel motion, where it is impossible to get a good result without good resampling, as well as in directional prediction, where fast approximation is usually used.

However, even a quick approximation is expensive in terms of computing resources, if applied everywhere, it is therefore advisable to avoid re-sampling, if possible. The expensive price for speed is one of the reasons why directional filters have not yet been used in video coding.

Directional transforms

Directional transformations attempt to correct DCT problems by condensing the block boundaries at the level of the transformation itself.

Experiments in this area fall into two categories. There are transformations that use essentially directed bases, such as directional wavelets. As a rule, they are prone to excessive recalculation / overcomplete, that is, they produce more output data than they accept at the input: usually much more. It is like working in the opposite direction, because we want to reduce the amount of data, not increase it! But these transformations still compress the energy, and the encoder still selects a small subset of the output data for encoding, so in reality there are some differences from conventional DCT coding with losses. However, “overcomplete” conversions usually require an excessive amount of memory and computational resources, and therefore are not used in popular video codecs.

The second category of directional transformations takes regular undirectional transformations like DCT - and changes them, affecting the input or output. Changes can be made in the form of resampling, matrix multiplication (which can be considered as a specialized form of resampling) or juggling with the order of the input data.

This last idea is the strongest, because the method works quickly. A simple permutation of numbers does not require any mathematical calculations.

Two examples of transformations in different directions by permuting pixels and coefficients, rather than a resampling filter. An example is taken from “Review of Directed Transformations in Image Coding” , Jicheng Xu, Bing Zeng, Feng Wu

The implementation is complicated by several practical difficulties. Reorienting the square gives a diagonal edge with mostly vertical or horizontal lines, which leads to a non-square matrix of numbers as input. Conceptually, this is not a problem. Since it is possible to start the conversion of rows and columns independently of each other, we simply use different sizes of 1D DCT for each row and column, as shown in the figure above. But in practice this means that we will need a different DCT factorization for each possible column length - and as soon as the hardware design department of this understands, you will be thrown out of the window.

There are other ways of processing non-square areas after the permutation. You can think of resampling schemes that retain the input square or work only on output. Most of the articles on directional conversion listed below offer different schemes for this.

And on this the story of directed transformations essentially ends. As soon as you circumvent various complications of directional transformations and make a real filter, it does not work normally in a modern codec for an unexpected reason: due to competition with a variable block size. That is, in a codec with a fixed block size, the addition of directional transformations gives an impressive increase in efficiency. And the variable block size in itself gives even greater benefits. But the combination of variable block size and directional transformations leads to a worse result than using each of these methods separately. The variable block size has already eliminated the redundancies that are used by directional transformations, and even made it more efficient.

When developing Daala, Nathan Egge and I experimented a lot with directional transformations. I looked at the problem from both the input and the output side, using sparse matrix multiplications to transform the output diagonal boundaries into a vertical / horizontal position. Nathan tested the well-known approaches to directional transformations, rearranging the data at the input. We came to one conclusion: additional complexity does not provide any objective or subjective benefits.

Applying directional transformations to Daala (and other codecs) could be a mistake, but the research raised one question mentioned earlier: how to quickly filter along borders without costly re-sampling? Answer: no need to re-sample. Make an angle approximation, moving along the whole nearest pixel. Make an approximation of the transformed core, literally or conceptually rearranging the pixels. This approach leads to some distortions (aliasing), but it works quite well and quickly enough .

Directed Predictors, Part 2: Daala Chronicles

The history of the CDEF in the Daala codec began with attempts to do something completely different: the usual boring directed intra-prediction. Or at least something normal for the Daala codec.

I wrote about the Daala intra-forecasting scheme in the frequency domain when we first started working on it. The math is quite working here; no cause for concern. However, the naive implementation required a huge number of matrix multiplications, which turned out to be too expensive for the codec in production. We hoped that the computational load can be reduced by an order of magnitude due to thinning — the elimination of matrix elements that do not make a large contribution to the forecast.

But thinning does not work as desired. At least, when we implemented it, it lost too much information, making the technique unusable in practice.

Of course, Daala still needed some form of intra-prediction, and Jean-Marc Valin had an idea: an autonomous prediction codec that works in the spatial domain parallel to the Daala codec in the frequency domain. As a kind of symbiote working in tandem, but not dependent on Daala, it is not limited to the requirements of the Daala for the frequency domain. This is how Intra Paint appeared.

An example of the algorithm for predicting Intra Paint in a photograph of Sydney Harbor . The visual output is clearly directional, it fits well with the boundaries of the blocks and features of the original image, creating a pleasant (perhaps a bit strange) result with clear boundaries

The Intra Paint filter worked in a new way: it encoded one-dimensional vectors only along the block boundaries, and then drove the pattern in the chosen direction. It's like splashing paint, and then smearing it in different directions in open areas.

Intra Paint seemed promising and in itself produced amazingly beautiful results, but again it was not effective enough to work as a standard intra-predictor. He simply did not get enough bits to encode his own information.

The difference between the original photo of Sydney Harbor and the result of Intra Paint. Despite the visually pleasing output of Intra Paint, objectively, it cannot be called an ultra-accurate predictor. The difference is quite significant even along many borders that seemed to be well crafted.

Intra Paint's “failure” gave us a different idea. Although this “drawing” is objectively not a very good predictor, but subjectively, for the most part, it looks good. Could it be possible to use the “paint smearing” method as a post-processing filter to improve the subjective visual quality? Intra Paint follows very well along clear boundaries, and therefore should potentially well eliminate noise that accumulates along the sharpest edges. From this idea was born the original Paint-Dering filter in Daala, which ultimately led to the CDEF itself.

There is one more interesting thing in directional forecasting, although this is currently a dead end direction in video coding. David Schlief implemented an interesting re-sampling filter with boundaries / directions called Edge-Directed Interpolation (EDI). Other codecs (like the VPx series and at some time AV1) experimented with reduced-size reference frames to save coding bits, and then increasing the resolution. We hoped that by increasing the resolution, the significantly improved EDI interpolation would improve the technique to such an extent that it would be useful. We also hoped to use EDI as an improved sub-pixel interpolation filter for motion compensation. Unfortunately, these ideas remained an unrealized dream.

Filling holes, merging branches

At the moment I have described all the basic prerequisites necessary for the approach to the CDEF, but in reality we continued to wander in the desert. Intra Paint spawned the original Daala Paint-Dering filter, which used the Intra-Paint algorithm as a post-filter to eliminate artifacts. It was too slow to use in a real codec.

As a result, we took into account the lessons of Intra Paint and abandoned experiments in this direction. Daala borrowed CLPF from Thor for a while, and then Jean-Marc created another, much faster Noise Reduction Filter (Deringing) for Daala, based on the search for Intra-Paint boundary directions (it worked quickly and well), as well as by Conditional Replacement Filter conditional replacement filter. CRF was created in something on the ideas of a median filter and produced results similar to a bilateral filter, but it worked essentially in vectors and therefore much faster.

Comparison of a 7-tap linear filter with a conditional replacement filter on a noisy one-dimensional signal, where noise simulates the effects of quantization on the original signal

Daala's final noise reduction filter uses two one-dimensional CRF filters, a 7-tap filter in the direction of the edges and a weak 5-tap filter. Both filters operate on whole pixels only, without re-sampling. Here, the Daala noise suppression filter becomes very similar to what we now know as CDEF.

We recently offered Daala as the AOM codec, and this intermediate filter was an AV1 daala_dering experiment. Cisco also introduced its noise suppression filter, the Constrained Low-Pass Filter (CLPF) from the Thor codec. For a while, both filters existed in parallel in the experimental assembly of AV1: they could be turned on separately or even together. Due to this, they managed to notice useful synergy in their work, as well as additional similarities of filters at various stages of work.

So, we finally got to CDEF : there was a merge of the CLPF filter from Cisco and the second version of the Daala noise elimination filter in one high-performance noise reduction filter, taking into account the direction of the borders.

Modern CDEF

The CDEF filter is simple and very similar to our previous filters. It consists of three parts (direction search, limited replacement / low pass filter and placement of pixel tags) that we used before. Given the long background, looking at the finished CDEF, you might ask, “Is that all? And where is the rest? ”CDEF is an example of how to get a useful effect due to the correct implementation of parts, and not due to complication. A simple and effective filter - so it should be.

Direction Search

CDEF works in a certain direction, so you need to define it. The algorithm is the same as that of Intra Paint and Paint-Dering. There are eight possible directions.

Eight possible directions for the CDEF filter. Numbered lines in each direction block correspond to the parameter 'k' in the search direction

We determine the direction of the filter, making the “directional” variants of the input block, one for each direction, where all the pixels along the line in the selected direction are reduced to the same value. Then choose the direction where the result most closely matches the source block. That is, for each direction d, we first find the average value of the pixels in each line k, and then along each line we add the quadratic error between the specified pixel value and the average value of this pixel line.

An example of the process of choosing the direction d, which best corresponds to the input block. First, we determine the average pixel value for each row of operation k in each direction. This is shown above by reducing each pixel of a given line k to this average value. Then we summarize the error for a given direction, pixel by pixel, subtracting the input value from the mean. The direction with the least error / variance is chosen as the best direction.

, — , . , : , . !

, . :

$E$ — ,

$p$ — ,

$x_p$ — ,

$k$ — ,

$N_{d, k}$ — ( )

$k$

$d$ . . ,

$d$ . , AV1 CDEF 5,875 1,9375 , 8×8 DCT.

(filter taps)

CDEF .

$d$ , ( ) .

CDEF . , Daala. 45° , CLPF Thor.

d. , 45° .

, . , .

. , (constraint function), .

(taps) (w) . a = 2 b = 4, a = 3 b = 3.

, , .

CDEF , (constraint function)

$d$ ,

$S$

$D$ :

, . (

$S$ ).

$S$ and

$D$ , . ,

$D$ .

. (d) x. y. (S). (D)

, . , ( ) , . :

…

$(p)$ and

$(s)$ .

. . CDEF, , CDEF .

results

CDEF (ringing) . AV1 , . , CDEF.

Fruits . CDEF, — CDEF

. , CDEF , , PSNR SSIM.

, , CDEF ( Daala Dering Thor CLPF ), CDEF. , , AV1 CDEF AV1 CDEF.

A/B- AV1 CDEF AV1 CDEF

(p<.05) . , 5−10%. , .

, 1%, , . — , .

, CDEF , «» . , CDEF , AV1. CDEF — , , AV1. CDEF 3% 10% AV1, .

Additional resources

'derf' Xiph.Org, media.xiph.org
, Daala AV1: « ?»
Constrained Directional Enhancement Filter (CDEF) AV1 . , - , 2017
CDEF ICASSP 2018 , , -
Deringing Daala . - . , Daala, CDEF AV1.
Daala: . - . Intra-Paint, Daala, CDEF
Intra Paint Deringing . - , 2015. , Intra Paint Daala
. , , , 2013
- . - , , IEEE Transactions on Image Processing, 16, 5, 2007
- . - , , 2009. 2009 ; ,
- . - , , . , , IEEE Transactions on Image Processing, 19, 7, 2010
DCT DC . , . IEEE
- . , , , . Proceedings of 2010 IEEE 17th International Conference on Image Processing, 26-29 , 2010,
- . , 2008. IEEE
- . , , . IEEE Transactions on Image Processing, 21, 2, 2012
. , , , . IEEE.
. , , .
. , , , IEEE Transaction in Image Processing, 19, 11, 2010. IEEE
. O. . , O. . , , 2008
. O. . , . , A. , 2011
. , , , , 2009. IEEE
. , , , , IEEE Transactions on Image Processing, 2008
— . , , IEEE Transactions on Circuits and Systems for Video Technology, 2008
. , , , IEEE Signal Processing Magazine, 2005

Source: https://habr.com/ru/post/416049/

All Articles