The modified discrete cosine transform (MDCT) is a lapped transform based on the typeIV discrete cosine transform (DCTIV), with the additional property of being lapped: it is designed to be performed on consecutive blocks of a larger dataset, where subsequent blocks are overlapped so that the last half of one block coincides with the first half of the next block. This overlapping, in addition to the energycompaction qualities of the DCT, makes the MDCT especially attractive for signal compression applications, since it helps to avoid artifacts stemming from the block boundaries. As a result of these advantages, the MDCT is employed in most modern lossy audio formats, including MP3, AC3, Vorbis, Windows Media Audio, ATRAC, Cook, and AAC.
The MDCT was proposed by Princen, Johnson, and Bradley^{[1]} in 1987, following earlier (1986) work by Princen and Bradley^{[2]} to develop the MDCT's underlying principle of timedomain aliasing cancellation (TDAC), described below. (There also exists an analogous transform, the MDST, based on the discrete sine transform, as well as other, rarely used, forms of the MDCT based on different types of DCT or DCT/DST combinations.)
In MP3, the MDCT is not applied to the audio signal directly, but rather to the output of a 32band polyphase quadrature filter (PQF) bank. The output of this MDCT is postprocessed by an alias reduction formula to reduce the typical aliasing of the PQF filter bank. Such a combination of a filter bank with an MDCT is called a hybrid filter bank or a subband MDCT. AAC, on the other hand, normally uses a pure MDCT; only the (rarely used) MPEG4 AACSSR variant (by Sony) uses a fourband PQF bank followed by an MDCT. Similar to MP3, ATRAC uses stacked quadrature mirror filters (QMF) followed by an MDCT.
Contents

Definition 1

Inverse transform 1.1

Computation 1.2

Window functions 2

Relationship to DCTIV and Origin of TDAC 3

Origin of TDAC 3.1

Smoothness and discontinuities 3.2

TDAC for the windowed MDCT 3.3

See also 4

References 5
Definition
As a lapped transform, the MDCT is a bit unusual compared to other Fourierrelated transforms in that it has half as many outputs as inputs (instead of the same number). In particular, it is a linear function F\colon \mathbf{R}^{2N} \to \mathbf{R}^N (where R denotes the set of real numbers). The 2N real numbers x_{0}, ..., x_{2N1} are transformed into the N real numbers X_{0}, ..., X_{N1} according to the formula:

X_k = \sum_{n=0}^{2N1} x_n \cos \left[\frac{\pi}{N} \left(n+\frac{1}{2}+\frac{N}{2}\right) \left(k+\frac{1}{2}\right) \right]
(The normalization coefficient in front of this transform, here unity, is an arbitrary convention and differs between treatments. Only the product of the normalizations of the MDCT and the IMDCT, below, is constrained.)
Inverse transform
The inverse MDCT is known as the IMDCT. Because there are different numbers of inputs and outputs, at first glance it might seem that the MDCT should not be invertible. However, perfect invertibility is achieved by adding the overlapped IMDCTs of subsequent overlapping blocks, causing the errors to cancel and the original data to be retrieved; this technique is known as timedomain aliasing cancellation (TDAC).
The IMDCT transforms N real numbers X_{0}, ..., X_{N1} into 2N real numbers y_{0}, ..., y_{2N1} according to the formula:

y_n = \frac{1}{N} \sum_{k=0}^{N1} X_k \cos \left[\frac{\pi}{N} \left(n+\frac{1}{2}+\frac{N}{2}\right) \left(k+\frac{1}{2}\right) \right]
(Like for the DCTIV, an orthogonal transform, the inverse has the same form as the forward transform.)
In the case of a windowed MDCT with the usual window normalization (see below), the normalization coefficient in front of the IMDCT should be multiplied by 2 (i.e., becoming 2/N).
Computation
Although the direct application of the MDCT formula would require O(N^{2}) operations, it is possible to compute the same thing with only O(N log N) complexity by recursively factorizing the computation, as in the fast Fourier transform (FFT). One can also compute MDCTs via other transforms, typically a DFT (FFT) or a DCT, combined with O(N) pre and postprocessing steps. Also, as described below, any algorithm for the DCTIV immediately provides a method to compute the MDCT and IMDCT of even size.
Window functions
In typical signalcompression applications, the transform properties are further improved by using a window function w_{n} (n = 0, ..., 2N1) that is multiplied with x_{n} and y_{n} in the MDCT and IMDCT formulas, above, in order to avoid discontinuities at the n = 0 and 2N boundaries by making the function go smoothly to zero at those points. (That is, we window the data before the MDCT and after the IMDCT.) In principle, x and y could have different window functions, and the window function could also change from one block to the next (especially for the case where data blocks of different sizes are combined), but for simplicity we consider the common case of identical window functions for equalsized blocks.
The transform remains invertible (that is, TDAC works), for a symmetric window w_{n} = w_{2N1n}, as long as w satisfies the PrincenBradley condition:

w_n^2 + w_{n + N}^2 = 1.
various window functions are used. A window that produces a form known as a modulated lapped transform^{[3]}^{[4]} is given by

w_n = \sin \left[\frac{\pi}{2N} \left(n+\frac{1}{2}\right) \right]
and is used for MP3 and MPEG2 AAC, and

w_n = \sin \left( \frac{\pi}{2} \sin^2 \left[\frac{\pi}{2N} \left(n+\frac{1}{2}\right) \right] \right)
for Vorbis. AC3 uses a KaiserBessel derived (KBD) window, and MPEG4 AAC can also use a KBD window.
Note that windows applied to the MDCT are different from windows used for some other types of signal analysis, since they must fulfill the PrincenBradley condition. One of the reasons for this difference is that MDCT windows are applied twice, for both the MDCT (analysis) and the IMDCT (synthesis).
Relationship to DCTIV and Origin of TDAC
As can be seen by inspection of the definitions, for even N the MDCT is essentially equivalent to a DCTIV, where the input is shifted by N/2 and two Nblocks of data are transformed at once. By examining this equivalence more carefully, important properties like TDAC can be easily derived.
In order to define the precise relationship to the DCTIV, one must realize that the DCTIV corresponds to alternating even/odd boundary conditions: even at its left boundary (around n=−1/2), odd at its right boundary (around n=N−1/2), and so on (instead of periodic boundaries as for a DFT). This follows from the identities \cos\left[\frac{\pi}{N} \left(n1+\frac{1}{2}\right) \left(k+\frac{1}{2}\right)\right] = \cos\left[\frac{\pi}{N} \left(n+\frac{1}{2}\right) \left(k+\frac{1}{2}\right)\right] and \cos\left[\frac{\pi}{N} \left(2Nn1+\frac{1}{2}\right) \left(k+\frac{1}{2}\right)\right] = \cos\left[\frac{\pi}{N} \left(n+\frac{1}{2}\right) \left(k+\frac{1}{2}\right)\right]. Thus, if its inputs are an array x of length N, we can imagine extending this array to (x, −x_{R}, −x, x_{R}, ...) and so on, where x_{R} denotes x in reverse order.
Consider an MDCT with 2N inputs and N outputs, where we divide the inputs into four blocks (a, b, c, d) each of size N/2. If we shift these to the right by N/2 (from the +N/2 term in the MDCT definition), then (b, c, d) extend past the end of the N DCTIV inputs, so we must "fold" them back according to the boundary conditions described above.

Thus, the MDCT of 2N inputs (a, b, c, d) is exactly equivalent to a DCTIV of the N inputs: (−c_{R}−d, a−b_{R}), where R denotes reversal as above.
(In this way, any algorithm to compute the DCTIV can be trivially applied to the MDCT.)
Similarly, the IMDCT formula above is precisely 1/2 of the DCTIV (which is its own inverse), where the output is extended (via the boundary conditions) to a length 2N and shifted back to the left by N/2. The inverse DCTIV would simply give back the inputs (−c_{R}−d, a−b_{R}) from above. When this is extended via the boundary conditions and shifted, one obtains:

IMDCT(MDCT(a, b, c, d)) = (a−b_{R}, b−a_{R}, c+d_{R}, d+c_{R}) / 2.
Half of the IMDCT outputs are thus redundant, as b−a_{R} = −(a−b_{R})_{R}, and likewise for the last two terms. If we group the input into bigger blocks A,B of size N, where A=(a, b) and B=(c, d), we can write this result in a simpler way:

IMDCT(MDCT(A, B)) = (A−A_{R}, B+B_{R}) / 2
One can now understand how TDAC works. Suppose that one computes the MDCT of the subsequent, 50% overlapped, 2N block (B, C). The IMDCT will then yield, analogous to the above: (B−B_{R}, C+C_{R}) / 2. When this is added with the previous IMDCT result in the overlapping half, the reversed terms cancel and one obtains simply B, recovering the original data.
Origin of TDAC
The origin of the term "timedomain aliasing cancellation" is now clear. The use of input data that extend beyond the boundaries of the logical DCTIV causes the data to be aliased in the same way that frequencies beyond the Nyquist frequency are aliased to lower frequencies, except that this aliasing occurs in the time domain instead of the frequency domain: we cannot distinguish the contributions of a and of b_{R} to the MDCT of (a, b, c, d), or equivalently, to the result of IMDCT(MDCT(a, b, c, d)) = (a−b_{R}, b−a_{R}, c+d_{R}, d+c_{R}) / 2. The combinations c−d_{R} and so on, have precisely the right signs for the combinations to cancel when they are added.
For odd N (which are rarely used in practice), N/2 is not an integer so the MDCT is not simply a shift permutation of a DCTIV. In this case, the additional shift by half a sample means that the MDCT/IMDCT becomes equivalent to the DCTIII/II, and the analysis is analogous to the above.
Smoothness and discontinuities
We have seen above that the MDCT of 2N inputs (a, b, c, d) is equivalent to a DCTIV of the N inputs (−c_{R}−d, a−b_{R}). The DCTIV is designed for the case where the function at the right boundary is odd, and therefore the values near the right boundary are close to 0. If the input signal is smooth, this is the case: the rightmost components of a and b_{R} are consecutive in the input sequence (a, b, c, d), and therefore their difference is small. Let us look at the middle of the interval: if we rewrite the above expression as (−c_{R}−d, a−b_{R}) = (−d, a)−(b,c)_{R}, the second term, (b,c)_{R}, gives a smooth transition in the middle. However, in the first term, (−d, a), there is a potential discontinuitiy where the right end of −d meets the left end of a. This is the reason for using a window function that reduces the components near the boundaries of the input sequence (a, b, c, d) towards 0.
TDAC for the windowed MDCT
Above, the TDAC property was proved for the ordinary MDCT, showing that adding IMDCTs of subsequent blocks in their overlapping half recovers the original data. The derivation of this inverse property for the windowed MDCT is only slightly more complicated.
Consider to overlapping consecutive sets of 2N inputs (A,B) and (B,C), for blocks A,B,C of size N. Recall from above that when (A,B) and (B,C) are MDCTed, IMDCTed, and added in their overlapping half, we obtain (B+B_R) / 2 + (BB_R) / 2 = B, the original data.
Now we suppose that we multiply both the MDCT inputs and the IMDCT outputs by a window function of length 2N. As above, we assume a symmetric window function, which is therefore of the form (W,W_R) where W is a lengthN vector and R denotes reversal as before. Then the PrincenBradley condition can be written as W + W_R^2 = (1,1,\ldots), with the squares and additions performed elementwise.
Therefore, instead of MDCTing (A,B), we now MDCT (WA,W_R B) (with all multiplications performed elementwise). When this is IMDCTed and multiplied again (elementwise) by the window function, the lastN half becomes:

W_R \cdot (W_R B+(W_R B)_R) =W_R \cdot (W_R B+W B_R) = W_R^2 B+WW_R B_R.
(Note that we no longer have the multiplication by 1/2, because the IMDCT normalization differs by a factor of 2 in the windowed case.)
Similarly, the windowed MDCT and IMDCT of (B,C) yields, in its firstN half:

W \cdot (WB  W_R B_R) = W^2 B  W W_R B_R.
When we add these two halves together, we obtain:

(W_R^2 B+WW_R B_R) + (W^2 B  W W_R B_R)= \left(W_R^2 + W^2\right)B = B,
recovering the original data.
See also
Other overlapping windowed Fourier transforms include:
References

^ J. P. Princen, A. W. Johnson und A. B. Bradley: Subband/transform coding using filter bank designs based on time domain aliasing cancellation, IEEE Proc. Intl. Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2161–2164, 1987. Initial description of what is now called the MDCT.

^ John P. Princen, Alan B. Bradley: Analysis/synthesis filter bank design based on time domain aliasing cancellation, IEEE Trans. Acoust. Speech Signal Processing, ASSP34 (5), 1153–1161, 1986. Described a precursor to the MDCT using a combination of discrete cosine and sine transforms.

^ H. S. Malvar, "Lapped Transforms for Efficient Transform/Subband Coding", IEEE Trans. on Acoustics, Speech, and Signal Processing, vol. 38, no. 6, pp. 969–978 (Equation 22), June 1990.

^ H. S. Malvar, "Modulated QMF Filter Banks with Perfect Reconstruction", Electronics Letters, vol. 26, no. 13, pp. 906–907 (Equation 13), June 1990.

Henrique S. Malvar, Signal Processing with Lapped Transforms (Artech House: Norwood MA, 1992).

A. W. Johnson and A. B. Bradley, "Adaptive transform coding incorporating time domain aliasing cancellation," Speech Comm. 6, 299308 (1987).

For algorithms, see e.g.:

ChiMin Liu and WenChieh Lee, "A unified fast algorithm for cosine modulated filterbanks in current audio standards", J. Audio Engineering 47 (12), 10611075 (1999).

V. Britanak and K. R. Rao, "A new fast algorithm for the unified forward and inverse MDCT/MDST computation," Signal Processing 82, 433459 (2002)

Vladimir Nikolajevic and Gerhard Fettweis, "Computation of forward and inverse MDCT using Clenshaw's recurrence formula," IEEE Trans. Sig. Proc. 51 (5), 14391444 (2003)

CheHong Chen, BinDa Liu, and JarFerr Yang, "Recursive architectures for realizing modified discrete cosine transform and its inverse," IEEE Trans. Circuits Syst. II: Analog Dig. Sig. Proc. 50 (1), 3845 (2003)

J.S. Wu, H.Z. Shu, L. Senhadji, and L.M. Luo, "Mixedradix algorithm for the computation of forward and inverse MDCTs," IEEE Trans. Circuits Syst. I: Reg. Papers 56 (4), 784794 (2009)

V. Britanak, "A survey of efficient MDCT implementations in MP3 audio coding standard: retrospective and stateoftheart," Signal. Process. 91 (4), 624672(2011)

...and references thereof.
This article was sourced from Creative Commons AttributionShareAlike License; additional terms may apply. World Heritage Encyclopedia content is assembled from numerous content providers, Open Access Publishing, and in compliance with The Fair Access to Science and Technology Research Act (FASTR), Wikimedia Foundation, Inc., Public Library of Science, The Encyclopedia of Life, Open Book Publishers (OBP), PubMed, U.S. National Library of Medicine, National Center for Biotechnology Information, U.S. National Library of Medicine, National Institutes of Health (NIH), U.S. Department of Health & Human Services, and USA.gov, which sources content from all federal, state, local, tribal, and territorial government publication portals (.gov, .mil, .edu). Funding for USA.gov and content contributors is made possible from the U.S. Congress, EGovernment Act of 2002.
Crowd sourced content that is contributed to World Heritage Encyclopedia is peer reviewed and edited by our editorial staff to ensure quality scholarly research articles.
By using this site, you agree to the Terms of Use and Privacy Policy. World Heritage Encyclopedia™ is a registered trademark of the World Public Library Association, a nonprofit organization.