In the theory of stochastic processes, the Karhunen–Loève theorem (named after Kari Karhunen and Michel Loève), also known as the Kosambi–Karhunen–Loève theorem^{[1]}^{[2]} is a representation of a stochastic process as an infinite linear combination of orthogonal functions, analogous to a Fourier series representation of a function on a bounded interval. Stochastic processes given by infinite series of this form were first^{[3]} considered by Damodar Dharmananda Kosambi.^{[4]} There exist many such expansions of a stochastic process: if the process is indexed over [a, b], any orthonormal basis of L^{2}([a, b]) yields an expansion thereof in that form. The importance of the Karhunen–Loève theorem is that it yields the best such basis in the sense that it minimizes the total mean squared error.
In contrast to a Fourier series where the coefficients are fixed numbers and the expansion basis consists of sinusoidal functions (that is, sine and cosine functions), the coefficients in the Karhunen–Loève theorem are random variables and the expansion basis depends on the process. In fact, the orthogonal basis functions used in this representation are determined by the covariance function of the process. One can think that the Karhunen–Loève transform adapts to the process in order to produce the best possible basis for its expansion.
In the case of a centered stochastic process {X_{t}}_{t ∈ [a, b]} (centered means E[X_{t}] = 0 for all t ∈ [a, b]) satisfying a technical continuity condition, X_{t} admits a decomposition

X_t = \sum_{k=1}^\infty Z_k e_k(t)
where Z_{k} are pairwise uncorrelated random variables and the functions e_{k} are continuous realvalued functions on [a, b] that are pairwise orthogonal in L^{2}([a, b]). It is therefore sometimes said that the expansion is biorthogonal since the random coefficients Z_{k} are orthogonal in the probability space while the deterministic functions e_{k} are orthogonal in the time domain. The general case of a process X_{t} that is not centered can be brought back to the case of a centered process by considering X_{t} − E[X_{t}] which is a centered process.
Moreover, if the process is Gaussian, then the random variables Z_{k} are Gaussian and stochastically independent. This result generalizes the Karhunen–Loève transform. An important example of a centered real stochastic process on [0, 1] is the Wiener process; the Karhunen–Loève theorem can be used to provide a canonical orthogonal representation for it. In this case the expansion consists of sinusoidal functions.
The above expansion into uncorrelated random variables is also known as the Karhunen–Loève expansion or Karhunen–Loève decomposition. The empirical version (i.e., with the coefficients computed from a sample) is known as the Karhunen–Loève transform (KLT), principal component analysis, proper orthogonal decomposition (POD), Empirical orthogonal functions (a term used in meteorology and geophysics), or the Hotelling transform.
Contents

Formulation 1

Statement of the theorem 2

Proof 3

Properties of the Karhunen–Loève transform 4

Special case: Gaussian distribution 4.1

The Karhunen–Loève transform decorrelates the process 4.2

The Karhunen–Loève expansion minimizes the total mean square error 4.3

Explained variance 4.4

The Karhunen–Loève expansion has the minimum representation entropy property 4.5

Linear KarhunenLoeve Approximations 5

NonLinear Approximation in Bases 6

Nonoptimality of KarhunenLoéve Bases 6.1

Principal component analysis 7

Covariance matrix 7.1

Principal component transform 7.2

Examples 8

The Wiener process 8.1

The Brownian bridge 8.2

Applications 9

Applications in signal estimation and detection 9.1

Detection of a known continuous signal S(t) 9.1.1

Signal detection in white noise 9.1.2

Signal detection in colored noise 9.1.3

How to find k(t) 9.1.3.1

Test threshold for Neyman–Pearson detector 9.1.3.2

Prewhitening 9.1.3.3

Detection of a gaussian random signal in Additive white Gaussian noise (AWGN) 9.1.4

See also 10

Notes 11

References 12

External links 13
Formulation

Throughout this article, we will consider a squareintegrable zeromean random process X_{t} defined over a probability space (Ω, F, P) and indexed over a closed interval [a, b], with covariance function K_{X}(s, t). We thus have:


\forall t\in [a,b] \qquad X_t\in L^2(\Omega, F,\mathbf{P}),

\forall t\in [a,b] \qquad \mathbf{E}[X_t]=0,

\forall t,s \in [a,b] \qquad K_X(s,t)=\mathbf{E}[X_s X_t].

We associate to K_{X} a linear operator T_{KX} defined in the following way:


\begin{cases} T_{K_X}: L^2([a,b]) \to L^2([a,b])\\ f \mapsto \int_a^b K_X(s,\cdot) f(s) ds \end{cases}

Since T_{KX} is a linear operator, it makes sense to talk about its eigenvalues λ_{k} and eigenfunctions e_{k}, which are found solving the homogeneous Fredholm integral equation of the second kind

\int_a^b K_X(s,t) e_k(s)\,ds=\lambda_k e_k(t)
Statement of the theorem
Theorem. Let X_{t} be a zeromean squareintegrable stochastic process defined over a probability space (Ω, F, P) and indexed over a closed and bounded interval [a, b], with continuous covariance function K_{X}(s, t).
Then K_{X}(s,t) is a Mercer kernel and letting e_{k} be an orthonormal basis of L^{2}([a, b]) formed by the eigenfunctions of T_{KX} with respective eigenvalues λ_{k}, X_{t} admits the following representation

X_t=\sum_{k=1}^\infty Z_k e_k(t)
where the convergence is in L^{2}, uniform in t and

Z_k=\int_a^b X_t e_k(t)\, dt
Furthermore, the random variables Z_{k} have zeromean, are uncorrelated and have variance λ_{k}

\mathbf{E}[Z_k]=0,~\forall k\in\mathbb{N} \qquad \mbox{and}\qquad \mathbf{E}[Z_i Z_j]=\delta_{ij} \lambda_j,~\forall i,j\in \mathbb{N}
Note that by generalizations of Mercer's theorem we can replace the interval [a, b] with other compact spaces C and the Lebesgue measure on [a, b] with a Borel measure whose support is C.
Proof

The covariance function K_{X} satisfies the definition of a Mercer kernel. By Mercer's theorem, there consequently exists a set {λ_{k}, e_{k}(t)} of eigenvalues and eigenfunctions of T_{KX} forming an orthonormal basis of L^{2}([a,b]), and K_{X} can be expressed as


K_X(s,t)=\sum_{k=1}^\infty \lambda_k e_k(s) e_k(t)

The process X_{t} can be expanded in terms of the eigenfunctions e_{k} as:


X_t=\sum_{k=1}^\infty Z_k e_k(t)

where the coefficients (random variables) Z_{k} are given by the projection of X_{t} on the respective eigenfunctions

Z_k=\int_a^b X_t e_k(t) \,dt


\begin{align} \mathbf{E}[Z_k] &=\mathbf{E}\left[\int_a^b X_t e_k(t) \,dt\right]=\int_a^b \mathbf{E}[X_t] e_k(t) dt=0 \\ [8pt] \mathbf{E}[Z_i Z_j]&=\mathbf{E}\left[ \int_a^b \int_a^b X_t X_s e_j(t)e_i(s) dt\, ds\right]\\ &=\int_a^b \int_a^b \mathbf{E}\left[X_t X_s\right] e_j(t)e_i(s) dt\, ds\\ &=\int_a^b \int_a^b K_X(s,t) e_j(t)e_i(s) dt \, ds\\ &=\int_a^b e_i(s)\left(\int_a^b K_X(s,t) e_j(t) dt\right) ds\\ &=\lambda_j \int_a^b e_i(s) e_j(s) ds\\ &=\delta_{ij}\lambda_j \end{align}

where we have used the fact that the e_{k} are eigenfunctions of T_{KX} and are orthonormal.

Let us now show that the convergence is in L^{2}. Let


S_N=\sum_{k=1}^N Z_k e_k(t).

Then:

\begin{align} \mathbf{E} \left [\left X_tS_N \right ^2 \right ]&=\mathbf{E} \left [X_t^2 \right ]+\mathbf{E} \left [S_N^2 \right ]  2\mathbf{E} \left [X_t S_N \right ]\\ &=K_X(t,t)+\mathbf{E}\left[\sum_{k=1}^N \sum_{l=1}^N Z_k Z_l e_k(t)e_l(t) \right] 2\mathbf{E}\left[X_t\sum_{k=1}^N Z_k e_k(t)\right]\\ &=K_X(t,t)+\sum_{k=1}^N \lambda_k e_k(t)^2 2\mathbf{E}\left[\sum_{k=1}^N \int_a^b X_t X_s e_k(s) e_k(t) ds\right]\\ &=K_X(t,t)\sum_{k=1}^N \lambda_k e_k(t)^2 \end{align}

which goes to 0 by Mercer's theorem.
Properties of the Karhunen–Loève transform
Special case: Gaussian distribution
Since the limit in the mean of jointly Gaussian random variables is jointly Gaussian, and jointly Gaussian random (centered) variables are independent if and only if they are orthogonal, we can also conclude:
Theorem. The variables Z_{i} have a joint Gaussian distribution and are stochastically independent if the original process {X_{t}}_{t} is Gaussian.
In the Gaussian case, since the variables Z_{i} are independent, we can say more:

\lim_{N \to \infty} \sum_{i=1}^N e_i(t) Z_i(\omega) = X_t(\omega)
almost surely.
The Karhunen–Loève transform decorrelates the process
This is a consequence of the independence of the Z_{k}.
The Karhunen–Loève expansion minimizes the total mean square error
In the introduction, we mentioned that the truncated Karhunen–Loeve expansion was the best approximation of the original process in the sense that it reduces the total meansquare error resulting of its truncation. Because of this property, it is often said that the KL transform optimally compacts the energy.
More specifically, given any orthonormal basis {f_{k}} of L^{2}([a, b]), we may decompose the process X_{t} as:

X_t(\omega)=\sum_{k=1}^\infty A_k(\omega) f_k(t)
where

A_k(\omega)=\int_a^b X_t(\omega) f_k(t)\,dt
and we may approximate X_{t} by the finite sum

\hat{X}_t(\omega)=\sum_{k=1}^N A_k(\omega) f_k(t)
for some integer N.
Claim. Of all such approximations, the KL approximation is the one that minimizes the total mean square error (provided we have arranged the eigenvalues in decreasing order).
[Proof]
Consider the error resulting from the truncation at the Nth term in the following orthonormal expansion:

\varepsilon_N(t)=\sum_{k=N+1}^\infty A_k(\omega) f_k(t)
The meansquare error ε_{N}^{2}(t) can be written as:

\begin{align} \varepsilon_N^2(t)&=\mathbf{E} \left[\sum_{i=N+1}^\infty \sum_{j=N+1}^\infty A_i(\omega) A_j(\omega) f_i(t) f_j(t)\right]\\ &=\sum_{i=N+1}^\infty \sum_{j=N+1}^\infty \mathbf{E}\left[\int_a^b \int_a^b X_t X_s f_i(t)f_j(s) ds\, dt\right] f_i(t) f_j(t)\\ &=\sum_{i=N+1}^\infty \sum_{j=N+1}^\infty f_i(t) f_j(t) \int_a^b \int_a^b K_X(s,t) f_i(t)f_j(s) ds\, dt \end{align}
We then integrate this last equality over [a, b]. The orthonormality of the f_{k} yields:

\int_a^b \varepsilon_N^2(t) dt=\sum_{k=N+1}^\infty \int_a^b \int_a^b K_X(s,t) f_k(t)f_k(s) ds\, dt
The problem of minimizing the total meansquare error thus comes down to minimizing the right hand side of this equality subject to the constraint that the f_{k} be normalized. We hence introduce β_{k}, the Lagrangian multipliers associated with these constraints, and aim at minimizing the following function:

Er[f_k(t),k\in\{N+1,\ldots\}]=\sum_{k=N+1}^\infty \int_a^b \int_a^b K_X(s,t) f_k(t)f_k(s) ds dt\beta_k \left(\int_a^b f_k(t) f_k(t) dt 1\right)
Differentiating with respect to f_{i}(t) and setting the derivative to 0 yields:

\frac{\partial Er}{\partial f_i(t)}=\int_a^b \left(\int_a^b K_X(s,t) f_i(s) ds \beta_i f_i(t)\right)dt=0
which is satisfied in particular when

\int_a^b K_X(s,t) f_i(s) \,ds =\beta_i f_i(t).
In other words when the f_{k} are chosen to be the eigenfunctions of T_{KX}, hence resulting in the KL expansion.
Explained variance
An important observation is that since the random coefficients Z_{k} of the KL expansion are uncorrelated, the Bienaymé formula asserts that the variance of X_{t} is simply the sum of the variances of the individual components of the sum:

\mbox{Var}[X_t]=\sum_{k=0}^\infty e_k(t)^2 \mbox{Var}[Z_k]=\sum_{k=1}^\infty \lambda_k e_k(t)^2
Integrating over [a, b] and using the orthonormality of the e_{k}, we obtain that the total variance of the process is:

\int_a^b \mbox{Var}[X_t] dt=\sum_{k=1}^\infty \lambda_k
In particular, the total variance of the Ntruncated approximation is

\sum_{k=1}^N \lambda_k.
As a result, the Ntruncated expansion explains

\frac{\sum_{k=1}^N \lambda_k}{\sum_{k=1}^\infty \lambda_k}
of the variance; and if we are content with an approximation that explains, say, 95% of the variance, then we just have to determine an N\in\mathbb{N} such that

\frac{\sum_{k=1}^N \lambda_k}{\sum_{k=1}^\infty \lambda_k} \geq 0.95.
The Karhunen–Loève expansion has the minimum representation entropy property
Linear KarhunenLoeve Approximations
Let us consider a whole class of signals we want to approximate over the first M vectors of a basis. These signals are modeled as realizations of a random vector Y[n] of size N. To optimize the approximation we design a basis that minimizes the average approximation error. This section proves that optimal bases are karhunenloeve bases that diagonalize the covariance matrix of Y. The random vector Y can be decomposed in an orthogonal basis

\left\{ g_m \right\}_{0\le m\le N}
as follows:

Y=\sum_{m=0}^{N1} \left\langle Y, g_m \right\rangle g_m,
where each

\left\langle Y, g_m \right\rangle =\sum_{n=0}^{N1}{Y[n]} g_m^* [n]
is a random variable. The approximation from the first M ≤ N vectors of the basis is

Y_M=\sum_{m=0}^{M1} \left\langle Y, g_m \right\rangle g_m
The energy conservation in an orthogonal basis implies

\varepsilon[M]= \mathbf{E} \left\{ \left\ Y Y_M \right\^2 \right\} =\sum_{m=M}^{N1} \mathbf{E}\left\{ \left \left\langle Y, g_m \right\rangle \right^2 \right\}
This error is related to the covariance of Y defined by

R[ n,m]=\mathbf{E} \left\{ Y[n] Y^*[m] \right\}
For any vector x[n] we denote by K the covariance operator represented by this matrix,

\mathbf{E}\left\{\left\langle Y,x \rangle\right^2\right\}=\langle Kx,x \rangle =\sum_{n=0}^{N1}\sum_{m=0}^{N1}R[n,m]x[n]x^*[m]
The error ε[M] is therefore a sum of the last N − M coefficients of the covariance operator

\varepsilon [M]=\sum_{m=M}^{N1}{\left\langle K g_m, g_m \right\rangle }
The covariance operator K is Hermitian and Positive and is thus diagonalized in an orthogonal basis called a KarhunenLoeve basis. The following theorem states that a KarhunenLoeve basis is optimal for linear approximations.
Theorem (Optimality of KarhunenLoeve Basis). Let K be acovariance operator. For all M ≥ 1, the approximation error

\varepsilon [M]=\sum_{m=M}^{N1}\left\langle K g_m, g_m \right\rangle
is minimum if and only if

\left\{ g_m \right\}_{0\le m
is a KarhunenLoeve basis ordered by decreasing eigenvalues.

\left\langle K g_m, g_m \right\rangle \ge \left\langle Kg_{m+1}, g_{m+1} \right\rangle, \qquad 0\le m
NonLinear Approximation in Bases
Linear approximations project the signal on M vectors a priori. The approximation can be made more precise by choosing the M orthogonal vectors depending on the signal properties. This section analyzes the general performance of these nonlinear approximations. A signal f\in \Eta is approximated with M vectors selected adaptively in an orthonormal basis for \Eta

\Beta =\left\{ g_m \right\}_{m\in \mathbb{N}}
Let f_M be the projection of f over M vectors whose indices are in I_{M}:

f_M=\sum_{m\in I_M} \left\langle f, g_m \right\rangle g_m
The approximation error is the sum of the remaining coefficients

\varepsilon [M]=\left\{ \left\ f f_M \right\^2 \right\}=\sum_{m\notin I_M}^{N1} \left\{ \left \left\langle f, g_m \right\rangle \right^2 \right\}
To minimize this error, the indices in I_{M} must correspond to the M vectors having the largest inner product amplitude

\left \left\langle f, g_m \right\rangle \right.
These are the vectors that best correlate f. They can thus be interpreted as the main features of f. The resulting error is necessarily smaller than the error of a linear approximation which selects the M approximation vectors independently of f. Let us sort

\left\{ \left \left\langle f, g_m \right\rangle \right \right\}_{m\in \mathbb{N}}
in decreasing order

\left \left \langle f, g_{m_k} \right \rangle \right\ge \left \left \langle f, g_{m_{k+1}} \right \rangle \right.
The best nonlinear approximation is

f_M=\sum_{k=1}^M \left\langle f, g_{m_k} \right\rangle g_{m_k}
It can also be written as inner product thresholding:

f_M=\sum_{m=0}^{\infty} \theta_T \left( \left\langle f, g_m \right\rangle \right) g_m
with

T=\left\left\langle f, g_{m_M} \right \rangle\right, \qquad \theta_T(x)= \begin{cases} x & x\ge T \\ 0 & x < T \end{cases}
The nonlinear error is

\varepsilon [M]=\left\{ \left\ f f_M \right\^2 \right\}=\sum_{k=M+1}^{\infty} \left\{ \left \left\langle f, g_{m_k} \right\rangle \right^2 \right\}
this error goes quickly to zero as M increases, if the sorted values of \left \left\langle f, g_{m_k} \right\rangle \right have a fast decay as k increases. This decay is quantified by computing the \Iota^\Rho norm of the signal inner products in B:

\ f \_{\Beta, p} =\left( \sum_{m=0}^{\infty} \left \left\langle f, g_m \right\rangle \right^p \right)^{\frac{1}{p}}
The following theorem relates the decay of ε[M] to \ f\_{\Beta, p}
Theorem (decay of error). If \ f\_{\Beta ,p}<\infty with p < 2 then

\varepsilon [M]\le \frac{\f\_{\Beta ,p}^2}{\frac{2}{p}1} M^{1\frac{2}{p}}
and

\varepsilon [M]=o\left( M^{1\frac{2}{p}} \right).
Conversely, if \varepsilon [M]=o\left( M^{1\frac{2}{p}} \right) then
\ f\_{\Beta ,q}<\infty for any q > p.
Nonoptimality of KarhunenLoéve Bases
To further illustrate the differences between linear and nonlinear approximations, we study the decomposition of a simple nonGaussian random vector in a KarhunenLoéve basis. Processes whose realizations have a random translation are stationary. The KarhunenLoéve basis is then a Fourier basis and we study its performance. To simplify the analysis, consider a random vector Y[n] of size N that is random shift modulo N of a deterministic signal f[n] of zero mean

\sum_{n=0}^{N1}f[n]=0

Y[n]=f [ ( np)\bmod N ]
The random shift P is uniformly distributed on [0,N1]:

\Pr ( P=p )=\frac{1}{N}, \qquad 0\le p
Clearly

\mathbf{E}\{ Y[n]\}=\frac{1}{N} \sum_{p=0}^{N1} f[( np)\bmod N]=0
and

R[ n,k]=\mathbf{E} \{ Y[n]Y[k] \}=\frac{1}{N}\sum_{p=0}^{N1} f[( np)\bmod N] f [(kp)\bmod N ] =\frac{1}{N}f\Theta \bar{f}[ nk], \quad \bar{f}[n]=f[ n]
Hence

R[ n,k]=R_Y[nk], \qquad R_Y[k]=\frac{1}{N}f \Theta \bar{f}[k]
Since R_{Y} is N periodic, Y is a circular stationary random vector. The covariance operator is a circular convolution with R_{Y} and is therefore diagonalized in the discrete Fourier KarhunenLoéve basis

\left\{ \frac{1}{\sqrt{N}} e^{\frac{i2\pi mn}{N}} \right\}_{0\le m
The power spectrum is Fourier Transform of R_{Y}:

P_Y[m]\hat{R}_Y[m]=\frac{1}{N} \left \hat{f}[m] \right^2
Example: Consider an extreme case where f[n]=\delta [n]\delta [n1]. A theorem stated above guarantees that the Fourier KarhunenLoéve basis produces a smaller expected approximation error than a canonical basis of Diracs \left\{g_m[n]=\delta[ nm] \right\}_{0\le m. Indeed we do not know a priori the abscissa of the nonzero coefficients of Y, so there is no particular Dirac that is better adapted to perform the approximation. But the Fourier vectors cover the whole support of Y and thus absorb a part of the signal energy.

\mathbf{E} \left\{ \left \left\langle Y[n],\frac{1}{\sqrt{N}} e^{\frac{i2\pi mn}{N}} \right\rangle \right^2 \right\}=P_Y[m] = \frac{4}{N}\sin^2 \left(\frac{\pi k}{N} \right)
Selecting higher frequency Fourier coefficients yields a better meansquare approximation than choosing a priori a few Dirac vectors to perform the approximation. The situation is totally different for nonlinear approximations. If f[n]=\delta[n]\delta[n1] then the discrete Fourier basis is extremely inefficient because f and hence Y have an energy that is almost uniformly spread among all Fourier vectors. In contrast, since f has only two nonzero coefficients in the Dirac basis, a nonlinear approximation of Y with M ≥ 2 gives zero error.^{[5]}
Principal component analysis
We have established the Karhunen–Loève theorem and derived a few properties thereof. We also noted that one hurdle in its application was the numerical cost of determining the eigenvalues and eigenfunctions of its covariance operator through the Fredholm integral equation of the second kind

\int_a^b K_X(s,t) e_k(s)\,ds=\lambda_k e_k(t).
However, when applied to a discrete and finite process \left(X_n\right)_{n\in\{1,\ldots,N\}}, the problem takes a much simpler form and standard algebra can be used to carry out the calculations.
Note that a continuous process can also be sampled at N points in time in order to reduce the problem to a finite version.
We henceforth consider a random Ndimensional vector X=\left(X_1~X_2~\ldots~X_N\right)^T. As mentioned above, X could contain N samples of a signal but it can hold many more representations depending on the field of application. For instance it could be the answers to a survey or economic data in an econometrics analysis.
As in the continuous version, we assume that X is centered, otherwise we can let X:=X\mu_X (where \mu_X is the mean vector of X) which is centered.
Let us adapt the procedure to the discrete case.
Covariance matrix
Recall that the main implication and difficulty of the KL transformation is computing the eigenvectors of the linear operator associated to the covariance function, which are given by the solutions to the integral equation written above.
Define Σ, the covariance matrix of X, as an N × N matrix whose elements are given by:

\Sigma_{ij}= \mathbf{E}[X_i X_j],\qquad \forall i,j \in \{1,\ldots,N\}
Rewriting the above integral equation to suit the discrete case, we observe that it turns into:

\sum_{i=1}^N \Sigma_{ij} e_j=\lambda e_i \quad \Leftrightarrow \quad \Sigma e=\lambda e
where e=(e_1~e_2~\ldots~e_N)^T is an Ndimensional vector.
The integral equation thus reduces to a simple matrix eigenvalue problem, which explains why the PCA has such a broad domain of applications.
Since Σ is a positive definite symmetric matrix, it possesses a set of orthonormal eigenvectors forming a basis of \R^N, and we write \{\lambda_i,\phi_i\}_{i\in\{1,\ldots,N\}} this set of eigenvalues and corresponding eigenvectors, listed in decreasing values of λ_{i}. Let also Φ be the orthonormal matrix consisting of these eigenvectors:

\begin{align} \Phi &:=\left(\phi_1~\phi_2~\ldots~\phi_N\right)^T\\ \Phi^T \Phi &=I \end{align}
Principal component transform
It remains to perform the actual KL transformation, called the principal component transform in this case. Recall that the transform was found by expanding the process with respect to the basis spanned by the eigenvectors of the covariance function. In this case, we hence have:

X =\sum_{i=1}^N \langle \phi_i,X\rangle \phi_i =\sum_{i=1}^N \phi_i^T X \phi_i
In a more compact form, the principal component transform of X is defined by:

\begin{cases} Y=\Phi^T X \\ X=\Phi Y \end{cases}
The ith component of Y is Y_i=\phi_i^T X, the projection of X on \phi_i and the inverse transform X = ΦY yields the expansion of X on the space spanned by the \phi_i:

X=\sum_{i=1}^N Y_i \phi_i=\sum_{i=1}^N \langle \phi_i,X\rangle \phi_i
As in the continuous case, we may reduce the dimensionality of the problem by truncating the sum at some K\in\{1,\ldots,N\} such that

\frac{\sum_{i=1}^K \lambda_i}{\sum_{i=1}^N \lambda_i}\geq \alpha
where α is the explained variance threshold we wish to set.
We can also reduce the dimensionality through the use of multilevel dominant eigenvector estimation (MDEE).^{[6]}
Examples
The Wiener process
There are numerous equivalent characterizations of the Wiener process which is a mathematical formalization of Brownian motion. Here we regard it as the centered standard Gaussian process W_{t} with covariance function

K_{W}(t,s) = \operatorname{Cov}(W_t,W_s) = \min (s,t).
We restrict the time domain to [a, b]=[0,1] without loss of generality.
The eigenvectors of the covariance kernel are easily determined. These are

e_k(t) = \sqrt{2} \sin \left( \left(k  \tfrac{1}{2}\right) \pi t \right)
and the corresponding eigenvalues are

\lambda_k = \frac{1}{(k \frac{1}{2})^2 \pi^2}.
[Proof]
In order to find the eigenvalues and eigenvectors, we need to solve the integral equation:

\begin{align} \int_a^b K_W(s,t) e(s) ds &=\lambda e(t)\qquad \forall t, 0\leq t\leq 1\\ \int_0^1 \min(s,t) e(s) ds &=\lambda e(t)\qquad \forall t, 0\leq t\leq 1 \\ \int_0^t s e(s) ds + t \int_t^1 e(s) ds &= \lambda e(t) \qquad \forall t, 0\leq t\leq 1 \end{align}
differentiating once with respect to t yields:

\int_{t}^1 e(s) ds=\lambda e'(t)
a second differentiation produces the following differential equation:

e(t)=\lambda e''(t)
The general solution of which has the form:

e(t)=A\sin\left(\frac{t}{\sqrt{\lambda}}\right)+B\cos\left(\frac{t}{\sqrt{\lambda}}\right)
where A and B are two constants to be determined with the boundary conditions. Setting t=0 in the initial integral equation gives e(0)=0 which implies that B=0 and similarly, setting t=1 in the first differentiation yields e' (1)=0, whence:

\cos\left(\frac{1}{\sqrt{\lambda}}\right)=0
which in turn implies that eigenvalues of T_{KX} are:

\lambda_k=\left(\frac{1}{(k\frac{1}{2})\pi}\right)^2,\qquad k\geq 1
The corresponding eigenfunctions are thus of the form:

e_k(t)=A \sin\left((k\frac{1}{2})\pi t\right),\qquad k\geq 1
A is then chosen so as to normalize e_{k}:

\int_0^1 e_k^2(t) dt=1\quad \implies\quad A=\sqrt{2}
This gives the following representation of the Wiener process:
Theorem. There is a sequence {Z_{i}}_{i} of independent Gaussian random variables with mean zero and variance 1 such that

W_t = \sqrt{2} \sum_{k=1}^\infty Z_k \frac{\sin \left(\left(k  \frac{1}{2}\right) \pi t\right)}{ \left(k  \frac{1}{2}\right) \pi}.
Note that this representation is only valid for t\in[0,1]. On larger intervals, the increments are not independent. As stated in the theorem, convergence is in the L^{2} norm and uniform in t.
The Brownian bridge
Similarly the Brownian bridge B_t=W_ttW_1 which is a stochastic process with covariance function

K_B(t,s)=\min(t,s)ts
can be represented as the series

B_t = \sum_{k=1}^\infty Z_k \frac{\sqrt{2} \sin(k \pi t)}{k \pi}
Applications
Adaptive optics systems sometimes use K–L functions to reconstruct wavefront phase information (Dai 1996, JOSA A). Karhunen–Loève expansion is closely related to the Singular Value Decomposition. The latter has myriad applications in image processing, radar, seismology, and the like. If one has independent vector observations from a vector valued stochastic process then the left singular vectors are maximum likelihood estimates of the ensemble KL expansion.
Applications in signal estimation and detection
Detection of a known continuous signal S(t)
In communication, we usually have to decide whether a signal from a noisy channel contains valuable information. The following hypothesis testing is used for detecting continuous signal s(t) from channel output X(t), N(t) is the channel noise, which is usually assumed zero mean gaussian process with correlation function R_{N} (t, s) = E[N(t)N(s)]

H: X(t) = N(t),

K: X(t) = N(t)+s(t), \quad t\in(0,T)
Signal detection in white noise
When the channel noise is white, its correlation function is

R_{N}(t) = \tfrac{1}{2} N_0 \delta (t),
and it has constant power spectrum density. In physically practical channel, the noise power is finite, so:

S_{N}(f) = \begin{cases} \frac{N_0}{2} &fw \end{cases}
Then the noise correlation function is sinc function with zeros at \frac{n}{2\omega}, n \in \mathbf{Z}. Since are uncorrelated and gaussian, they are independent. Thus we can take samples from X(t) with time spacing

\Delta t = \frac{n}{2\omega} within (0,T).
Let X_i = X(i\Delta t). We have a total of n = \frac{T}{\Delta t} = T(2\omega) = 2\omega T i.i.d samples \{X_1, X_2,...,X_n\} to develop the likelihoodratio test. Define signal S_i = S(i\Delta t), the problem becomes,

H: X_i = N_i,

K: X_i = N_i + S_i, i = 1,2...n.
The loglikelihood ratio

\mathcal{L}(\underline{x}) = \log\frac{\sum^n_{i=1} (2S_i x_i  S_i^2)}{2\sigma^2} \Leftrightarrow \Delta t \sum^n_{i = 1} S_i x_i = \sum^n_{i=1} S(i\Delta t)x(i\Delta t)\Delta t \gtrless \lambda_2.
As t → 0, let:

G = \int^T_0 S(t)x(t)dt.
Then G is the test statistics and the Neyman–Pearson optimum detector is

G(\underline{x}) > G_0 \Rightarrow K < G_0 \Rightarrow H.
As G is gaussian, we can characterize it by finding its mean and variances. Then we get

H: G \sim N \left (0,\tfrac{1}{2}N_0E \right )

K: G \sim N \left (E,\tfrac{1}{2}N_0E \right )
where

\mathbf{E} = \int^T_{0} S^2(t)dt
is the signal energy.
The false alarm error

\alpha = \int^{\infty}_{G_0} N \left (0, \tfrac{1}{2}N_0E \right )dG \Rightarrow G_0 = \sqrt{\tfrac{1}{2}N_0E} \Phi^{1}(1\alpha)
And the probability of detection:

\beta = \int^{\infty}_{G_0} N \left (E, \tfrac{1}{2}N_0E \right )dG = 1\Phi \left (\frac{G_0  E}{\sqrt{\tfrac{1}{2}N_0 E}} \right ) = \Phi \left (\sqrt{\frac{2E}{N_0}}  \Phi^{1}(1\alpha) \right ),
where Φ is the cdf of standard normal gaussian variable.
Signal detection in colored noise
When N(t) is colored (correlated in time) gaussian noise with zero mean and covariance function R_N(t,s) = E[X(t)X(s)], we cannot sample independent discrete observations by evenly spacing the time. Instead, we can use K–L expansion to uncorrelate the noise process and get independent gaussian observation 'samples'. The K–L expansion of N(t):

N(t) = \sum^{\infty}_{i=1} N_i \Phi_i(t), \quad 0,
where N_i =\int N(t)\Phi_i(t)dt and the orthonormal bases \{\Phi_i{t}\} are generated by kernel R_N(t,s), i.e., solution to

\int ^T _0 R_N(t,s)\Phi_i(s)ds = \lambda_i \Phi_i(t), var[N_i] = \lambda_i.
Do the expansion:

S(t) = \sum^{\infty}_{i = 1}S_i\Phi_i(t),
where S_i = \int^T _0 S(t)\Phi_i(t)dt, 0, then

X_i = \int^T _0 X(t)\Phi_i(t) dt = N_i
under H and N_i + S_i under K. Let \overline{X} = \{X_1,X_2,\dots\}, we have

N_i are independent gaussian r.v's with variance \lambda_i

under H: \{X_i\} are independent gaussian r.v's.

f_H[x(t)0

under K: \{X_i  S_i\} are independent gaussian r.v's.

f_K[x(t)0
Hence, the logLR is given by

\mathcal{L}(\underline{x}) = \sum^{\infty}_{i=1} \frac{2S_i x_i  S_i^2}{2\lambda_i}
and the optimum detector is

G = \sum^{\infty}_{i=1} S_i x_i \lambda_i > G_0 \Rightarrow K, < G_0 \Rightarrow H.
Define

k(t) = \sum^{\infty}_{i=1} \lambda_i S_i \Phi_i(t), 0
then G = \int^T _0 k(t)x(t)dt.
How to find k(t)
Since

\int^T_0 R_N(t,s)k(s)ds = \sum^{\infty}_{i=1} \lambda_i S_i \int^T _0 R_N(t,s)\Phi_i (s) ds = \sum^{\infty}_{i=1} S_i \Phi_i(t) = S(t),
k(t) is the solution to

\int^T_0 R_N(t,s)k(s)ds = S(t).
If N(t)is widesense stationary,

\int^T_0 R_N(ts)k(s)ds = S(t) ,
which is known as the Wiener–Hopf equation. The equation can be solved by taking fourier transform, but not practically realizable since infinite spectrum needs spatial factorization. A special case which is easy to calculate k(t) is white gaussian noise.

\int^T_0 \frac{N_0}{2}\delta(ts)k(s)ds = S(t) \Rightarrow k(t) = C S(t), 0.
The corresponding impulse response is h(t) = k(Tt) = C S(Tt). Let C = 1, this is just the result we arrived at in previous section for detecting of signal in white noise.
Test threshold for Neyman–Pearson detector
Since X(t) is a gaussian process,

G = \int^T_0 k(t)x(t)dt,
is a gaussian random variable that can be characterized by its mean and variance.

\begin{align} \mathbf{E}[GH] &= \int^T_0 k(t)\mathbf{E}[x(t)H]dt = 0 \\ \mathbf{E}[GK] &= \int^T_0 k(t)\mathbf{E}[x(t)K]dt = \int^T_0 k(t)S(t)dt \equiv \rho \\ \mathbf{E}[G^2H] &= \int^T_0 \int^T_0 k(t)k(s) R_N(t,s)dtds = \int^T_0 k(t) \left (\int^T_0 k(s)R_N(t,s)ds \right )=\int^T_0 k(t)S(t)dt = \rho \\ \text{Var}[GH] &= \mathbf{E}[G^2H]  (\mathbf{E}[GH])^2 = \rho \\ \mathbf{E}[G^2K] &=\int^T_0\int^T_0k(t)k(s) \mathbf{E}[x(t)x(s)]dtds = \int^T_0\int^T_0k(t)k(s)(R_N(t,s) +S(t)S(s))dtds = \rho + \rho^2\\ \text{Var}[GK] &= \mathbf{E}[G^2K]  (\mathbf{E}[GK])^2 = \rho + \rho^2 \rho^2 = \rho \end{align}
Hence, we obtain the distributions of H and K:

H: G \sim N(0,\rho)

K: G \sim N(\rho, \rho)
The false alarm error is

\alpha = \int^{\infty}_{G_0} N(0,\rho)dG = 1  \Phi \left (\frac{G_0}{\sqrt{\rho}} \right ).
So the test threshold for the Neyman–Pearson optimum detector is

G_0 = \sqrt{\rho} \Phi^{1} (1\alpha).
Its power of detection is

\beta = \int^{\infty}_{G_0} N(\rho, \rho)dG = \Phi \left (\sqrt{\rho}  \Phi^{1}(1  \alpha) \right )
When the noise is white gaussian process, the signal power is

\rho = \int^T_0 k(t)S(t)dt = \int^T_0 S(t)^2 dt = E.
Prewhitening
For some type of colored noise, a typical practise is to add a prewhitening filter before the matched filter to transform the colored noise into white noise. For example, N(t) is a widesense stationary colored noise with correlation function

R_N(\tau) = \frac{B N_0}{4} e^{B\tau}

S_N(f) = \frac{N_0}{2(1+(\frac{w}{B})^2)}
The transfer function of prewhitening filter is

H(f) = 1 + j \frac{w}{B}.
When the signal we want to detect from the noisy channel is also random, for example, a white gaussian process X(t), we can still implement K–L expansion to get independent sequence of observation. In this case, the detection problem is described as follows:

H_0 : Y(t) = N(t)

H_1 : Y(t) = N(t) + X(t), 0
X(t) is a random process with correlation function R_X(t,s) = E\{X[t]X[s]\}
The K–L expansion of X(t) is

X(t) = \sum^{\infty}_{i=1} X_i \Phi_i(t),
where

X_i =\int^T_0 X(t) \Phi_i(t). \Phi(t)
are solutions to

\int^T_0 R_X(t,s)\Phi_i(s)ds= \lambda_i \Phi_i(t).
So X_i's are independent sequence of r.v's with zero mean and variance \lambda_i. Expanding Y(t) and N(t) by \Phi_i(t), we get

Y_i = \int^T_0 Y(t)\Phi_i(t)dt = \int^T_0 [N(t) + X(t)]\Phi_i(t) = N_i + X_i,
where

N_i = \int^T_0 N(t)\Phi_i(t)dt.
As N(t) is gaussian white noise, N_i's are i.i.d sequence of r.v with zero mean and variance \tfrac{1}{2}N_0, then the problem is simplified as follows,

H_0: Y_i = N_i

H_1: Y_i = N_i + X_i
The Neyman–Pearson optimal test:

\Lambda = \frac{f_YH_1}{f_YH_0} = Ce^{\sum^{\infty}_{i=1} \frac{y_i^2}{2} \frac{\lambda_i}{\tfrac{1}{2} N_0 (\tfrac{1}{2}N_0 + \lambda_i)} },
so the loglikelihood ratio

\mathcal{L} = \ln(\Lambda) = K \sum^{\infty}_{i=1}\tfrac{1}{2}y_i^2 \frac{\lambda_i}{\frac{N_0}{2}(\frac{N_0}{2} + \lambda_i)}.
Since

\hat{X_i} = \frac{\lambda_i}{\frac{N_0}{2}(\frac{N_0}{2} + \lambda_i)}
is just the minimummeansquare estimate of X_i given Y_i's,

\mathcal{L} = K + \frac{1}{N_0} \sum^{\infty}_{i=1} Y_i \hat{X_i}.
K–L expansion has the following property: If

f(t) = \sum f_i \Phi_i(t), g(t) = \sum g_i \Phi_i(t),
where

f_i = \int_0^T f(t) \Phi_i(t), g_i = \int_0^T g(t)\Phi_i(t).,
then

\sum^{\infty}_{i=1} f_i g_i = \int^T_0 g(t)f(t)dt.
So let

\hat{X(tT)} = \sum^{\infty}_{i=1} \hat{X_i}\Phi_i(t), \quad \mathcal{L} = K + \frac{1}{N_0} \int^T_0 Y(t) \hat{X(tT)}dt.
Noncausal filter Q(t, s) can be used to get the estimate through

\hat{X(tT)} = \int^T_0 Q(t,s)Y(s)ds.
By orthogonality principle, Q(t,s) satisfies

\int^T_0 Q(t,s)R_X(s,t)ds + \tfrac{N_0}{2} Q(t, \lambda) = R_X(t, \lambda), 0 < \lambda < T, 0 .
However for practical reason, it's necessary to further derive the causal filter h(t, s), where h(t, s) = 0 for s > t, to get estimate \hat{X(tt)}. Specifically,

Q(t,s) = h(t,s) + h(s, t)  \int^T_0 h(\lambda, t)h(s, \lambda)d\lambda.
See also
Notes

^ Sapatnekar, Sachin (2011), "Overcoming variations in nanometerscale technologies", IEEE Journal on Emerging and Selected Topics in Circuits and Systems 1 (1): 5–18,

^ Ghoman, Satyajit; Wang, Zhicun; Chen, PC; Kapania, Rakesh (2012), A PODbased Reduced Order Design Scheme for Shape Optimization of Air Vehicles

^ Raju, C.K. (2009), "Kosambi the Mathematician", Economic and Political Weekly 44 (20): 33–45

^ Kosambi, D. D. (1943), "Statistics in Function Space", Journal of the Indian Mathematical Society 7: 76–88, .

^ A wavelet tour of signal processingStéphane Mallat

^ X. Tang, “Texture information in runlength matrices,” IEEE Transactions on Image Processing, vol. 7, No. 11, pp. 1602 1609, Nov. 1998
References

Stark, Henry; Woods, John W. (1986). Probability, Random Processes, and Estimation Theory for Engineers. PrenticeHall, Inc.

Ghanem, Roger; Spanos, Pol (1991). Stochastic finite elements: a spectral approach. SpringerVerlag.

Guikhman, I.; Skorokhod, A. (1977). Introduction a la Théorie des Processus Aléatoires. Éditions MIR.

Simon, B. (1979). Functional Integration and Quantum Physics. Academic Press.

Karhunen, Kari (1947). "Über lineare Methoden in der Wahrscheinlichkeitsrechnung". Ann. Acad. Sci. Fennicae. Ser. A. I. Math.Phys. 37: 1–79.

Loève, M. (1978). Probability theory. Vol. II, 4th ed. Graduate Texts in Mathematics 46. SpringerVerlag.

Dai, G. (1996). "Modal wavefront reconstruction with Zernike polynomials and Karhunen–Loeve functions". JOSA A 13 (6): 1218.

Wu B., Zhu J., Najm F.(2005) "A Nonparametric Approach for Dynamic Range Estimation of Nonlinear Systems". In Proceedings of Design Automation Conference(841844) 2005

Wu B., Zhu J., Najm F.(2006) "Dynamic Range Estimation". IEEE Transactions on ComputerAided Design of Integrated Circuits and Systems, Vol. 25 Issue:9 (16181636) 2006

Jorgensen, Palle E. T.; Song, MyungSin (2007). "Entropy Encoding, Hilbert Space and Karhunen–Loeve Transforms".
External links

Mathematica KarhunenLoeveDecomposition function.

E161: Computer Image Processing and Analysis notes by Pr. Ruye Wang at Harvey Mudd College [1]
This article was sourced from Creative Commons AttributionShareAlike License; additional terms may apply. World Heritage Encyclopedia content is assembled from numerous content providers, Open Access Publishing, and in compliance with The Fair Access to Science and Technology Research Act (FASTR), Wikimedia Foundation, Inc., Public Library of Science, The Encyclopedia of Life, Open Book Publishers (OBP), PubMed, U.S. National Library of Medicine, National Center for Biotechnology Information, U.S. National Library of Medicine, National Institutes of Health (NIH), U.S. Department of Health & Human Services, and USA.gov, which sources content from all federal, state, local, tribal, and territorial government publication portals (.gov, .mil, .edu). Funding for USA.gov and content contributors is made possible from the U.S. Congress, EGovernment Act of 2002.
Crowd sourced content that is contributed to World Heritage Encyclopedia is peer reviewed and edited by our editorial staff to ensure quality scholarly research articles.
By using this site, you agree to the Terms of Use and Privacy Policy. World Heritage Encyclopedia™ is a registered trademark of the World Public Library Association, a nonprofit organization.