In econometrics, the generalized method of moments (GMM) is a generic method for estimating parameters in statistical models. Usually it is applied in the context of semiparametric models, where the parameter of interest is finitedimensional, whereas the full shape of the distribution function of the data may not be known, and therefore maximum likelihood estimation is not applicable.
The method requires that a certain number of moment conditions were specified for the model. These moment conditions are functions of the model parameters and the data, such that their expectation is zero at the true values of the parameters. The GMM method then minimizes a certain norm of the sample averages of the moment conditions.
The GMM estimators are known to be consistent, asymptotically normal, and efficient in the class of all estimators that don’t use any extra information aside from that contained in the moment conditions.
GMM was developed by Lars Peter Hansen in 1982 as a generalization of the method of moments which was introduced by Karl Pearson in 1894. Hansen shared the 2013 Nobel Prize in Economics in part for this work.
Contents

Description 1

Properties 2

Consistency 2.1

Asymptotic normality 2.2

Efficiency 2.3

Implementation 3

Jtest 4

Scope 5

Implementations 6

See also 7

References 8
Description
Suppose the available data consists of T observations {Y_{t} }_{ t = 1,...,T}, where each observation Y_{t} is an ndimensional multivariate random variable. We assume that the data come from a certain statistical model, defined up to an unknown parameter θ ∈ Θ. The goal of the estimation problem is to find the “true” value of this parameter, θ_{0}, or at least a reasonably close estimate.
A general assumption of GMM is that the data Y_{t} be generated by a weakly stationary ergodic stochastic process. (The case of independent and identically distributed (iid) variables Y_{t} is a special case of this condition.)
In order to apply GMM, we need to have "moment conditions", i.e. we need to know a vectorvalued function g(Y,θ) such that

m(\theta_0) \equiv \operatorname{E}[\,g(Y_t,\theta_0)\,]=0,
where E denotes expectation, and Y_{t} is a generic observation. Moreover, the function m(θ) must differ from zero for θ ≠ θ_{0}, or otherwise the parameter θ will not be pointidentified.
The basic idea behind GMM is to replace the theoretical expected value E[⋅] with its empirical analog — sample average:

\hat{m}(\theta) \equiv \frac{1}{T}\sum_{t=1}^T g(Y_t,\theta)
and then to minimize the norm of this expression with respect to θ. The minimizing value of θ is our estimate for θ_{0}.
By the law of large numbers, \scriptstyle\hat{m}(\theta)\,\approx\;\operatorname{E}[g(Y_t,\theta)]\,=\,m(\theta) for large values of T, and thus we expect that \scriptstyle\hat{m}(\theta_0)\;\approx\;m(\theta_0)\;=\;0. The generalized method of moments looks for a number \scriptstyle\hat\theta which would make \scriptstyle\hat{m}(\;\!\hat\theta\;\!) as close to zero as possible. Mathematically, this is equivalent to minimizing a certain norm of \scriptstyle\hat{m}(\theta) (norm of m, denoted as m, measures the distance between m and zero). The properties of the resulting estimator will depend on the particular choice of the norm function, and therefore the theory of GMM considers an entire family of norms, defined as

\ \hat{m}(\theta) \^2_{W} = \hat{m}(\theta)'\,W\hat{m}(\theta),
where W is a positivedefinite weighting matrix, and m′ denotes transposition. In practice, the weighting matrix W is computed based on the available data set, which will be denoted as \scriptstyle\hat{W}. Thus, the GMM estimator can be written as

\hat\theta = \operatorname{arg}\min_{\theta\in\Theta} \bigg(\frac{1}{T}\sum_{t=1}^T g(Y_t,\theta)\bigg)' \hat{W} \bigg(\frac{1}{T}\sum_{t=1}^T g(Y_t,\theta)\bigg)
Under suitable conditions this estimator is consistent, asymptotically normal, and with right choice of weighting matrix \scriptstyle\hat{W} also asymptotically efficient.
Properties
Consistency
Consistency is a statistical property of an estimator stating that, having a sufficient number of observations, the estimator will get arbitrarily close to the true value of parameter:

\hat\theta \xrightarrow{p} \theta_0\ \text{as}\ T\to\infty
(see Convergence in probability). Necessary and sufficient conditions for a GMM estimator to be consistent are as follows:

\hat{W}_T \xrightarrow{p} W, where W is a positive semidefinite matrix,

\,W\operatorname{E}[\,g(Y_t,\theta)\,]=0 only for \,\theta=\theta_0,

The set of possible parameters \Theta \subset \mathbb{R}^{k} is compact,

\,g(Y,\theta) is continuous at each θ with probability one,

\operatorname{E}[\,\textstyle\sup_{\theta\in\Theta} \lVert g(Y,\theta)\rVert\,]<\infty.
The second condition here (socalled Global identification condition) is often particularly hard to verify. There exist simpler necessary but not sufficient conditions, which may be used to detect nonidentification problem:

Order condition. The dimension of moment function m(θ) should be at least as large as the dimension of parameter vector θ.

Local identification. If g(Y,θ) is continuously differentiable in a neighborhood of \theta_0, then matrix W\operatorname{E}[\nabla_\theta g(Y_t,\theta_0)] must have full column rank.
In practice applied econometricians often simply assume that global identification holds, without actually proving it.^{[1]}
Asymptotic normality
Asymptotic normality is a useful property, as it allows us to construct confidence bands for the estimator, and conduct different tests. Before we can make a statement about the asymptotic distribution of the GMM estimator, we need to define two auxiliary matrices:

G = \operatorname{E}[\,\nabla_{\!\theta}\,g(Y_t,\theta_0)\,], \qquad \Omega = \operatorname{E}[\,g(Y_t,\theta_0)g(Y_t,\theta_0)'\,]
Then under conditions 1–6 listed below, the GMM estimator will be asymptotically normal with limiting distribution

\sqrt{T}\big(\hat\theta  \theta_0\big)\ \xrightarrow{d}\ \mathcal{N}\big[0, (G'WG)^{1}G'W\Omega W'G(G'W'G)^{1}\big]
(see Convergence in distribution). Conditions:

\hat\theta is consistent (see previous section),

The set of possible parameters \Theta \subset \mathbb{R}^{k} is compact,

\,g(Y,\theta) is continuously differentiable in some neighborhood N of \theta_0 with probability one,

\operatorname{E}[\,\lVert g(Y_t,\theta) \rVert^2\,]<\infty,

\operatorname{E}[\,\textstyle\sup_{\theta\in N}\lVert \nabla_\theta g(Y_t,\theta) \rVert\,]<\infty,

the matrix G'WG is nonsingular.
Efficiency
So far we have said nothing about the choice of matrix W, except that it must be positive semidefinite. In fact any such matrix will produce a consistent and asymptotically normal GMM estimator, the only difference will be in the asymptotic variance of that estimator. It can be shown that taking

W \propto\ \Omega^{1}
will result in the most efficient estimator in the class of all asymptotically normal estimators. Efficiency in this case means that such an estimator will have the smallest possible variance (we say that matrix A is smaller than matrix B if B–A is positive semidefinite).
In this case the formula for the asymptotic distribution of the GMM estimator simplifies to

\sqrt{T}\big(\hat\theta  \theta_0\big)\ \xrightarrow{d}\ \mathcal{N}\big[0, (G'\,\Omega^{1}G)^{1}\big]
The proof that such a choice of weighting matrix is indeed optimal is often adopted with slight modifications when establishing efficiency of other estimators. As a rule of thumb, a weighting matrix is optimal whenever it makes the “sandwich formula” for variance collapse into a simpler expression.
Proof. We will consider the difference between asymptotic variance with arbitrary W and asymptotic variance with W=\Omega^{1}. If we can factor this difference into a symmetric product of the form CC' for some matrix C, then it will guarantee that this difference is nonnegativedefinite, and thus W=\Omega^{1} will be optimal by definition.

\,V(W)V(\Omega^{1})

\,=(G'WG)^{1}G'W\Omega WG(G'WG)^{1}  (G'\Omega^{1}G)^{1}


\,=(G'WG)^{1}\Big(G'W\Omega WG  G'WG(G'\Omega^{1}G)^{1}G'WG\Big)(G'WG)^{1}


\,=(G'WG)^{1}G'W\Omega^{1/2}\Big(I  \Omega^{1/2}G(G'\Omega^{1}G)^{1}G'\Omega^{1/2}\Big)\Omega^{1/2}WG(G'WG)^{1}


\,=A(IB)A',

where we introduced matrices A and B in order to slightly simplify notation; I is an identity matrix. We can see that matrix B here is symmetric and idempotent: B^2=B. This means I–B is symmetric and idempotent as well: IB=(IB)(IB)'. Thus we can continue to factor the previous expression as


\,=A(IB)(IB)'A' = \Big(A(IB)\Big)\Big(A(IB)\Big)' \geq 0

Implementation
One difficulty with implementing the outlined method is that we cannot take W = Ω^{−1} because, by the definition of matrix Ω, we need to know the value of θ_{0} in order to compute this matrix, and θ_{0} is precisely the quantity we don’t know and are trying to estimate in the first place.
Several approaches exist to deal with this issue, the first one being the most popular:

Twostep feasible GMM:

Step 1: Take W = I (the identity matrix), and compute preliminary GMM estimate \scriptstyle\hat\theta_{(1)}. This estimator is consistent for θ_{0}, although not efficient.

Step 2: Take

\hat{W} = \bigg(\frac{1}{T}\sum_{t=1}^T g(Y_t,\hat\theta_{(1)})g(Y_t,\hat\theta_{(1)})'\bigg)^{1},
where we have plugged in our firststep preliminary estimate \scriptstyle\hat\theta_{(1)}. This matrix converges in probability to Ω^{−1} and therefore if we compute \scriptstyle\hat\theta with this weighting matrix, the estimator will be asymptotically efficient.

Iterated GMM. Essentially the same procedure as 2step GMM, except that the matrix \hat{W}_T is recalculated several times. That is, the estimate obtained in step 2 is used to calculate the weighting matrix for step 3, and so on. Such estimator, denoted \scriptstyle\hat\theta_{(i)}, is equivalent to solving the following system of equations:^{[2]}

\bigg(\frac{1}{T}\sum_{t=1}^T \frac{\partial g}{\partial\theta'}(Y_t,\hat\theta_{(i)})\bigg)' \bigg(\frac{1}{T}\sum_{t=1}^T g(Y_t,\hat\theta_{(i)})g(Y_t,\hat\theta_{(i)})'\bigg)^{\!1} \bigg(\frac{1}{T}\sum_{t=1}^T g(Y_t,\hat\theta_{(i)})\bigg) = 0
Asymptotically no improvement can be achieved through such iterations, although certain MonteCarlo experiments suggest that finitesample properties of this estimator are slightly better.

Continuously updating GMM (CUGMM, or CUE). Estimates \scriptstyle\hat\theta simultaneously with estimating the weighting matrix W:

\hat\theta = \operatorname{arg}\min_{\theta\in\Theta} \bigg(\frac{1}{T}\sum_{t=1}^T g(Y_t,\theta)\bigg)' \bigg(\frac{1}{T}\sum_{t=1}^T g(Y_t,\theta)g(Y_t,\theta)'\bigg)^{\!1} \bigg(\frac{1}{T}\sum_{t=1}^T g(Y_t,\theta)\bigg)
In MonteCarlo experiments this method demonstrated a better performance than the traditional twostep GMM: the estimator has smaller median bias (although fatter tails), and the Jtest for overidentifying restrictions in many cases was more reliable.^{[3]}
Another important issue in implementation of minimization procedure is that the function is supposed to search through (possibly highdimensional) parameter space Θ and find the value of θ which minimizes the objective function. No generic recommendation for such procedure exists, it is a subject of its own field, numerical optimization.
Jtest
When the number of moment conditions is greater than the dimension of the parameter vector θ, the model is said to be overidentified. Overidentification allows us to check whether the model's moment conditions match the data well or not.
Conceptually we can check whether \hat{m}(\hat\theta) is sufficiently close to zero to suggest that the model fits the data well. The GMM method has then replaced the problem of solving the equation \hat{m}(\theta)=0, which chooses \theta to match the restrictions exactly, by a minimization calculation. The minimization can always be conducted even when no \theta_0 exists such that m(\theta_0)=0. This is what Jtest does. The Jtest is also called a test for overidentifying restrictions.
Formally we consider two hypotheses:

H_0:\ m(\theta_0)=0 (the null hypothesis that the model is “valid”), and

H_1:\ m(\theta)\neq 0,\ \forall \theta\in\Theta (the alternative hypothesis that model is “invalid”; the data do not come close to meeting the restrictions)
Under hypothesis H_0, the following socalled Jstatistic is asymptotically chisquared with k–l degrees of freedom. Define J to be:

J \equiv T \cdot \bigg(\frac{1}{T}\sum_{t=1}^T g(Y_t,\hat\theta)\bigg)' \hat{W}_T \bigg(\frac{1}{T}\sum_{t=1}^T g(Y_t,\hat\theta)\bigg)\ \xrightarrow{d}\ \chi^2_{k\ell} under H_0,
where \hat\theta is the GMM estimator of the parameter \theta_0, k is the number of moment conditions (dimension of vector g), and l is the number of estimated parameters (dimension of vector θ). Matrix \hat{W}_T must converge in probability to \Omega^{1}, the efficient weighting matrix (note that previously we only required that W be proportional to \Omega^{1} for estimator to be efficient; however in order to conduct the Jtest W must be exactly equal to \Omega^{1}, not simply proportional).
Under the alternative hypothesis H_1, the Jstatistic is asymptotically unbounded:

J\ \xrightarrow{p}\ \infty under H_1
To conduct the test we compute the value of J from the data. It is a nonnegative number. We compare it with (for example) the 0.95 quantile of the \chi^2_{k\ell} distribution:

H_0 is rejected at 95% confidence level if J > q_{0.95}^{\chi^2_{k\ell}}

H_0 cannot be rejected at 95% confidence level if J < q_{0.95}^{\chi^2_{k\ell}}
Scope
Many other popular estimation techniques can be cast in terms of GMM optimization:

Ordinary least squares (OLS) is equivalent to GMM with moment conditions:

\operatorname{E}[\,x_t(y_t  x_t'\beta)\,]=0

Generalized least squares (GLS)

\operatorname{E}[\,x_t(y_t  x_t'\beta)/\sigma^2(x_t)\,]=0

Instrumental variables regression (IV)

\operatorname{E}[\,z_t(y_t  x_t'\beta)\,]=0

Nonlinear least squares (NLLS):

\operatorname{E}[\,\nabla_{\!\beta}\, g(x_t,\beta)\cdot(y_t  g(x_t,\beta))\,]=0

Maximum likelihood estimation (MLE):

\operatorname{E}[\,\nabla_{\!\theta} \ln f(x_t,\theta) \,]=0
Implementations

R Programming wikibook, Method of Moments

R

Stata

EViews

SAS
See also
References

^ Newey, McFadden (1994), p.2127

^ Imbens, Spady & Johnson (1998, p. 336)

^ Hansen, Heaton & Yaron (1996)

Kirby Faciane (2006): Statistics for Empirical and Quantitative Finance. H.C. Baird: Philadelphia. ISBN 0978820894.

Alastair R. Hall (2005). Generalized Method of Moments (Advanced Texts in Econometrics). Oxford University Press. ISBN 0198775202.


Lars Peter Hansen (2002): Method of Moments in International Encyclopedia of the Social and Behavior Sciences, N. J. Smelser and P. B. Bates (editors), Pergamon: Oxford.

Hansen, Lars Peter; Heaton, John; Yaron, Amir (1996). "Finitesample properties of some alternative GMM estimators". Journal of Business & Economic Statistics 14 (3): 262–280.

Imbens, Guido W.; Spady, Richard H.; Johnson, Phillip (1998). "Information theoretic approaches to inference in moment condition models". Econometrica 66 (2): 333–357.

Newey W., McFadden D. (1994). Large sample estimation and hypothesis testing, in Handbook of Econometrics, Ch.36. Elsevier Science.

Special issues of Journal of Business and Economic Statistics: vol. 14, no. 3 and vol. 20, no. 4.
This article was sourced from Creative Commons AttributionShareAlike License; additional terms may apply. World Heritage Encyclopedia content is assembled from numerous content providers, Open Access Publishing, and in compliance with The Fair Access to Science and Technology Research Act (FASTR), Wikimedia Foundation, Inc., Public Library of Science, The Encyclopedia of Life, Open Book Publishers (OBP), PubMed, U.S. National Library of Medicine, National Center for Biotechnology Information, U.S. National Library of Medicine, National Institutes of Health (NIH), U.S. Department of Health & Human Services, and USA.gov, which sources content from all federal, state, local, tribal, and territorial government publication portals (.gov, .mil, .edu). Funding for USA.gov and content contributors is made possible from the U.S. Congress, EGovernment Act of 2002.
Crowd sourced content that is contributed to World Heritage Encyclopedia is peer reviewed and edited by our editorial staff to ensure quality scholarly research articles.
By using this site, you agree to the Terms of Use and Privacy Policy. World Heritage Encyclopedia™ is a registered trademark of the World Public Library Association, a nonprofit organization.