A population moment \(\gamma\) can be defined as the expectation of some continuous function \(g\) of a random variable \(x\): \[
\gamma={\mathrm{E}} [g(x)]
\]
On the other hand, a sample moment is the sample version of the population moment in a particular sample: \[
\hat \gamma=\frac{1}{n} \sum [g(x)]
\]
5.2 OLS as a moment problem
Consider the simple linear regression \[
\bf y=X\beta+u, \quad u \sim IID(0, \sigma^2).
\]
If the model is correctly specified, then \[
\rm E (\bf X'u)=0.
\]
The MOM principle suggests that we replace the left-hand side with its sample analog \(\frac{1}{n} \bf X'(y-X\beta)\).
Since we know that the true \(\bf \beta\) sets the population moment equal to zero in expectation, it seems reasonable to assume that a good choice of \(\bf \hat \beta\) would be one that sets the sample moment to zero. The MOM procedure suggests an estimate of \(\bf
\beta\) that solves \[
\frac{1}{n} \bf X'(y-X \hat \beta)=0.
\]
The MOM estimator is \[
\bf \hat \beta=(X'X)^{-1}X'y,
\] which is the same as the OLS estimator.
5.3 IV as a moment problem
Consider the simple linear regression \[
\bf y=X\beta+u, \quad u \sim IID(0, \sigma^2).
\]
If the model is mis-specified, then \[
\rm E (\bf X'u)\neq 0.
\]
We have to find an instrumental variable \(\bf Z\) which is \[
\rm E (\bf Z'u)= 0.
\] Or, \[
\rm E (\bf Z'(y-X\beta))= 0.
\]
The sample analogy of this is \[
\frac{1}{n} \bf Z'(y-X \hat \beta)=0.
\]
That gives us the IV estimator \[
\bf \hat \beta=(Z'X)^{-1}Z'y.
\]
5.4 The Generalized Method of Moments
The expectation \({\rm E}(Y^r)\) for any \(r=1,2, \dots\) is called the \(r^{th}\) (raw) moment of \(Y\). The expectation \({\rm E} [(Y-{\rm
E}(Y))^r]\) is called the \(r^{th}\) centered moment of \(Y\).
The mean is the first raw moment.
The variance is the second centered moment.
The third centered moment measures the skewness of the distribution.
The fourth centered moment measures the kurtosis of the distribution. Interpreted as a measure of “fatness of tails”.
The standardized kurtosis is [k=.]
For a normal distribution, \(k=3.\)
For a \(t\) distribution with \(v \geq 5\) degrees of freedom, \(k=3+6/(v-4) > 4.\) i.e., the \(t\) distribution has fatter tails than a normal distribution.
The distribution function of a random variable captures all information about the random variable. It can be shown using all moments also captures all information.
This distinction underlies the relative strengths and weaknesses of ML and GMM.
5.5 GMM
The statistical model takes the general form \[
E[m(Y_i; \theta_0)]=0
\] where - \(Y_1, \cdots, Y_n\) are random variables from which the sample \(y_1, \cdots, y_n\) is drawn, - \(m(Y, \theta)\) is a function specifying the model, - \(\theta_0\) is the “true value” of the parameter.
\(E[m(Y_i; \theta_0)]=0\) are called the population moment conditions.
Two ideas behind GMM:
Replace the population mean \(E[.]\) with the sample mean calculated from the observed sample \(y_1, \cdots, y_n\).
Since \(E[m(Y_i; \theta_0)]=0\), choose \(\hat \theta_{GMM}\) to make \(\frac{1}{n}\sum_{i=1}^{n}m(y_i; \hat \theta_{GMM})\) as close to zero as possible.
\(\hat \theta_{GMM}\) is chosen to make \(\bar m(\theta)'\bar
m(\theta)\) as close to zero as possible.
More generally, \(\hat \theta_{GMM}\) is chosen to minimize \(\bar
m(\theta)'W \bar m(\theta)\) for some weighting matrix \(W\).
5.5.1 An example
Let’s see an example with GMM, using the same simulated data as before. We have the same situation as before, \(X\) is endogenous. We are doing GMM version of 2sls.
Here I use R’s “gmm” library which makes things easy. It expects two arguments: “g” and “x”, which correponds to \(u\) and \(W\) here. The moment condition is \(E(W u) = 0\) in this example. \(W\) is the instrument, and \(u\) is the residual from regressing \(y\) on endogenous \(X\).
## DGP: data$y <- data$x + data$z + data$uset.seed(66)nobs=10000nDim =3sdxx =1sdww=1sdzz=1## here we have three variables x,z,w.## z is the omitted variable,x and z are correlated, w is the instrument, which is correlated with x, but not z. u is indepent of everything else.crxz=.6crzw=0crxw=.8covarMat =matrix( c(sdxx^2, crxz, crxw, crxz, sdzz^2, crzw, crxw, crzw, sdww^2 ) , nrow=nDim , ncol=nDim )covarMat
Call:
gmm(g = data$y ~ data$x, x = data$x)
Method: twoStep
Kernel: Quadratic Spectral
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -9.7566e-04 1.2624e-02 -7.7286e-02 9.3840e-01
data$x 1.6041e+00 1.3143e-02 1.2205e+02 0.0000e+00
J-Test: degrees of freedom is 0
J-test P-value
Test E(g)=0: 6.09365424169492e-26 *******
# It returns the same estimates as the OLS results.ols <-lm(y ~ x, data=data)summary(ols)
Call:
lm(formula = y ~ x, data = data)
Residuals:
Min 1Q Median 3Q Max
-4.5738 -0.8574 -0.0150 0.8508 5.5521
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.0009757 0.0128451 -0.076 0.939
x 1.6040701 0.0131145 122.312 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1.284 on 9998 degrees of freedom
Multiple R-squared: 0.5994, Adjusted R-squared: 0.5994
F-statistic: 1.496e+04 on 1 and 9998 DF, p-value: < 2.2e-16
5.6 A few concepts of conditioning
5.6.1 Independence
If \(X\) and \(Y\) are independent then \[
f(x,y)=f(x)f(y)
\] and hence \[
f(y|x)=f(y).
\]
If \(X\) and \(Y\) are independent then \[
E[g(X)h(Y)]=E[g(X)]\cdot E[h(Y)]
\] and hence \[
Cov[g(X),h(Y)]=0.
\] i.e. all functions of \(X\) and \(Y\) are uncorrelated.
5.6.2 Law of Iterated Expectations
\[
E[Y]=E[E(Y|X)].
\]
5.6.3 Dependence Concepts
\(X\), \(Y\) independent: \[
Cov[g(X),h(Y)]=0
\]
\(X\), \(Y\) uncorrelated: \[
Cov[X,Y]=0
\]
\(E[Y|X]=0\):
\[
Cov[g(X),Y]=0
\]
5.6.4 Regression
A regression model is a model of \(E[Y_i|X_i]\). For example, \[
Y_i=\beta_0+\beta_1X_i+u_i
\] where \(E[u_i|X_i]=0\).
5.6.5 GMM regression
The regression model \[
Y_i=\beta_0+\beta_1X_i+u_i, \quad E[u_i|X_i]=0
\] implies the moment condition \[
E[u_i]=0 \quad \mbox{and} \quad E[X_i u_i]=0
\]
That is, \[
E[Y_i-\beta_0-\beta_1X_i]=0
\]\[
E[X_i(Y_i-\beta_0-\beta_1X_i)]=0
\]
The sample moment conditions are \[
\frac{1}{n}\sum_{i=1}^{n}(y_i-\hat \beta_0-\hat \beta_1x_i)=0
\]\[
\frac{1}{n}\sum_{i=1}^{n}x_i(y_i-\hat \beta_0-\hat \beta_1x_i)=0
\]
These are just normal equations for OLS.
A characteristic of GMM: the specification of the model generates the estimator. i.e. only \(E[Y_i|X_i]=\beta_0+\beta_1 X_i\) is assumed.
Note there are no assumptions that \(u_i\) is homoscedastic, not autocorrelated or normally distributed. These properties affect the statistical properties of the GMM estimator, not its definition.
# General Method of Moments (GMM)```{r}#| include: falselibrary(MASS)library(ivreg)library(gmm)```## The Method of Moments (MOM)A population moment $\gamma$ can be defined as the expectation ofsome continuous function $g$ of a random variable $x$:$$\gamma={\mathrm{E}} [g(x)]$$On the other hand, a sample moment is the sample version of thepopulation moment in a particular sample:$$\hat \gamma=\frac{1}{n} \sum [g(x)]$$## OLS as a moment problemConsider the simple linear regression$$\bf y=X\beta+u, \quad u \sim IID(0, \sigma^2).$$If the model is correctly specified, then$$\rm E (\bf X'u)=0.$$The MOM principle suggests that we replace the left-hand side withits sample analog $\frac{1}{n} \bf X'(y-X\beta)$.Since we know that the true $\bf \beta$ sets the population momentequal to zero in expectation, it seems reasonable to assume that agood choice of $\bf \hat \beta$ would be one that sets the samplemoment to zero. The MOM procedure suggests an estimate of $\bf\beta$ that solves$$\frac{1}{n} \bf X'(y-X \hat \beta)=0.$$The MOM estimator is$$\bf \hat \beta=(X'X)^{-1}X'y,$$which is the same as the OLS estimator.## IV as a moment problemConsider the simple linear regression$$\bf y=X\beta+u, \quad u \sim IID(0, \sigma^2).$$If the model is mis-specified, then$$\rm E (\bf X'u)\neq 0.$$We have to find an instrumental variable $\bf Z$ which is$$\rm E (\bf Z'u)= 0.$$Or,$$\rm E (\bf Z'(y-X\beta))= 0.$$The sample analogy of this is$$\frac{1}{n} \bf Z'(y-X \hat \beta)=0.$$That gives us the IV estimator$$\bf \hat \beta=(Z'X)^{-1}Z'y.$$## The Generalized Method of MomentsThe expectation ${\rm E}(Y^r)$ for any $r=1,2, \dots$ is called the$r^{th}$ (raw) moment of $Y$. The expectation ${\rm E} [(Y-{\rm E}(Y))^r]$ is called the $r^{th}$ centered moment of $Y$.The mean is the first raw moment.The variance is the second centered moment.The third centered moment measures the skewness of thedistribution.The fourth centered moment measures the kurtosis of thedistribution. Interpreted as a measure of "fatness of tails".The standardized kurtosis is\[k=\frac{E[(Y-E(Y))^4]}{E[(Y-E(Y))^2]^2}.\]For a normal distribution, $k=3.$For a $t$ distribution with $v \geq 5$ degrees of freedom,$k=3+6/(v-4) > 4.$ i.e., the $t$ distribution has fatter tailsthan a normal distribution.The distribution function of a random variable captures allinformation about the random variable. It can be shown using allmoments also captures all information.This distinction underlies the relative strengths and weaknessesof ML and GMM.## GMMThe statistical model takes the general form$$E[m(Y_i; \theta_0)]=0$$where- $Y_1, \cdots, Y_n$ are random variables from which the sample $y_1, \cdots, y_n$ is drawn,- $m(Y, \theta)$ is a function specifying the model,- $\theta_0$ is the "true value" of the parameter.$E[m(Y_i; \theta_0)]=0$ are called the population momentconditions.Two ideas behind GMM:1. Replace the population mean $E[.]$ with the sample mean calculated from the observed sample $y_1, \cdots, y_n$.2. Since $E[m(Y_i; \theta_0)]=0$, choose $\hat \theta_{GMM}$ to make $\frac{1}{n}\sum_{i=1}^{n}m(y_i; \hat \theta_{GMM})$ as close to zero as possible.Define the notation$$\bar m(\theta)=\frac{1}{n} \sum_{i=1}^n m(y_i; \theta).$$$\hat \theta_{GMM}$ is chosen to make $\bar m(\theta)'\barm(\theta)$ as close tozero as possible.More generally, $\hat \theta_{GMM}$ is chosen to minimize $\barm(\theta)'W \bar m(\theta)$ for some weighting matrix $W$.### An exampleLet's see an example with GMM, using the same simulated data asbefore. We have the same situation as before, $X$ is endogenous. Weare doing GMM version of 2sls.Here I use R's "gmm" library which makes things easy. It expects twoarguments: "g" and "x", which correponds to $u$ and $W$ here. Themoment condition is $E(W u) = 0$ in this example. $W$ is theinstrument, and $u$ is the residual from regressing $y$ on endogenous$X$.```{r}## DGP: data$y <- data$x + data$z + data$uset.seed(66)nobs=10000nDim =3sdxx =1sdww=1sdzz=1## here we have three variables x,z,w.## z is the omitted variable,x and z are correlated, w is the instrument, which is correlated with x, but not z. u is indepent of everything else.crxz=.6crzw=0crxw=.8covarMat =matrix( c(sdxx^2, crxz, crxw, crxz, sdzz^2, crzw, crxw, crzw, sdww^2 ) , nrow=nDim , ncol=nDim )covarMatdata =data.frame(mvrnorm(n=nobs, mu=rep(0,nDim), Sigma=covarMat ))names(data) <-c('x','z','w')data$u <-rnorm(nobs,0,1)# dgpdata$y <- data$x + data$z + data$ugmm.fit =gmm(data$y~data$x, data$w)summary(gmm.fit)# It returns the same estimates as the 2sls results.tsls.model <-ivreg(y ~ x | w, data=data)summary(tsls.model)```In OLS case, it would be $E(X u) = 0$.```{r}gmm.ols =gmm(data$y~data$x, data$x)summary(gmm.ols)# It returns the same estimates as the OLS results.ols <-lm(y ~ x, data=data)summary(ols)```## A few concepts of conditioning### IndependenceIf $X$ and $Y$ are independent then$$f(x,y)=f(x)f(y)$$and hence$$f(y|x)=f(y).$$If $X$ and $Y$ are independent then$$E[g(X)h(Y)]=E[g(X)]\cdot E[h(Y)]$$and hence$$Cov[g(X),h(Y)]=0.$$i.e. all functions of $X$ and $Y$ are uncorrelated.### Law of Iterated Expectations$$E[Y]=E[E(Y|X)].$$### Dependence Concepts $X$, $Y$ independent:$$Cov[g(X),h(Y)]=0$$$X$, $Y$ uncorrelated:$$Cov[X,Y]=0$$$E[Y|X]=0$:$$Cov[g(X),Y]=0$$### RegressionA regression model is a model of $E[Y_i|X_i]$. For example,$$Y_i=\beta_0+\beta_1X_i+u_i$$where $E[u_i|X_i]=0$.### GMM regressionThe regression model$$Y_i=\beta_0+\beta_1X_i+u_i, \quad E[u_i|X_i]=0$$implies the moment condition$$E[u_i]=0 \quad \mbox{and} \quad E[X_i u_i]=0$$That is,$$E[Y_i-\beta_0-\beta_1X_i]=0$$$$E[X_i(Y_i-\beta_0-\beta_1X_i)]=0$$The sample moment conditions are$$\frac{1}{n}\sum_{i=1}^{n}(y_i-\hat \beta_0-\hat \beta_1x_i)=0$$$$\frac{1}{n}\sum_{i=1}^{n}x_i(y_i-\hat \beta_0-\hat \beta_1x_i)=0$$These are just normal equations for OLS.A characteristic of GMM: the specification of the model generatesthe estimator. i.e. only $E[Y_i|X_i]=\beta_0+\beta_1 X_i$ isassumed.Note there are no assumptions that $u_i$ is homoscedastic, notautocorrelated or normally distributed. These properties affectthe statistical properties of the GMM estimator, not itsdefinition.