7  Discrete and Limited Dependent Variables

7.1 Binary Response Models

7.1.1 Probit and Logit

Let \(P_t\) denote the probability that \(y_t=1\) conditional on the information set \(\Omega\), which consists of exogenous and predetermined variables. A binary response model serves to model this conditional expectation. Since the values are 0 or 1, it is clear that \(P_t\) is also the expectation of \(y_t\) conditional on \(\Omega_t\): \[ P_t \equiv \mbox{Pr}(y_t=1 | \Omega_t)=\mbox{E}(y_t | \Omega_t). \]

Since \(0 \le P_t \le 1\), \(X_t \hat \beta\) needs to be in this interval too, \(X_t\) is a set of regressors.

We ensure that \(0 \le P_t \le 1\) by specifying that \[ P_t \equiv \mbox{Pr}(y_t=1 | \Omega_t)=F(\bf X_t \beta). \] \(F(x)\) is a transformation function, which has the same characteristics as the CDF of a probability distribution.

Two popular choices of \(F(x)\) are Gaussian (probit) and Logistic (logit).

The less familiar logistic function is \[ \Lambda(x)=\frac{e^x}{1+e^x} \]

The logit model is most easily derived by assuming that \[ \log (\frac{P_t}{1-P_t})=\bf X_t \beta \] which says the logarithm of the odds (the ratio of the two probabilities) is equal to \(\bf X_t \beta\). Therefore, \[ P_t =\frac{\exp({\bf X}_t \beta)}{1+\exp({\bf X}_t \beta)}=\Lambda({\bf X}_t \beta) \]

7.1.2 MLE for binary data

The likelihood for an observation \(t\) is the probability that \(y_t=1\) if \(y_t=1\), or the probability that \(y_t=0\) if \(y_t=0\). The logarithm of the appropriate probability is then the contribution to the loglikelihood made by observation \(t\). Therefore, if \(\bf y\) is an n-vector with typical element \(y_t\), the loglikelihood function for \(\bf y\) can be written as \[ {\ell}({\bf y, \beta})=\sum_{t=1}^n(y_t \log F({\bf X_t \beta})+(1-y_t) \log(1- F({\bf X_t \beta})) \]

For the logit and probit models, this function is globally concave with respect to \(\beta\). This implies that the first-order conditions, or likelihood equations, uniquely define the MLE estimator \(\hat \beta\). These likelihood equations can be written as \[ \sum_{t=1}^n \frac{(y_t-F({\bf X_t \beta}))F({\bf X_t \beta})x_{ti}}{F({\bf X_t \beta})(1- F({\bf X_t \beta}))}=0, \ i=1, \dots, k. \]

Newton’s Method can be used to find \(\hat \beta\).

7.2 Models for More than Two Discrete Responses

7.2.1 The Ordered Probit

Ordered Probit can be easily derived from a latent variable model. \[ y_t^0={\bf X}_t \beta+u_t, \ u_t \sim NID(0,1) \]

Suppose we observe \(y_t\) with three values.

\[ \begin{cases} y_t = 0 & \mbox{if} \ y_t^0 < \gamma_1; \\ y_t = 1 & \mbox{if} \ \gamma_1 \leq y_t^0 < \gamma_2; \\ y_t = 2 & \mbox{if} \ y_t^0 \geq \gamma_2. \end{cases} \]

Therefore, \[ \begin{aligned} \mbox{Pr}(y_t=0) &= \mbox{Pr}(y_t^0 < \gamma_1)=\mbox{Pr}({\bf X}_t \beta < \gamma_1) \\ &= \mbox{Pr}(u_t < \gamma_1 - {\bf X}_t \beta)=\Phi ( \gamma_1 - {\bf X}_t \beta) \end{aligned} \]

Similarly, \[ \mbox{Pr}(y_t=2) = 1 - \Phi ( \gamma_2 - {\bf X}_t \beta) \] \[ \mbox{Pr}(y_t=1) = \Phi ( \gamma_2- {\bf X}_t \beta ) - \Phi ( \gamma_1- {\bf X}_t \beta ) \]

These probabilities depend solely on the value of the index function and on the two threshold parameters.

The loglikelihood function is \[ \ell (\beta, \gamma_1, \gamma_2) = \sum_{y_t=0} \log(\Phi (\gamma_1 - {\bf X}_t \beta)) + \sum_{y_t=2} \log(\Phi ( {\bf X}_t \beta - \gamma_2)) + \sum_{y_t=1} \log(\Phi ( \gamma_2 - {\bf X}_t \beta ) -\Phi ( \gamma_1 - {\bf X}_t \beta )) \]

7.2.2 The Multinomial Logit

When responses are unordered, the popular choice is multinomial logit.

Suppose there are \(J+1\) responses, for \(J \geq 1\).

\[ \mbox{Pr}(y_t=l)=\frac{\exp({\bf W}_{tl} \beta^l)}{\sum_{j=0}^J\exp({\bf W}_{tl} \beta^l)} \quad \mbox{for} \ l=0, \dots, J. \]

Here \({\bf W}_{tj}\) is a row vector of length \(k_j\) of observations on variables that belong to the information set of interest, and \(\beta^j\) is a \(k_j\)-vector of parameters, usually different for each \(j=0, \dots, J.\)