Chapter 17. Limited Dependent Variable Models and Sample Selection#

Home | Stata | R

import statsmodels
import statsmodels.api as sm
import statsmodels.formula.api as smf
from statsmodels.iolib.summary2 import summary_col

from wooldridge import *

Example 17.1. Married Women’s Labor Force Participation#

df = dataWoo('mroz')
print(smf.ols('inlf ~ nwifeinc + educ + exper + expersq + age + kidslt6 + kidsge6', data = df).fit().summary())
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                   inlf   R-squared:                       0.264
Model:                            OLS   Adj. R-squared:                  0.257
Method:                 Least Squares   F-statistic:                     38.22
Date:                Mon, 11 Dec 2023   Prob (F-statistic):           6.90e-46
Time:                        18:38:16   Log-Likelihood:                -423.89
No. Observations:                 753   AIC:                             863.8
Df Residuals:                     745   BIC:                             900.8
Df Model:                           7                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept      0.5855      0.154      3.798      0.000       0.283       0.888
nwifeinc      -0.0034      0.001     -2.351      0.019      -0.006      -0.001
educ           0.0380      0.007      5.151      0.000       0.024       0.052
exper          0.0395      0.006      6.962      0.000       0.028       0.051
expersq       -0.0006      0.000     -3.227      0.001      -0.001      -0.000
age           -0.0161      0.002     -6.476      0.000      -0.021      -0.011
kidslt6       -0.2618      0.034     -7.814      0.000      -0.328      -0.196
kidsge6        0.0130      0.013      0.986      0.324      -0.013       0.039
==============================================================================
Omnibus:                      169.137   Durbin-Watson:                   0.494
Prob(Omnibus):                  0.000   Jarque-Bera (JB):               36.741
Skew:                          -0.196   Prob(JB):                     1.05e-08
Kurtosis:                       1.991   Cond. No.                     3.06e+03
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 3.06e+03. This might indicate that there are
strong multicollinearity or other numerical problems.
mLogit = sm.Logit.from_formula('inlf ~ nwifeinc + educ + exper + expersq + age + kidslt6 + kidsge6', data = df).fit()
print(mLogit.summary())
print(mLogit.get_margeff().summary())
Optimization terminated successfully.
         Current function value: 0.533553
         Iterations 6
                           Logit Regression Results                           
==============================================================================
Dep. Variable:                   inlf   No. Observations:                  753
Model:                          Logit   Df Residuals:                      745
Method:                           MLE   Df Model:                            7
Date:                Mon, 11 Dec 2023   Pseudo R-squ.:                  0.2197
Time:                        18:38:16   Log-Likelihood:                -401.77
converged:                       True   LL-Null:                       -514.87
Covariance Type:            nonrobust   LLR p-value:                 3.159e-45
==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept      0.4255      0.860      0.494      0.621      -1.261       2.112
nwifeinc      -0.0213      0.008     -2.535      0.011      -0.038      -0.005
educ           0.2212      0.043      5.091      0.000       0.136       0.306
exper          0.2059      0.032      6.422      0.000       0.143       0.269
expersq       -0.0032      0.001     -3.104      0.002      -0.005      -0.001
age           -0.0880      0.015     -6.040      0.000      -0.117      -0.059
kidslt6       -1.4434      0.204     -7.090      0.000      -1.842      -1.044
kidsge6        0.0601      0.075      0.804      0.422      -0.086       0.207
==============================================================================
        Logit Marginal Effects       
=====================================
Dep. Variable:                   inlf
Method:                          dydx
At:                           overall
==============================================================================
                dy/dx    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
nwifeinc      -0.0038      0.001     -2.571      0.010      -0.007      -0.001
educ           0.0395      0.007      5.414      0.000       0.025       0.054
exper          0.0368      0.005      7.139      0.000       0.027       0.047
expersq       -0.0006      0.000     -3.176      0.001      -0.001      -0.000
age           -0.0157      0.002     -6.603      0.000      -0.020      -0.011
kidslt6       -0.2578      0.032     -8.070      0.000      -0.320      -0.195
kidsge6        0.0107      0.013      0.805      0.421      -0.015       0.037
==============================================================================
mProbit = sm.Probit.from_formula('inlf ~ nwifeinc + educ + exper + expersq + age + kidslt6 + kidsge6', data = df).fit()
print(mProbit.summary())
print(mProbit.get_margeff().summary())
Optimization terminated successfully.
         Current function value: 0.532938
         Iterations 5
                          Probit Regression Results                           
==============================================================================
Dep. Variable:                   inlf   No. Observations:                  753
Model:                         Probit   Df Residuals:                      745
Method:                           MLE   Df Model:                            7
Date:                Mon, 11 Dec 2023   Pseudo R-squ.:                  0.2206
Time:                        18:38:16   Log-Likelihood:                -401.30
converged:                       True   LL-Null:                       -514.87
Covariance Type:            nonrobust   LLR p-value:                 2.009e-45
==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept      0.2701      0.509      0.531      0.595      -0.727       1.267
nwifeinc      -0.0120      0.005     -2.484      0.013      -0.022      -0.003
educ           0.1309      0.025      5.183      0.000       0.081       0.180
exper          0.1233      0.019      6.590      0.000       0.087       0.160
expersq       -0.0019      0.001     -3.145      0.002      -0.003      -0.001
age           -0.0529      0.008     -6.235      0.000      -0.069      -0.036
kidslt6       -0.8683      0.119     -7.326      0.000      -1.101      -0.636
kidsge6        0.0360      0.043      0.828      0.408      -0.049       0.121
==============================================================================
       Probit Marginal Effects       
=====================================
Dep. Variable:                   inlf
Method:                          dydx
At:                           overall
==============================================================================
                dy/dx    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
nwifeinc      -0.0036      0.001     -2.509      0.012      -0.006      -0.001
educ           0.0394      0.007      5.452      0.000       0.025       0.054
exper          0.0371      0.005      7.200      0.000       0.027       0.047
expersq       -0.0006      0.000     -3.205      0.001      -0.001      -0.000
age           -0.0159      0.002     -6.739      0.000      -0.021      -0.011
kidslt6       -0.2612      0.032     -8.197      0.000      -0.324      -0.199
kidsge6        0.0108      0.013      0.829      0.407      -0.015       0.036
==============================================================================

Example 17.2. Married Women’s Annual Labor Supply#

print(smf.ols('hours ~ nwifeinc + educ + exper + expersq + age + kidslt6 + kidsge6', data = df).fit().summary())
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                  hours   R-squared:                       0.266
Model:                            OLS   Adj. R-squared:                  0.259
Method:                 Least Squares   F-statistic:                     38.50
Date:                Mon, 11 Dec 2023   Prob (F-statistic):           3.42e-46
Time:                        18:38:16   Log-Likelihood:                -6049.5
No. Observations:                 753   AIC:                         1.212e+04
Df Residuals:                     745   BIC:                         1.215e+04
Df Model:                           7                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept   1330.4824    270.785      4.913      0.000     798.891    1862.074
nwifeinc      -3.4466      2.544     -1.355      0.176      -8.441       1.548
educ          28.7611     12.955      2.220      0.027       3.329      54.193
exper         65.6725      9.963      6.592      0.000      46.114      85.231
expersq       -0.7005      0.325     -2.158      0.031      -1.338      -0.063
age          -30.5116      4.364     -6.992      0.000     -39.079     -21.945
kidslt6     -442.0899     58.847     -7.513      0.000    -557.615    -326.565
kidsge6      -32.7792     23.176     -1.414      0.158     -78.278      12.719
==============================================================================
Omnibus:                       79.794   Durbin-Watson:                   1.371
Prob(Omnibus):                  0.000   Jarque-Bera (JB):              112.876
Skew:                           0.779   Prob(JB):                     3.08e-25
Kurtosis:                       4.083   Cond. No.                     3.06e+03
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 3.06e+03. This might indicate that there are
strong multicollinearity or other numerical problems.

Tobit model ??#

df = dataWoo("mroz").dropna()
X = df[['nwifeinc' , 'educ' , 'exper' , 'expersq' , 'age' , 'kidslt6' , 'kidsge6']]
X = sm.add_constant(X)
y = df[['hours']]

Example 17.3. Poisson Regression for Number of Arrests#

df = dataWoo("crime1")
print(smf.ols('narr86 ~ pcnv + avgsen + tottime + ptime86 + qemp86 + inc86 + black + hispan + born60', data=df).fit().summary())
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                 narr86   R-squared:                       0.072
Model:                            OLS   Adj. R-squared:                  0.069
Method:                 Least Squares   F-statistic:                     23.57
Date:                Mon, 11 Dec 2023   Prob (F-statistic):           3.72e-39
Time:                        18:38:16   Log-Likelihood:                -3349.7
No. Observations:                2725   AIC:                             6719.
Df Residuals:                    2715   BIC:                             6778.
Df Model:                           9                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept      0.5766      0.038     15.215      0.000       0.502       0.651
pcnv          -0.1319      0.040     -3.264      0.001      -0.211      -0.053
avgsen        -0.0113      0.012     -0.926      0.355      -0.035       0.013
tottime        0.0121      0.009      1.279      0.201      -0.006       0.031
ptime86       -0.0409      0.009     -4.638      0.000      -0.058      -0.024
qemp86        -0.0513      0.014     -3.542      0.000      -0.080      -0.023
inc86         -0.0015      0.000     -4.261      0.000      -0.002      -0.001
black          0.3270      0.045      7.199      0.000       0.238       0.416
hispan         0.1938      0.040      4.880      0.000       0.116       0.272
born60        -0.0225      0.033     -0.675      0.500      -0.088       0.043
==============================================================================
Omnibus:                     2395.492   Durbin-Watson:                   1.840
Prob(Omnibus):                  0.000   Jarque-Bera (JB):           111865.117
Skew:                           3.982   Prob(JB):                         0.00
Kurtosis:                      33.362   Cond. No.                         291.
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
from statsmodels.genmod.generalized_estimating_equations import GEE
from statsmodels.genmod.cov_struct import (Exchangeable,
    Independence,Autoregressive)
from statsmodels.genmod.families import Poisson

print(GEE.from_formula('narr86 ~ pcnv + avgsen + tottime + ptime86 + qemp86 + inc86 + black + hispan +born60', 'black', data=df, cov_struct=Independence(), family=Poisson()).fit().summary())
                               GEE Regression Results                              
===================================================================================
Dep. Variable:                      narr86   No. Observations:                 2725
Model:                                 GEE   No. clusters:                        2
Method:                        Generalized   Min. cluster size:                 439
                      Estimating Equations   Max. cluster size:                2286
Family:                            Poisson   Mean cluster size:              1362.5
Dependence structure:         Independence   Num. iterations:                     2
Date:                     Mon, 11 Dec 2023   Scale:                           1.000
Covariance type:                    robust   Time:                         18:38:16
==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept     -0.5996      0.007    -81.445      0.000      -0.614      -0.585
pcnv          -0.4016      0.090     -4.457      0.000      -0.578      -0.225
avgsen        -0.0238      0.015     -1.584      0.113      -0.053       0.006
tottime        0.0245      0.005      4.621      0.000       0.014       0.035
ptime86       -0.0986      0.024     -4.184      0.000      -0.145      -0.052
qemp86        -0.0380      0.022     -1.757      0.079      -0.080       0.004
inc86         -0.0081      0.000    -25.035      0.000      -0.009      -0.007
black          0.6608      0.012     56.763      0.000       0.638       0.684
hispan         0.4998      0.009     57.224      0.000       0.483       0.517
born60        -0.0510      0.047     -1.090      0.276      -0.143       0.041
==============================================================================
Skew:                          3.9776   Kurtosis:                      30.5163
Centered skew:                 3.9776   Centered kurtosis:             30.5163
==============================================================================

Example 17.5. Wage Offer Equation for Married Women#

df = dataWoo("mroz")
print(smf.ols('lwage ~ educ + exper + expersq', data=df).fit().summary())
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                  lwage   R-squared:                       0.157
Model:                            OLS   Adj. R-squared:                  0.151
Method:                 Least Squares   F-statistic:                     26.29
Date:                Mon, 11 Dec 2023   Prob (F-statistic):           1.30e-15
Time:                        18:38:16   Log-Likelihood:                -431.60
No. Observations:                 428   AIC:                             871.2
Df Residuals:                     424   BIC:                             887.4
Df Model:                           3                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept     -0.5220      0.199     -2.628      0.009      -0.912      -0.132
educ           0.1075      0.014      7.598      0.000       0.080       0.135
exper          0.0416      0.013      3.155      0.002       0.016       0.067
expersq       -0.0008      0.000     -2.063      0.040      -0.002   -3.82e-05
==============================================================================
Omnibus:                       77.792   Durbin-Watson:                   1.961
Prob(Omnibus):                  0.000   Jarque-Bera (JB):              300.917
Skew:                          -0.753   Prob(JB):                     4.54e-66
Kurtosis:                       6.822   Cond. No.                     2.21e+03
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 2.21e+03. This might indicate that there are
strong multicollinearity or other numerical problems.