Chapter 17. Limited Dependent Variable Models and Sample Selection#
import statsmodels
import statsmodels.api as sm
import statsmodels.formula.api as smf
from statsmodels.iolib.summary2 import summary_col
from wooldridge import *
Example 17.1. Married Women’s Labor Force Participation#
df = dataWoo('mroz')
print(smf.ols('inlf ~ nwifeinc + educ + exper + expersq + age + kidslt6 + kidsge6', data = df).fit().summary())
OLS Regression Results
Dep. Variable: inlf R-squared: 0.264
Model: OLS Adj. R-squared: 0.257
Method: Least Squares F-statistic: 38.22
Date: Mon, 11 Dec 2023 Prob (F-statistic): 6.90e-46
Time: 18:38:16 Log-Likelihood: -423.89
No. Observations: 753 AIC: 863.8
Df Residuals: 745 BIC: 900.8
Df Model: 7
Covariance Type: nonrobust
coef std err t P>|t| [0.025 0.975]
Intercept 0.5855 0.154 3.798 0.000 0.283 0.888
nwifeinc -0.0034 0.001 -2.351 0.019 -0.006 -0.001
educ 0.0380 0.007 5.151 0.000 0.024 0.052
exper 0.0395 0.006 6.962 0.000 0.028 0.051
expersq -0.0006 0.000 -3.227 0.001 -0.001 -0.000
age -0.0161 0.002 -6.476 0.000 -0.021 -0.011
kidslt6 -0.2618 0.034 -7.814 0.000 -0.328 -0.196
kidsge6 0.0130 0.013 0.986 0.324 -0.013 0.039
Omnibus: 169.137 Durbin-Watson: 0.494
Prob(Omnibus): 0.000 Jarque-Bera (JB): 36.741
Skew: -0.196 Prob(JB): 1.05e-08
Kurtosis: 1.991 Cond. No. 3.06e+03
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 3.06e+03. This might indicate that there are
strong multicollinearity or other numerical problems.
mLogit = sm.Logit.from_formula('inlf ~ nwifeinc + educ + exper + expersq + age + kidslt6 + kidsge6', data = df).fit()
Optimization terminated successfully.
Current function value: 0.533553
Iterations 6
Logit Regression Results
Dep. Variable: inlf No. Observations: 753
Model: Logit Df Residuals: 745
Method: MLE Df Model: 7
Date: Mon, 11 Dec 2023 Pseudo R-squ.: 0.2197
Time: 18:38:16 Log-Likelihood: -401.77
converged: True LL-Null: -514.87
Covariance Type: nonrobust LLR p-value: 3.159e-45
coef std err z P>|z| [0.025 0.975]
Intercept 0.4255 0.860 0.494 0.621 -1.261 2.112
nwifeinc -0.0213 0.008 -2.535 0.011 -0.038 -0.005
educ 0.2212 0.043 5.091 0.000 0.136 0.306
exper 0.2059 0.032 6.422 0.000 0.143 0.269
expersq -0.0032 0.001 -3.104 0.002 -0.005 -0.001
age -0.0880 0.015 -6.040 0.000 -0.117 -0.059
kidslt6 -1.4434 0.204 -7.090 0.000 -1.842 -1.044
kidsge6 0.0601 0.075 0.804 0.422 -0.086 0.207
Logit Marginal Effects
Dep. Variable: inlf
Method: dydx
At: overall
dy/dx std err z P>|z| [0.025 0.975]
nwifeinc -0.0038 0.001 -2.571 0.010 -0.007 -0.001
educ 0.0395 0.007 5.414 0.000 0.025 0.054
exper 0.0368 0.005 7.139 0.000 0.027 0.047
expersq -0.0006 0.000 -3.176 0.001 -0.001 -0.000
age -0.0157 0.002 -6.603 0.000 -0.020 -0.011
kidslt6 -0.2578 0.032 -8.070 0.000 -0.320 -0.195
kidsge6 0.0107 0.013 0.805 0.421 -0.015 0.037
mProbit = sm.Probit.from_formula('inlf ~ nwifeinc + educ + exper + expersq + age + kidslt6 + kidsge6', data = df).fit()
Optimization terminated successfully.
Current function value: 0.532938
Iterations 5
Probit Regression Results
Dep. Variable: inlf No. Observations: 753
Model: Probit Df Residuals: 745
Method: MLE Df Model: 7
Date: Mon, 11 Dec 2023 Pseudo R-squ.: 0.2206
Time: 18:38:16 Log-Likelihood: -401.30
converged: True LL-Null: -514.87
Covariance Type: nonrobust LLR p-value: 2.009e-45
coef std err z P>|z| [0.025 0.975]
Intercept 0.2701 0.509 0.531 0.595 -0.727 1.267
nwifeinc -0.0120 0.005 -2.484 0.013 -0.022 -0.003
educ 0.1309 0.025 5.183 0.000 0.081 0.180
exper 0.1233 0.019 6.590 0.000 0.087 0.160
expersq -0.0019 0.001 -3.145 0.002 -0.003 -0.001
age -0.0529 0.008 -6.235 0.000 -0.069 -0.036
kidslt6 -0.8683 0.119 -7.326 0.000 -1.101 -0.636
kidsge6 0.0360 0.043 0.828 0.408 -0.049 0.121
Probit Marginal Effects
Dep. Variable: inlf
Method: dydx
At: overall
dy/dx std err z P>|z| [0.025 0.975]
nwifeinc -0.0036 0.001 -2.509 0.012 -0.006 -0.001
educ 0.0394 0.007 5.452 0.000 0.025 0.054
exper 0.0371 0.005 7.200 0.000 0.027 0.047
expersq -0.0006 0.000 -3.205 0.001 -0.001 -0.000
age -0.0159 0.002 -6.739 0.000 -0.021 -0.011
kidslt6 -0.2612 0.032 -8.197 0.000 -0.324 -0.199
kidsge6 0.0108 0.013 0.829 0.407 -0.015 0.036
Example 17.2. Married Women’s Annual Labor Supply#
print(smf.ols('hours ~ nwifeinc + educ + exper + expersq + age + kidslt6 + kidsge6', data = df).fit().summary())
OLS Regression Results
Dep. Variable: hours R-squared: 0.266
Model: OLS Adj. R-squared: 0.259
Method: Least Squares F-statistic: 38.50
Date: Mon, 11 Dec 2023 Prob (F-statistic): 3.42e-46
Time: 18:38:16 Log-Likelihood: -6049.5
No. Observations: 753 AIC: 1.212e+04
Df Residuals: 745 BIC: 1.215e+04
Df Model: 7
Covariance Type: nonrobust
coef std err t P>|t| [0.025 0.975]
Intercept 1330.4824 270.785 4.913 0.000 798.891 1862.074
nwifeinc -3.4466 2.544 -1.355 0.176 -8.441 1.548
educ 28.7611 12.955 2.220 0.027 3.329 54.193
exper 65.6725 9.963 6.592 0.000 46.114 85.231
expersq -0.7005 0.325 -2.158 0.031 -1.338 -0.063
age -30.5116 4.364 -6.992 0.000 -39.079 -21.945
kidslt6 -442.0899 58.847 -7.513 0.000 -557.615 -326.565
kidsge6 -32.7792 23.176 -1.414 0.158 -78.278 12.719
Omnibus: 79.794 Durbin-Watson: 1.371
Prob(Omnibus): 0.000 Jarque-Bera (JB): 112.876
Skew: 0.779 Prob(JB): 3.08e-25
Kurtosis: 4.083 Cond. No. 3.06e+03
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 3.06e+03. This might indicate that there are
strong multicollinearity or other numerical problems.
Tobit model ??#
df = dataWoo("mroz").dropna()
X = df[['nwifeinc' , 'educ' , 'exper' , 'expersq' , 'age' , 'kidslt6' , 'kidsge6']]
X = sm.add_constant(X)
y = df[['hours']]
Example 17.3. Poisson Regression for Number of Arrests#
df = dataWoo("crime1")
print(smf.ols('narr86 ~ pcnv + avgsen + tottime + ptime86 + qemp86 + inc86 + black + hispan + born60', data=df).fit().summary())
OLS Regression Results
Dep. Variable: narr86 R-squared: 0.072
Model: OLS Adj. R-squared: 0.069
Method: Least Squares F-statistic: 23.57
Date: Mon, 11 Dec 2023 Prob (F-statistic): 3.72e-39
Time: 18:38:16 Log-Likelihood: -3349.7
No. Observations: 2725 AIC: 6719.
Df Residuals: 2715 BIC: 6778.
Df Model: 9
Covariance Type: nonrobust
coef std err t P>|t| [0.025 0.975]
Intercept 0.5766 0.038 15.215 0.000 0.502 0.651
pcnv -0.1319 0.040 -3.264 0.001 -0.211 -0.053
avgsen -0.0113 0.012 -0.926 0.355 -0.035 0.013
tottime 0.0121 0.009 1.279 0.201 -0.006 0.031
ptime86 -0.0409 0.009 -4.638 0.000 -0.058 -0.024
qemp86 -0.0513 0.014 -3.542 0.000 -0.080 -0.023
inc86 -0.0015 0.000 -4.261 0.000 -0.002 -0.001
black 0.3270 0.045 7.199 0.000 0.238 0.416
hispan 0.1938 0.040 4.880 0.000 0.116 0.272
born60 -0.0225 0.033 -0.675 0.500 -0.088 0.043
Omnibus: 2395.492 Durbin-Watson: 1.840
Prob(Omnibus): 0.000 Jarque-Bera (JB): 111865.117
Skew: 3.982 Prob(JB): 0.00
Kurtosis: 33.362 Cond. No. 291.
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
from statsmodels.genmod.generalized_estimating_equations import GEE
from statsmodels.genmod.cov_struct import (Exchangeable,
from statsmodels.genmod.families import Poisson
print(GEE.from_formula('narr86 ~ pcnv + avgsen + tottime + ptime86 + qemp86 + inc86 + black + hispan +born60', 'black', data=df, cov_struct=Independence(), family=Poisson()).fit().summary())
GEE Regression Results
Dep. Variable: narr86 No. Observations: 2725
Model: GEE No. clusters: 2
Method: Generalized Min. cluster size: 439
Estimating Equations Max. cluster size: 2286
Family: Poisson Mean cluster size: 1362.5
Dependence structure: Independence Num. iterations: 2
Date: Mon, 11 Dec 2023 Scale: 1.000
Covariance type: robust Time: 18:38:16
coef std err z P>|z| [0.025 0.975]
Intercept -0.5996 0.007 -81.445 0.000 -0.614 -0.585
pcnv -0.4016 0.090 -4.457 0.000 -0.578 -0.225
avgsen -0.0238 0.015 -1.584 0.113 -0.053 0.006
tottime 0.0245 0.005 4.621 0.000 0.014 0.035
ptime86 -0.0986 0.024 -4.184 0.000 -0.145 -0.052
qemp86 -0.0380 0.022 -1.757 0.079 -0.080 0.004
inc86 -0.0081 0.000 -25.035 0.000 -0.009 -0.007
black 0.6608 0.012 56.763 0.000 0.638 0.684
hispan 0.4998 0.009 57.224 0.000 0.483 0.517
born60 -0.0510 0.047 -1.090 0.276 -0.143 0.041
Skew: 3.9776 Kurtosis: 30.5163
Centered skew: 3.9776 Centered kurtosis: 30.5163
Example 17.5. Wage Offer Equation for Married Women#
df = dataWoo("mroz")
print(smf.ols('lwage ~ educ + exper + expersq', data=df).fit().summary())
OLS Regression Results
Dep. Variable: lwage R-squared: 0.157
Model: OLS Adj. R-squared: 0.151
Method: Least Squares F-statistic: 26.29
Date: Mon, 11 Dec 2023 Prob (F-statistic): 1.30e-15
Time: 18:38:16 Log-Likelihood: -431.60
No. Observations: 428 AIC: 871.2
Df Residuals: 424 BIC: 887.4
Df Model: 3
Covariance Type: nonrobust
coef std err t P>|t| [0.025 0.975]
Intercept -0.5220 0.199 -2.628 0.009 -0.912 -0.132
educ 0.1075 0.014 7.598 0.000 0.080 0.135
exper 0.0416 0.013 3.155 0.002 0.016 0.067
expersq -0.0008 0.000 -2.063 0.040 -0.002 -3.82e-05
Omnibus: 77.792 Durbin-Watson: 1.961
Prob(Omnibus): 0.000 Jarque-Bera (JB): 300.917
Skew: -0.753 Prob(JB): 4.54e-66
Kurtosis: 6.822 Cond. No. 2.21e+03
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 2.21e+03. This might indicate that there are
strong multicollinearity or other numerical problems.