Chapter 5. Multiple Regression Analysis: OLS Asymptotics#
import statsmodels.formula.api as smf
from statsmodels.iolib.summary2 import summary_col
from wooldridge import *
Example 5.2 Birth weight equaiton, Standar Errors#
df = dataWoo('bwght')
half= df['cigs'].count()/2
half
694.0
df2=df[:694]
bwght_ols_half = smf.ols(formula='lbwght ~ cigs + lfaminc + 1', data=df2).fit()
print(bwght_ols_half.summary())
OLS Regression Results
==============================================================================
Dep. Variable: lbwght R-squared: 0.030
Model: OLS Adj. R-squared: 0.027
Method: Least Squares F-statistic: 10.52
Date: Mon, 11 Dec 2023 Prob (F-statistic): 3.16e-05
Time: 18:36:37 Log-Likelihood: 147.30
No. Observations: 694 AIC: -288.6
Df Residuals: 691 BIC: -275.0
Df Model: 2
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
Intercept 4.7056 0.027 173.939 0.000 4.652 4.759
cigs -0.0046 0.001 -3.481 0.001 -0.007 -0.002
lfaminc 0.0194 0.008 2.370 0.018 0.003 0.035
==============================================================================
Omnibus: 384.000 Durbin-Watson: 1.859
Prob(Omnibus): 0.000 Jarque-Bera (JB): 5273.755
Skew: -2.170 Prob(JB): 0.00
Kurtosis: 15.788 Cond. No. 22.8
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
bwght_ols = smf.ols(formula='lbwght ~ cigs + lfaminc + 1', data=df).fit()
print(bwght_ols.summary())
OLS Regression Results
==============================================================================
Dep. Variable: lbwght R-squared: 0.026
Model: OLS Adj. R-squared: 0.024
Method: Least Squares F-statistic: 18.31
Date: Mon, 11 Dec 2023 Prob (F-statistic): 1.42e-08
Time: 18:36:37 Log-Likelihood: 349.39
No. Observations: 1388 AIC: -692.8
Df Residuals: 1385 BIC: -677.1
Df Model: 2
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
Intercept 4.7186 0.018 258.631 0.000 4.683 4.754
cigs -0.0041 0.001 -4.756 0.000 -0.006 -0.002
lfaminc 0.0163 0.006 2.913 0.004 0.005 0.027
==============================================================================
Omnibus: 610.862 Durbin-Watson: 1.927
Prob(Omnibus): 0.000 Jarque-Bera (JB): 5956.668
Skew: -1.786 Prob(JB): 0.00
Kurtosis: 12.499 Cond. No. 24.1
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
print(summary_col([bwght_ols_half, bwght_ols], stars=True,float_format='%0.3f',
model_names=['bwght_ols_half','bwght_ols'],
info_dict={'N':lambda x: "{0:d}".format(int(x.nobs)),
'R2':lambda x: "{:.3f}".format(x.rsquared)}))
=======================================
bwght_ols_half bwght_ols
---------------------------------------
Intercept 4.706*** 4.719***
(0.027) (0.018)
cigs -0.005*** -0.004***
(0.001) (0.001)
lfaminc 0.019** 0.016***
(0.008) (0.006)
R-squared 0.030 0.026
R-squared Adj. 0.027 0.024
N 694 1388
R2 0.030 0.026
=======================================
Standard errors in parentheses.
* p<.1, ** p<.05, ***p<.01
Example 5.3 Economic model of crime#
df = dataWoo('crime1')
crime_ols = smf.ols(formula='narr86 ~ pcnv + avgsen + tottime + ptime86 + qemp86 + 1',
data=df).fit()
print(crime_ols.summary())
OLS Regression Results
==============================================================================
Dep. Variable: narr86 R-squared: 0.043
Model: OLS Adj. R-squared: 0.041
Method: Least Squares F-statistic: 24.29
Date: Mon, 11 Dec 2023 Prob (F-statistic): 5.43e-24
Time: 18:36:37 Log-Likelihood: -3392.7
No. Observations: 2725 AIC: 6797.
Df Residuals: 2719 BIC: 6833.
Df Model: 5
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
Intercept 0.7061 0.033 21.297 0.000 0.641 0.771
pcnv -0.1512 0.041 -3.701 0.000 -0.231 -0.071
avgsen -0.0070 0.012 -0.568 0.570 -0.031 0.017
tottime 0.0121 0.010 1.263 0.207 -0.007 0.031
ptime86 -0.0393 0.009 -4.403 0.000 -0.057 -0.022
qemp86 -0.1031 0.010 -9.915 0.000 -0.123 -0.083
==============================================================================
Omnibus: 2395.326 Durbin-Watson: 1.837
Prob(Omnibus): 0.000 Jarque-Bera (JB): 106869.684
Skew: 4.001 Prob(JB): 0.00
Kurtosis: 32.618 Cond. No. 16.3
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
crime_ols_r = smf.ols(formula='narr86 ~ pcnv + ptime86 + qemp86 + 1', data=df).fit()
resid = df.narr86 - crime_ols_r.predict()
print(crime_ols_r.summary())
OLS Regression Results
==============================================================================
Dep. Variable: narr86 R-squared: 0.041
Model: OLS Adj. R-squared: 0.040
Method: Least Squares F-statistic: 39.10
Date: Mon, 11 Dec 2023 Prob (F-statistic): 9.91e-25
Time: 18:36:37 Log-Likelihood: -3394.7
No. Observations: 2725 AIC: 6797.
Df Residuals: 2721 BIC: 6821.
Df Model: 3
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
Intercept 0.7118 0.033 21.565 0.000 0.647 0.776
pcnv -0.1499 0.041 -3.669 0.000 -0.230 -0.070
ptime86 -0.0344 0.009 -4.007 0.000 -0.051 -0.018
qemp86 -0.1041 0.010 -10.023 0.000 -0.124 -0.084
==============================================================================
Omnibus: 2394.860 Durbin-Watson: 1.836
Prob(Omnibus): 0.000 Jarque-Bera (JB): 106169.153
Skew: 4.002 Prob(JB): 0.00
Kurtosis: 32.513 Cond. No. 8.27
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
# print(" (LM, p, df) = ", crime_ols.compare_lm_test(crime_ols_r))
#Alternatively,
crime_resid = smf.ols(formula='resid ~ pcnv + avgsen + tottime + ptime86 + qemp86 + 1',
data=df).fit()
print(crime_resid.summary())
OLS Regression Results
==============================================================================
Dep. Variable: resid R-squared: 0.001
Model: OLS Adj. R-squared: -0.000
Method: Least Squares F-statistic: 0.8136
Date: Mon, 11 Dec 2023 Prob (F-statistic): 0.540
Time: 18:36:38 Log-Likelihood: -3392.7
No. Observations: 2725 AIC: 6797.
Df Residuals: 2719 BIC: 6833.
Df Model: 5
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
Intercept -0.0057 0.033 -0.172 0.863 -0.071 0.059
pcnv -0.0013 0.041 -0.032 0.975 -0.081 0.079
avgsen -0.0070 0.012 -0.568 0.570 -0.031 0.017
tottime 0.0121 0.010 1.263 0.207 -0.007 0.031
ptime86 -0.0048 0.009 -0.543 0.587 -0.022 0.013
qemp86 0.0010 0.010 0.098 0.922 -0.019 0.021
==============================================================================
Omnibus: 2395.326 Durbin-Watson: 1.837
Prob(Omnibus): 0.000 Jarque-Bera (JB): 106869.684
Skew: 4.001 Prob(JB): 0.00
Kurtosis: 32.618 Cond. No. 16.3
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
LM = 2725 * 0.0015 # N'Rsq
LM
4.0875