Python for Introductory Econometrics

Chapter 5. Multiple Regression Analysis: OLS Asymptotics

Example 5.2 Birth weight equaiton, Standar Errors

https://www.solomonegash.com/

In [1]:
import numpy as np
import pandas as pd
import scipy as sp

import statsmodels
import statsmodels.api as sm
import statsmodels.formula.api as smf
from statsmodels.iolib.summary2 import summary_col

from wooldridge import *
In [2]:
df = dataWoo('bwght')
half= df['cigs'].count()/2
half
Out[2]:
694.0
In [3]:
df2=df[:694]
bwght_ols_half = smf.ols(formula='lbwght  ~ cigs  + lfaminc + 1', data=df2).fit()
print(bwght_ols_half.summary())
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                 lbwght   R-squared:                       0.030
Model:                            OLS   Adj. R-squared:                  0.027
Method:                 Least Squares   F-statistic:                     10.52
Date:                Thu, 09 Apr 2020   Prob (F-statistic):           3.16e-05
Time:                        19:38:33   Log-Likelihood:                 147.30
No. Observations:                 694   AIC:                            -288.6
Df Residuals:                     691   BIC:                            -275.0
Df Model:                           2                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept      4.7056      0.027    173.939      0.000       4.652       4.759
cigs          -0.0046      0.001     -3.481      0.001      -0.007      -0.002
lfaminc        0.0194      0.008      2.370      0.018       0.003       0.035
==============================================================================
Omnibus:                      384.000   Durbin-Watson:                   1.859
Prob(Omnibus):                  0.000   Jarque-Bera (JB):             5273.755
Skew:                          -2.170   Prob(JB):                         0.00
Kurtosis:                      15.788   Cond. No.                         22.8
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
In [4]:
bwght_ols = smf.ols(formula='lbwght  ~ cigs  + lfaminc + 1', data=df).fit()
print(bwght_ols.summary())
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                 lbwght   R-squared:                       0.026
Model:                            OLS   Adj. R-squared:                  0.024
Method:                 Least Squares   F-statistic:                     18.31
Date:                Thu, 09 Apr 2020   Prob (F-statistic):           1.42e-08
Time:                        19:38:33   Log-Likelihood:                 349.39
No. Observations:                1388   AIC:                            -692.8
Df Residuals:                    1385   BIC:                            -677.1
Df Model:                           2                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept      4.7186      0.018    258.631      0.000       4.683       4.754
cigs          -0.0041      0.001     -4.756      0.000      -0.006      -0.002
lfaminc        0.0163      0.006      2.913      0.004       0.005       0.027
==============================================================================
Omnibus:                      610.862   Durbin-Watson:                   1.927
Prob(Omnibus):                  0.000   Jarque-Bera (JB):             5956.668
Skew:                          -1.786   Prob(JB):                         0.00
Kurtosis:                      12.499   Cond. No.                         24.1
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
In [5]:
print(summary_col([bwght_ols_half, bwght_ols], stars=True,float_format='%0.3f',
                  model_names=['bwght_ols_half','bwght_ols'],
                  info_dict={'N':lambda x: "{0:d}".format(int(x.nobs)),
                             'R2':lambda x: "{:.3f}".format(x.rsquared)}))
==================================
          bwght_ols_half bwght_ols
----------------------------------
Intercept 4.706***       4.719*** 
          (0.027)        (0.018)  
cigs      -0.005***      -0.004***
          (0.001)        (0.001)  
lfaminc   0.019**        0.016*** 
          (0.008)        (0.006)  
N         694            1388     
R2        0.030          0.026    
==================================
Standard errors in parentheses.
* p<.1, ** p<.05, ***p<.01

Example 5.3 Economic model of crime

In [6]:
df = dataWoo('crime1')
crime_ols = smf.ols(formula='narr86  ~ pcnv  + avgsen + tottime + ptime86 + qemp86 + 1', data=df).fit()
print(crime_ols.summary())
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                 narr86   R-squared:                       0.043
Model:                            OLS   Adj. R-squared:                  0.041
Method:                 Least Squares   F-statistic:                     24.29
Date:                Thu, 09 Apr 2020   Prob (F-statistic):           5.43e-24
Time:                        19:38:33   Log-Likelihood:                -3392.7
No. Observations:                2725   AIC:                             6797.
Df Residuals:                    2719   BIC:                             6833.
Df Model:                           5                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept      0.7061      0.033     21.297      0.000       0.641       0.771
pcnv          -0.1512      0.041     -3.701      0.000      -0.231      -0.071
avgsen        -0.0070      0.012     -0.568      0.570      -0.031       0.017
tottime        0.0121      0.010      1.263      0.207      -0.007       0.031
ptime86       -0.0393      0.009     -4.403      0.000      -0.057      -0.022
qemp86        -0.1031      0.010     -9.915      0.000      -0.123      -0.083
==============================================================================
Omnibus:                     2395.326   Durbin-Watson:                   1.837
Prob(Omnibus):                  0.000   Jarque-Bera (JB):           106869.684
Skew:                           4.001   Prob(JB):                         0.00
Kurtosis:                      32.618   Cond. No.                         16.3
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
In [7]:
crime_ols_r = smf.ols(formula='narr86  ~ pcnv + ptime86 + qemp86 + 1', data=df).fit()
resid = df.narr86 - crime_ols_r.predict()
print(crime_ols_r.summary())
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                 narr86   R-squared:                       0.041
Model:                            OLS   Adj. R-squared:                  0.040
Method:                 Least Squares   F-statistic:                     39.10
Date:                Thu, 09 Apr 2020   Prob (F-statistic):           9.91e-25
Time:                        19:38:34   Log-Likelihood:                -3394.7
No. Observations:                2725   AIC:                             6797.
Df Residuals:                    2721   BIC:                             6821.
Df Model:                           3                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept      0.7118      0.033     21.565      0.000       0.647       0.776
pcnv          -0.1499      0.041     -3.669      0.000      -0.230      -0.070
ptime86       -0.0344      0.009     -4.007      0.000      -0.051      -0.018
qemp86        -0.1041      0.010    -10.023      0.000      -0.124      -0.084
==============================================================================
Omnibus:                     2394.860   Durbin-Watson:                   1.836
Prob(Omnibus):                  0.000   Jarque-Bera (JB):           106169.153
Skew:                           4.002   Prob(JB):                         0.00
Kurtosis:                      32.513   Cond. No.                         8.27
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
In [8]:
print(" (LM, p, df) = ", crime_ols.compare_lm_test(crime_ols_r))
 (LM, p, df) =  (4.070729461071197, 0.13063282803269777, 2.0)
In [9]:
#Alternatively, 
crime_resid = smf.ols(formula='resid  ~ pcnv  + avgsen + tottime + ptime86 + qemp86 + 1', data=df).fit()
print(crime_resid.summary())
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                  resid   R-squared:                       0.001
Model:                            OLS   Adj. R-squared:                 -0.000
Method:                 Least Squares   F-statistic:                    0.8136
Date:                Thu, 09 Apr 2020   Prob (F-statistic):              0.540
Time:                        19:38:34   Log-Likelihood:                -3392.7
No. Observations:                2725   AIC:                             6797.
Df Residuals:                    2719   BIC:                             6833.
Df Model:                           5                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept     -0.0057      0.033     -0.172      0.863      -0.071       0.059
pcnv          -0.0013      0.041     -0.032      0.975      -0.081       0.079
avgsen        -0.0070      0.012     -0.568      0.570      -0.031       0.017
tottime        0.0121      0.010      1.263      0.207      -0.007       0.031
ptime86       -0.0048      0.009     -0.543      0.587      -0.022       0.013
qemp86         0.0010      0.010      0.098      0.922      -0.019       0.021
==============================================================================
Omnibus:                     2395.326   Durbin-Watson:                   1.837
Prob(Omnibus):                  0.000   Jarque-Bera (JB):           106869.684
Skew:                           4.001   Prob(JB):                         0.00
Kurtosis:                      32.618   Cond. No.                         16.3
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
In [10]:
LM = 2725 * 0.0015 # N'Rsq
LM
Out[10]:
4.0875