Chapter 7. Multiple Regression Analysis with Qualitative Information

Chapter 7. Multiple Regression Analysis with Qualitative Information#

import numpy as np
import statsmodels.formula.api as smf

from wooldridge import *

Example 7.1. Hourly wage equation#

df = dataWoo('wage1')

wage_female = smf.ols(formula='wage ~ female + educ + exper + tenure + 1', data=df).fit()
print(wage_female.summary())
# b1 measures the average wage difference between men and women who have the same level of educ, exper and tenure.

                            OLS Regression Results                            
==============================================================================
Dep. Variable:                   wage   R-squared:                       0.364
Model:                            OLS   Adj. R-squared:                  0.359
Method:                 Least Squares   F-statistic:                     74.40
Date:                Mon, 11 Dec 2023   Prob (F-statistic):           7.30e-50
Time:                        18:36:51   Log-Likelihood:                -1314.2
No. Observations:                 526   AIC:                             2638.
Df Residuals:                     521   BIC:                             2660.
Df Model:                           4                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept     -1.5679      0.725     -2.164      0.031      -2.991      -0.145
female        -1.8109      0.265     -6.838      0.000      -2.331      -1.291
educ           0.5715      0.049     11.584      0.000       0.475       0.668
exper          0.0254      0.012      2.195      0.029       0.003       0.048
tenure         0.1410      0.021      6.663      0.000       0.099       0.183
==============================================================================
Omnibus:                      185.864   Durbin-Watson:                   1.794
Prob(Omnibus):                  0.000   Jarque-Bera (JB):              715.580
Skew:                           1.589   Prob(JB):                    4.11e-156
Kurtosis:                       7.749   Cond. No.                         141.
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

wage_female = smf.ols(formula='wage ~ female + 1', data=df).fit()
print(wage_female.summary())
# b0 is the average wage for men in the sample.

                            OLS Regression Results                            
==============================================================================
Dep. Variable:                   wage   R-squared:                       0.116
Model:                            OLS   Adj. R-squared:                  0.114
Method:                 Least Squares   F-statistic:                     68.54
Date:                Mon, 11 Dec 2023   Prob (F-statistic):           1.04e-15
Time:                        18:36:51   Log-Likelihood:                -1400.7
No. Observations:                 526   AIC:                             2805.
Df Residuals:                     524   BIC:                             2814.
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept      7.0995      0.210     33.806      0.000       6.687       7.512
female        -2.5118      0.303     -8.279      0.000      -3.108      -1.916
==============================================================================
Omnibus:                      223.488   Durbin-Watson:                   1.818
Prob(Omnibus):                  0.000   Jarque-Bera (JB):              929.998
Skew:                           1.928   Prob(JB):                    1.13e-202
Kurtosis:                       8.250   Cond. No.                         2.57
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

#Mean wage for male and female
from pandas.api.types import CategoricalDtype
df = df.melt(['wage', 'educ','exper', 'tenure', 'female'])
df.groupby('female').mean(numeric_only=True) 

	wage	educ	exper	tenure	value
female
0	7.099489	12.788321	17.558394	6.474453	32.003058
1	4.587659	12.317460	16.428571	3.615079	26.502281

Example 7.2. Effect of computer ownership on collage GPA#

df = dataWoo('gpa1')
colGPA_ur = smf.ols(formula='colGPA ~ PC + hsGPA + ACT + 1', data=df).fit()
print(colGPA_ur.summary())

                            OLS Regression Results                            
==============================================================================
Dep. Variable:                 colGPA   R-squared:                       0.219
Model:                            OLS   Adj. R-squared:                  0.202
Method:                 Least Squares   F-statistic:                     12.83
Date:                Mon, 11 Dec 2023   Prob (F-statistic):           1.93e-07
Time:                        18:36:51   Log-Likelihood:                -42.796
No. Observations:                 141   AIC:                             93.59
Df Residuals:                     137   BIC:                             105.4
Df Model:                           3                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept      1.2635      0.333      3.793      0.000       0.605       1.922
PC             0.1573      0.057      2.746      0.007       0.044       0.271
hsGPA          0.4472      0.094      4.776      0.000       0.262       0.632
ACT            0.0087      0.011      0.822      0.413      -0.012       0.029
==============================================================================
Omnibus:                        2.770   Durbin-Watson:                   1.870
Prob(Omnibus):                  0.250   Jarque-Bera (JB):                1.863
Skew:                           0.016   Prob(JB):                        0.394
Kurtosis:                       2.438   Cond. No.                         298.
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

colGPA_r = smf.ols(formula='colGPA ~ PC + 1', data=df).fit()
print(colGPA_r.summary())

                            OLS Regression Results                            
==============================================================================
Dep. Variable:                 colGPA   R-squared:                       0.050
Model:                            OLS   Adj. R-squared:                  0.043
Method:                 Least Squares   F-statistic:                     7.314
Date:                Mon, 11 Dec 2023   Prob (F-statistic):            0.00770
Time:                        18:36:51   Log-Likelihood:                -56.641
No. Observations:                 141   AIC:                             117.3
Df Residuals:                     139   BIC:                             123.2
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept      2.9894      0.040     75.678      0.000       2.911       3.068
PC             0.1695      0.063      2.704      0.008       0.046       0.293
==============================================================================
Omnibus:                        2.136   Durbin-Watson:                   1.941
Prob(Omnibus):                  0.344   Jarque-Bera (JB):                1.852
Skew:                           0.160   Prob(JB):                        0.396
Kurtosis:                       2.539   Cond. No.                         2.45
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

df = df.melt(['PC', 'colGPA'])
df.groupby('PC').mean(numeric_only=True) 

	colGPA	value
PC
0	2.989412	2.256078
1	3.158929	2.250529

Example 7.3. Effect of Training Grants on hours of training#

df = dataWoo('jtrain')
df = df[(df['year']==1988)]
jobb_reg = smf.ols(formula='hrsemp ~ grant + lsales + lemploy + 1', data=df).fit()
print(jobb_reg.summary())

                            OLS Regression Results                            
==============================================================================
Dep. Variable:                 hrsemp   R-squared:                       0.237
Model:                            OLS   Adj. R-squared:                  0.214
Method:                 Least Squares   F-statistic:                     10.44
Date:                Mon, 11 Dec 2023   Prob (F-statistic):           4.80e-06
Time:                        18:36:51   Log-Likelihood:                -482.29
No. Observations:                 105   AIC:                             972.6
Df Residuals:                     101   BIC:                             983.2
Df Model:                           3                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept     46.6651     43.412      1.075      0.285     -39.453     132.783
grant         26.2545      5.592      4.695      0.000      15.162      37.347
lsales        -0.9846      3.540     -0.278      0.781      -8.007       6.038
lemploy       -6.0699      3.883     -1.563      0.121     -13.772       1.633
==============================================================================
Omnibus:                       66.291   Durbin-Watson:                   2.182
Prob(Omnibus):                  0.000   Jarque-Bera (JB):              284.382
Skew:                           2.191   Prob(JB):                     1.77e-62
Kurtosis:                       9.767   Cond. No.                         285.
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

Example 7.4. Housing price regression#

df = dataWoo('hprice1')
hrpice_reg = smf.ols(formula='lprice ~ llotsize + lsqrft  + bdrms + colonial + 1', data=df).fit()
print(hrpice_reg.summary())

                            OLS Regression Results                            
==============================================================================
Dep. Variable:                 lprice   R-squared:                       0.649
Model:                            OLS   Adj. R-squared:                  0.632
Method:                 Least Squares   F-statistic:                     38.38
Date:                Mon, 11 Dec 2023   Prob (F-statistic):           3.74e-18
Time:                        18:36:51   Log-Likelihood:                 26.619
No. Observations:                  88   AIC:                            -43.24
Df Residuals:                      83   BIC:                            -30.85
Df Model:                           4                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept     -1.3496      0.651     -2.073      0.041      -2.644      -0.055
llotsize       0.1678      0.038      4.395      0.000       0.092       0.244
lsqrft         0.7072      0.093      7.620      0.000       0.523       0.892
bdrms          0.0268      0.029      0.934      0.353      -0.030       0.084
colonial       0.0538      0.045      1.202      0.233      -0.035       0.143
==============================================================================
Omnibus:                       13.728   Durbin-Watson:                   2.077
Prob(Omnibus):                  0.001   Jarque-Bera (JB):               50.828
Skew:                          -0.053   Prob(JB):                     9.18e-12
Kurtosis:                       6.722   Cond. No.                         411.
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

Example7.5. Hourly wage equation#

df = dataWoo('wage1')
wage_reg = smf.ols(formula='lwage ~ female + educ  + exper + expersq + tenure + tenursq + 1', data=df).fit()
print(wage_reg.summary())

                            OLS Regression Results                            
==============================================================================
Dep. Variable:                  lwage   R-squared:                       0.441
Model:                            OLS   Adj. R-squared:                  0.434
Method:                 Least Squares   F-statistic:                     68.18
Date:                Mon, 11 Dec 2023   Prob (F-statistic):           2.11e-62
Time:                        18:36:51   Log-Likelihood:                -260.59
No. Observations:                 526   AIC:                             535.2
Df Residuals:                     519   BIC:                             565.0
Df Model:                           6                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept      0.4167      0.099      4.212      0.000       0.222       0.611
female        -0.2965      0.036     -8.281      0.000      -0.367      -0.226
educ           0.0802      0.007     11.868      0.000       0.067       0.093
exper          0.0294      0.005      5.916      0.000       0.020       0.039
expersq       -0.0006      0.000     -5.431      0.000      -0.001      -0.000
tenure         0.0317      0.007      4.633      0.000       0.018       0.045
tenursq       -0.0006      0.000     -2.493      0.013      -0.001      -0.000
==============================================================================
Omnibus:                       13.111   Durbin-Watson:                   1.796
Prob(Omnibus):                  0.001   Jarque-Bera (JB):               24.212
Skew:                          -0.085   Prob(JB):                     5.53e-06
Kurtosis:                       4.037   Cond. No.                     4.49e+03
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 4.49e+03. This might indicate that there are
strong multicollinearity or other numerical problems.

level = np.exp(-.2965) - 1 
level

-0.2565843727358956

Example 7.6. Hourly wage equation#

df = dataWoo('wage1')
df = df.melt(['wage','lwage', 'educ','exper', 'expersq', 'tenure', 'tenursq', 'female', 'married' ])
wage_reg = smf.ols('lwage ~ female * married + educ  + exper + expersq + tenure + tenursq + 1', data=df).fit()
print(wage_reg.summary())

                            OLS Regression Results                            
==============================================================================
Dep. Variable:                  lwage   R-squared:                       0.461
Model:                            OLS   Adj. R-squared:                  0.460
Method:                 Least Squares   F-statistic:                     842.1
Date:                Mon, 11 Dec 2023   Prob (F-statistic):               0.00
Time:                        18:36:51   Log-Likelihood:                -3764.3
No. Observations:                7890   AIC:                             7547.
Df Residuals:                    7881   BIC:                             7609.
Df Model:                           8                                         
Covariance Type:            nonrobust                                         
==================================================================================
                     coef    std err          t      P>|t|      [0.025      0.975]
----------------------------------------------------------------------------------
Intercept          0.3214      0.026     12.547      0.000       0.271       0.372
female            -0.1104      0.014     -7.729      0.000      -0.138      -0.082
married            0.2127      0.014     15.000      0.000       0.185       0.240
female:married    -0.3006      0.018    -16.353      0.000      -0.337      -0.265
educ               0.0789      0.002     46.022      0.000       0.076       0.082
exper              0.0268      0.001     19.958      0.000       0.024       0.029
expersq           -0.0005   2.83e-05    -18.925      0.000      -0.001      -0.000
tenure             0.0291      0.002     16.795      0.000       0.026       0.032
tenursq           -0.0005   5.92e-05     -9.002      0.000      -0.001      -0.000
==============================================================================
Omnibus:                      203.516   Durbin-Watson:                   1.786
Prob(Omnibus):                  0.000   Jarque-Bera (JB):              482.736
Skew:                          -0.062   Prob(JB):                    1.50e-105
Kurtosis:                       4.205   Cond. No.                     5.06e+03
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 5.06e+03. This might indicate that there are
strong multicollinearity or other numerical problems.

Example 7.7. Effects of physical attractiveness on wage#

df = dataWoo('beauty')
df = df[(df['female']==0)]
wage_reg = smf.ols('lwage ~ belavg + abvavg + educ  + exper + expersq + union + married + black + south + goodhlth + 1', 
                   data=df).fit()
print(wage_reg.summary())

                            OLS Regression Results                            
==============================================================================
Dep. Variable:                  lwage   R-squared:                       0.255
Model:                            OLS   Adj. R-squared:                  0.246
Method:                 Least Squares   F-statistic:                     27.82
Date:                Mon, 11 Dec 2023   Prob (F-statistic):           5.64e-46
Time:                        18:36:51   Log-Likelihood:                -540.02
No. Observations:                 824   AIC:                             1102.
Df Residuals:                     813   BIC:                             1154.
Df Model:                          10                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept      0.4778      0.120      3.970      0.000       0.242       0.714
belavg        -0.1647      0.053     -3.113      0.002      -0.269      -0.061
abvavg        -0.0250      0.038     -0.662      0.508      -0.099       0.049
educ           0.0607      0.007      9.218      0.000       0.048       0.074
exper          0.0464      0.006      8.019      0.000       0.035       0.058
expersq       -0.0007      0.000     -5.883      0.000      -0.001      -0.000
union          0.1485      0.036      4.099      0.000       0.077       0.220
married        0.0644      0.044      1.454      0.146      -0.022       0.151
black         -0.2570      0.076     -3.397      0.001      -0.405      -0.108
south          0.0853      0.043      1.991      0.047       0.001       0.169
goodhlth       0.0011      0.070      0.016      0.987      -0.137       0.139
==============================================================================
Omnibus:                       29.805   Durbin-Watson:                   1.789
Prob(Omnibus):                  0.000   Jarque-Bera (JB):               64.071
Skew:                           0.180   Prob(JB):                     1.22e-14
Kurtosis:                       4.318   Cond. No.                     6.27e+03
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 6.27e+03. This might indicate that there are
strong multicollinearity or other numerical problems.

df = dataWoo('beauty')
df = df[(df['female']==1)]
beauty_reg = smf.ols('lwage ~ belavg + abvavg + educ  + exper + expersq + union + married + black + south + goodhlth + 1',
                     data=df).fit()
print(beauty_reg.summary())

                            OLS Regression Results                            
==============================================================================
Dep. Variable:                  lwage   R-squared:                       0.278
Model:                            OLS   Adj. R-squared:                  0.261
Method:                 Least Squares   F-statistic:                     16.40
Date:                Mon, 11 Dec 2023   Prob (F-statistic):           4.31e-25
Time:                        18:36:51   Log-Likelihood:                -265.56
No. Observations:                 436   AIC:                             553.1
Df Residuals:                     425   BIC:                             598.0
Df Model:                          10                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept     -0.0773      0.144     -0.536      0.592      -0.361       0.206
belavg        -0.1142      0.066     -1.718      0.087      -0.245       0.016
abvavg         0.0686      0.049      1.398      0.163      -0.028       0.165
educ           0.0757      0.009      8.502      0.000       0.058       0.093
exper          0.0295      0.007      4.109      0.000       0.015       0.044
expersq       -0.0005      0.000     -2.814      0.005      -0.001      -0.000
union          0.2927      0.054      5.431      0.000       0.187       0.399
married       -0.0623      0.044     -1.404      0.161      -0.149       0.025
black          0.1438      0.069      2.087      0.038       0.008       0.279
south          0.0085      0.060      0.142      0.887      -0.109       0.126
goodhlth       0.1131      0.081      1.391      0.165      -0.047       0.273
==============================================================================
Omnibus:                       33.272   Durbin-Watson:                   1.868
Prob(Omnibus):                  0.000   Jarque-Bera (JB):              111.518
Skew:                           0.238   Prob(JB):                     6.08e-25
Kurtosis:                       5.432   Cond. No.                     3.70e+03
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 3.7e+03. This might indicate that there are
strong multicollinearity or other numerical problems.

Example 7.8. Effects of law school rankings on starting salaries#

df = dataWoo('lawsch85')

df['r61_100'] = 0
df.loc[(df['rank'] > 60) & (df['rank'] <= 100), 'r61_100'] = 1

r_reg = smf.ols(formula = 
                     'lsalary ~ top10 + r11_25  + r26_40 + r41_60 + r61_100 + LSAT + GPA + llibvol + lcost + 1', 
                     data=df).fit()
print(r_reg.summary())

                            OLS Regression Results                            
==============================================================================
Dep. Variable:                lsalary   R-squared:                       0.911
Model:                            OLS   Adj. R-squared:                  0.905
Method:                 Least Squares   F-statistic:                     143.2
Date:                Mon, 11 Dec 2023   Prob (F-statistic):           9.45e-62
Time:                        18:36:52   Log-Likelihood:                 146.45
No. Observations:                 136   AIC:                            -272.9
Df Residuals:                     126   BIC:                            -243.8
Df Model:                           9                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept      9.1653      0.411     22.277      0.000       8.351       9.979
top10          0.6996      0.053     13.078      0.000       0.594       0.805
r11_25         0.5935      0.039     15.049      0.000       0.515       0.672
r26_40         0.3751      0.034     11.005      0.000       0.308       0.443
r41_60         0.2628      0.028      9.399      0.000       0.207       0.318
r61_100        0.1316      0.021      6.254      0.000       0.090       0.173
LSAT           0.0057      0.003      1.858      0.066      -0.000       0.012
GPA            0.0137      0.074      0.185      0.854      -0.133       0.161
llibvol        0.0364      0.026      1.398      0.165      -0.015       0.088
lcost          0.0008      0.025      0.033      0.973      -0.049       0.051
==============================================================================
Omnibus:                        9.419   Durbin-Watson:                   1.926
Prob(Omnibus):                  0.009   Jarque-Bera (JB):               20.478
Skew:                           0.100   Prob(JB):                     3.57e-05
Kurtosis:                       4.890   Cond. No.                     8.98e+03
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 8.98e+03. This might indicate that there are
strong multicollinearity or other numerical problems.

hypotheses = '(LSAT = GPA = llibvol = lcost = 0)'
f_test = r_reg.f_test(hypotheses)
print(f_test)

<F test: F=2.3853161323547907, p=0.054704376456822586, df_denom=126, df_num=4>

display = np.exp(0.6996)-1
display

1.0129473674662273

Example 7.9. Effects of computer usage on wages#

Data not available#

Example 7.10. Log hourly wage equation#

df=dataWoo('wage1')
wage_reg = smf.ols('lwage ~ female*educ  + exper + expersq + tenure + tenursq + 1', data=df).fit()
print(wage_reg.summary())

                            OLS Regression Results                            
==============================================================================
Dep. Variable:                  lwage   R-squared:                       0.441
Model:                            OLS   Adj. R-squared:                  0.433
Method:                 Least Squares   F-statistic:                     58.37
Date:                Mon, 11 Dec 2023   Prob (F-statistic):           1.67e-61
Time:                        18:36:52   Log-Likelihood:                -260.49
No. Observations:                 526   AIC:                             537.0
Df Residuals:                     518   BIC:                             571.1
Df Model:                           7                                         
Covariance Type:            nonrobust                                         
===============================================================================
                  coef    std err          t      P>|t|      [0.025      0.975]
-------------------------------------------------------------------------------
Intercept       0.3888      0.119      3.276      0.001       0.156       0.622
female         -0.2268      0.168     -1.354      0.176      -0.556       0.102
educ            0.0824      0.008      9.725      0.000       0.066       0.099
female:educ    -0.0056      0.013     -0.426      0.670      -0.031       0.020
exper           0.0293      0.005      5.886      0.000       0.020       0.039
expersq        -0.0006      0.000     -5.398      0.000      -0.001      -0.000
tenure          0.0319      0.007      4.647      0.000       0.018       0.045
tenursq        -0.0006      0.000     -2.509      0.012      -0.001      -0.000
==============================================================================
Omnibus:                       13.302   Durbin-Watson:                   1.795
Prob(Omnibus):                  0.001   Jarque-Bera (JB):               24.887
Skew:                          -0.080   Prob(JB):                     3.94e-06
Kurtosis:                       4.053   Cond. No.                     8.41e+03
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 8.41e+03. This might indicate that there are
strong multicollinearity or other numerical problems.

hypotheses = '(female:educ = female = 0)'
f_test = wage_reg.f_test(hypotheses)
print(f_test)

<F test: F=34.32554911448261, p=1.00234395722165e-14, df_denom=518, df_num=2>

Example 7.11. Effects of race on baseball player salaries#

df=dataWoo('mlb1')
df = df[(df['percblck']!=0)]
mlb1_reg = smf.ols('lsalary ~ years + gamesyr + bavg + hrunsyr + rbisyr + runsyr + fldperc + allstar + black + hispan + black:percblck + hispan:perchisp  + 1', data=df).fit()
print(mlb1_reg.summary())

                            OLS Regression Results                            
==============================================================================
Dep. Variable:                lsalary   R-squared:                       0.638
Model:                            OLS   Adj. R-squared:                  0.624
Method:                 Least Squares   F-statistic:                     46.48
Date:                Mon, 11 Dec 2023   Prob (F-statistic):           1.31e-62
Time:                        18:36:52   Log-Likelihood:                -350.12
No. Observations:                 330   AIC:                             726.2
Df Residuals:                     317   BIC:                             775.6
Df Model:                          12                                         
Covariance Type:            nonrobust                                         
===================================================================================
                      coef    std err          t      P>|t|      [0.025      0.975]
-----------------------------------------------------------------------------------
Intercept          10.3437      2.183      4.739      0.000       6.050      14.638
years               0.0673      0.013      5.224      0.000       0.042       0.093
gamesyr             0.0089      0.003      2.624      0.009       0.002       0.016
bavg                0.0009      0.002      0.625      0.533      -0.002       0.004
hrunsyr             0.0146      0.016      0.889      0.375      -0.018       0.047
rbisyr              0.0045      0.008      0.593      0.553      -0.010       0.019
runsyr              0.0072      0.005      1.577      0.116      -0.002       0.016
fldperc             0.0011      0.002      0.513      0.609      -0.003       0.005
allstar             0.0075      0.003      2.621      0.009       0.002       0.013
black              -0.1980      0.125     -1.578      0.116      -0.445       0.049
hispan             -0.1900      0.153     -1.241      0.215      -0.491       0.111
black:percblck      0.0125      0.005      2.509      0.013       0.003       0.022
hispan:perchisp     0.0201      0.010      2.051      0.041       0.001       0.039
==============================================================================
Omnibus:                        1.828   Durbin-Watson:                   1.341
Prob(Omnibus):                  0.401   Jarque-Bera (JB):                1.709
Skew:                           0.012   Prob(JB):                        0.425
Kurtosis:                       3.352   Cond. No.                     5.65e+04
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 5.65e+04. This might indicate that there are
strong multicollinearity or other numerical problems.

hypotheses = '(black = hispan = black:percblck = hispan:perchisp = 0)'
f_test = mlb1_reg.f_test(hypotheses)
print(f_test)

<F test: F=2.647888304563965, p=0.03347614735675849, df_denom=317, df_num=4>

df=dataWoo('mlb1')
df = df[(df['percblck']!=0)]
mlb1_reg_r = smf.ols('lsalary ~ years + gamesyr + bavg + hrunsyr + rbisyr + runsyr + fldperc + allstar + 1', data=df).fit()
print(mlb1_reg_r.summary())

                            OLS Regression Results                            
==============================================================================
Dep. Variable:                lsalary   R-squared:                       0.644
Model:                            OLS   Adj. R-squared:                  0.636
Method:                 Least Squares   F-statistic:                     77.94
Date:                Mon, 11 Dec 2023   Prob (F-statistic):           1.35e-72
Time:                        18:36:52   Log-Likelihood:                -377.03
No. Observations:                 353   AIC:                             772.1
Df Residuals:                     344   BIC:                             806.9
Df Model:                           8                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept     10.3277      2.002      5.159      0.000       6.390      14.265
years          0.0629      0.012      5.146      0.000       0.039       0.087
gamesyr        0.0092      0.003      2.848      0.005       0.003       0.016
bavg           0.0004      0.001      0.345      0.730      -0.002       0.003
hrunsyr        0.0195      0.016      1.230      0.220      -0.012       0.051
rbisyr         0.0027      0.007      0.365      0.716      -0.012       0.017
runsyr         0.0096      0.004      2.229      0.026       0.001       0.018
fldperc        0.0012      0.002      0.600      0.549      -0.003       0.005
allstar        0.0069      0.003      2.468      0.014       0.001       0.012
==============================================================================
Omnibus:                        2.170   Durbin-Watson:                   1.300
Prob(Omnibus):                  0.338   Jarque-Bera (JB):                2.071
Skew:                           0.068   Prob(JB):                        0.355
Kurtosis:                       3.350   Cond. No.                     5.36e+04
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 5.36e+04. This might indicate that there are
strong multicollinearity or other numerical problems.

Equation [7.22] (Page 222)#

df=dataWoo('gpa3')
df = df[(df['spring']==1)]
gpa3_reg = smf.ols('cumgpa ~ female*sat + hsperc + female:hsperc + tothrs + female:tothrs + 1', data=df).fit()
print(gpa3_reg.summary())

                            OLS Regression Results                            
==============================================================================
Dep. Variable:                 cumgpa   R-squared:                       0.406
Model:                            OLS   Adj. R-squared:                  0.394
Method:                 Least Squares   F-statistic:                     34.95
Date:                Mon, 11 Dec 2023   Prob (F-statistic):           4.66e-37
Time:                        18:36:52   Log-Likelihood:                -237.26
No. Observations:                 366   AIC:                             490.5
Df Residuals:                     358   BIC:                             521.7
Df Model:                           7                                         
Covariance Type:            nonrobust                                         
=================================================================================
                    coef    std err          t      P>|t|      [0.025      0.975]
---------------------------------------------------------------------------------
Intercept         1.4808      0.207      7.142      0.000       1.073       1.889
female           -0.3535      0.411     -0.861      0.390      -1.161       0.454
sat               0.0011      0.000      5.807      0.000       0.001       0.001
female:sat        0.0008      0.000      1.949      0.052   -6.88e-06       0.002
hsperc           -0.0085      0.001     -6.167      0.000      -0.011      -0.006
female:hsperc    -0.0005      0.003     -0.174      0.862      -0.007       0.006
tothrs            0.0023      0.001      2.718      0.007       0.001       0.004
female:tothrs    -0.0001      0.002     -0.071      0.943      -0.003       0.003
==============================================================================
Omnibus:                        6.858   Durbin-Watson:                   1.832
Prob(Omnibus):                  0.032   Jarque-Bera (JB):                9.054
Skew:                          -0.146   Prob(JB):                       0.0108
Kurtosis:                       3.713   Cond. No.                     1.66e+04
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 1.66e+04. This might indicate that there are
strong multicollinearity or other numerical problems.

hypotheses = '(female:tothrs = female:sat = female:hsperc = 0)'
f_test = gpa3_reg.f_test(hypotheses)
print(f_test)

<F test: F=1.5338978108616452, p=0.20537335628140385, df_denom=358, df_num=3>

Equation [7.25] (Page 224)#

df=dataWoo('gpa3')
df = df[(df['spring']==1)]
gpa3_reg_r = smf.ols('cumgpa ~ female + sat + hsperc + tothrs + 1', data=df).fit()
print(gpa3_reg_r.summary())

                            OLS Regression Results                            
==============================================================================
Dep. Variable:                 cumgpa   R-squared:                       0.398
Model:                            OLS   Adj. R-squared:                  0.392
Method:                 Least Squares   F-statistic:                     59.74
Date:                Mon, 11 Dec 2023   Prob (F-statistic):           1.10e-38
Time:                        18:36:52   Log-Likelihood:                -239.59
No. Observations:                 366   AIC:                             489.2
Df Residuals:                     361   BIC:                             508.7
Df Model:                           4                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept      1.3285      0.180      7.388      0.000       0.975       1.682
female         0.3101      0.059      5.291      0.000       0.195       0.425
sat            0.0012      0.000      7.635      0.000       0.001       0.002
hsperc        -0.0084      0.001     -6.839      0.000      -0.011      -0.006
tothrs         0.0025      0.001      3.379      0.001       0.001       0.004
==============================================================================
Omnibus:                        5.200   Durbin-Watson:                   1.860
Prob(Omnibus):                  0.074   Jarque-Bera (JB):                6.107
Skew:                          -0.139   Prob(JB):                       0.0472
Kurtosis:                       3.568   Cond. No.                     6.74e+03
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 6.74e+03. This might indicate that there are
strong multicollinearity or other numerical problems.

Equation [7.29] (Page 225)#

df = dataWoo('mroz')
mroz_reg = smf.ols('inlf ~ nwifeinc + educ + exper + expersq + age + kidslt6 + kidsge6 + 1', data=df).fit()
print(mroz_reg.summary())

                            OLS Regression Results                            
==============================================================================
Dep. Variable:                   inlf   R-squared:                       0.264
Model:                            OLS   Adj. R-squared:                  0.257
Method:                 Least Squares   F-statistic:                     38.22
Date:                Mon, 11 Dec 2023   Prob (F-statistic):           6.90e-46
Time:                        18:36:52   Log-Likelihood:                -423.89
No. Observations:                 753   AIC:                             863.8
Df Residuals:                     745   BIC:                             900.8
Df Model:                           7                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept      0.5855      0.154      3.798      0.000       0.283       0.888
nwifeinc      -0.0034      0.001     -2.351      0.019      -0.006      -0.001
educ           0.0380      0.007      5.151      0.000       0.024       0.052
exper          0.0395      0.006      6.962      0.000       0.028       0.051
expersq       -0.0006      0.000     -3.227      0.001      -0.001      -0.000
age           -0.0161      0.002     -6.476      0.000      -0.021      -0.011
kidslt6       -0.2618      0.034     -7.814      0.000      -0.328      -0.196
kidsge6        0.0130      0.013      0.986      0.324      -0.013       0.039
==============================================================================
Omnibus:                      169.137   Durbin-Watson:                   0.494
Prob(Omnibus):                  0.000   Jarque-Bera (JB):               36.741
Skew:                          -0.196   Prob(JB):                     1.05e-08
Kurtosis:                       1.991   Cond. No.                     3.06e+03
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 3.06e+03. This might indicate that there are
strong multicollinearity or other numerical problems.

Example7.12. A linear probability model of arrest#

df = dataWoo('crime1')

df['arr86'] = 0
df.loc[(df['narr86'] > 0), 'arr86'] = 1

crime_reg = smf.ols(' df.arr86 ~ pcnv + avgsen + tottime + ptime86 + qemp86 + 1', data=df).fit()
print(crime_reg.summary())

                            OLS Regression Results                            
==============================================================================
Dep. Variable:               df.arr86   R-squared:                       0.047
Model:                            OLS   Adj. R-squared:                  0.046
Method:                 Least Squares   F-statistic:                     27.03
Date:                Mon, 11 Dec 2023   Prob (F-statistic):           9.09e-27
Time:                        18:36:52   Log-Likelihood:                -1609.7
No. Observations:                2725   AIC:                             3231.
Df Residuals:                    2719   BIC:                             3267.
Df Model:                           5                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept      0.4406      0.017     25.568      0.000       0.407       0.474
pcnv          -0.1624      0.021     -7.649      0.000      -0.204      -0.121
avgsen         0.0061      0.006      0.947      0.344      -0.007       0.019
tottime       -0.0023      0.005     -0.454      0.650      -0.012       0.007
ptime86       -0.0220      0.005     -4.739      0.000      -0.031      -0.013
qemp86        -0.0428      0.005     -7.925      0.000      -0.053      -0.032
==============================================================================
Omnibus:                      766.502   Durbin-Watson:                   1.581
Prob(Omnibus):                  0.000   Jarque-Bera (JB):              474.581
Skew:                           0.903   Prob(JB):                    8.83e-104
Kurtosis:                       2.043   Cond. No.                         16.3
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

hypotheses = '(avgsen = tottime = 0)'
f_test = crime_reg.f_test(hypotheses)
print(f_test)

<F test: F=1.0597004440363245, p=0.346702695391494, df_denom=2.72e+03, df_num=2>

Equation [7.32] (Page 228)#

crime_reg_2 = smf.ols(' df.arr86 ~ pcnv + avgsen + tottime + ptime86 + qemp86 + black + hispan + 1', data=df).fit()
print(crime_reg_2.summary())

                            OLS Regression Results                            
==============================================================================
Dep. Variable:               df.arr86   R-squared:                       0.068
Model:                            OLS   Adj. R-squared:                  0.066
Method:                 Least Squares   F-statistic:                     28.41
Date:                Mon, 11 Dec 2023   Prob (F-statistic):           5.46e-38
Time:                        18:36:52   Log-Likelihood:                -1579.6
No. Observations:                2725   AIC:                             3175.
Df Residuals:                    2717   BIC:                             3222.
Df Model:                           7                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept      0.3804      0.019     20.314      0.000       0.344       0.417
pcnv          -0.1521      0.021     -7.219      0.000      -0.193      -0.111
avgsen         0.0046      0.006      0.723      0.470      -0.008       0.017
tottime       -0.0026      0.005     -0.520      0.603      -0.012       0.007
ptime86       -0.0237      0.005     -5.157      0.000      -0.033      -0.015
qemp86        -0.0385      0.005     -7.123      0.000      -0.049      -0.028
black          0.1698      0.024      7.171      0.000       0.123       0.216
hispan         0.0962      0.021      4.644      0.000       0.056       0.137
==============================================================================
Omnibus:                      631.338   Durbin-Watson:                   1.571
Prob(Omnibus):                  0.000   Jarque-Bera (JB):              443.423
Skew:                           0.880   Prob(JB):                     5.15e-97
Kurtosis:                       2.102   Cond. No.                         19.5
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

Equation [7.33] (Page 229)#

df = dataWoo('jtrain')
df = df[(df['year']==1988)]
jobb_reg = smf.ols(' lscrap ~ grant + lsales + lemploy + 1', data=df).fit()
print(jobb_reg.summary())

                            OLS Regression Results                            
==============================================================================
Dep. Variable:                 lscrap   R-squared:                       0.072
Model:                            OLS   Adj. R-squared:                  0.011
Method:                 Least Squares   F-statistic:                     1.182
Date:                Mon, 11 Dec 2023   Prob (F-statistic):              0.327
Time:                        18:36:52   Log-Likelihood:                -85.161
No. Observations:                  50   AIC:                             178.3
Df Residuals:                      46   BIC:                             186.0
Df Model:                           3                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept      4.9868      4.656      1.071      0.290      -4.384      14.358
grant         -0.0518      0.431     -0.120      0.905      -0.920       0.816
lsales        -0.4548      0.373     -1.218      0.229      -1.206       0.297
lemploy        0.6394      0.365      1.751      0.087      -0.096       1.374
==============================================================================
Omnibus:                        0.246   Durbin-Watson:                   1.779
Prob(Omnibus):                  0.884   Jarque-Bera (JB):                0.163
Skew:                          -0.132   Prob(JB):                        0.922
Kurtosis:                       2.905   Cond. No.                         380.
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

Equation [7.35] (Page 231)#

df = dataWoo('fertil2')
fert_reg = smf.ols(' children ~ age + educ + 1', data=df).fit()
print(fert_reg.summary())

                            OLS Regression Results                            
==============================================================================
Dep. Variable:               children   R-squared:                       0.560
Model:                            OLS   Adj. R-squared:                  0.559
Method:                 Least Squares   F-statistic:                     2768.
Date:                Mon, 11 Dec 2023   Prob (F-statistic):               0.00
Time:                        18:36:52   Log-Likelihood:                -7881.7
No. Observations:                4361   AIC:                         1.577e+04
Df Residuals:                    4358   BIC:                         1.579e+04
Df Model:                           2                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept     -1.9967      0.094    -21.249      0.000      -2.181      -1.813
age            0.1748      0.003     64.631      0.000       0.170       0.180
educ          -0.0899      0.006    -15.034      0.000      -0.102      -0.078
==============================================================================
Omnibus:                      224.819   Durbin-Watson:                   1.884
Prob(Omnibus):                  0.000   Jarque-Bera (JB):              840.653
Skew:                          -0.066   Prob(JB):                    2.85e-183
Kurtosis:                       5.147   Cond. No.                         123.
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

Equation [7.37] (Page 232)#

df = dataWoo('fertil2')
fert_reg_2 = smf.ols(' children ~ age + educ + electric + 1', data=df).fit()
print(fert_reg_2.summary())

                            OLS Regression Results                            
==============================================================================
Dep. Variable:               children   R-squared:                       0.562
Model:                            OLS   Adj. R-squared:                  0.562
Method:                 Least Squares   F-statistic:                     1863.
Date:                Mon, 11 Dec 2023   Prob (F-statistic):               0.00
Time:                        18:36:52   Log-Likelihood:                -7863.3
No. Observations:                4358   AIC:                         1.573e+04
Df Residuals:                    4354   BIC:                         1.576e+04
Df Model:                           3                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept     -2.0711      0.095    -21.861      0.000      -2.257      -1.885
age            0.1770      0.003     64.855      0.000       0.172       0.182
educ          -0.0788      0.006    -12.462      0.000      -0.091      -0.066
electric      -0.3618      0.068     -5.317      0.000      -0.495      -0.228
==============================================================================
Omnibus:                      224.823   Durbin-Watson:                   1.890
Prob(Omnibus):                  0.000   Jarque-Bera (JB):              841.796
Skew:                          -0.065   Prob(JB):                    1.61e-183
Kurtosis:                       5.149   Cond. No.                         126.
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

Chapter 7. Multiple Regression Analysis with Qualitative Information

Contents

Chapter 7. Multiple Regression Analysis with Qualitative Information#

Example 7.1. Hourly wage equation#

Example 7.2. Effect of computer ownership on collage GPA#

Example 7.3. Effect of Training Grants on hours of training#

Example 7.4. Housing price regression#

Example7.5. Hourly wage equation#

Example 7.6. Hourly wage equation#

Example 7.7. Effects of physical attractiveness on wage#

Example 7.8. Effects of law school rankings on starting salaries#

Example 7.9. Effects of computer usage on wages#

Data not available#

Example 7.10. Log hourly wage equation#

Example 7.11. Effects of race on baseball player salaries#

Equation [7.22] (Page 222)#

Equation [7.25] (Page 224)#

Equation [7.29] (Page 225)#

Example7.12. A linear probability model of arrest#

Equation [7.32] (Page 228)#

Equation [7.33] (Page 229)#

Equation [7.35] (Page 231)#

Equation [7.37] (Page 232)#