Chapter 6. Multiple Regression Analysis: Further Analysis

Contents

Chapter 6. Multiple Regression Analysis: Further Analysis#

Home | Stata | R

import numpy as np
import pandas as pd
import scipy.stats as ss

import statsmodels.formula.api as smf
from statsmodels.iolib.summary2 import summary_col

from wooldridge import *

Table 6.1 Determinants of College GPA#

df = dataWoo('bwght')

bwght_ols1 = smf.ols(formula='bwght  ~ cigs  + faminc + 1', data=df).fit()
bwght_ols2 = smf.ols(formula='bwghtlbs  ~ cigs  + faminc + 1', data=df).fit()
bwght_ols3 = smf.ols(formula='bwght  ~ packs  + faminc + 1', data=df).fit()

print(summary_col([bwght_ols1, bwght_ols2, bwght_ols3],stars=True,float_format='%0.3f',
                  model_names=['bwght_ols1','bwght_ols2', 'bwght_ols3'],
                 info_dict={'N':lambda x: "{0:d}".format(int(x.nobs)),
                             'R2':lambda x: "{:.3f}".format(x.rsquared)}))

===============================================
               bwght_ols1 bwght_ols2 bwght_ols3
-----------------------------------------------
Intercept      116.974*** 7.311***   116.974***
               (1.049)    (0.066)    (1.049)   
R-squared      0.030      0.030      0.030     
R-squared Adj. 0.028      0.028      0.028     
cigs           -0.463***  -0.029***            
               (0.092)    (0.006)              
faminc         0.093***   0.006***   0.093***  
               (0.029)    (0.002)    (0.029)   
packs                                -9.268*** 
                                     (1.832)   
N              1388       1388       1388      
R2             0.030      0.030      0.030     
===============================================
Standard errors in parentheses.
* p<.1, ** p<.05, ***p<.01

Example6.1. Effects of pollution on housing prices#

df = dataWoo('hprice2')
df1 = df[['price', 'nox', 'crime', 'rooms', 'dist', 'stratio']]

zprice = pd.DataFrame({"zprice":ss.zscore(df1.loc[:,"price"])})
znox = pd.DataFrame({"znox":ss.zscore(df1.loc[:,"nox"])})
zcrime = pd.DataFrame({"zcrime":ss.zscore(df1.loc[:,"crime"])})
zrooms = pd.DataFrame({"zrooms":ss.zscore(df1.loc[:,"rooms"])})
zdist = pd.DataFrame({"zdist":ss.zscore(df1.loc[:,"dist"])})
zstratio = pd.DataFrame({"zstratio":ss.zscore(df1.loc[:,"stratio"])})

df2 = pd.concat([zprice,znox,zcrime,zrooms,zdist,zstratio],axis=1) 

hprice_std = smf.ols(formula='zprice ~ znox + zcrime + zrooms + zdist + zstratio + 1', data=df2).fit()
print(hprice_std.summary())

                            OLS Regression Results                            
==============================================================================
Dep. Variable:                 zprice   R-squared:                       0.636
Model:                            OLS   Adj. R-squared:                  0.632
Method:                 Least Squares   F-statistic:                     174.5
Date:                Mon, 11 Dec 2023   Prob (F-statistic):          3.61e-107
Time:                        18:36:44   Log-Likelihood:                -462.53
No. Observations:                 506   AIC:                             937.1
Df Residuals:                     500   BIC:                             962.4
Df Model:                           5                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept   8.674e-17      0.027   3.21e-15      1.000      -0.053       0.053
znox          -0.3404      0.045     -7.643      0.000      -0.428      -0.253
zcrime        -0.1433      0.031     -4.665      0.000      -0.204      -0.083
zrooms         0.5139      0.030     17.112      0.000       0.455       0.573
zdist         -0.2348      0.043     -5.459      0.000      -0.319      -0.150
zstratio      -0.2703      0.030     -9.018      0.000      -0.329      -0.211
==============================================================================
Omnibus:                      272.145   Durbin-Watson:                   0.865
Prob(Omnibus):                  0.000   Jarque-Bera (JB):             2647.578
Skew:                           2.150   Prob(JB):                         0.00
Kurtosis:                      13.348   Cond. No.                         3.33
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

hprice_std = smf.ols(formula='zprice ~ znox + zcrime + zrooms + zdist + zstratio + 1', data=df2).fit()
print(hprice_std.summary())

                            OLS Regression Results                            
==============================================================================
Dep. Variable:                 zprice   R-squared:                       0.636
Model:                            OLS   Adj. R-squared:                  0.632
Method:                 Least Squares   F-statistic:                     174.5
Date:                Mon, 11 Dec 2023   Prob (F-statistic):          3.61e-107
Time:                        18:36:44   Log-Likelihood:                -462.53
No. Observations:                 506   AIC:                             937.1
Df Residuals:                     500   BIC:                             962.4
Df Model:                           5                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept   8.674e-17      0.027   3.21e-15      1.000      -0.053       0.053
znox          -0.3404      0.045     -7.643      0.000      -0.428      -0.253
zcrime        -0.1433      0.031     -4.665      0.000      -0.204      -0.083
zrooms         0.5139      0.030     17.112      0.000       0.455       0.573
zdist         -0.2348      0.043     -5.459      0.000      -0.319      -0.150
zstratio      -0.2703      0.030     -9.018      0.000      -0.329      -0.211
==============================================================================
Omnibus:                      272.145   Durbin-Watson:                   0.865
Prob(Omnibus):                  0.000   Jarque-Bera (JB):             2647.578
Skew:                           2.150   Prob(JB):                         0.00
Kurtosis:                      13.348   Cond. No.                         3.33
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

Compare the result in Example 4.5.#

import math
df['ldist'] = np.log(df['dist'])
hprice_log = smf.ols(formula='lprice ~ lnox + ldist + rooms + stratio + 1', data=df).fit()
print(hprice_log.summary())

                            OLS Regression Results                            
==============================================================================
Dep. Variable:                 lprice   R-squared:                       0.584
Model:                            OLS   Adj. R-squared:                  0.581
Method:                 Least Squares   F-statistic:                     175.9
Date:                Mon, 11 Dec 2023   Prob (F-statistic):           5.53e-94
Time:                        18:36:44   Log-Likelihood:                -43.495
No. Observations:                 506   AIC:                             96.99
Df Residuals:                     501   BIC:                             118.1
Df Model:                           4                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept     11.0839      0.318     34.843      0.000      10.459      11.709
lnox          -0.9535      0.117     -8.168      0.000      -1.183      -0.724
ldist         -0.1343      0.043     -3.117      0.002      -0.219      -0.050
rooms          0.2545      0.019     13.736      0.000       0.218       0.291
stratio       -0.0525      0.006     -8.894      0.000      -0.064      -0.041
==============================================================================
Omnibus:                       61.317   Durbin-Watson:                   0.682
Prob(Omnibus):                  0.000   Jarque-Bera (JB):              480.143
Skew:                           0.051   Prob(JB):                    5.47e-105
Kurtosis:                       7.771   Cond. No.                         560.
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

Equation (6.7)#

hprice_eq6_7 = smf.ols(formula='lprice ~ lnox + rooms + 1', data=df).fit()
print(hprice_eq6_7.summary())

                            OLS Regression Results                            
==============================================================================
Dep. Variable:                 lprice   R-squared:                       0.514
Model:                            OLS   Adj. R-squared:                  0.512
Method:                 Least Squares   F-statistic:                     265.7
Date:                Mon, 11 Dec 2023   Prob (F-statistic):           1.79e-79
Time:                        18:36:44   Log-Likelihood:                -83.009
No. Observations:                 506   AIC:                             172.0
Df Residuals:                     503   BIC:                             184.7
Df Model:                           2                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept      9.2337      0.188     49.184      0.000       8.865       9.603
lnox          -0.7177      0.066    -10.818      0.000      -0.848      -0.587
rooms          0.3059      0.019     16.086      0.000       0.269       0.343
==============================================================================
Omnibus:                       52.327   Durbin-Watson:                   0.603
Prob(Omnibus):                  0.000   Jarque-Bera (JB):              327.013
Skew:                           0.042   Prob(JB):                     9.77e-72
Kurtosis:                       6.937   Cond. No.                         102.
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

Equation (6.12)#

df = dataWoo('wage1')
wage_exp = smf.ols(formula='wage ~ exper + expersq + 1', data=df).fit()
print(wage_exp.summary())

                            OLS Regression Results                            
==============================================================================
Dep. Variable:                   wage   R-squared:                       0.093
Model:                            OLS   Adj. R-squared:                  0.089
Method:                 Least Squares   F-statistic:                     26.74
Date:                Mon, 11 Dec 2023   Prob (F-statistic):           8.77e-12
Time:                        18:36:44   Log-Likelihood:                -1407.5
No. Observations:                 526   AIC:                             2821.
Df Residuals:                     523   BIC:                             2834.
Df Model:                           2                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept      3.7254      0.346     10.769      0.000       3.046       4.405
exper          0.2981      0.041      7.277      0.000       0.218       0.379
expersq       -0.0061      0.001     -6.792      0.000      -0.008      -0.004
==============================================================================
Omnibus:                      203.746   Durbin-Watson:                   1.802
Prob(Omnibus):                  0.000   Jarque-Bera (JB):              721.819
Skew:                           1.806   Prob(JB):                    1.82e-157
Kurtosis:                       7.460   Cond. No.                     1.76e+03
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 1.76e+03. This might indicate that there are
strong multicollinearity or other numerical problems.

Example6.2. Effects of pollution on housing prices#

df = dataWoo('hprice2')
df['ldist'] = np.log(df['dist'])
df['roomsq'] = np.square(df['rooms'])
hprice_roomsq = smf.ols(formula='lprice ~ lnox + ldist + rooms + roomsq + stratio + 1', data=df).fit()
print(hprice_roomsq.summary())

                            OLS Regression Results                            
==============================================================================
Dep. Variable:                 lprice   R-squared:                       0.603
Model:                            OLS   Adj. R-squared:                  0.599
Method:                 Least Squares   F-statistic:                     151.8
Date:                Mon, 11 Dec 2023   Prob (F-statistic):           7.89e-98
Time:                        18:36:44   Log-Likelihood:                -31.806
No. Observations:                 506   AIC:                             75.61
Df Residuals:                     500   BIC:                             101.0
Df Model:                           5                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept     13.3855      0.566     23.630      0.000      12.273      14.498
lnox          -0.9017      0.115     -7.862      0.000      -1.127      -0.676
ldist         -0.0868      0.043     -2.005      0.045      -0.172      -0.002
rooms         -0.5451      0.165     -3.295      0.001      -0.870      -0.220
roomsq         0.0623      0.013      4.862      0.000       0.037       0.087
stratio       -0.0476      0.006     -8.129      0.000      -0.059      -0.036
==============================================================================
Omnibus:                       56.649   Durbin-Watson:                   0.691
Prob(Omnibus):                  0.000   Jarque-Bera (JB):              384.168
Skew:                          -0.100   Prob(JB):                     3.79e-84
Kurtosis:                       7.264   Cond. No.                     2.30e+03
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 2.3e+03. This might indicate that there are
strong multicollinearity or other numerical problems.

Example6.3. Effects of attendance on final exam performance#

df = dataWoo('attend')
df['priGPAsq'] = np.square(df['priGPA'])
df['ACTsq'] = np.square(df['ACT'])
df['priGPA_atndrte'] = df['priGPA']*df['atndrte']
attned_perf = smf.ols(formula='stndfnl ~ atndrte + priGPA + ACT + priGPAsq + ACTsq + priGPA_atndrte + 1', 
                      data=df).fit()
print(attned_perf.summary())

                            OLS Regression Results                            
==============================================================================
Dep. Variable:                stndfnl   R-squared:                       0.229
Model:                            OLS   Adj. R-squared:                  0.222
Method:                 Least Squares   F-statistic:                     33.25
Date:                Mon, 11 Dec 2023   Prob (F-statistic):           3.49e-35
Time:                        18:36:44   Log-Likelihood:                -868.90
No. Observations:                 680   AIC:                             1752.
Df Residuals:                     673   BIC:                             1783.
Df Model:                           6                                         
Covariance Type:            nonrobust                                         
==================================================================================
                     coef    std err          t      P>|t|      [0.025      0.975]
----------------------------------------------------------------------------------
Intercept          2.0503      1.360      1.507      0.132      -0.621       4.721
atndrte           -0.0067      0.010     -0.656      0.512      -0.027       0.013
priGPA            -1.6285      0.481     -3.386      0.001      -2.573      -0.684
ACT               -0.1280      0.098     -1.300      0.194      -0.321       0.065
priGPAsq           0.2959      0.101      2.928      0.004       0.097       0.494
ACTsq              0.0045      0.002      2.083      0.038       0.000       0.009
priGPA_atndrte     0.0056      0.004      1.294      0.196      -0.003       0.014
==============================================================================
Omnibus:                        2.581   Durbin-Watson:                   2.279
Prob(Omnibus):                  0.275   Jarque-Bera (JB):                2.474
Skew:                          -0.095   Prob(JB):                        0.290
Kurtosis:                       3.226   Cond. No.                     2.43e+04
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 2.43e+04. This might indicate that there are
strong multicollinearity or other numerical problems.

Example 6.4. CEO compensation and frim perfromance#

df = dataWoo('ceosal1')

salary_lin = smf.ols(formula='salary ~ sales + roe + 1', data=df).fit()
salary_log = smf.ols(formula='lsalary ~ lsales + roe + 1', data=df).fit()

print(summary_col([salary_lin, salary_log],stars=True,float_format='%0.3f',
                  model_names=['salary_lin','salary_log'],
                 info_dict={'N':lambda x: "{0:d}".format(int(x.nobs)),
                             'R2':lambda x: "{:.3f}".format(x.rsquared)}))

====================================
               salary_lin salary_log
------------------------------------
Intercept      830.631*** 4.362***  
               (223.905)  (0.294)   
R-squared      0.029      0.282     
R-squared Adj. 0.020      0.275     
lsales                    0.275***  
                          (0.033)   
roe            19.631*    0.018***  
               (11.077)   (0.004)   
sales          0.016*               
               (0.009)              
N              209        209       
R2             0.029      0.282     
====================================
Standard errors in parentheses.
* p<.1, ** p<.05, ***p<.01

Example 6.5. Confidence interval for predicted college GPA#

df = dataWoo('gpa2')
df['hsizesq'] = np.square(df['hsize'])

gpa_lin = smf.ols(formula='colgpa ~ sat + hsperc + hsize + hsizesq + 1', data=df).fit()
print(gpa_lin.summary())

                            OLS Regression Results                            
==============================================================================
Dep. Variable:                 colgpa   R-squared:                       0.278
Model:                            OLS   Adj. R-squared:                  0.277
Method:                 Least Squares   F-statistic:                     398.0
Date:                Mon, 11 Dec 2023   Prob (F-statistic):          2.13e-290
Time:                        18:36:44   Log-Likelihood:                -3467.9
No. Observations:                4137   AIC:                             6946.
Df Residuals:                    4132   BIC:                             6978.
Df Model:                           4                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept      1.4927      0.075     19.812      0.000       1.345       1.640
sat            0.0015   6.52e-05     22.886      0.000       0.001       0.002
hsperc        -0.0139      0.001    -24.698      0.000      -0.015      -0.013
hsize         -0.0609      0.017     -3.690      0.000      -0.093      -0.029
hsizesq        0.0055      0.002      2.406      0.016       0.001       0.010
==============================================================================
Omnibus:                      194.990   Durbin-Watson:                   1.879
Prob(Omnibus):                  0.000   Jarque-Bera (JB):              239.365
Skew:                          -0.497   Prob(JB):                     1.05e-52
Kurtosis:                       3.633   Cond. No.                     9.02e+03
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 9.02e+03. This might indicate that there are
strong multicollinearity or other numerical problems.

df['sat0'] = df['sat']-1200
df['hsize0'] = df['hsize']-5
df['hsperc0'] = df['hsperc']-30
df['hsize0sq'] = np.square(df['hsize0'])

gpa_predict = smf.ols(formula='colgpa ~ sat0 + hsperc0 + hsize0 + hsize0sq + 1', data=df).fit()
print(gpa_predict.summary())

                            OLS Regression Results                            
==============================================================================
Dep. Variable:                 colgpa   R-squared:                       0.278
Model:                            OLS   Adj. R-squared:                  0.277
Method:                 Least Squares   F-statistic:                     398.0
Date:                Mon, 11 Dec 2023   Prob (F-statistic):          2.13e-290
Time:                        18:36:44   Log-Likelihood:                -3467.9
No. Observations:                4137   AIC:                             6946.
Df Residuals:                    4132   BIC:                             6978.
Df Model:                           4                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept      2.7001      0.020    135.833      0.000       2.661       2.739
sat0           0.0015   6.52e-05     22.886      0.000       0.001       0.002
hsperc0       -0.0139      0.001    -24.698      0.000      -0.015      -0.013
hsize0        -0.0063      0.009     -0.730      0.465      -0.023       0.011
hsize0sq       0.0055      0.002      2.406      0.016       0.001       0.010
==============================================================================
Omnibus:                      194.990   Durbin-Watson:                   1.879
Prob(Omnibus):                  0.000   Jarque-Bera (JB):              239.365
Skew:                          -0.497   Prob(JB):                     1.05e-52
Kurtosis:                       3.633   Cond. No.                         503.
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

print(summary_col([gpa_lin, gpa_predict],stars=True,float_format='%0.3f',
                  model_names=['salary_lin','gpa_predict'],
                 info_dict={'N':lambda x: "{0:d}".format(int(x.nobs)),
                             'R2':lambda x: "{:.3f}".format(x.rsquared)}))

=====================================
               salary_lin gpa_predict
-------------------------------------
Intercept      1.493***   2.700***   
               (0.075)    (0.020)    
R-squared      0.278      0.278      
R-squared Adj. 0.277      0.277      
hsize          -0.061***             
               (0.017)               
hsize0                    -0.006     
                          (0.009)    
hsize0sq                  0.005**    
                          (0.002)    
hsizesq        0.005**               
               (0.002)               
hsperc         -0.014***             
               (0.001)               
hsperc0                   -0.014***  
                          (0.001)    
sat            0.001***              
               (0.000)               
sat0                      0.001***   
                          (0.000)    
N              4137       4137       
R2             0.278      0.278      
=====================================
Standard errors in parentheses.
* p<.1, ** p<.05, ***p<.01

Example 6.6. Confidence Interval for Future Collage GPA#

gpa_lin = smf.ols(formula='colgpa ~ sat + hsperc + hsize + hsizesq + 1', data=df).fit()
print(gpa_lin.summary())

                            OLS Regression Results                            
==============================================================================
Dep. Variable:                 colgpa   R-squared:                       0.278
Model:                            OLS   Adj. R-squared:                  0.277
Method:                 Least Squares   F-statistic:                     398.0
Date:                Mon, 11 Dec 2023   Prob (F-statistic):          2.13e-290
Time:                        18:36:45   Log-Likelihood:                -3467.9
No. Observations:                4137   AIC:                             6946.
Df Residuals:                    4132   BIC:                             6978.
Df Model:                           4                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept      1.4927      0.075     19.812      0.000       1.345       1.640
sat            0.0015   6.52e-05     22.886      0.000       0.001       0.002
hsperc        -0.0139      0.001    -24.698      0.000      -0.015      -0.013
hsize         -0.0609      0.017     -3.690      0.000      -0.093      -0.029
hsizesq        0.0055      0.002      2.406      0.016       0.001       0.010
==============================================================================
Omnibus:                      194.990   Durbin-Watson:                   1.879
Prob(Omnibus):                  0.000   Jarque-Bera (JB):              239.365
Skew:                          -0.497   Prob(JB):                     1.05e-52
Kurtosis:                       3.633   Cond. No.                     9.02e+03
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 9.02e+03. This might indicate that there are
strong multicollinearity or other numerical problems.

predicted_value= 1200*.0015 + 30 * -(.0139) + 5*-(.0609) + 5*5*.0055 + 1.4927
predicted_value

2.7087

Example 6.7. Predicting CEO log(salary)#

df = dataWoo('ceosal2')
ceo_step1 = smf.ols(formula='lsalary ~ lsales + lmktval + ceoten + 1', data=df).fit()
print(ceo_step1.summary())

                            OLS Regression Results                            
==============================================================================
Dep. Variable:                lsalary   R-squared:                       0.318
Model:                            OLS   Adj. R-squared:                  0.306
Method:                 Least Squares   F-statistic:                     26.91
Date:                Mon, 11 Dec 2023   Prob (F-statistic):           2.47e-14
Time:                        18:36:45   Log-Likelihood:                -128.12
No. Observations:                 177   AIC:                             264.2
Df Residuals:                     173   BIC:                             276.9
Df Model:                           3                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept      4.5038      0.257     17.509      0.000       3.996       5.012
lsales         0.1629      0.039      4.150      0.000       0.085       0.240
lmktval        0.1092      0.050      2.203      0.029       0.011       0.207
ceoten         0.0117      0.005      2.198      0.029       0.001       0.022
==============================================================================
Omnibus:                       25.596   Durbin-Watson:                   2.044
Prob(Omnibus):                  0.000   Jarque-Bera (JB):              123.522
Skew:                          -0.291   Prob(JB):                     1.51e-27
Kurtosis:                       7.051   Cond. No.                         95.3
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

uhat = df.lsalary - ceo_step1.predict()
ehat=np.exp(uhat)
ehat.mean()

1.135661326663072

mhat=np.exp(ceo_step1.predict())
ceo_step2 = smf.ols(formula='salary ~ mhat + 0', data=df).fit()
ceo_step2.params #The coef. as in equation 46.44

mhat    1.116857
dtype: float64

ceo_step3= smf.ols(formula='lsalary ~ lsales + lmktval + ceoten + 1', data=df).fit()
ceo_step3.params

Intercept    4.503795
lsales       0.162854
lmktval      0.109243
ceoten       0.011705
dtype: float64

ceo_step3_pred = 4.5038 + .1629*np.log(5000) + .1092*np.log(10000) + .0117*10
ceo_step3_pred

7.014019939501504

ceo_step4 = smf.ols(formula='salary ~ mhat + 0', data=df).fit()
ceo_step4.params

mhat    1.116857
dtype: float64

ceo_step4_pred = 1.117*np.exp(7.013)
ceo_step4_pred

1240.9674054171805

Example 6.8. Predicting CEO salary#

corr = ss.pearsonr(df.salary, mhat)
print(corr) # Returns correlation coeficient and pvalue.

PearsonRResult(statistic=0.49303222976474653, pvalue=3.1363578178118826e-12)

ceo_sal= smf.ols(formula='salary ~ sales + mktval + ceoten + 1', data=df).fit()
print(ceo_sal.summary())

                            OLS Regression Results                            
==============================================================================
Dep. Variable:                 salary   R-squared:                       0.201
Model:                            OLS   Adj. R-squared:                  0.187
Method:                 Least Squares   F-statistic:                     14.53
Date:                Mon, 11 Dec 2023   Prob (F-statistic):           1.74e-08
Time:                        18:36:45   Log-Likelihood:                -1359.3
No. Observations:                 177   AIC:                             2727.
Df Residuals:                     173   BIC:                             2739.
Df Model:                           3                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept    613.4361     65.237      9.403      0.000     484.673     742.199
sales          0.0190      0.010      1.891      0.060      -0.001       0.039
mktval         0.0234      0.009      2.468      0.015       0.005       0.042
ceoten        12.7034      5.618      2.261      0.025       1.615      23.792
==============================================================================
Omnibus:                      187.324   Durbin-Watson:                   2.166
Prob(Omnibus):                  0.000   Jarque-Bera (JB):             6405.956
Skew:                           3.916   Prob(JB):                         0.00
Kurtosis:                      31.412   Cond. No.                     1.59e+04
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 1.59e+04. This might indicate that there are
strong multicollinearity or other numerical problems.