Chapter 02 - The Simple Regression Model#

import stata_setup
stata_setup.config("C:/Program Files/Stata18/", "se", splash=False)

Problem 2.1. Papke1995 (401k)#

a. Average prate & mrate

%%stata
use 401K.dta, clear 
d, short
mean pra mra
. use 401K.dta, clear 

. d, short

Contains data from 401K.dta
 Observations:         1,534                  
    Variables:             8                  9 Jun 1998 08:20
Sorted by: 

. mean pra mra
Mean estimation                          Number of obs = 1,534

--------------------------------------------------------------
             |       Mean   Std. err.     [95% conf. interval]
-------------+------------------------------------------------
       prate |
   87.36291   .4268091      86.52572     88.2001
       mrate |   .7315124   .0199033      .6924718     .770553
--------------------------------------------------------------
. 

b & c. Run-regres prate on mrate, interprate intercept & coef.

%%stata
reg prate mrate
      Source |       SS           df       MS      Number of obs   =     1,534
-------------+----------------------------------   F(1, 1532)      =    123.68
       Model |  32001.7271         1  32001.7271   Prob > F        =    0.0000
    Residual |  396383.812     1,532   258.73617   R-squared       =    0.0747
-------------+----------------------------------   Adj R-squared   =    0.0741
       Total |  428385.539     1,533  279.442622   Root MSE        =    16.085

------------------------------------------------------------------------------
       prate | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
       mrate |   5.861079   .5270107    11.12   0.000      4.82734    6.894818
       _cons |   83.07546   .5632844   147.48   0.000     81.97057    84.18035
------------------------------------------------------------------------------

d. predict at mrate=3.5

%%stata
display _b[_cons] + _b[mrate]*3.5 
103.58923

e. How much of the variation in prate is explained by mrate? Is it a lot?

%%stata
display " R-squared  = " e(r2)
 R-squared  = .0747031

Problem 2.2.#

a. Average salary & average tenure

%%stata
use ceosal2.dta , clear
d, short
mean lsalary ceoten comten  
. use ceosal2.dta , clear

. d, short

Contains data from ceosal2.dta
 Observations:           177                  
    Variables:            15                  17 Aug 1999 23:14
Sorted by: 

. mean lsalary ceoten comten  
Mean estimation                            Number of obs = 177

--------------------------------------------------------------
             |       Mean   Std. err.     [95% conf. interval]
-------------+------------------------------------------------
     lsalary |   6.582848   .0455542      6.492945     6.67275
      ceoten |   7.954802    .537489      6.894049    9.015555
      comten |   22.50282   .9241289      20.67902    24.32662
--------------------------------------------------------------

. 

b. CEO at their first year (ceoten=0)

%%stata
count if ceoten==0  
sum ceoten 
display r(max)
. count if ceoten==0  
  5

. sum ceoten 

    Variable |        Obs        Mean    Std. dev.       Min        Max
-------------+---------------------------------------------------------
      ceoten |        177    7.954802    7.150826          0         37

. display r(max)
37

. 

c. ols lsalary on ceoten, …

%%stata
reg lsalary ceoten
      Source |       SS           df       MS      Number of obs   =       177
-------------+----------------------------------   F(1, 175)       =      2.33
       Model |  .850907024         1  .850907024   Prob > F        =    0.1284
    Residual |   63.795306       175  .364544606   R-squared       =    0.0132
-------------+----------------------------------   Adj R-squared   =    0.0075
       Total |  64.6462131       176  .367308029   Root MSE        =    .60378

------------------------------------------------------------------------------
     lsalary | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
      ceoten |   .0097236   .0063645     1.53   0.128    -.0028374    .0222846
       _cons |   6.505498   .0679911    95.68   0.000      6.37131    6.639686
------------------------------------------------------------------------------

Problem 2.3. sleep75.dta (Biddle&Hamermesh1990)#

a. ols sleep on totwrk & report in equation form. Interprate intercept.

%%stata
use sleep75.dta , clear
d sleep totwrk, short
reg sleep totwrk
. use sleep75.dta , clear

. d sleep totwrk, short

Variable      Storage   Display    Value
    name         type    format    label      Variable label
-------------------------------------------------------------------------------
sleep           int     %9.0g                 mins sleep at night, per wk
totwrk          int     %9.0g                 mins worked per week

. reg sleep totwrk

      Source |       SS           df       MS      Number of obs   =       706
-------------+----------------------------------   F(1, 704)       =     81.09
       Model |  14381717.2         1  14381717.2   Prob > F        =    0.0000
    Residual |   124858119       704  177355.282   R-squared       =    0.1033
-------------+----------------------------------   Adj R-squared   =    0.1020
       Total |   139239836       705  197503.313   Root MSE        =    421.14

------------------------------------------------------------------------------
       sleep | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
      totwrk |  -.1507458   .0167403    -9.00   0.000    -.1836126    -.117879
       _cons |   3586.377   38.91243    92.17   0.000     3509.979    3662.775
------------------------------------------------------------------------------
. 

b. If totwrk increases by 2 hours, by how much is sleep estimated to fall?

%%stata
display _b[totwrk]*2*60
-18.089499

Problem 2.4. Wage2: ols salary on iq#

a. average Salary, average IQ and sample sd of IQ

%%stata
use wage2.dta, clear
sum wage IQ
. use wage2.dta, clear

. sum wage IQ

    Variable |        Obs        Mean    Std. dev.       Min        Max
-------------+---------------------------------------------------------
        wage |        935    957.9455    404.3608        115       3078
          IQ |        935    101.2824    15.05264         50        145

. 

b. efect of 15 point increase in IQ on Wage (constant dollar)

%%stata
reg wage IQ
display "wage= " %5.3f _b[_cons] "+" %5.3f _b[IQ] "IQ; N=" _N ",Rsq=" %5.4f e(r2) 
display _b[IQ]*15
. reg wage IQ

      Source |       SS           df       MS      Number of obs   =       935
-------------+----------------------------------   F(1, 933)       =     98.55
       Model |  14589782.6         1  14589782.6   Prob > F        =    0.0000
    Residual |   138126386       933  148045.429   R-squared       =    0.0955
-------------+----------------------------------   Adj R-squared   =    0.0946
       Total |   152716168       934  163507.675   Root MSE        =    384.77

------------------------------------------------------------------------------
        wage | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
          IQ |   8.303064   .8363951     9.93   0.000     6.661631    9.944498
       _cons |   116.9916   85.64153     1.37   0.172    -51.08078    285.0639
------------------------------------------------------------------------------

. display "wage= " %5.3f _b[_cons] "+" %5.3f _b[IQ] "IQ; N=" _N ",Rsq=" %5.4f e
> (r2) 
wage= 116.992+8.303IQ; N=935,Rsq=0.0955

. display _b[IQ]*15
124.54596

. 

c. efect of 15 point increase in IQ on Wage (percentage)

%%stata
reg lwage IQ
display "lwage= " %5.3f _b[_cons] "+" %5.3f _b[IQ] "IQ; N=" _N ",Rsq=" %5.4f e(r2) 
display "0" _b[IQ]*15
. reg lwage IQ

      Source |       SS           df       MS      Number of obs   =       935
-------------+----------------------------------   F(1, 933)       =    102.62
       Model |  16.4150939         1  16.4150939   Prob > F        =    0.0000
    Residual |  149.241189       933  .159958402   R-squared       =    0.0991
-------------+----------------------------------   Adj R-squared   =    0.0981
       Total |  165.656283       934  .177362188   Root MSE        =    .39995

------------------------------------------------------------------------------
       lwage | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
          IQ |   .0088072   .0008694    10.13   0.000      .007101    .0105134
       _cons |   5.886994   .0890206    66.13   0.000     5.712291    6.061698
------------------------------------------------------------------------------

. display "lwage= " %5.3f _b[_cons] "+" %5.3f _b[IQ] "IQ; N=" _N ",Rsq=" %5.4f 
> e(r2) 
lwage= 5.887+0.009IQ; N=935,Rsq=0.0991

. display "0" _b[IQ]*15
0.13210734

. 

Problem 2.5 rdchem: r&d on sales#

a. Model for elasticity?

\(log(rd)=\beta_0 +\beta_1 log(sales) ; \beta_1\) is parameter elasticity

b. Estimate b1?

%%stata
use rdchem.dta , clear
reg lrd lsale  
. use rdchem.dta , clear

. reg lrd lsale  

      Source |       SS           df       MS      Number of obs   =        32
-------------+----------------------------------   F(1, 30)        =    302.72
       Model |  84.8395785         1  84.8395785   Prob > F        =    0.0000
    Residual |  8.40768588        30  .280256196   R-squared       =    0.9098
-------------+----------------------------------   Adj R-squared   =    0.9068
       Total |  93.2472644        31  3.00797627   Root MSE        =    .52939
------------------------------------------------------------------------------
         lrd | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
      lsales |   1.075731   .0618275    17.40   0.000     .9494619    1.201999
       _cons |  -4.104722   .4527678    -9.07   0.000    -5.029398   -3.180047
------------------------------------------------------------------------------

. 

Problem 2.6 meap93: math pass rate (math4) & spending per student (expend)#

a. Diminishing effect
b. $\(math_{10} = \beta_0 + \beta_1 ln(expend) + u \\ \frac{dy}{dlnx}\cdot\frac{dlnx}{dx} = c\% \iff \frac{dy}{dlnx}\cdot\frac{1}{x}; \\ \frac{dy}{dlnx}=\beta_1=\gamma x \implies x=\frac{\beta_1 }{\gamma}\)\( c. ols \)math_{10}\( on \)lexpend$,

%%stata
use meap93.dta, clear
reg math10 lexpend
display "math10= " %5.3f _b[_cons] "+" %5.3f _b[lexpend] "log(expend); N=" _N ",Rsq=" %5.4f e(r2) 
. use meap93.dta, clear

. reg math10 lexpend

      Source |       SS           df       MS      Number of obs   =       408
-------------+----------------------------------   F(1, 406)       =     12.41
       Model |  1329.42517         1  1329.42517   Prob > F        =    0.0005
    Residual |  43487.7553       406  107.112698   R-squared       =    0.0297
-------------+----------------------------------   Adj R-squared   =    0.0273
       Total |  44817.1805       407  110.115923   Root MSE        =     10.35
------------------------------------------------------------------------------
      math10 | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
     lexpend |   11.16439   3.169011     3.52   0.000     4.934677    17.39411
       _cons |   -69.3411   26.53013    -2.61   0.009    -121.4947   -17.18753
------------------------------------------------------------------------------

. display "math10= " %5.3f _b[_cons] "+" %5.3f _b[lexpend] "log(expend); N=" _N
>  ",Rsq=" %5.4f e(r2) 
math10= -69.341+11.164log(expend); N=408,Rsq=0.0297

. 

d. How big is the effect? If spending increases by 10%?

%%stata
display  _b[lexpend]/10 "%"
1.1164395%
e. Why is "math10>100" not much of a worry in this data set?

Problem 2.7 charity: gifts and mailings; imported from R (wooldridge package)#

a. & b.

%%stata
use charity.dta, clear
sum gift mails
count if gift==0
display 100*r(N)/4268 "%"
. use charity.dta, clear
(Written by R.              )

. sum gift mails

    Variable |        Obs        Mean    Std. dev.       Min        Max
-------------+---------------------------------------------------------
        gift |      4,268     7.44447    15.06256          0        250
   mailsyear |      4,268    2.049555      .66758        .25        3.5

. count if gift==0
  2,561

. display 100*r(N)/4268 "%"
60.004686%

. 

c. Regeress gift on mails per year,

%%stata
reg gift mails
display "gift= " %5.3f _b[_cons] "+" %5.3f _b[mails] "mails; N=" _N ",Rsq=" %5.4f e(r2) 
. reg gift mails

      Source |       SS           df       MS      Number of obs   =     4,268
-------------+----------------------------------   F(1, 4266)      =     59.65
       Model |  13349.7251         1  13349.7251   Prob > F        =    0.0000
    Residual |  954750.114     4,266  223.804528   R-squared       =    0.0138
-------------+----------------------------------   Adj R-squared   =    0.0136
       Total |   968099.84     4,267  226.880675   Root MSE        =     14.96

------------------------------------------------------------------------------
        gift | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
   mailsyear |   2.649546   .3430598     7.72   0.000     1.976971    3.322122
       _cons |    2.01408   .7394696     2.72   0.006     .5643347    3.463825
------------------------------------------------------------------------------

. display "gift= " %5.3f _b[_cons] "+" %5.3f _b[mails] "mails; N=" _N ",Rsq=" %
> 5.4f e(r2) 
gift= 2.014+2.650mails; N=4268,Rsq=0.0138

. 

d. Does the charity make profit if per unit cost of mailing is one guilder?

%%stata
display  _b[mails] - 1
1.6495464

e. The smallest predicted gift (i.e., mail=0)

%%stata
margins, at(mail=0)
Adjusted predictions                                     Number of obs = 4,268
Model VCE: OLS

Expression: Linear prediction, predict()
At: mailsyear = 0

------------------------------------------------------------------------------
             |            Delta-method
             |     Margin   std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
       _cons |    2.01408   .7394696     2.72   0.006     .5643347    3.463825
------------------------------------------------------------------------------

Problem 2.8#

a.

%%stata
clear
set obs 500
g x_ = uniform()
g x = x_ *10
sum x
. clear

. set obs 500
Number of observations (_N) was 0, now 500.

. g x_ = uniform()

. g x = x_ *10

. sum x

    Variable |        Obs        Mean    Std. dev.       Min        Max
-------------+---------------------------------------------------------
           x |        500    5.077128    2.924345    .013631   9.997507

. 

b.

%%stata
g u_ = runiform() 
g u = u_ *6 
sum u
. g u_ = runiform() 

. g u = u_ *6 

. sum u

    Variable |        Obs        Mean    Std. dev.       Min        Max
-------------+---------------------------------------------------------
           u |        500    2.893636    1.738422   .0011084   5.987668

. 

c.

%%stata
g y = 1 + 2*x + u 
reg y x
. g y = 1 + 2*x + u 

. reg y x

      Source |       SS           df       MS      Number of obs   =       500
-------------+----------------------------------   F(1, 498)       =   5664.28
       Model |  17151.3257         1  17151.3257   Prob > F        =    0.0000
    Residual |  1507.93458       498  3.02798109   R-squared       =    0.9192
-------------+----------------------------------   Adj R-squared   =    0.9190
       Total |  18659.2603       499  37.3933072   Root MSE        =    1.7401

------------------------------------------------------------------------------
           y | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
           x |   2.004795   .0266378    75.26   0.000     1.952459    2.057131
       _cons |   3.869292   .1560343    24.80   0.000     3.562726    4.175859
------------------------------------------------------------------------------

. 

d.

%%stata
qui reg y x
predict uh, residual
g xuh=x*uh 
//verify if E(uh)=E(x'uh)=0 ; compare results with E(u)=E(x'u)=0. Discuss. 
sum xuh uh u
. qui reg y x

. predict uh, residual

. g xuh=x*uh 

. //verify if E(uh)=E(x'uh)=0 ; compare results with E(u)=E(x'u)=0. Discuss. 
. sum xuh uh u

    Variable |        Obs        Mean    Std. dev.       Min        Max
-------------+---------------------------------------------------------
         xuh |        500    2.68e-09    10.04505  -28.13847   26.71127
          uh |        500   -9.37e-10    1.738365  -2.895379   3.101357
           u |        500    2.893636    1.738422   .0011084   5.987668

. 

e.

%%stata
g xu = x * u
sum xu xuh uh u 
. g xu = x * u

. sum xu xuh uh u 

    Variable |        Obs        Mean    Std. dev.       Min        Max
-------------+---------------------------------------------------------
          xu |        500    14.73229    13.00821   .0034057   53.14323
         xuh |        500    2.68e-09    10.04505  -28.13847   26.71127
          uh |        500   -9.37e-10    1.738365  -2.895379   3.101357
           u |        500    2.893636    1.738422   .0011084   5.987668

. 

f. Rerun 2 or 3 times and compare results and conclude!

Problem 2.9 CountyMurders only 1996#

a. how many counties had zero murders in 1996?

%%stata
use countymurders.dta, clear
keep if year==1996
count if murder==0 //counties with zero murder
count if execs>0 //counties with at least one execution
sum execs if murder>0
display r(max)
. use countymurders.dta, clear
(Written by R.              )
. keep if year==1996
(35,152 observations deleted)

. count if murder==0 //counties with zero murder
  1,051

. count if execs>0 //counties with at least one execution
  31

. sum execs if murder>0

    Variable |        Obs        Mean    Std. dev.       Min        Max
-------------+---------------------------------------------------------
       execs |      1,146    .0296684    .1937704          0          3

. display r(max)
3

. 

b. ols murder = f (execs); report results the usual way with N & R^2 included

%%stata
reg murders execs
display "murders= " %5.2f _b[_cons] "+" %5.2f _b[execs] "execs; N= " _N ",Rsq=" %5.4f e(r2)
. reg murders execs

      Source |       SS           df       MS      Number of obs   =     2,197
-------------+----------------------------------   F(1, 2195)      =    100.77
       Model |  152381.693         1  152381.693   Prob > F        =    0.0000
    Residual |  3319359.01     2,195  1512.23645   R-squared       =    0.0439
-------------+----------------------------------   Adj R-squared   =    0.0435
       Total |   3471740.7     2,196  1580.93839   Root MSE        =    38.887
------------------------------------------------------------------------------
     murders | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
       execs |   58.55548   5.833255    10.04   0.000      47.1162    69.99476
       _cons |   5.457241    .834838     6.54   0.000     3.820086    7.094396
------------------------------------------------------------------------------

. display "murders= " %5.2f _b[_cons] "+" %5.2f _b[execs] "execs; N= " _N ",Rsq
> =" %5.4f e(r2)
murders=  5.46+58.56execs; N= 2197,Rsq=0.0439

. 

c. Interprate the slope coef. d. The smallest murder that can be predicted using this model is when execution is zero.

%%stata
display _b[_cons] + _b[execs]*0 
predict u, residual
sum u if murder==0 & execs==0 
. display _b[_cons] + _b[execs]*0 
5.4572409

. predict u, residual

. sum u if murder==0 & execs==0 

    Variable |        Obs        Mean    Std. dev.       Min        Max
-------------+---------------------------------------------------------
           u |      1,050   -5.457241           0  -5.457241  -5.457241

. 

e. Why OLS is not suitable? Endogeniety issues: Omitted variable, measurment error, simultaniety.

Problem 2.10#

a. Sample size, mean & SD of math12 & read12.

%%stata
use catholic.dta, clear
sum math12 read12
(Written by R.              )

. use catholic.dta, clear

. sum math12 read12

    Variable |        Obs        Mean    Std. dev.       Min        Max
-------------+---------------------------------------------------------
      math12 |      7,430    52.13362    9.459117       29.5      71.37
      read12 |      7,430     51.7724    9.407761      29.15      68.09

. 

b. Ols math12 on read12.

%%stata
reg math12 read12
display "math12= " %5.2f _b[_cons] "+" %5.2f _b[read12] "read12; N= " _N ",Rsq=" %5.4f e(r2) 
. reg math12 read12

      Source |       SS           df       MS      Number of obs   =     7,430
-------------+----------------------------------   F(1, 7428)      =   7568.58
       Model |  335470.113         1  335470.113   Prob > F        =    0.0000
    Residual |   329238.93     7,428  44.3240347   R-squared       =    0.5047
-------------+----------------------------------   Adj R-squared   =    0.5046
       Total |  664709.043     7,429  89.4749015   Root MSE        =    6.6576

------------------------------------------------------------------------------
      math12 | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
      read12 |   .7142915   .0082105    87.00   0.000     .6981966    .7303863
       _cons |   15.15304    .432036    35.07   0.000     14.30612    15.99995
------------------------------------------------------------------------------

. display "math12= " %5.2f _b[_cons] "+" %5.2f _b[read12] "read12; N= " _N ",Rs
> q=" %5.4f e(r2) 
math12= 15.15+ 0.71read12; N= 7430,Rsq=0.5047

. 

c. Interprate the intercept.

d. Are you surprised by the b1 that you found? What about R2?

e. I would run the reverse regression to refute the comment.

%%stata
reg  read12 math12
      Source |       SS           df       MS      Number of obs   =     7,430
-------------+----------------------------------   F(1, 7428)      =   7568.58
       Model |  331837.266         1  331837.266   Prob > F        =    0.0000
    Residual |  325673.561     7,428  43.8440443   R-squared       =    0.5047
-------------+----------------------------------   Adj R-squared   =    0.5046
       Total |  657510.828     7,429  88.5059668   Root MSE        =    6.6215

------------------------------------------------------------------------------
      read12 | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
      math12 |   .7065563   .0081216    87.00   0.000     .6906358    .7224769
       _cons |   14.93706   .4303184    34.71   0.000     14.09352    15.78061
------------------------------------------------------------------------------

Spurious correlation or causality?