Chapter 2 - Simple Regression - Computer Exercises

-------------------------------------------------------------------------------------
      name:  SN
       log:  ~Wooldridge\intro-econx\iproblem2.smcl
  log type:  smcl
 opened on:  27 Jan 2019, 01:15:29

. **********************************************
. * Solomon Negash - Solutions to Computer Exercises
. * Wooldridge (2016). Introductory Econometrics: A Modern Approach. 6th ed.  
. * STATA Program, version 15.1. 

. * Chapter 2  - The Simple Regression Model
. * Computer Exercises (Problems)
. ******************** SETUP *********************

. *Problem 2.1. Papke1995 (401k) 
. use 401K.dta, clear 
. d, short
Contains data from 401K.dta
  obs:         1,534                          
 vars:             8                          9 Jun 1998 08:20
 size:        39,884                          
Sorted by: 

. //a. Average prate & mrate
. mean pra mra
Mean estimation                   Number of obs   =      1,534
--------------------------------------------------------------
             |       Mean   Std. Err.     [95% Conf. Interval]
-------------+------------------------------------------------
       prate |   87.36291   .4268091      86.52572     88.2001
       mrate |   .7315124   .0199033      .6924718     .770553
--------------------------------------------------------------

. //b&c. Run-regres prate on mrate, interprate intercept & coef.
. reg prate mrate
      Source |       SS           df       MS      Number of obs   =     1,534
-------------+----------------------------------   F(1, 1532)      =    123.68
       Model |  32001.7271         1  32001.7271   Prob > F        =    0.0000
    Residual |  396383.812     1,532   258.73617   R-squared       =    0.0747
-------------+----------------------------------   Adj R-squared   =    0.0741
       Total |  428385.539     1,533  279.442622   Root MSE        =    16.085
------------------------------------------------------------------------------
       prate |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       mrate |   5.861079   .5270107    11.12   0.000      4.82734    6.894818
       _cons |   83.07546   .5632844   147.48   0.000     81.97057    84.18035
------------------------------------------------------------------------------

. //d. predict at mrate=3.5
. display _b[_cons] + _b[mrate]*3.5 
103.58923

. //e. How much of the variation in prate is explained by mrate? Is it a lot?
. display " R-squared  = " e(r2)
 R-squared  = .0747031


. *Problem 2.2. ceosal2.dta
. use ceosal2.dta , clear
. d, short
Contains data from ceosal2.dta
  obs:           177                          
 vars:            15                          17 Aug 1999 23:14
 size:         6,549                          
Sorted by: 

. //a. Average salary & average tenure
. mean lsalary ceoten comten  
Mean estimation                   Number of obs   =        177
--------------------------------------------------------------
             |       Mean   Std. Err.     [95% Conf. Interval]
-------------+------------------------------------------------
     lsalary |   6.582848   .0455542      6.492945     6.67275
      ceoten |   7.954802    .537489      6.894049    9.015555
      comten |   22.50282   .9241289      20.67902    24.32662
--------------------------------------------------------------

. //b. CEO at their first year (ceoten=0)
. count if ceoten==0  
  5
. sum ceoten 
    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
      ceoten |        177    7.954802    7.150826          0         37
. display r(max)
37

. //c. ols lsalary on ceoten, ...
. reg lsalary ceoten
      Source |       SS           df       MS      Number of obs   =       177
-------------+----------------------------------   F(1, 175)       =      2.33
       Model |  .850907024         1  .850907024   Prob > F        =    0.1284
    Residual |   63.795306       175  .364544606   R-squared       =    0.0132
-------------+----------------------------------   Adj R-squared   =    0.0075
       Total |  64.6462131       176  .367308029   Root MSE        =    .60378
------------------------------------------------------------------------------
     lsalary |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      ceoten |   .0097236   .0063645     1.53   0.128    -.0028374    .0222846
       _cons |   6.505498   .0679911    95.68   0.000      6.37131    6.639686
------------------------------------------------------------------------------


. *Problem 2.3. sleep75.dta (Biddle&Hamermesh1990)
. use sleep75.dta , clear
. d sleep totwrk, short
              storage   display    value
variable name   type    format     label      variable label
-------------------------------------------------------------------------------------
sleep           int     %9.0g                 mins sleep at night, per wk
totwrk          int     %9.0g                 mins worked per week

. //a. ols sleep on totwrk & report in equation form. Interprate intercept. 
. reg sleep totwrk
      Source |       SS           df       MS      Number of obs   =       706
-------------+----------------------------------   F(1, 704)       =     81.09
       Model |  14381717.2         1  14381717.2   Prob > F        =    0.0000
    Residual |   124858119       704  177355.282   R-squared       =    0.1033
-------------+----------------------------------   Adj R-squared   =    0.1020
       Total |   139239836       705  197503.313   Root MSE        =    421.14
------------------------------------------------------------------------------
       sleep |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      totwrk |  -.1507458   .0167403    -9.00   0.000    -.1836126    -.117879
       _cons |   3586.377   38.91243    92.17   0.000     3509.979    3662.775
------------------------------------------------------------------------------

. //b. If totwrk increases by 2 hours, by how much is sleep estimated to fall? 
. display _b[totwrk]*2*60
-18.089499


. *Problem 2.4. Wage2: ols salary on iq
. use wage2.dta, clear

. //a. average Salary, average IQ and sample sd of IQ
. sum wage IQ
    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
        wage |        935    957.9455    404.3608        115       3078
          IQ |        935    101.2824    15.05264         50        145

. //b. efect of 15 point increase in IQ on Wage (constant dollar)
. reg wage IQ
      Source |       SS           df       MS      Number of obs   =       935
-------------+----------------------------------   F(1, 933)       =     98.55
       Model |  14589782.6         1  14589782.6   Prob > F        =    0.0000
    Residual |   138126386       933  148045.429   R-squared       =    0.0955
-------------+----------------------------------   Adj R-squared   =    0.0946
       Total |   152716168       934  163507.675   Root MSE        =    384.77
------------------------------------------------------------------------------
        wage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
          IQ |   8.303064   .8363951     9.93   0.000     6.661631    9.944498
       _cons |   116.9916   85.64153     1.37   0.172    -51.08078    285.0639
------------------------------------------------------------------------------
. display "wage= " %5.3f _b[_cons] "+" %5.3f _b[IQ] "IQ; N=" _N ",Rsq=" %5.4f e(r2) 
wage= 116.992+8.303IQ; N=935,Rsq=0.0955
. display _b[IQ]*15
124.54596

. //c. efect of 15 point increase in IQ on Wage (percentage)
. reg lwage IQ
      Source |       SS           df       MS      Number of obs   =       935
-------------+----------------------------------   F(1, 933)       =    102.62
       Model |  16.4150939         1  16.4150939   Prob > F        =    0.0000
    Residual |  149.241189       933  .159958402   R-squared       =    0.0991
-------------+----------------------------------   Adj R-squared   =    0.0981
       Total |  165.656283       934  .177362188   Root MSE        =    .39995
------------------------------------------------------------------------------
       lwage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
          IQ |   .0088072   .0008694    10.13   0.000      .007101    .0105134
       _cons |   5.886994   .0890206    66.13   0.000     5.712291    6.061698
------------------------------------------------------------------------------
. display "lwage= " %5.3f _b[_cons] "+" %5.3f _b[IQ] "IQ; N=" _N ",Rsq=" %5.4f e(r2) 
lwage= 5.887+0.009IQ; N=935,Rsq=0.0991
. display "0" _b[IQ]*15
0.13210734


. *Problem 2.5. rdchem: r&d on sales
. use rdchem.dta , clear

. //a. Model for elasticity?
. *log(rd)=b0+b1log(sales) ; b1 is parameter elasticity 

. //b. Estimate b1? 
. reg lrd lsale  
      Source |       SS           df       MS      Number of obs   =        32
-------------+----------------------------------   F(1, 30)        =    302.72
       Model |  84.8395785         1  84.8395785   Prob > F        =    0.0000
    Residual |  8.40768588        30  .280256196   R-squared       =    0.9098
-------------+----------------------------------   Adj R-squared   =    0.9068
       Total |  93.2472644        31  3.00797627   Root MSE        =    .52939
------------------------------------------------------------------------------
         lrd |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      lsales |   1.075731   .0618275    17.40   0.000     .9494619    1.201999
       _cons |  -4.104722   .4527678    -9.07   0.000    -5.029398   -3.180047
------------------------------------------------------------------------------

 




. *Problem 2.6. meap93: math pass rate (math4) & spending per student (expend)
. use meap93.dta, clear
. //a. Diminishing effect   
. //b. math10 = b0 + b1log(expend) + u --> 
. *(dy/dlnx)*(dlnx/dx)=c% <==> (dy/dlnx)*1/x ; (dy/dlnx)=b1=cx ==> x=b1/c 
. //c. ols math10 on lexpend, 
. reg math10 lexpend
      Source |       SS           df       MS      Number of obs   =       408
-------------+----------------------------------   F(1, 406)       =     12.41
       Model |  1329.42517         1  1329.42517   Prob > F        =    0.0005
    Residual |  43487.7553       406  107.112698   R-squared       =    0.0297
-------------+----------------------------------   Adj R-squared   =    0.0273
       Total |  44817.1805       407  110.115923   Root MSE        =     10.35
------------------------------------------------------------------------------
      math10 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
     lexpend |   11.16439   3.169011     3.52   0.000     4.934677    17.39411
       _cons |   -69.3411   26.53013    -2.61   0.009    -121.4947   -17.18753
------------------------------------------------------------------------------
. display "math10= " %5.3f _b[_cons] "+" %5.3f _b[lexpend] "log(expend); ///
> N=" _N ",Rsq=" %5.4f e(r2) 
math10= -69.341+11.164log(expend); N=408,Rsq=0.0297

. //d. How big is the effect? If spending increases by 10%? 
. display  _b[lexpend]/10 "%"
1.1164395%

. //e. Why is "math10>100" not much of a worry in this data set?


. *Problem 2.7. charity: gifts and mailings; imported from R (wooldridge package)
. use charity.dta, clear

. //a. & b.
. sum gift mails
    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
        gift |      4,268     7.44447    15.06256          0        250
   mailsyear |      4,268    2.049555      .66758        .25        3.5
. count if gift==0
  2,561
. display 100*r(N)/4268 "%"
60.004686%

. //c. Regeress gift on mails per year, 
. reg gift mails
      Source |       SS           df       MS      Number of obs   =     4,268
-------------+----------------------------------   F(1, 4266)      =     59.65
       Model |  13349.7251         1  13349.7251   Prob > F        =    0.0000
    Residual |  954750.114     4,266  223.804528   R-squared       =    0.0138
-------------+----------------------------------   Adj R-squared   =    0.0136
       Total |   968099.84     4,267  226.880675   Root MSE        =     14.96
------------------------------------------------------------------------------
        gift |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
   mailsyear |   2.649546   .3430598     7.72   0.000     1.976971    3.322122
       _cons |    2.01408   .7394696     2.72   0.006     .5643347    3.463825
------------------------------------------------------------------------------
. display "gift= " %5.3f _b[_cons] "+" %5.3f _b[mails] "mails; N=" _N ",Rsq=" %5.4f e(r2)
gift= 2.014+2.650mails; N=4268,Rsq=0.0138

. //d. Does the charity make profit if per unit cost of mailing is one guilder?
. display  _b[mails] - 1
1.6495464

. //e. The smallest predicted gift (i.e., mail=0)
. margins, at(mail=0)
Adjusted predictions                            Number of obs     =      4,268
Model VCE    : OLS
Expression   : Linear prediction, predict()
at           : mailsyear       =           0
------------------------------------------------------------------------------
             |            Delta-method
             |     Margin   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       _cons |    2.01408   .7394696     2.72   0.006     .5643347    3.463825
------------------------------------------------------------------------------


. *Problem 2.8.
. clear

. //a.
. set obs 500
number of observations (_N) was 0, now 500
. g x_ = uniform()
. g x = x_ *10
. sum x
    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
           x |        500    4.872077      2.9848   .0033314   9.980742

. //b.
. g u_ = runiform() 
. g u = u_ *6 
. sum u
    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
           u |        500    3.053768    1.768815   .0115764   5.998999

. //c.
. g y = 1 + 2*x + u 
. reg y x
      Source |       SS           df       MS      Number of obs   =       500
-------------+----------------------------------   F(1, 498)       =   5806.78
       Model |  18178.7235         1  18178.7235   Prob > F        =    0.0000
    Residual |  1559.04137       498  3.13060515   R-squared       =    0.9210
-------------+----------------------------------   Adj R-squared   =    0.9209
       Total |  19737.7649       499   39.554639   Root MSE        =    1.7694
------------------------------------------------------------------------------
           y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
           x |   2.022163   .0265368    76.20   0.000     1.970025    2.074301
       _cons |   3.945788   .1515815    26.03   0.000      3.64797    4.243606
------------------------------------------------------------------------------

. //d.
. qui reg y x
. predict uh, residual
. g xuh=x*uh 
. //verify if E(uh)=E(x'uh)=0 ; compare results with E(u)=E(x'u)=0. Discuss. 
. sum xuh uh u
    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
         xuh |        500   -1.27e-08    10.14709  -28.35288   26.61722
          uh |        500   -1.04e-09    1.767578  -3.085353   3.014317
           u |        500    3.053768    1.768815   .0115764   5.998999

. //e.
. g xu = x * u
. sum xu xuh uh u 
    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
          xu |        500    15.07525    13.89241   .0110519   56.60181
         xuh |        500   -1.27e-08    10.14709  -28.35288   26.61722
          uh |        500   -1.04e-09    1.767578  -3.085353   3.014317
           u |        500    3.053768    1.768815   .0115764   5.998999

. //f. Rerun 2 or 3 times and compare results and conclude!


. *Problem 2.9. CountyMurders only 1996
. use countymurders.dta, clear
. keep if year==1996
(35,152 observations deleted)

. //a. how many counties had zero murders in 1996?
. count if murder==0 //counties with zero murder
  1,051
. count if execs>0 //counties with at least one execution
  31
. sum execs if murder>0
    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
       execs |      1,146    .0296684    .1937704          0          3
. display r(max)
3

. //b. ols murder = f (execs); report results the usual way with N & R^2 included
. reg murders execs
      Source |       SS           df       MS      Number of obs   =     2,197
-------------+----------------------------------   F(1, 2195)      =    100.77
       Model |  152381.693         1  152381.693   Prob > F        =    0.0000
    Residual |  3319359.01     2,195  1512.23645   R-squared       =    0.0439
-------------+----------------------------------   Adj R-squared   =    0.0435
       Total |   3471740.7     2,196  1580.93839   Root MSE        =    38.887
------------------------------------------------------------------------------
     murders |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       execs |   58.55548   5.833255    10.04   0.000      47.1162    69.99476
       _cons |   5.457241    .834838     6.54   0.000     3.820086    7.094396
------------------------------------------------------------------------------
. display "murders= " %5.2f _b[_cons] "+" %5.2f _b[execs] "execs; N= " _N ",Rsq=" %5.4f e(r2)
murders=  5.46+58.56execs; N= 2197,Rsq=0.0439

. //c. Interprate the slope coef.
. //d. The smallest murder that can be predicted using this model is when execution i
> s zero.
. display _b[_cons] + _b[execs]*0 
5.4572409
. predict u, residual
. sum u if murder==0 & execs==0 
    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
           u |      1,050   -5.457241           0  -5.457241  -5.457241

. //e. Why OLS is not suitable? Endogeniety issues: Omitted variable, measurment erro
> r, simultaniety. 


. *Problem 2.10.
. use catholic.dta, clear

. //a. Sample size, mean & SD of math12 & read12. 
. sum math12 read12
    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
      math12 |      7,430    52.13362    9.459117       29.5      71.37
      read12 |      7,430     51.7724    9.407761      29.15      68.09

. //b. Ols math12 on read12. 
. reg math12 read12
      Source |       SS           df       MS      Number of obs   =     7,430
-------------+----------------------------------   F(1, 7428)      =   7568.58
       Model |  335470.113         1  335470.113   Prob > F        =    0.0000
    Residual |   329238.93     7,428  44.3240347   R-squared       =    0.5047
-------------+----------------------------------   Adj R-squared   =    0.5046
       Total |  664709.043     7,429  89.4749015   Root MSE        =    6.6576
------------------------------------------------------------------------------
      math12 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      read12 |   .7142915   .0082105    87.00   0.000     .6981966    .7303863
       _cons |   15.15304    .432036    35.07   0.000     14.30612    15.99995
------------------------------------------------------------------------------
. display "math12= " %5.2f _b[_cons] "+" %5.2f _b[read12] "read12; N= " _N ",Rsq=" %5.4f e(r2)
math12= 15.15+ 0.71read12; N= 7430,Rsq=0.5047

. //c. *Interprate the intercept.
. //d. Comment on the values of b1 and R^2. 
. //e. I would run the reverse regression to refute the comment.
. reg  read12 math12
      Source |       SS           df       MS      Number of obs   =     7,430
-------------+----------------------------------   F(1, 7428)      =   7568.58
       Model |  331837.266         1  331837.266   Prob > F        =    0.0000
    Residual |  325673.561     7,428  43.8440443   R-squared       =    0.5047
-------------+----------------------------------   Adj R-squared   =    0.5046
       Total |  657510.828     7,429  88.5059668   Root MSE        =    6.6215
------------------------------------------------------------------------------
      read12 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      math12 |   .7065563   .0081216    87.00   0.000     .6906358    .7224769
       _cons |   14.93706   .4303184    34.71   0.000     14.09352    15.78061
------------------------------------------------------------------------------
. *Spurious correlation or causality?

. log close
      name:  SN
       log:  ~Wooldridge\intro-econx\iproblem2.smcl
  log type:  smcl
 closed on:  27 Jan 2019, 01:15:29
-------------------------------------------------------------------------------------