Chapter 2 - Simple Regression - Computer Exercises
-------------------------------------------------------------------------------------
name: SN
log: ~Wooldridge\intro-econx\iproblem2.smcl
log type: smcl
opened on: 27 Jan 2019, 01:15:29
. **********************************************
. * Solomon Negash - Solutions to Computer Exercises
. * Wooldridge (2016). Introductory Econometrics: A Modern Approach. 6th ed.
. * STATA Program, version 15.1.
. * Chapter 2 - The Simple Regression Model
. * Computer Exercises (Problems)
. ******************** SETUP *********************
. *Problem 2.1. Papke1995 (401k)
. use 401K.dta, clear
. d, short
Contains data from 401K.dta
obs: 1,534
vars: 8 9 Jun 1998 08:20
size: 39,884
Sorted by:
. //a. Average prate & mrate
. mean pra mra
Mean estimation Number of obs = 1,534
--------------------------------------------------------------
| Mean Std. Err. [95% Conf. Interval]
-------------+------------------------------------------------
prate | 87.36291 .4268091 86.52572 88.2001
mrate | .7315124 .0199033 .6924718 .770553
--------------------------------------------------------------
. //b&c. Run-regres prate on mrate, interprate intercept & coef.
. reg prate mrate
Source | SS df MS Number of obs = 1,534
-------------+---------------------------------- F(1, 1532) = 123.68
Model | 32001.7271 1 32001.7271 Prob > F = 0.0000
Residual | 396383.812 1,532 258.73617 R-squared = 0.0747
-------------+---------------------------------- Adj R-squared = 0.0741
Total | 428385.539 1,533 279.442622 Root MSE = 16.085
------------------------------------------------------------------------------
prate | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
mrate | 5.861079 .5270107 11.12 0.000 4.82734 6.894818
_cons | 83.07546 .5632844 147.48 0.000 81.97057 84.18035
------------------------------------------------------------------------------
. //d. predict at mrate=3.5
. display _b[_cons] + _b[mrate]*3.5
103.58923
. //e. How much of the variation in prate is explained by mrate? Is it a lot?
. display " R-squared = " e(r2)
R-squared = .0747031
. *Problem 2.2. ceosal2.dta
. use ceosal2.dta , clear
. d, short
Contains data from ceosal2.dta
obs: 177
vars: 15 17 Aug 1999 23:14
size: 6,549
Sorted by:
. //a. Average salary & average tenure
. mean lsalary ceoten comten
Mean estimation Number of obs = 177
--------------------------------------------------------------
| Mean Std. Err. [95% Conf. Interval]
-------------+------------------------------------------------
lsalary | 6.582848 .0455542 6.492945 6.67275
ceoten | 7.954802 .537489 6.894049 9.015555
comten | 22.50282 .9241289 20.67902 24.32662
--------------------------------------------------------------
. //b. CEO at their first year (ceoten=0)
. count if ceoten==0
5
. sum ceoten
Variable | Obs Mean Std. Dev. Min Max
-------------+---------------------------------------------------------
ceoten | 177 7.954802 7.150826 0 37
. display r(max)
37
. //c. ols lsalary on ceoten, ...
. reg lsalary ceoten
Source | SS df MS Number of obs = 177
-------------+---------------------------------- F(1, 175) = 2.33
Model | .850907024 1 .850907024 Prob > F = 0.1284
Residual | 63.795306 175 .364544606 R-squared = 0.0132
-------------+---------------------------------- Adj R-squared = 0.0075
Total | 64.6462131 176 .367308029 Root MSE = .60378
------------------------------------------------------------------------------
lsalary | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
ceoten | .0097236 .0063645 1.53 0.128 -.0028374 .0222846
_cons | 6.505498 .0679911 95.68 0.000 6.37131 6.639686
------------------------------------------------------------------------------
. *Problem 2.3. sleep75.dta (Biddle&Hamermesh1990)
. use sleep75.dta , clear
. d sleep totwrk, short
storage display value
variable name type format label variable label
-------------------------------------------------------------------------------------
sleep int %9.0g mins sleep at night, per wk
totwrk int %9.0g mins worked per week
. //a. ols sleep on totwrk & report in equation form. Interprate intercept.
. reg sleep totwrk
Source | SS df MS Number of obs = 706
-------------+---------------------------------- F(1, 704) = 81.09
Model | 14381717.2 1 14381717.2 Prob > F = 0.0000
Residual | 124858119 704 177355.282 R-squared = 0.1033
-------------+---------------------------------- Adj R-squared = 0.1020
Total | 139239836 705 197503.313 Root MSE = 421.14
------------------------------------------------------------------------------
sleep | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
totwrk | -.1507458 .0167403 -9.00 0.000 -.1836126 -.117879
_cons | 3586.377 38.91243 92.17 0.000 3509.979 3662.775
------------------------------------------------------------------------------
. //b. If totwrk increases by 2 hours, by how much is sleep estimated to fall?
. display _b[totwrk]*2*60
-18.089499
. *Problem 2.4. Wage2: ols salary on iq
. use wage2.dta, clear
. //a. average Salary, average IQ and sample sd of IQ
. sum wage IQ
Variable | Obs Mean Std. Dev. Min Max
-------------+---------------------------------------------------------
wage | 935 957.9455 404.3608 115 3078
IQ | 935 101.2824 15.05264 50 145
. //b. efect of 15 point increase in IQ on Wage (constant dollar)
. reg wage IQ
Source | SS df MS Number of obs = 935
-------------+---------------------------------- F(1, 933) = 98.55
Model | 14589782.6 1 14589782.6 Prob > F = 0.0000
Residual | 138126386 933 148045.429 R-squared = 0.0955
-------------+---------------------------------- Adj R-squared = 0.0946
Total | 152716168 934 163507.675 Root MSE = 384.77
------------------------------------------------------------------------------
wage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
IQ | 8.303064 .8363951 9.93 0.000 6.661631 9.944498
_cons | 116.9916 85.64153 1.37 0.172 -51.08078 285.0639
------------------------------------------------------------------------------
. display "wage= " %5.3f _b[_cons] "+" %5.3f _b[IQ] "IQ; N=" _N ",Rsq=" %5.4f e(r2)
wage= 116.992+8.303IQ; N=935,Rsq=0.0955
. display _b[IQ]*15
124.54596
. //c. efect of 15 point increase in IQ on Wage (percentage)
. reg lwage IQ
Source | SS df MS Number of obs = 935
-------------+---------------------------------- F(1, 933) = 102.62
Model | 16.4150939 1 16.4150939 Prob > F = 0.0000
Residual | 149.241189 933 .159958402 R-squared = 0.0991
-------------+---------------------------------- Adj R-squared = 0.0981
Total | 165.656283 934 .177362188 Root MSE = .39995
------------------------------------------------------------------------------
lwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
IQ | .0088072 .0008694 10.13 0.000 .007101 .0105134
_cons | 5.886994 .0890206 66.13 0.000 5.712291 6.061698
------------------------------------------------------------------------------
. display "lwage= " %5.3f _b[_cons] "+" %5.3f _b[IQ] "IQ; N=" _N ",Rsq=" %5.4f e(r2)
lwage= 5.887+0.009IQ; N=935,Rsq=0.0991
. display "0" _b[IQ]*15
0.13210734
. *Problem 2.5. rdchem: r&d on sales
. use rdchem.dta , clear
. //a. Model for elasticity?
. *log(rd)=b0+b1log(sales) ; b1 is parameter elasticity
. //b. Estimate b1?
. reg lrd lsale
Source | SS df MS Number of obs = 32
-------------+---------------------------------- F(1, 30) = 302.72
Model | 84.8395785 1 84.8395785 Prob > F = 0.0000
Residual | 8.40768588 30 .280256196 R-squared = 0.9098
-------------+---------------------------------- Adj R-squared = 0.9068
Total | 93.2472644 31 3.00797627 Root MSE = .52939
------------------------------------------------------------------------------
lrd | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
lsales | 1.075731 .0618275 17.40 0.000 .9494619 1.201999
_cons | -4.104722 .4527678 -9.07 0.000 -5.029398 -3.180047
------------------------------------------------------------------------------
. *Problem 2.6. meap93: math pass rate (math4) & spending per student (expend)
. use meap93.dta, clear
. //a. Diminishing effect
. //b. math10 = b0 + b1log(expend) + u -->
. *(dy/dlnx)*(dlnx/dx)=c% <==> (dy/dlnx)*1/x ; (dy/dlnx)=b1=cx ==> x=b1/c
. //c. ols math10 on lexpend,
. reg math10 lexpend
Source | SS df MS Number of obs = 408
-------------+---------------------------------- F(1, 406) = 12.41
Model | 1329.42517 1 1329.42517 Prob > F = 0.0005
Residual | 43487.7553 406 107.112698 R-squared = 0.0297
-------------+---------------------------------- Adj R-squared = 0.0273
Total | 44817.1805 407 110.115923 Root MSE = 10.35
------------------------------------------------------------------------------
math10 | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
lexpend | 11.16439 3.169011 3.52 0.000 4.934677 17.39411
_cons | -69.3411 26.53013 -2.61 0.009 -121.4947 -17.18753
------------------------------------------------------------------------------
. display "math10= " %5.3f _b[_cons] "+" %5.3f _b[lexpend] "log(expend); ///
> N=" _N ",Rsq=" %5.4f e(r2)
math10= -69.341+11.164log(expend); N=408,Rsq=0.0297
. //d. How big is the effect? If spending increases by 10%?
. display _b[lexpend]/10 "%"
1.1164395%
. //e. Why is "math10>100" not much of a worry in this data set?
. *Problem 2.7. charity: gifts and mailings; imported from R (wooldridge package)
. use charity.dta, clear
. //a. & b.
. sum gift mails
Variable | Obs Mean Std. Dev. Min Max
-------------+---------------------------------------------------------
gift | 4,268 7.44447 15.06256 0 250
mailsyear | 4,268 2.049555 .66758 .25 3.5
. count if gift==0
2,561
. display 100*r(N)/4268 "%"
60.004686%
. //c. Regeress gift on mails per year,
. reg gift mails
Source | SS df MS Number of obs = 4,268
-------------+---------------------------------- F(1, 4266) = 59.65
Model | 13349.7251 1 13349.7251 Prob > F = 0.0000
Residual | 954750.114 4,266 223.804528 R-squared = 0.0138
-------------+---------------------------------- Adj R-squared = 0.0136
Total | 968099.84 4,267 226.880675 Root MSE = 14.96
------------------------------------------------------------------------------
gift | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
mailsyear | 2.649546 .3430598 7.72 0.000 1.976971 3.322122
_cons | 2.01408 .7394696 2.72 0.006 .5643347 3.463825
------------------------------------------------------------------------------
. display "gift= " %5.3f _b[_cons] "+" %5.3f _b[mails] "mails; N=" _N ",Rsq=" %5.4f e(r2)
gift= 2.014+2.650mails; N=4268,Rsq=0.0138
. //d. Does the charity make profit if per unit cost of mailing is one guilder?
. display _b[mails] - 1
1.6495464
. //e. The smallest predicted gift (i.e., mail=0)
. margins, at(mail=0)
Adjusted predictions Number of obs = 4,268
Model VCE : OLS
Expression : Linear prediction, predict()
at : mailsyear = 0
------------------------------------------------------------------------------
| Delta-method
| Margin Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
_cons | 2.01408 .7394696 2.72 0.006 .5643347 3.463825
------------------------------------------------------------------------------
. *Problem 2.8.
. clear
. //a.
. set obs 500
number of observations (_N) was 0, now 500
. g x_ = uniform()
. g x = x_ *10
. sum x
Variable | Obs Mean Std. Dev. Min Max
-------------+---------------------------------------------------------
x | 500 4.872077 2.9848 .0033314 9.980742
. //b.
. g u_ = runiform()
. g u = u_ *6
. sum u
Variable | Obs Mean Std. Dev. Min Max
-------------+---------------------------------------------------------
u | 500 3.053768 1.768815 .0115764 5.998999
. //c.
. g y = 1 + 2*x + u
. reg y x
Source | SS df MS Number of obs = 500
-------------+---------------------------------- F(1, 498) = 5806.78
Model | 18178.7235 1 18178.7235 Prob > F = 0.0000
Residual | 1559.04137 498 3.13060515 R-squared = 0.9210
-------------+---------------------------------- Adj R-squared = 0.9209
Total | 19737.7649 499 39.554639 Root MSE = 1.7694
------------------------------------------------------------------------------
y | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
x | 2.022163 .0265368 76.20 0.000 1.970025 2.074301
_cons | 3.945788 .1515815 26.03 0.000 3.64797 4.243606
------------------------------------------------------------------------------
. //d.
. qui reg y x
. predict uh, residual
. g xuh=x*uh
. //verify if E(uh)=E(x'uh)=0 ; compare results with E(u)=E(x'u)=0. Discuss.
. sum xuh uh u
Variable | Obs Mean Std. Dev. Min Max
-------------+---------------------------------------------------------
xuh | 500 -1.27e-08 10.14709 -28.35288 26.61722
uh | 500 -1.04e-09 1.767578 -3.085353 3.014317
u | 500 3.053768 1.768815 .0115764 5.998999
. //e.
. g xu = x * u
. sum xu xuh uh u
Variable | Obs Mean Std. Dev. Min Max
-------------+---------------------------------------------------------
xu | 500 15.07525 13.89241 .0110519 56.60181
xuh | 500 -1.27e-08 10.14709 -28.35288 26.61722
uh | 500 -1.04e-09 1.767578 -3.085353 3.014317
u | 500 3.053768 1.768815 .0115764 5.998999
. //f. Rerun 2 or 3 times and compare results and conclude!
. *Problem 2.9. CountyMurders only 1996
. use countymurders.dta, clear
. keep if year==1996
(35,152 observations deleted)
. //a. how many counties had zero murders in 1996?
. count if murder==0 //counties with zero murder
1,051
. count if execs>0 //counties with at least one execution
31
. sum execs if murder>0
Variable | Obs Mean Std. Dev. Min Max
-------------+---------------------------------------------------------
execs | 1,146 .0296684 .1937704 0 3
. display r(max)
3
. //b. ols murder = f (execs); report results the usual way with N & R^2 included
. reg murders execs
Source | SS df MS Number of obs = 2,197
-------------+---------------------------------- F(1, 2195) = 100.77
Model | 152381.693 1 152381.693 Prob > F = 0.0000
Residual | 3319359.01 2,195 1512.23645 R-squared = 0.0439
-------------+---------------------------------- Adj R-squared = 0.0435
Total | 3471740.7 2,196 1580.93839 Root MSE = 38.887
------------------------------------------------------------------------------
murders | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
execs | 58.55548 5.833255 10.04 0.000 47.1162 69.99476
_cons | 5.457241 .834838 6.54 0.000 3.820086 7.094396
------------------------------------------------------------------------------
. display "murders= " %5.2f _b[_cons] "+" %5.2f _b[execs] "execs; N= " _N ",Rsq=" %5.4f e(r2)
murders= 5.46+58.56execs; N= 2197,Rsq=0.0439
. //c. Interprate the slope coef.
. //d. The smallest murder that can be predicted using this model is when execution i
> s zero.
. display _b[_cons] + _b[execs]*0
5.4572409
. predict u, residual
. sum u if murder==0 & execs==0
Variable | Obs Mean Std. Dev. Min Max
-------------+---------------------------------------------------------
u | 1,050 -5.457241 0 -5.457241 -5.457241
. //e. Why OLS is not suitable? Endogeniety issues: Omitted variable, measurment erro
> r, simultaniety.
. *Problem 2.10.
. use catholic.dta, clear
. //a. Sample size, mean & SD of math12 & read12.
. sum math12 read12
Variable | Obs Mean Std. Dev. Min Max
-------------+---------------------------------------------------------
math12 | 7,430 52.13362 9.459117 29.5 71.37
read12 | 7,430 51.7724 9.407761 29.15 68.09
. //b. Ols math12 on read12.
. reg math12 read12
Source | SS df MS Number of obs = 7,430
-------------+---------------------------------- F(1, 7428) = 7568.58
Model | 335470.113 1 335470.113 Prob > F = 0.0000
Residual | 329238.93 7,428 44.3240347 R-squared = 0.5047
-------------+---------------------------------- Adj R-squared = 0.5046
Total | 664709.043 7,429 89.4749015 Root MSE = 6.6576
------------------------------------------------------------------------------
math12 | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
read12 | .7142915 .0082105 87.00 0.000 .6981966 .7303863
_cons | 15.15304 .432036 35.07 0.000 14.30612 15.99995
------------------------------------------------------------------------------
. display "math12= " %5.2f _b[_cons] "+" %5.2f _b[read12] "read12; N= " _N ",Rsq=" %5.4f e(r2)
math12= 15.15+ 0.71read12; N= 7430,Rsq=0.5047
. //c. *Interprate the intercept.
. //d. Comment on the values of b1 and R^2.
. //e. I would run the reverse regression to refute the comment.
. reg read12 math12
Source | SS df MS Number of obs = 7,430
-------------+---------------------------------- F(1, 7428) = 7568.58
Model | 331837.266 1 331837.266 Prob > F = 0.0000
Residual | 325673.561 7,428 43.8440443 R-squared = 0.5047
-------------+---------------------------------- Adj R-squared = 0.5046
Total | 657510.828 7,429 88.5059668 Root MSE = 6.6215
------------------------------------------------------------------------------
read12 | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
math12 | .7065563 .0081216 87.00 0.000 .6906358 .7224769
_cons | 14.93706 .4303184 34.71 0.000 14.09352 15.78061
------------------------------------------------------------------------------
. *Spurious correlation or causality?
. log close
name: SN
log: ~Wooldridge\intro-econx\iproblem2.smcl
log type: smcl
closed on: 27 Jan 2019, 01:15:29
-------------------------------------------------------------------------------------