## Chapter 20 - Stratified Sampling and Cluster Sampling

### Examples

```------------------------------------------------------------------------------------------
name:  SN
log:  \iiexample20.smcl
log type:  smcl
closed on:  12 May 2020, 20:45:32
. **********************************************
.  * Solomon Negash - Examples
.  * Wooldridge (2010). Economic Analysis of Cross-Section and Panel Data. 2nd ed.
.  * STATA Program, version 16.1.

.  * Chapter 20 - Stratified Sampling and Cluster Sampling
.  ***********************************************

. // Example 20.3 (Cluster Correlation in Teacher Compensation)

. u "Wooldridge_2E\benefits", clear

. eststo POLS: reg lavgsal bs lstaff lenroll lunch

Source |       SS           df       MS      Number of obs   =     1,848
-------------+----------------------------------   F(4, 1843)      =    429.78
Model |  48.3485452         4  12.0871363   Prob > F        =    0.0000
Residual |  51.8328336     1,843  .028124164   R-squared       =    0.4826
-------------+----------------------------------   Adj R-squared   =    0.4815
Total |  100.181379     1,847  .054240054   Root MSE        =     .1677

------------------------------------------------------------------------------
lavgsal |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
bs |  -.1774396   .1219691    -1.45   0.146    -.4166518    .0617725
lstaff |  -.6907025   .0184598   -37.42   0.000    -.7269068   -.6544981
lenroll |  -.0292406   .0084997    -3.44   0.001    -.0459107   -.0125705
lunch |  -.0008471   .0001625    -5.21   0.000    -.0011658   -.0005284
_cons |   13.72361   .1121095   122.41   0.000     13.50374    13.94349
------------------------------------------------------------------------------

. eststo POLSr: reg lavgsal bs lstaff lenroll lunch, cluster(distid)

Linear regression                               Number of obs     =      1,848
F(4, 536)         =     134.77
Prob > F          =     0.0000
R-squared         =     0.4826
Root MSE          =      .1677

(Std. Err. adjusted for 537 clusters in distid)
------------------------------------------------------------------------------
|               Robust
lavgsal |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
bs |  -.1774396   .2596214    -0.68   0.495    -.6874398    .3325605
lstaff |  -.6907025   .0352962   -19.57   0.000    -.7600383   -.6213666
lenroll |  -.0292406   .0257414    -1.14   0.256     -.079807    .0213258
lunch |  -.0008471   .0005709    -1.48   0.138    -.0019686    .0002744
_cons |   13.72361   .2562909    53.55   0.000     13.22016    14.22707
------------------------------------------------------------------------------

. eststo RE: xtreg lavgsal bs lstaff lenroll lunch, re

Random-effects GLS regression                   Number of obs     =      1,848
Group variable: distid                          Number of groups  =        537

R-sq:                                           Obs per group:
within  = 0.5453                                         min =          1
between = 0.3852                                         avg =        3.4
overall = 0.4671                                         max =        162

Wald chi2(4)      =    1890.56
corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =     0.0000

------------------------------------------------------------------------------
lavgsal |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
bs |  -.3812698   .1118678    -3.41   0.001    -.6005267    -.162013
lstaff |  -.6174177   .0153587   -40.20   0.000    -.6475202   -.5873151
lenroll |  -.0249189   .0075532    -3.30   0.001    -.0397228   -.0101149
lunch |   .0002995   .0001794     1.67   0.095    -.0000521    .0006511
_cons |   13.36682   .0975734   136.99   0.000     13.17558    13.55806
-------------+----------------------------------------------------------------
sigma_u |  .12627558
sigma_e |  .09996638
rho |  .61473634   (fraction of variance due to u_i)
------------------------------------------------------------------------------

. eststo REr: xtreg lavgsal bs lstaff lenroll lunch, re cluster(distid)

Random-effects GLS regression                   Number of obs     =      1,848
Group variable: distid                          Number of groups  =        537

R-sq:                                           Obs per group:
within  = 0.5453                                         min =          1
between = 0.3852                                         avg =        3.4
overall = 0.4671                                         max =        162

Wald chi2(4)      =     316.91
corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =     0.0000

(Std. Err. adjusted for 537 clusters in distid)
------------------------------------------------------------------------------
|               Robust
lavgsal |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
bs |  -.3812698   .1504893    -2.53   0.011    -.6762235   -.0863162
lstaff |  -.6174177   .0363789   -16.97   0.000     -.688719   -.5461163
lenroll |  -.0249189   .0115371    -2.16   0.031    -.0475312   -.0023065
lunch |   .0002995   .0001963     1.53   0.127    -.0000852    .0006841
_cons |   13.36682   .1968713    67.90   0.000     12.98096    13.75268
-------------+----------------------------------------------------------------
sigma_u |  .12627558
sigma_e |  .09996638
rho |  .61473634   (fraction of variance due to u_i)
------------------------------------------------------------------------------

. eststo FEr: xtreg lavgsal bs lstaff lenroll lunch, fe cluster(distid)

Fixed-effects (within) regression               Number of obs     =      1,848
Group variable: distid                          Number of groups  =        537

R-sq:                                           Obs per group:
within  = 0.5486                                         min =          1
between = 0.3544                                         avg =        3.4
overall = 0.4567                                         max =        162

F(4,536)          =      57.84
corr(u_i, Xb)  = 0.1433                         Prob > F          =     0.0000

(Std. Err. adjusted for 537 clusters in distid)
------------------------------------------------------------------------------
|               Robust
lavgsal |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
bs |  -.4948449   .1937316    -2.55   0.011    -.8754112   -.1142785
lstaff |  -.6218901   .0431812   -14.40   0.000    -.7067152   -.5370649
lenroll |  -.0515063   .0130887    -3.94   0.000    -.0772178   -.0257948
lunch |   .0005138   .0002127     2.42   0.016     .0000959    .0009317
_cons |   13.61783   .2413169    56.43   0.000     13.14379    14.09187
-------------+----------------------------------------------------------------
sigma_u |  .15491886
sigma_e |  .09996638
rho |  .70602068   (fraction of variance due to u_i)
------------------------------------------------------------------------------

. eststo FE: xtreg lavgsal bs lstaff lenroll lunch, fe

Fixed-effects (within) regression               Number of obs     =      1,848
Group variable: distid                          Number of groups  =        537

R-sq:                                           Obs per group:
within  = 0.5486                                         min =          1
between = 0.3544                                         avg =        3.4
overall = 0.4567                                         max =        162

F(4,1307)         =     397.05
corr(u_i, Xb)  = 0.1433                         Prob > F          =     0.0000

------------------------------------------------------------------------------
lavgsal |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
bs |  -.4948449    .133039    -3.72   0.000    -.7558382   -.2338515
lstaff |  -.6218901   .0167565   -37.11   0.000    -.6547627   -.5890175
lenroll |  -.0515063   .0094004    -5.48   0.000    -.0699478   -.0330648
lunch |   .0005138   .0002088     2.46   0.014     .0001042    .0009234
_cons |   13.61783   .1133406   120.15   0.000     13.39548    13.84018
-------------+----------------------------------------------------------------
sigma_u |  .15491886
sigma_e |  .09996638
rho |  .70602068   (fraction of variance due to u_i)
------------------------------------------------------------------------------
F test that all u_i=0: F(536, 1307) = 7.24                   Prob > F = 0.0000

. estout POLS POLSr RE REr FE FEr, cells(b(nostar fmt(3)) se(par fmt(3))) /*
*/ ti("Table 20.1 Salary-Benefits Trade-off for Michigan Teachers")

Table 20.1 Salary-Benefits Trade-off for Michigan Teachers
------------------------------------------------------------------------------------------
POLS        POLSr           RE          REr           FEr          FE
b/se         b/se         b/se         b/se         b/se         b/se
------------------------------------------------------------------------------------------
bs                 -0.177       -0.177       -0.381       -0.381       -0.495       -0.495
(0.122)      (0.260)      (0.112)      (0.150)      (0.194)      (0.133)
lstaff             -0.691       -0.691       -0.617       -0.617       -0.622       -0.622
(0.018)      (0.035)      (0.015)      (0.036)      (0.043)      (0.017)
lenroll            -0.029       -0.029       -0.025       -0.025       -0.052       -0.052
(0.008)      (0.026)      (0.008)      (0.012)      (0.013)      (0.009)
lunch              -0.001       -0.001        0.000        0.000        0.001        0.001
(0.000)      (0.001)      (0.000)      (0.000)      (0.000)      (0.000)
_cons              13.724       13.724       13.367       13.367       13.618       13.618
(0.112)      (0.256)      (0.098)      (0.197)      (0.241)      (0.113)
------------------------------------------------------------------------------------------

. // Example 20.4 (Effects of Spending on School Performance)

. u "Wooldridge_2E\meap94_98", clear

. eststo FE: xtreg math4 lavgrexp lunch lenrol y95 y96 y97 y98, fe

Fixed-effects (within) regression               Number of obs     =      7,150
Group variable: schid                           Number of groups  =      1,683

R-sq:                                           Obs per group:
within  = 0.3602                                         min =          3
between = 0.0292                                         avg =        4.2
overall = 0.1514                                         max =          5

F(7,5460)         =     439.11
corr(u_i, Xb)  = 0.0073                         Prob > F          =     0.0000

------------------------------------------------------------------------------
math4 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
lavgrexp |   6.288376   2.098685     3.00   0.003     2.174117    10.40264
lunch |  -.0215072   .0312185    -0.69   0.491     -.082708    .0396935
lenrol |  -2.038461   1.791604    -1.14   0.255    -5.550718    1.473797
y95 |    11.6192   .5545233    20.95   0.000     10.53212    12.70629
y96 |   13.05561   .6630948    19.69   0.000     11.75568    14.35554
y97 |   10.14771   .7024067    14.45   0.000     8.770713    11.52471
y98 |   23.41404   .7187237    32.58   0.000     22.00506    24.82303
_cons |   11.84422   22.81097     0.52   0.604    -32.87436     56.5628
-------------+----------------------------------------------------------------
sigma_u |   15.84958
sigma_e |  11.325028
rho |  .66200804   (fraction of variance due to u_i)
------------------------------------------------------------------------------
F test that all u_i=0: F(1682, 5460) = 4.82                  Prob > F = 0.0000

. eststo FEr_sch: xtreg math4 lavgrexp lunch lenrol y95 y96 y97 y98, fe cluster(schid)

Fixed-effects (within) regression               Number of obs     =      7,150
Group variable: schid                           Number of groups  =      1,683

R-sq:                                           Obs per group:
within  = 0.3602                                         min =          3
between = 0.0292                                         avg =        4.2
overall = 0.1514                                         max =          5

F(7,1682)         =     431.08
corr(u_i, Xb)  = 0.0073                         Prob > F          =     0.0000

(Std. Err. adjusted for 1,683 clusters in schid)
------------------------------------------------------------------------------
|               Robust
math4 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
lavgrexp |   6.288376   2.431317     2.59   0.010     1.519651     11.0571
lunch |  -.0215072   .0390732    -0.55   0.582    -.0981445      .05513
lenrol |  -2.038461   1.789094    -1.14   0.255    -5.547545    1.470623
y95 |    11.6192   .5358469    21.68   0.000     10.56821     12.6702
y96 |   13.05561   .6910815    18.89   0.000     11.70014    14.41108
y97 |   10.14771   .7326314    13.85   0.000     8.710745    11.58468
y98 |   23.41404   .7669553    30.53   0.000     21.90975    24.91833
_cons |   11.84422   25.16643     0.47   0.638    -37.51659    61.20503
-------------+----------------------------------------------------------------
sigma_u |   15.84958
sigma_e |  11.325028
rho |  .66200804   (fraction of variance due to u_i)
------------------------------------------------------------------------------

. eststo FEr_dist: xtreg math4 lavgrexp lunch lenrol y95 y96 y97 y98, fe cluster(distid)

Fixed-effects (within) regression               Number of obs      =      7150
Group variable: schid                           Number of groups   =      1683

R-sq:  within  = 0.3602                         Obs per group: min =         3
between = 0.0292                                        avg =       4.2
overall = 0.1514                                        max =         5

F(7,466)           =    259.90
corr(u_i, Xb)  = 0.0073                         Prob > F           =    0.0000

(Std. Err. adjusted for 467 clusters in distid)
------------------------------------------------------------------------------
|               Robust
math4 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
lavgrexp |   6.288376   3.132334     2.01   0.045     .1331271    12.44363
lunch |  -.0215072   .0399206    -0.54   0.590    -.0999539    .0569395
lenrol |  -2.038461   2.098607    -0.97   0.332    -6.162365    2.085443
y95 |    11.6192   .7210398    16.11   0.000     10.20231     13.0361
y96 |   13.05561   .9326851    14.00   0.000     11.22282     14.8884
y97 |   10.14771   .9576417    10.60   0.000      8.26588    12.02954
y98 |   23.41404   1.027313    22.79   0.000      21.3953    25.43278
_cons |   11.84422   32.68429     0.36   0.717    -52.38262    76.07107
-------------+----------------------------------------------------------------
sigma_u |   15.84958
sigma_e |  11.325028
rho |  .66200804   (fraction of variance due to u_i)
------------------------------------------------------------------------------

. estout FE FEr_sch FEr_dist, cells(b(nostar fmt(2)) se(par fmt(2))) /*
*/ ti("Table 20.2 Fixed Effects Estimation of Spending on Test Pass Rates")

Table 20.2 Fixed Effects Estimation of Spending on Test Pass Rates
---------------------------------------------------
FE      FEr_sch     FEr_dist
b/se         b/se         b/se
---------------------------------------------------
lavgrexp             6.29         6.29         6.29
(2.10)       (2.43)       (3.13)
lunch               -0.02        -0.02        -0.02
(0.03)       (0.04)       (0.04)
lenrol              -2.04        -2.04        -2.04
(1.79)       (1.79)       (2.10)
y95                 11.62        11.62        11.62
(0.55)       (0.54)       (0.72)
y96                 13.06        13.06        13.06
(0.66)       (0.69)       (0.93)
y97                 10.15        10.15        10.15
(0.70)       (0.73)       (0.96)
y98                 23.41        23.41        23.41
(0.72)       (0.77)       (1.03)
_cons               11.84        11.84        11.84
(22.81)      (25.17)      (32.68)
---------------------------------------------------

. log close
name:  SN
log:  iiexample20.smcl
log type:  smcl
closed on:  12 May 2020, 20:45:33
------------------------------------------------------------------------------------------
```