11. CHAPTER 11. STATISTICAL INFERENCE FOR MULTIPLE REGRESSION#
SET UP
library(foreign) # to open stata.dta files
library(psych) # for better sammary of descriptive statistics
library(repr) # to combine graphs with adjustable plot dimensions
options(repr.plot.width = 12, repr.plot.height = 6) # Plot dimensions (in inches)
options(width = 150) # To increase character width of printed output
11.1. 11.1 - 11.4 EXAMPLE: HOUSE PRICE AND CHARACTERISTICS#
df = read.dta(file = "Dataset/AED_HOUSE.DTA")
attach(df)
print(describe(df))
vars n mean sd median trimmed mad min max range skew kurtosis se
price 1 29 253910.34 37390.71 244000 249296.00 22239.00 204000 375000 171000 1.48 2.23 6943.28
size 2 29 1882.76 398.27 1800 1836.00 296.52 1400 3300 1900 1.64 3.29 73.96
bedrooms 3 29 3.79 0.68 4 3.72 0.00 3 6 3 0.92 1.89 0.13
bathrooms 4 29 2.21 0.34 2 2.16 0.00 2 3 1 1.28 0.23 0.06
lotsize 5 29 2.14 0.69 2 2.16 0.00 1 3 2 -0.17 -1.00 0.13
age 6 29 36.41 7.12 35 36.20 5.93 23 51 28 0.44 -0.75 1.32
monthsold 7 29 5.97 1.68 6 6.04 1.48 3 8 5 -0.47 -1.03 0.31
list 8 29 257824.14 40860.26 245000 252364.00 23721.60 199900 386000 186100 1.65 2.70 7587.56
11.1.1. Table 11.2#
Multiple regression
library(jtools)
summ(ols <- lm(price ~ size+bedrooms+bathrooms+lotsize+age+monthsold))
MODEL INFO:
Observations: 29
Dependent Variable: price
Type: OLS linear regression
MODEL FIT:
F(6,22) = 6.83, p = 0.00
R² = 0.65
Adj. R² = 0.56
Standard errors:OLS
--------------------------------------------------------
Est. S.E. t val. p
----------------- ----------- ---------- -------- ------
(Intercept) 137791.07 61464.95 2.24 0.04
size 68.37 15.39 4.44 0.00
bedrooms 2685.32 9192.53 0.29 0.77
bathrooms 6832.88 15721.19 0.43 0.67
lotsize 2303.22 7226.54 0.32 0.75
age -833.04 719.33 -1.16 0.26
monthsold -2088.50 3520.90 -0.59 0.56
--------------------------------------------------------
Confidence intervals
summ(ols, digits=3, confint = TRUE)
MODEL INFO:
Observations: 29
Dependent Variable: price
Type: OLS linear regression
MODEL FIT:
F(6,22) = 6.826, p = 0.000
R² = 0.651
Adj. R² = 0.555
Standard errors:OLS
-------------------------------------------------------------------------
Est. 2.5% 97.5% t val. p
----------------- ------------ ------------ ------------ -------- -------
(Intercept) 137791.066 10320.557 265261.574 2.242 0.035
size 68.369 36.454 100.285 4.443 0.000
bedrooms 2685.315 -16378.816 21749.447 0.292 0.773
bathrooms 6832.880 -25770.876 39436.636 0.435 0.668
lotsize 2303.221 -12683.695 17290.138 0.319 0.753
age -833.039 -2324.847 658.770 -1.158 0.259
monthsold -2088.504 -9390.399 5213.392 -0.593 0.559
-------------------------------------------------------------------------
Confidence interval manually for size
coef=summary(ols)$coefficients["size",1]
sterror=summary(ols)$coefficients["size",2]
df = summary(ols)$df[2]
N = length(price)
cbind(coef,sterror,df)
coef + c(-1,1)*sterror*qt(0.975, df)
coef | sterror | df |
---|---|---|
68.36942 | 15.38947 | 22 |
- 36.4536078247386
- 100.28523012857
Test that beta_size = 50 against beta_size not = 50
coef=summary(ols)$coefficients["size",1]
sterror=summary(ols)$coefficients["size",2]
cbind(coef,sterror)
N = length(size)
t = (coef - 50)/sterror
p = 2*(1-pt(abs(t),N-2))
tcrit = qt(.975,N-2)
print("t statistic, p-value, critical value")
cbind(t, p, tcrit)
coef | sterror |
---|---|
68.36942 | 15.38947 |
[1] "t statistic, p-value, critical value"
t | p | tcrit |
---|---|---|
1.193635 | 0.2430033 | 2.051831 |
11.2. 11.5 JOINT HYPOTHESIS TESTS#
Joint significance of all coefficients
Included in complete OLS output
ols.full = lm(price ~ size+bedrooms+bathrooms+lotsize+age+monthsold)
sum.full = summary(ols.full)
F = sum.full$fstatistic
k = sum.full$df[1]
df = sum.full$df[2]
critval = qf(.95,k-1,df)
pval = 1 - pf(F,k-1,df)
cbind(F[1], pval[1], critval)
critval | |||
---|---|---|---|
value | 6.826098 | 0.0003424253 | 2.549061 |
Joint test of zero coeffs for bedrooms bathrooms lotsize age monthsold
library(car)
print(
linearHypothesis(ols.full,c("bedrooms=0", "bathrooms=0", "lotsize",
"age=0", "monthsold=0"))
)
Loading required package: carData
Attaching package: 'car'
The following object is masked from 'package:psych':
logit
Linear hypothesis test:
bedrooms = 0
bathrooms = 0
lotsize = 0
age = 0
monthsold = 0
Model 1: restricted model
Model 2: price ~ size + bedrooms + bathrooms + lotsize + age + monthsold
Res.Df RSS Df Sum of Sq F Pr(>F)
1 27 1.4975e+10
2 22 1.3679e+10 5 1295703800 0.4168 0.832
11.3. 11.6 F STATISTIC UNDER ASSUMPTIONS 1-4#
Summary after lm saves a number of quantitites
sigma is the root mean squared error (RMSE) = ResSS/(n-k)
r.squared is R2, adj.r.squared is adjusted R2, $fstatistic is F statisic
df saves three things: (k, N-k, k*) where k is effective number of parameters and k*>=k is the number of regressors including possibly nonidentified regressors if there is perfect collinearity
so k = \(df[1] and df = N-k = \)df[2]
sum.full = summary(ols.full)
Compute adjusted R-squared
k = sum.full$df[1]
df = sum.full$df[2]
rsq = sum.full$r.squared
s = sum.full$sigma
print(cbind(k,df,rsq,s))
k df rsq s
[1,] 7 22 0.6505528 24935.73
ResSS = df*s^2 # Since s^2 = ResSS/(N-k) we have ResSS = (N-k)*s^2
TotSS = ResSS / (1 - rsq) # Since Rsq = 1 - (ResSS / TotSS) use TotSS = ResSS / (1 - Rsq)
ExpSS = TotSS - ResSS # And ExpSS = TotSS - ResSS
print(cbind(ResSS, ExpSS, TotSS))
ResSS ExpSS TotSS
[1,] 13679397855 25466429042 39145826897
Foverall = (ExpSS/(k-1)) / (ResSS/df)
critval = qf(.95,k-1,df)
pval = 1 - pf(Foverall[1],k-1,df)
print(cbind(Foverall[1], pval[1], critval))
critval
[1,] 6.826098 0.0003424253 2.549061
Subset test where drop bedrooms bathrooms lotsize age monthsold
Unrestricted model
ols.unrest = lm(price ~ size+bedrooms+bathrooms+lotsize+age+monthsold)
sum.unrest = summary(ols.unrest)
k_u = sum.unrest$df[1]
df_u = sum.unrest$df[2]
s_u = sum.unrest$sigma
ResSS_u = df_u*s_u^2
Restricted model
ols.rest = lm(price ~ size)
sum.rest = summary(ols.rest)
k_r = sum.rest$df[1]
df_r = sum.rest$df[2]
s_r = sum.rest$sigma
ResSS_r = df_r*s_r^2
q = k_u - k_r
Fstat = ((ResSS_r-ResSS_u)/q) / (ResSS_u/df_u)
critval = qf(.95,q,df_u)
pval = 1 - pf(Fstat[1],q,df_u)
cbind(Fstat[1], pval[1], critval)
critval | ||
---|---|---|
0.4167652 | 0.8319759 | 2.661274 |
11.4. 11.7 PRESENTATION OF REGRESSION RESULTS#
The following combines output from two models in a single table It uses packages jtools which also needs huxtable for tables
library(huxtable)
# The two models
ols.full = lm(price ~ size+bedrooms+bathrooms+lotsize+age+monthsold)
summ(ols.full)
ols.small = lm(price ~ size)
summ(ols.small)
MODEL INFO:
Observations: 29
Dependent Variable: price
Type: OLS linear regression
MODEL FIT:
F(6,22) = 6.83, p = 0.00
R² = 0.65
Adj. R² = 0.56
Standard errors:OLS
--------------------------------------------------------
Est. S.E. t val. p
----------------- ----------- ---------- -------- ------
(Intercept) 137791.07 61464.95 2.24 0.04
size 68.37 15.39 4.44 0.00
bedrooms 2685.32 9192.53 0.29 0.77
bathrooms 6832.88 15721.19 0.43 0.67
lotsize 2303.22 7226.54 0.32 0.75
age -833.04 719.33 -1.16 0.26
monthsold -2088.50 3520.90 -0.59 0.56
--------------------------------------------------------
MODEL INFO:
Observations: 29
Dependent Variable: price
Type: OLS linear regression
MODEL FIT:
F(1,27) = 43.58, p = 0.00
R² = 0.62
Adj. R² = 0.60
Standard errors:OLS
--------------------------------------------------------
Est. S.E. t val. p
----------------- ----------- ---------- -------- ------
(Intercept) 115017.28 21489.36 5.35 0.00
size 73.77 11.17 6.60 0.00
--------------------------------------------------------
The following uses default standard errors. For heteroskedastic-robust standard errors add , robust = “HC1”. Default includes standard errors
export_summs(ols.full, ols.small, scale = TRUE)
names | Model 1 | Model 2 | |
---|---|---|---|
<chr> | <chr> | <chr> | |
Model 1 | Model 2 | ||
1 | (Intercept) | 253910.344827586 *** | 253910.344827586 *** |
2 | (4630.44948878858) | (4373.24701673169) | |
3 | size | 27229.6341330319 *** | 29380.9493926936 *** |
4 | (6129.19772656264) | (4450.65562831248) | |
5 | bedrooms | 1812.66731922359 | |
6 | (6205.2273627909) | ||
7 | bathrooms | 2330.99683705654 | |
8 | (5363.19204846579) | ||
9 | lotsize | 1596.20979166114 | |
10 | (5008.2316865653) | ||
11 | age | -5930.38085788559 | |
12 | (5120.92452909128) | ||
13 | monthsold | -3507.31651374698 | |
14 | (5912.79950609896) | ||
1.1 | N | 29 | 29 |
2.1 | R2 | 0.650552844601402 | 0.61745343394899 |
.1 | All continuous predictors are mean-centered and scaled by 1 standard deviation. The outcome variable is in its original units. *** p < 0.001; ** p < 0.01; * p < 0.05. | All continuous predictors are mean-centered and scaled by 1 standard deviation. The outcome variable is in its original units. *** p < 0.001; ** p < 0.01; * p < 0.05. | All continuous predictors are mean-centered and scaled by 1 standard deviation. The outcome variable is in its original units. *** p < 0.001; ** p < 0.01; * p < 0.05. |
This gives t-statistics
export_summs(ols.full, ols.small, scale = TRUE,
error_format = "({statistic})")
names | Model 1 | Model 2 | |
---|---|---|---|
<chr> | <chr> | <chr> | |
Model 1 | Model 2 | ||
1 | (Intercept) | 253910.344827586 *** | 253910.344827586 *** |
2 | (54.8349237892268) | (58.0599138022951) | |
3 | size | 27229.6341330319 *** | 29380.9493926936 *** |
4 | (4.4426098402772) | (6.60148792591122) | |
5 | bedrooms | 1812.66731922359 | |
6 | (0.29211940405167) | ||
7 | bathrooms | 2330.99683705654 | |
8 | (0.434628634587746) | ||
9 | lotsize | 1596.20979166114 | |
10 | (0.318717242244006) | ||
11 | age | -5930.38085788559 | |
12 | (-1.15806839647722) | ||
13 | monthsold | -3507.31651374698 | |
14 | (-0.59317359063659) | ||
1.1 | N | 29 | 29 |
2.1 | R2 | 0.650552844601402 | 0.61745343394899 |
.1 | All continuous predictors are mean-centered and scaled by 1 standard deviation. The outcome variable is in its original units. *** p < 0.001; ** p < 0.01; * p < 0.05. | All continuous predictors are mean-centered and scaled by 1 standard deviation. The outcome variable is in its original units. *** p < 0.001; ** p < 0.01; * p < 0.05. | All continuous predictors are mean-centered and scaled by 1 standard deviation. The outcome variable is in its original units. *** p < 0.001; ** p < 0.01; * p < 0.05. |
This gives t-statistics and p-values
export_summs(ols.full, ols.small, scale = TRUE,
error_format = "({statistic}, p = {p.value})")
names | Model 1 | Model 2 | |
---|---|---|---|
<chr> | <chr> | <chr> | |
Model 1 | Model 2 | ||
1 | (Intercept) | 253910.344827586 *** | 253910.344827586 *** |
2 | (54.8349237892268, p = 0) | (58.0599138022951, p = 0) | |
3 | size | 27229.6341330319 *** | 29380.9493926936 *** |
4 | (4.4426098402772, p = 0.000204637210605776) | (6.60148792591122, p = 4.40875159537941e-07) | |
5 | bedrooms | 1812.66731922359 | |
6 | (0.29211940405167, p = 0.772932358924004) | ||
7 | bathrooms | 2330.99683705654 | |
8 | (0.434628634587746, p = 0.668065373738364) | ||
9 | lotsize | 1596.20979166114 | |
10 | (0.318717242244006, p = 0.752947112842582) | ||
11 | age | -5930.38085788559 | |
12 | (-1.15806839647722, p = 0.259254037652819) | ||
13 | monthsold | -3507.31651374698 | |
14 | (-0.59317359063659, p = 0.55911430746301) | ||
1.1 | N | 29 | 29 |
2.1 | R2 | 0.650552844601402 | 0.61745343394899 |
.1 | All continuous predictors are mean-centered and scaled by 1 standard deviation. The outcome variable is in its original units. *** p < 0.001; ** p < 0.01; * p < 0.05. | All continuous predictors are mean-centered and scaled by 1 standard deviation. The outcome variable is in its original units. *** p < 0.001; ** p < 0.01; * p < 0.05. | All continuous predictors are mean-centered and scaled by 1 standard deviation. The outcome variable is in its original units. *** p < 0.001; ** p < 0.01; * p < 0.05. |
This gives 95% confidence interval
export_summs(ols.full, ols.small, scale = TRUE,
error_format = "[{conf.low}, {conf.high}]")
names | Model 1 | Model 2 | |
---|---|---|---|
<chr> | <chr> | <chr> | |
Model 1 | Model 2 | ||
1 | (Intercept) | 253910.344827586 *** | 253910.344827586 *** |
2 | [244307.380340498, 263513.309314675] | [244937.18314255, 262883.506512623] | |
3 | size | 27229.6341330319 *** | 29380.9493926936 *** |
4 | [14518.456040055, 39940.8122260087] | [20248.9583561773, 38512.9404292099] | |
5 | bedrooms | 1812.66731922359 | |
6 | [-11056.1865886896, 14681.5212271367] | ||
7 | bathrooms | 2330.99683705654 | |
8 | [-8791.58271025368, 13453.5763843668] | ||
9 | lotsize | 1596.20979166114 | |
10 | [-8790.2270209302, 11982.6466042525] | ||
11 | age | -5930.38085788559 | |
12 | [-16550.5283215371, 4689.76660576591] | ||
13 | monthsold | -3507.31651374698 | |
14 | [-15769.7121653618, 8755.07913786788] | ||
1.1 | N | 29 | 29 |
2.1 | R2 | 0.650552844601402 | 0.61745343394899 |
.1 | All continuous predictors are mean-centered and scaled by 1 standard deviation. The outcome variable is in its original units. *** p < 0.001; ** p < 0.01; * p < 0.05. | All continuous predictors are mean-centered and scaled by 1 standard deviation. The outcome variable is in its original units. *** p < 0.001; ** p < 0.01; * p < 0.05. | All continuous predictors are mean-centered and scaled by 1 standard deviation. The outcome variable is in its original units. *** p < 0.001; ** p < 0.01; * p < 0.05. |