Proofs of some formulas in Anderson's Statistics for Business and Economics
- (9.7) Sample Size for a One-Tailed Hypothesis Test about a Population Mean
- (10.7) Degree of Freedom: t Distribution with Two Independent Random Samples
- (13.12) Test Statistic for the Equality of k Population Means
- (13.14) SSE+SSTR=SST
- (13.16) Fisher's LSD
- (14.11) Relationship among SST, SSR, and SSE
- (14.13) Sample Correlation Coefficient
- (14.15) Mean Square Error (Estimate of σ2)
- (14.19) Test Statistic for t Test for Significance in Simple Linear Regression
- (14.21) Test Statistic for F Test for Significance in Simple Linear Regression
- (14.27) Prediction Interval for y*
- (14.30) Standard Deviation of the ith Residual
- (14.33) Leverage of Observation of i
- (15.9) Adjusted Multiple Coefficient of Determination
- (18.8) Spearman Rank-Correlation Coefficient
- References
(9.7) Sample Size for a One-Tailed Hypothesis Test about a Population Mean
n=(zα+zβ)2σ2(μ0−μa)2
Proof: Note that we don't have to change β in a two-tailed hypothesis.
(10.7) Degree of Freedom: t Distribution with Two Independent Random Samples
d.f.=(s21n1+s22n2)21n1−1(s21n1)2+1n2−1(s22n2)2
Proof: See [Casella, p.409, exe.8.42].
(13.12) Test Statistic for the Equality of k Population Means
F=MSTRMSE
Proof: This proof is from [Hogg, PSI, sec.9.3]. Recall that treatment 1treatment 2⋯treatment kx11x12⋯x1kx21x22⋯x2k⋮⋮⋱⋮xn11xn22⋯xnkk↓↓⋯↓↘¯x1¯x2⋯¯xkˉˉx
(13.14) SST=SSTR+SSE
SST=SSTR+SSE
Proof: Recall that treatment 1treatment 2⋯treatment kx11x12⋯x1kx21x22⋯x2k⋮⋮⋱⋮xn11xn22⋯xnkk↓↓⋯↓↘¯x1¯x2⋯¯xkˉˉx
(13.16) Fisher's LSD
t=¯xi−¯xj√MSE(1ni+1nj)
Proof: See [Casella, sec.11.2].
(14.11) Relationship among SST, SSR, and SSE
SST=SSR+SSE
Proof: Recall that SSE(14.8)=∑(yi−ˆyi)2SST(14.9)=∑(yi−¯y)2SSR(14.10)=∑(ˆyi−¯y)2.
(14.13) Sample Correlation Coefficient
rxy=(sign of b1)√Coefficient of determination=(sign of b1)√r2
Proof: Recall the following formulas sx(3.8), (3.9)=√∑(xi−¯x)2n−1sy(3.8), (3.9)=√∑(yi−¯y)2n−1sxy(3.13)=∑(xi−¯x)(yi−¯y)n−1rxy(3.15)=sxysxsySST(14.9)=∑(yi−¯y)2SSR(14.10)=∑(ˆyi−¯y)2r2(14.12)=SSRSST.
(14.15) Mean Square Error (Estimate of σ2)
s2=MSE=SSEn−2
Proof: See [Casella, p.552, (11.3.29).
(14.19) Test Statistic for t Test for Significance in Simple Linear Regression
t=b1ˆσb1
Proof: By [Casella, p.553, thm.11.3.3], we have b1∼n(β1,σ2∑(xi−¯x)2) and SSEσ2∼χ2(n−2).
(14.21) Test Statistic for F Test for Significance in Simple Linear Regression
F=MSRMSE
Proof: By [Casella, p.553, thm.11.3.3], we have b1∼n(β1,σ2∑(xi−¯x)2) and SSEσ2∼χ2(n−2).
(14.27) Prediction Interval for y*
ˆy∗±tα/2spred=ˆy∗±tα/2s√1+1n+(x∗−¯x)2∑(xi−¯x)2=ˆy∗±tα/2√SSEn−2√1+1n+(x∗−¯x)2∑(xi−¯x)2
Proof: See [Casella, p.559, (11.3.41)].
(14.30) Standard Deviation of the ith Residual
syi−ˆyi=s√1−hi,s(14.16)=√SSEn−2=√∑(yi−ˆyi)n−2,hi=1n+(xi−¯x)2∑(xi−¯x)2
Proof: By [Casella, p.552, (11.3.28), Var(yi−ˆyi)=[n−2n+1Sxx(1nn∑j=1x2j+x2i−2(xi−¯x)2−2xi¯x)]σ2,
(14.33) Leverage of Observation i
hi=1n+(xi−ˉx)2∑(xi−ˉx)2
Proof: See Casella, subsec.11.3.5.
(15.9) Adjusted Multiple Coefficient of Determination
R2a=1−(1−R2)n−1n−p−1
Proof: 這個公式也可以表示成 R2a=1−(1−R2)n−1n−p−1=1−(1−SSRSST)n−1n−p−1=1−SSESST×n−1n−p−1
Hastie在這裡用的符號 d 就是我們的 p,RSS 就是我們的 SSE。
我的解釋是,如果加入更多的independent variables,則 p 會變大,SSE 會變小,這時有兩種情形:
- 如果 SSE 只有變小一點點,那麼 p 的變化對 R2a 的影響比較大,就會讓 R2a 變小,就表示加入這些independent variables是不好的;
- 如果 SSE 變小很多,那麼 SSE 的變化對 R2a 的影響比較大,就會讓 R2a 變大,就表示加入這些independent variables是好的。
(18.8) Spearman Rank-Correlation Coefficient
rs=1−6∑ni=1d2in(n2−1)
課本有誤,誤寫成 n2+1。
Proof: See [Hogg, IMS, p.634, subsec. 10.8.2]. It gives another more instructive expression rS=∑[R(Xi)−n+12][R(Yi)−n+12]n(n2−1)/12
References
- [Casella] Casella and Berger's Statistical Inference
- [Hogg, PSI] Hogg and Tanis's Probability and Statistical Inference
- [Hogg, IMS] Hogg, McKean and Craig's Introduction to Mathematical Statistics
No comments:
Post a Comment