Proofs of some formulas in Anderson's Statistics for Business and Economics

Proofs of some formulas in Anderson's Statistics for Business and Economics

Proofs of some formulas in Anderson's Statistics for Business and Economics

(9.7) Sample Size for a One-Tailed Hypothesis Test about a Population Mean

n=(zα+zβ)2σ2(μ0μa)2

In a two-tailed hypothesis test, use (9.7) with zα/2 replacing zα.

Proof: Note that we don't have to change β in a two-tailed hypothesis.

(10.7) Degree of Freedom: t Distribution with Two Independent Random Samples

d.f.=(s21n1+s22n2)21n11(s21n1)2+1n21(s22n2)2

Proof: See [Casella, p.409, exe.8.42].

(13.12) Test Statistic for the Equality of k Population Means

F=MSTRMSE

Proof: This proof is from [Hogg, PSI, sec.9.3]. Recall that treatment 1treatment 2treatment kx11x12x1kx21x22x2kxn11xn22xnkk¯x1¯x2¯xkˉˉx

Recall the following formulas SSE(13.11)=kj=1nji=1(xij¯xj)2=kj=1(nj1)s2jSSTR(13.8)=kj=1nj(¯xjˉˉx)2SST(13.13)=kj=1nji=1(xijˉˉx)2MSE(13.10)=SSEnkMSTR(13.7)=SSTRk1
At first, SSTn1=kj=1nji=1(xijˉˉx)2n1用全部來算sample variance=S2[Casella, p.218, thm.5.3.1.(c)]SSTσ2=(n1)S2σ2χ2n1
Then nji=1(xij¯xj)2nj1用單行來算sample variance=S2[Casella, p.218, thm.5.3.1.(c)]nji=1(xij¯xj)2σ2=(nj1)S2σ2χ2nj1[Casella, p.219, lem.5.3.2.(b)]SSEσ2=kj=1nji=1(xij¯xj)2σ2χ2(n11)+(n21)++(nk1)=χ2nk[Casella, p.155, thm.4.2.12]SSTRσ2=SSTσ2SSEσ2χ2k1MSTRMSE=SSTR/(k1)SSE/(nk)=(SSTR/σ2)/(k1)(SSE/σ2)/(nk)F(k1,nk)

(13.14) SST=SSTR+SSE

SST=SSTR+SSE

Proof: Recall that treatment 1treatment 2treatment kx11x12x1kx21x22x2kxn11xn22xnkk¯x1¯x2¯xkˉˉx

Recall the following formulas SSE(13.11)=kj=1nji=1(xij¯xj)2=kj=1(nj1)s2jSSTR(13.8)=kj=1nj(¯xjˉˉx)2SST(13.13)=kj=1nji=1(xijˉˉx)2
Then SST=kj=1nji=1(xijˉˉx)2=kj=1nji=1(xij¯xj+¯xjˉˉx)2=kj=1nji=1[(xij¯xj)2+2(xij¯xj)(¯xjˉˉx)+(¯xjˉˉx)2]=kj=1nji=1(xij¯xj)2+2kj=1nji=1(xij¯xj)(¯xjˉˉx)+kj=1nji=1(¯xjˉˉx)2=kj=1nji=1(xij¯xj)2+2kj=1nji=1(xij¯xj)(¯xjˉˉx)+kj=1nj(¯xjˉˉx)2=SSE+2kj=1nji=1(xij¯xj)(¯xjˉˉx)+SSTR
We show that the middle term is zero. Indeed, nji=1(xij¯xj)(¯xjˉˉx)=nji=1xij¯xjnji=1¯x2jnji=1xijˉˉx+nji=1¯xjˉˉx=nj¯x2jnj¯x2jnj¯xjˉˉx+nj¯xjˉˉx=0.

(13.16) Fisher's LSD

t=¯xi¯xjMSE(1ni+1nj)

Proof: See [Casella, sec.11.2].

(14.11) Relationship among SST, SSR, and SSE

SST=SSR+SSE

Proof: Recall that SSE(14.8)=(yiˆyi)2SST(14.9)=(yi¯y)2SSR(14.10)=(ˆyi¯y)2.

Note that SST=(yi¯y)2=(yiˆyi+ˆyi¯y)2=[(yiˆyi)2+2(yiˆyi)(ˆyi¯y)+(ˆyi¯y)2]=(yiˆyi)2+2(yiˆyi)(ˆyi¯y)+(ˆyi¯y)2=SSE+2(yiˆyi)(ˆyi¯y)+SSR.
It suffices to show that (yiˆyi)(ˆyi¯y)=0.
Recall the following formulas ˆyi(14.3)=b0+b1xib1(14.6)=(xi¯x)(yi¯y)(xi¯x)2b0(14.7)=¯yb1¯x.
Thus, ()(yiˆyi)(ˆyi¯y)(14.3)=(yib0b1xi)(b1xi+b0¯y)(14.7)=(yi¯y+b1¯xb1xi)(b1xi+¯yb1¯x¯y)=[(yi¯y)b1(xi¯x)][b1(xi¯x)]=b1(xi¯x)(yi¯y)b21(xi¯x)2(14.6)=0.

(14.13) Sample Correlation Coefficient

rxy=(sign of b1)Coefficient of determination=(sign of b1)r2

Proof: Recall the following formulas sx(3.8), (3.9)=(xi¯x)2n1sy(3.8), (3.9)=(yi¯y)2n1sxy(3.13)=(xi¯x)(yi¯y)n1rxy(3.15)=sxysxsySST(14.9)=(yi¯y)2SSR(14.10)=(ˆyi¯y)2r2(14.12)=SSRSST.

We express SSR and SST in term of sx,sy and sxy. It is easy to see that SST=(n1)s2y.
In addition, SSR=(ˆyi¯y)2See () in this webpage=(ˆyi¯y)2+(yiˆyi)(ˆyi¯y)=(ˆyi¯y)[(ˆyi¯y)+(yiˆyi)]=(ˆyi¯y)(yi¯y)(14.3)=(b0+b1xi¯y)(yi¯y)(14.7)=(b1xib1¯x)(yi¯y)=b1(xi¯x)(yi¯y)()(14.6)=[(xi¯x)(yi¯y)]2(xi¯x)2=(n1)2s2xy(n1)s2x.
Therefore, r2=SSRSST=(n1)2s2xy(n1)s2x(n1)s2y=s2xys2xs2y=r2xy.
This follows that rxy=±r2.
Note that b1=(xi¯x)(yi¯y)(xi¯x)2=(n1)sxy(n1)s2x=sxys2x.
Which means that b1 and sxy have the same sign. On the other hand, sx and sy are nonnegative. So sxysxsy and b1 have the same sign. Then we have rxy=(sign of b1)r2.

(14.15) Mean Square Error (Estimate of σ2)

s2=MSE=SSEn2

Proof: See [Casella, p.552, (11.3.29).

(14.19) Test Statistic for t Test for Significance in Simple Linear Regression

t=b1ˆσb1

Proof: By [Casella, p.553, thm.11.3.3], we have b1n(β1,σ2(xi¯x)2) and SSEσ2χ2(n2).

This follows that b1β1σ/(xi¯x)2n(0,1)
and b1β1σ/(xi¯x)2/SSE/σ2n2ˆσ=SSEn2=b1β1ˆσ/(x1¯x)2=b1β1ˆσb1t(n2)

(14.21) Test Statistic for F Test for Significance in Simple Linear Regression

F=MSRMSE

Proof: By [Casella, p.553, thm.11.3.3], we have b1n(β1,σ2(xi¯x)2) and SSEσ2χ2(n2).

Thus, b1β1σ(xi¯x)2=(b1β1)(xi¯x)2σn(0,1)[Casella, p.53, exa.2.1.9](b1β1)2(xi¯x)2σ2χ2(1)H0:β1=0b21(xi¯x)2σ2χ2(1)(14.6)b1(xi¯x)(yi¯y)σ2χ2(1)See () in this webpageSSRσ2χ2(1)
Therefore, MSRMSE=SSR/1SSE/(n2)=SSRσ2/1SSEσ2/(n2)F(1,n2).
We can also apply the square of a t distribution is a F distribution. See [Casella, p.225, thm.5.3.8]. That is, MSRMSE=SSR/1SSE/(n2)=SSRσ2/1SSEσ2/(n2)=b21(xi¯x)2σ2/SSE/σ2n2=(b1β1σ/(xi¯x)2/SSE/σ2n2)2(14.19)[t(n2)]2=F(1,n2).

(14.27) Prediction Interval for y*

ˆy±tα/2spred=ˆy±tα/2s1+1n+(x¯x)2(xi¯x)2=ˆy±tα/2SSEn21+1n+(x¯x)2(xi¯x)2

where the confidence coefficient is 1α and tα/2 is based on the t distribution with n2 degrees of freedom.

Proof: See [Casella, p.559, (11.3.41)].

(14.30) Standard Deviation of the ith Residual

syiˆyi=s1hi,s(14.16)=SSEn2=(yiˆyi)n2,hi=1n+(xi¯x)2(xi¯x)2

Proof: By [Casella, p.552, (11.3.28), Var(yiˆyi)=[n2n+1Sxx(1nnj=1x2j+x2i2(xi¯x)22xi¯x)]σ2,

where Sxx=ni=1(xi¯x)2, see [Casella, p.541, (11.3.6). By [Anderson, p.667, (14.16)], s is an estimator of σ, so we substitute σ by s. Then we have s2yiˆyi=[n2n+1Sxx(1nnj=1x2j+x2i2(xi¯x)22xi¯x)]s2
To prove syiˆyi=s1hi, it suffices to check that 1hi=11n(xi¯x)2(xi¯x)2=n2n+1Sxx(1nnj=1x2j+x2i2(xi¯x)22xi¯x)

(14.33) Leverage of Observation i

hi=1n+(xiˉx)2(xiˉx)2

Proof: See Casella, subsec.11.3.5.

(15.9) Adjusted Multiple Coefficient of Determination

R2a=1(1R2)n1np1

Proof: 這個公式也可以表示成 R2a=1(1R2)n1np1=1(1SSRSST)n1np1=1SSESST×n1np1

Hastie在An Introduction to Statistical Learning中的解釋是(p.212, line -1):The intuition behind the adjusted R2 is that once all of the correct variables have been included in the model, adding additional noise variables will lead to only a very small decrease in RSS. Since adding noise variables leads to an increase in d, such variables will lead to an increase in RSSnd1, and consequently a decrease in the adjusted R2. Therefore, in theory, the model with the largest adjusted R2 will have only correct variables and no noise variables. Unlike the R2 statistic, the adjusted R2 statistic pays a price for the inclusion of unnecessary variables in the model.

Hastie在這裡用的符號 d 就是我們的 pRSS 就是我們的 SSE

我的解釋是,如果加入更多的independent variables,則 p 會變大,SSE 會變小,這時有兩種情形:

  • 如果 SSE 只有變小一點點,那麼 p 的變化對 R2a 的影響比較大,就會讓 R2a 變小,就表示加入這些independent variables是不好的;
  • 如果 SSE 變小很多,那麼 SSE 的變化對 R2a 的影響比較大,就會讓 R2a 變大,就表示加入這些independent variables是好的。

(18.8) Spearman Rank-Correlation Coefficient

rs=16ni=1d2in(n21)

課本有誤,誤寫成 n2+1

Proof: See [Hogg, IMS, p.634, subsec. 10.8.2]. It gives another more instructive expression rS=[R(Xi)n+12][R(Yi)n+12]n(n21)/12

Note that n+12 and n2112 is the mean and variance of the discrete uniform distribution, respectively. Use R(Xi)2=R(Yi)2=16n(n+1)(2n+1)
and R(Xi)=R(Yi)=n(n+1)2
to show that 16d2in(n21)=16[R(Xi)R(Yi)]2n(n21)
and [R(Xi)n+12][R(Yi)n+12]n(n21)/12
both are equal to 12R(Xi)R(Yi)3n(n+1)2n(n21)

References

  • [Casella] Casella and Berger's Statistical Inference
  • [Hogg, PSI] Hogg and Tanis's Probability and Statistical Inference
  • [Hogg, IMS] Hogg, McKean and Craig's Introduction to Mathematical Statistics

No comments:

Post a Comment