- p.34, (2.7):
E(y0−ˆf(x0))2=Var(ˆf(x0))+[Bias(ˆf(x0))]2+Var(ϵ).
Proof: I don't know why. The authors don't prove it in the book The Elements of Statistical Learning (3.22). -
p.79, line -14:
Recall that in simple regression, R2 is the square of the correlation of the response and the variable. In multiple linear regression, it turns out that it equals Cor(Y,ˆY)2,
Proof: I don't know why. -
p.98, (3.37):
hi=1n+(xi−ˉx)2∑ni′=1(xi′−ˉx)2.
Proof: See Anderson, p.707, (14.33) and Casella, p.557, subsec. 11.3.5. -
p.133, sec.4.3.2:
Estimating the regression coefficients β0 and β1 in p(X)=eβ0+β1X1+eβ0+β1X
Proof: See Casella, p.593, subsec.12.3.2. -
p.138, sec.4.4:
Linear Discriminant Analysis
Proof: The Elements of Statistical Learning p.116, subsec.4.3.3 有直觀解釋。 -
p.140, (4.13):
δk(x)=x⋅μkσ2−μ2k2σ2+log(πk)
Proof: 在(4.12) pk(x)=πk1√2πσexp(−12σ2(x−μk)2)∑Kl=1πl1√2πσexp(−12σ2(x−μl)2)中,是 x 固定,k 在變動。因為分母不會隨著 k 變動而變動,所以不用管,分子的 1√2πσ 也是一樣不用管。將分子的 (x−μk)2 展開後的 −x22σ2 也不用管。 -
p.143, (4.19):
δk(x)=xTΣμk−12μTkΣ−1μk+logπk
Proof: 類似(4.13)的證明,但這裡比較不一樣的地方是,因為 xTΣ−1μk 是一個scalar,所以 μTkΣ−1x=xTΣ−1μk -
p.148, fig.4.8:
The true positive rate is the sensitivity: the fraction of defaulters that are correctly identified, using a given threshold value. The false positive rate is 1-specificity: the fraction of non-defaulters that we classify incorrectly as defaulters, using that that same threshold value.
Proof: 想想我們生活中的用語“偽陽性”,就比較容易理解這段話。 -
p.151, (4.24):
log(p1(x)1−p1(x))=log(p1(x)p2(x))=c0+c1x
Proof: c0=lnπ1π2−μ21−μ222σ2, c1=(μ1−μ2)σ2. -
p.187, (5.6):
α=σ2Y−σXYσ2X+σ2Y−2σXY
Proof: By [Casella, p.171, thm.4.5.6], Var(αX+(1−α)Y)=α2Var(X)+(1−α)2Var(Y)+2α(1−α)Cov(X,Y).Express it in terms of α. We have f(α)=2[Var(X)+Var(Y)−2Cov(X,Y)]α2+[−2Var(Y)+2Cov(X,Y)]α+Var(Y).Solve f′(α)=0. Then we have α=σ2Y−σXYσ2X+σ2Y−2σXY. -
p.213, line 20
As an alternative to the approaches just discussed, we can directly estimate the test error using the validation set and cross-validation methods discussed in Chapter 5.
Proof: 這裡並沒有說明清楚該怎麼用cross-validation,下面說明使用方法,主要是參考p.275, line 4。
K-fold cross-validation
在一個模型 M 中,有個值 n 要決定,n1,n2,... 是候選的值。- 在sec.6.1是要決定predictor的個數 p=1 or 2 or 3 or ⋯
- 在sec.7.4是要決定knot的個數 k=1 or 2 or 3 or ⋯(原文用大寫 K ,但這裡避免符號重複,改用小寫 k)。
- 假設 n=n1,把資料分成 K 份 g1,g2,g3,...,gK。
- 取出 g1 放旁邊,用 g2,g3,...,gK train出模型 M(記住,這時候模型是假設 n=n1),然後用 g1 test出誤差 e1。
- 取出 g2 放旁邊,用 g1,g3,...,gK train出模型 M(記住,這時候模型是假設 n=n1),然後用 g2 test出誤差 e2。
- 反覆此步驟
- 假設 n=n2,把資料分成 K 份 g1,g2,g3,...,gK。
- 取出 g1 放旁邊,用 g2,g3,...,gK train出模型 M(記住,這時候模型是假設 n=n2),然後用 g1 test出誤差 e1。
- 取出 g2 放旁邊,用 g1,g3,...,gK train出模型 M(記住,這時候模型是假設 n=n2),然後用 g2 test出誤差 e2。
- 反覆此步驟
- 反覆此步驟
-
p.214, line -1
It may not be immediately obvious why such a constraint should improve the fit, but it turns out that shrinking the coefficient estimates can significantly reduce their variance.
Proof: 參考Bishop的Pattern Recognition and Machine Learning,p.8, table 1.1,當有 M=9 個predictors的時候,estimated coefficients的值變得很大,所以把p.5, (1.2) E(w)=12N∑n=1{y(xn,w)−tn}2改成p.10, (1.4) ˜E(w)=12N∑n=1{y(xn,w)−tn}2(w)+λ2||w||2這樣就可以限制那些estimated coefficient的大小。 -
p.220, (6.8)
One can show that the lasso and ridge regression coefficient estimates solve the problems minimizeβ{n∑i=1(yi−β0−p∑j=1βjxij)2} subject to p∑j=1|βj|≤s
Proof: I don't know why. -
p.231, line 24.
Var(ϕ11×(pop−¯pop)+ϕ21×(ad−¯ad)
Proof: See Jolliffe's Principal Component Analysis Sections 1.1, p.5,但是Hastie是考慮 X−μ=(pop−μpopad−μad)的covariance matrix而不是 X 的covariance matrix。這並沒有差別,因為兩個是一樣的,參考Hogg, IMS, p.141, (2.6.13)及p.143, thm.2.6.3, (2.6.15)。 Cov(X−μ)=E((X−μ)(X−μ)T)=E((X−μ)(XT−μT))=E(XXT−μXT−XμT+μμT)=E(XXT)−μE(XT)−E(X)μT+μμT=E(XXT)−μμT=Cov(X). -
p.232, fig.6.15, left panel.
Proof: 圖中直線的求法可以參考Casella, p.581, subsec.12.2.2或是Jolliffe's Principal Component Analysis p.34, prop.G3. -
p.267, line -5
What is the variance of the fit, i.e. Var(ˆf(x0))?
Proof: See Montgomery's Introduction to Linear Regression Analysis, ch.3. -
p.278, subsec.7.5.2
Choosing the Smoothing Parameter λ
Proof: See Wang's Smoothing Splines Methods and Applications, ch.3. -
p.
bla
Proof: -
p.
bla
Proof: -
p.
bla
Proof: -
p.
bla
Proof: -
p.
bla
Proof:
讀書筆記,Hastie and Tibshirani's An Introduction to Statistical Learning
Subscribe to:
Posts (Atom)
No comments:
Post a Comment