標準差為什麼要這樣定義

標準差為什麼要這樣定義

標準差為什麼要這樣定義?

我們學的標準差定義為\(\sqrt{\frac{\sum_{i=1}^{n}(x_i-\mu)^2}{n}}\),書上說這個數值刻劃了資料的離散程度,但既然要刻劃數值的離散程度,\(\frac{\sum_{i=1}^{n}|x_i-\mu|}{n}\)不是更直觀嗎?這個問題我也想了很久,目前比較可以接受的答案如下。

其實兩個公式各有各自的名稱。

Standard Deviation\(\sqrt{\frac{\sum_{i=1}^{n}(x_i-\mu)^2}{n}}\)
Mean Absolute Deviation or mean deviation\(\frac{\sum_{i=1}^{n}|x_i-\mu|}{n}\)

定義前者為"標準"差的理由是,在常用的幾個distribution之下,Standard Deviation的公式比較漂亮,Mean Absolute Deviation的公式比較複雜。我把它們列成表格比較。

Discrete P.M.F. Explanation Standard Deviation Mean Absolute Deviation
Uniform \(f(x)=\frac{1}{m}\),
\(x=1, 2, ..., m\)
\(m\) 顆球中取到 \(x\) 號的機率 \(\sqrt{\frac{m^2-1}{12}}\) \(\left\{\begin{array}{ll}\frac{m}{4}&\text{for }m\text{ even}\\\frac{(m-1)(m+1)}{4m}&\text{for }m\text{ odd}\\\end{array}\right.\)
Bernoulli \(f(x)=p^x(1-p)^{1-x}\),
\(x=0, 1\)
投 \(1\) 次硬幣,出現 \(x\) 次正面的機率 \(\sqrt{p(1-p)}\) \(2p(1-p)\)
Binomial \(f(x)={n\choose x}p^x(1-p)^{n-x}\),
\(x=0, 1, 2, ..., n\)
投 \(n\) 次硬幣,出現 \(x\) 次正面的機率 \(\sqrt{np(1-p)}\) \(2(1-p)^{n-\lfloor np\rfloor}p^{\lfloor np\rfloor+1}(\lfloor np\rfloor+1){n\choose \lfloor np\rfloor+1}\)
Geometric \(f(x)=(1-p)^{x-1}p\),
\(x=1, 2, 3, ...\)
經歷 \(x-1\) 次失敗,在第 \(x\) 次成功的機率 \(\sqrt{\frac{1-p}{p^2}}\) \(2(1-p)^{\lfloor 1/p \rfloor}\lfloor \frac{1}{p} \rfloor\)
Poisson \(f(x)=\frac{\lambda^x e^{-\lambda}}{x!}\),
\(x=0, 1, 2, ...\)
在某段長度為 \(L\) 的時間內,有 \(x\) 次電話來電的機率,\(\lambda\) 為在時間 \(L\) 中,平均來電的次數,\(\lambda>0\) \(\sqrt{\lambda}\) \(\frac{2e^{-\lambda}\lambda^{\lfloor \lambda \rfloor +1}}{\lfloor \lambda \rfloor!}\)

Continuous P.D.F. Explanation Standard Deviation Mean Absolute Deviation
Uniform \(f(x)=\frac{1}{b-a}\),
\(a\leq x\leq b\)
在 \([a, b]\) 區間選一點的機率,直覺上選一點的機率應該是 \(0\),但其實是由C.D.F. 微分推過來的。 \(\frac{b-a}{\sqrt{12}}\) \(\frac{1}{4}(b-a)\)
Exponential \(f(x)=\frac{1}{\theta}e^{-x/\theta}\),
\(0\leq x<\infty\)
第 \(1\) 次來電的等待時間為 \(x\) 的機率。\(\lambda\) 為單位時間中,平均來電的次數。注意與Poisson中 \(\lambda\) 的意義不同。
\(\theta=\frac{1}{\lambda}\)
\(\theta\) \(\frac{2\theta}{e}\)
Gamma \(f(x)=\frac{1}{\Gamma(\alpha)\theta^{\alpha}}x^{\alpha-1}e^{-x/\theta}\),
\(0<x<\infty\)
第 \(\alpha\) 次來電的等待時間為 \(x\) 的機率。\(\lambda\) 為單位時間中,平均來電的次數。
\(\theta=\frac{1}{\lambda}\)
\(\sqrt{\alpha}\theta\) ???
Chi-Square \(f(x)=\frac{1}{\Gamma(r/2)2^{r/2}}x^{r/2-1}e^{-x/2}\),
\(0<x<\infty\)
Gamma分配中,\(\theta=2, \alpha=\frac{r}{2}\)
\(r=1, 2, ...\)
\(\sqrt{2r}\) ???
Normal \(f(x)=\frac{1}{\sigma \sqrt{2\pi}}e^{-(x-\mu)^2/(2\sigma^2)}\),
\(-\infty<x<\infty\)
  \(\sigma\) \(\sqrt{\frac{2}{\pi}}\sigma\)
Beta \(f(x)=\frac{\Gamma(\alpha+\beta)}{\Gamma(\alpha)\Gamma(\beta)}x^{\alpha-1}(1-x)^{\beta-1}\),
\(0<x<1\)
  \(\sqrt{\frac{\alpha\beta}{(\alpha+\beta+1)(\alpha+\beta)^2}}\) ???

Mean Absolute Deviation的公式是從這裡Wolfram Math World看來的。其他公式主要是參考Hogg跟Tanis的Probability and Statistical Inference。

來看幾個簡單的例子

數據
\(x_1, x_2, ..., x_n\)
範例1
\(-3, 0, 3\)
範例2
\(0, 0, 0\)
算術平均數 \(\mu\)\(\frac{x_1+x_2+\cdots+x_n}{n}\)\(\frac{-3+0+3}{3}=0\)\(\frac{0+0+0}{3}=0\)
離均距平均
mean absolute deviation
or mean deviation
\(\frac{|x_1-\mu|+|x_2-\mu|+\cdots+|x_n-\mu|}{n}\)\(\frac{|-3-0|+|0-0|+|3-0|}{3}=2\)\(\frac{|0-0|+|0-0|+|0-0|}{0}=0\)
標準差 \(\sigma\)\(\sqrt{\frac{(x_1-\mu)^2+(x_2-\mu)^2+\cdots+(x_n-\mu)^2}{n}}\)\(\sqrt{\frac{(-3-0)^2+(0-0)^2+(3-0)^2}{3}}=\sqrt{6}\approx 2.4\)\(\sqrt{\frac{(0-0)^2+(0-0)^2+(0-0)^2}{3}}=0\)

為何使用standard deviation而不用mean absolute deviation可以參考Kenney's Mathematics of Statistics Part One

The absolute value of a variable \(x'\), denoted by the symbol \(|x'|\), is not very tractable in mathematical operations. Therefore the mean deviation is not favored by mathematician since it is unwieldy in the more theoretical and mathematical discussions. Its chief use is in experimental work where occasional large and erratic deviations are liable to occur. In such cases the standard deviation would tend to emphasize these deviations.

上面的論點是說,standard deviation在理論上有其優點。筆者以前會認為standard deviation跟mean absolute deviation可能會差太多,導致如果用stadard deviation會失真(在這裡的"真"就是指mean absolution deviation),不過下面的論點說明其實兩者在大部分情況下都會差一個倍數而已,所以standard deviation也可以忠實地反應離散程度(也就是mean absolution deviation)。

For a common type of distribution, the standard deviation is approximately twenty-five percent greater than the mean deviation. Speaking more accurately, this is true of a normal distribution (to be considered in Chatper VI) for which the relation is \(\text{MD}=\frac{4}{5}\sigma\) (approximately).

更多理由

No comments:

Post a Comment