A Proof of Pearson's Chi-Square Goodness-of-Fit Test
- [Hogg, PSI, p.416] To generalize, we let an experiment have \(k\) (instead of only two) mutually exclusive and exhaustive outcomes, say, \(A_1, A_2, ..., A_k\). Let \(p_i=P(A_i)\), and thus \(\sum_{i=1}^{k}p_i=1\). The experiment is repeated \(n\) independent times, and we let \(Y_i\) represent the number of times the experiment results in \(A_i\), \(i=1, 2, ..., k\). This joint distribution of \(Y_1, Y_2, ..., Y_{k−1}\) is a straightforward generalization of the binomial distribution, as follows.
- [Hogg, PSI, p.416] In considering the joint pmf, we see that \[ f(y_1, y_2, ..., y_{k-1})=P(Y_1=y_1, Y_2=y_2, ..., Y_{k-1}=y_{k-1}), \] where \(y_1, y_2, ..., y_{k-1}\) are nonnegative integers such that \(y_1+y_2+\cdots+y_{k-1}\leq n\). Note that we do not need to consider \(Y_k\), since, once the other \(k-1\) random variables are observed to equal \(y_1, y_2, ..., y_{k-1}\), respectively, we know that \[ Y_k=n-y_1-y_2-\cdots-y_{k-1}=y_k, \text{ say}. \] From the independence of the trials, the probability of each particular arrangement of \(y_1\) \(A_1\)s, \(y_2\) \(A_2\)s, ..., \(y_k\) \(A_k\)s is \[ p_1^{y_1}p_2^{y_2}\cdots p_k^{y_k}. \] The number of such arrangements is the multinomial coefficient \[ {n\choose y_1, y_2, ..., y_k}=\frac{n!}{y_1! y_2!\cdots y_k!}. \] Hence, the product of these two expressions gives the joint pmf of \(Y_1, Y_2, ..., Y_{k-1}\): \[ f(y_1, y_2, ..., y_{k-1})=\frac{n!}{y_1! y_2!\cdots y_k!}p_1^{y_1}p_2^{y_2}\cdots p_k^{y_k}. \] (Recall that \(y_k=n-y_1-y_2-\cdots-y_{k-1}\).)
- [Hogg, PSI, p.416] Pearson then constructed an expression similar to \(Q_1\) (Equation 9.1-1), which involves \(Y_1\) and \(Y_2=n-Y_1\), that we denote by \(Q_{k-1}\), which involves \(Y_1, Y_2, ..., Y_{k-1}\), and \(Y_k=n-Y_1-Y_2-\cdots -Y_{k-1}\), namely, \[ Q_{k-1}=\sum_{i=1}^{k}\frac{(Y_i-np_i)^2}{np_i}. \] He argued that \(Q_{k-1}\) has an approximate chi-square distribution with \(k-1\) degree of freedom in much the same way we argued that \(Q_1\) is approximately \(\chi^2(1)\). We accept Pearson's conclusion, as the proof is beyond the level of this text.
Part I
- We need some preliminaries.
- [Hogg, IMS, p.140] In Section 2.5 we discussed the covariance between two random variables. In this section we want to extend this discussion to the \(n\)-variate case. Let \(\mathbf{X}=(X_1, ..., X_n)'\) be an \(n\)-dimensional random vector. Recall that we defined \(\text{E}(\mathbf{X})=(\text{E}(X_1), ..., \text{E}(X_n))'\), that is, the expectation of a random vector is just the vector of the expectations of its components.
- [Hogg, IMS, p.140] Now suppose \(\mathbf{W}\) is an \(m\times n\) matrix of random variables, say, \(\mathbf{W}=[W_{ij}]\) for the random variables \(W_{ij}\), \(1\leq i\leq m\) and \(1\leq j\leq n\). Note that we can always string out the matrix into an \(mn\times 1\) random vector. Hence, we define the expectation of a random matrix \[ \text{E}[\mathbf{W}]=[\text{E}(W_{ij})]. \tag{2.6.10} \] As the following theorem shows, the linearity of the expectation operator easily follows from this definition:
- [Hogg, IMS, p.141] Let \(\mathbf{X}=(X_1, ..., X_n)'\) be an \(n\)-dimensional random vector, such that \(\sigma_i^2=\text{Var}(X_i)\lt \infty\). The mean of \(\mathbf{X}\) is \(\mathbf{\mu}=\text{E}[\mathbf{X}]\) and we define its variance-covariance matrix as \[ \text{Cov}(\mathbf{X})=\text{E}[(\mathbf{X}-\mathbf{\mu})(\mathbf{X}-\mathbf{\mu})']=[\sigma_{ij}], \tag{2.6.13} \] where \(\sigma_{ii}\) denotes \(\sigma_i^2\). As Exercise 2.6.8 shows, the \(i\)th diagonal entry of \(\text{Cov}(\mathbf{X})\) is \(\sigma_i^2=\text{Var}(X_i)\) and the \((i, j)\)th off diagonal entry is \(\text{Cov}(X_i, X_j)\).
- [Hogg, IMS, p.350] As another simple application, consider the multivariate analog of the sample mean and sample variance. Let \(\{\mathbf{X}_n\}\) be a sequence of iid random vectors with common mean vector \(\mathbf{\mu}\) and variance-covariance matrix \(\Sigma\). Denote the vector of means by \[ \overline{\mathbf{X}}_n=\frac{1}{n}\sum_{i=1}^{n}\mathbf{X}_i. \tag{5.4.5} \] Of course, \(\overline{\mathbf{X}}_n\) is just the vector of sample means, \((\overline{X}_1, ..., \overline{X}_p)'\). By the Weak Law of Large Numbers. Theorem 5.1.1, \(\overline{X}_j\to \mu_j\), in probability, for each \(j\). Hence, by Theorem 5.4.1, \(\overline{\mathbf{X}}_n\to \mathbf{\mu}\), in probability.
- 這張圖可以幫助你理解 \[ \begin{array}{cccc|ccc} \mathbf{X}_1 & \mathbf{X}_2 & \cdots & \mathbf{X}_n & \overline{\mathbf{X}}_n & \to & \mathbf{\mu} \\ || & || & \cdots & || & || & & || \\ \begin{bmatrix}X_{11}\\X_{21}\\\vdots\\X_{p1}\end{bmatrix} & \begin{bmatrix}X_{12}\\X_{22}\\\vdots\\X_{p2}\end{bmatrix} & \cdots & \begin{bmatrix}X_{1n}\\X_{2n}\\\vdots\\X_{pn}\end{bmatrix} & \begin{bmatrix}\overline{X}_1\\\overline{X}_2\\\vdots\\\overline{X}_p\end{bmatrix} & \begin{matrix}\to\\\to\\\vdots\\\to\end{matrix} & \begin{bmatrix}\mu_1\\\mu_2\\\vdots\\\mu_p\end{bmatrix} \end{array} \] 這裡的 \(\overline{X}_k\) 定義成 \(\overline{X}_k=\frac{\sum_{l=1}^{n}X_{kl}}{n}\),我覺得寫成 \(\overline{X}_{k\Box}\) 會更一目瞭然。 這裡所說的common mean vector \(\mathbf{\mu}\) and variance-covariance matrix \(\Sigma\) 是指 \[ \text{E}(\mathbf{X}_1)=\text{E}(\mathbf{X}_2)=\cdots =\mathbf{\mu} \] 也就是說, \[ \text{E}(X_{i1})=\text{E}(X_{i2})=\cdots=\mu_i \text{ for }i=1, 2, ... \] 及 \[ \text{Cov}(\mathbf{X}_1)=\text{Cov}(\mathbf{X}_2)=\cdots =\Sigma \]
Part II
- Let \(\mathbf{X}_1, \mathbf{X}_2, ...\) be a sequence of independent and identically distributed \(k\)-variate random vectors. For each \(i\), \(\mathbf{X}_i\) has a multinomial distribution \(\text{multinomial}(1, \mathbf{p})\), where \[ \mathbf{p}=\begin{bmatrix} p_1\\ p_2\\ \vdots\\ p_k \end{bmatrix} \]
- Each \(X_{ij}\) is either \(0\) or \(1\). As the following figure shows. Each column has exactly one \(1\) and \(0\) elsewhere. \[ \begin{array}{ccccc} \mathbf{X}_1 & \mathbf{X}_2 & \cdots & \mathbf{X}_n & \\ || & || & \cdots & || & \\ \begin{bmatrix}X_{11}\\X_{21}\\\vdots\\X_{k1}\end{bmatrix} & \begin{bmatrix}X_{12}\\X_{22}\\\vdots\\X_{k2}\end{bmatrix} & \cdots & \begin{bmatrix}X_{1n}\\X_{2n}\\\vdots\\X_{kn}\end{bmatrix} & \begin{matrix} \sum_{j=1}^{n}X_{1j}=n\overline{X}_1=n\overline{X}_{1\Box}=Y_1\\\sum_{j=1}^{n}X_{2j}=n\overline{X}_2=n\overline{X}_{2\Box}=Y_2\\\vdots\\\sum_{j=1}^{n}X_{kj}=n\overline{X}_k=n\overline{X}_{k\Box}=Y_k\end{matrix} \end{array} \] Recall that \(Y_i\) represent the number of times the experiment results in \(A_i\), \(i=1, 2, ..., k\).
- By [Casella, p.182, line 5], \(X_{ij}\sim \text{B}(1, p_i)\). So \[ \text{E}(X_{ij})=p_i\text{ and }\text{Var}(X_{ij})=p_i(1-p_i). \] By [Casella, p.182, line -10], \[ \text{Cov}(X_{kj}, X_{lj})=-p_k p_l. \] Thus, \[ \text{E}(\mathbf{X}_1)=\text{E}(\mathbf{X}_2)=\cdots=\mathbf{p} \] and variance-covariance matrix of \(\mathbf{X}_1, \mathbf{X}_2, ...\) is \[ \text{Cov}(\mathbf{X}_1)=\text{Cov}(\mathbf{X}_2)=\cdots=\Sigma= \begin{bmatrix} p_1(1-p_1) & -p_1 p_2 & \cdots & -p_1 p_k \\ -p_2 p_1 & p_2(1-p_2) & \cdots & -p_2 p_k \\ \vdots & \vdots & \ddots & \vdots \\ -p_k p_1 & -p_k p_2 & \cdots & p_k(1-p_k) \end{bmatrix} \] Note that the sum of every entries in any column or row of \(\Sigma\) is \(0\). Hence, \(\det{\Sigma}=0\) and \(\Sigma\) is not invertible.
Part III
- For each \(j=1, 2, ...\), consider \[ \mathbf{Z}_j= \begin{bmatrix} X_{1j}\\ X_{2j}\\ \vdots\\ X_{k-1, j} \end{bmatrix} \] Check that \[\text{E}(\mathbf{Z}_1)=\text{E}(\mathbf{Z}_2)=\cdots =\mathbf{p}^*= \begin{bmatrix} p_1\\ p_2\\ \vdots\\ p_{k-1} \end{bmatrix} \tag{*} \] and the variance-covariance matrix of \(\mathbf{Z}_1, \mathbf{Z}_2, ...\) is \[ \text{Cov}(\mathbf{Z}_1)=\text{Cov}(\mathbf{Z}_2)=\cdots =\Sigma^*= \begin{bmatrix} p_1(1-p_1) & -p_1 p_2 & \cdots & -p_1 p_{k-1} \\ -p_2 p_1 & p_2(1-p_2) & \cdots & -p_2 p_{k-1} \\ \vdots & \vdots & \ddots & \vdots \\ -p_{k-1} p_1 & -p_{k-1} p_2 & \cdots & p_{k-1}(1-p_{k-1}) \end{bmatrix} \tag{**} \] That is, the upper-left \((k-1)\times (k-1)\) submatrix of \(\Sigma\).
- Note that \[ \begin{array}{lll} \Sigma^* &=& \begin{bmatrix} p_1(1-p_1) & -p_1 p_2 & \cdots & -p_1 p_{k-1} \\ -p_2 p_1 & p_2(1-p_2) & \cdots & -p_2 p_{k-1} \\ \vdots & \vdots & \ddots & \vdots \\ -p_{k-1} p_1 & -p_{k-1} p_2 & \cdots & p_{k-1}(1-p_{k-1}) \end{bmatrix} \\ &=& \begin{bmatrix} p_1 \\ & p_2 \\ & & \ddots \\ & & & p_{k-1} \end{bmatrix} - \begin{bmatrix} p_1\\ p_2\\ \vdots\\ p_{k-1} \end{bmatrix} \begin{bmatrix} p_1 & p_2 & \cdots & p_{k-1} \end{bmatrix}\\ &=& \text{diag}(p_1, p_2, ..., p_{k-1}) -\mathbf{p^*}(\mathbf{p}^*)^T \end{array}. \] Now, \(\Sigma^*\) is invertible and its inverse is \[ \begin{array}{lll} (\Sigma^*)^{-1} &=& \begin{bmatrix} \frac{1}{p_1}+\frac{1}{p_k} & \frac{1}{p_k} & \cdots & \frac{1}{p_k} \\ \frac{1}{p_k} & \frac{1}{p_2}+\frac{1}{p_k} & \cdots & \frac{1}{p_k} \\ \vdots & \vdots & \ddots & \vdots \\ \frac{1}{p_k} & \frac{1}{p_k} & \cdots & \frac{1}{p_{k-1}}+\frac{1}{p_k} \end{bmatrix} \\ &=& \begin{bmatrix} \frac{1}{p_1} \\ & \frac{1}{p_2} \\ & & \ddots \\ & & & \frac{1}{p_{k-1}} \end{bmatrix} +\frac{1}{p_k} \begin{bmatrix} 1&1&\cdots&1\\ 1&1&\cdots&1\\ \vdots&\vdots&\ddots&\vdots\\ 1&1&\cdots&1 \end{bmatrix}\\ &=& \text{diag}(\frac{1}{p_1}, \frac{1}{p_2}, ..., \frac{1}{p_{k-1}})+\frac{1}{p_k}\mathbf{1}\mathbf{1}^T, \end{array} \tag{***} \] where \[ \mathbf{1}= \begin{bmatrix} 1\\ 1\\ \vdots\\ 1 \end{bmatrix}. \] It is easy to check that it is indeed the inverse by using that \[ \begin{array}{rcl} (\mathbf{p}^*)^T\text{diag}(\frac{1}{p_1}, \frac{1}{p_2}, ..., \frac{1}{p_{k-1}}) &=& \mathbf{1}^T, \\\text{diag}(p_1, p_2, ..., p_{k-1})\mathbf{1}&=& \mathbf{p}^*, \\ (\mathbf{p}^*)^T \mathbf{1} &=& p_1+p_2+\cdots+p_{k-1}=1-p_k. \end{array} \]
Part IV
- [Hogg, IMS, p.351, def.5.4.2] Let \(\{\mathbf{X}_n\}\) be a sequence of random vectors with \(\mathbf{X}_n\) having distribution function \(F_n(\mathbf{x})\) and \(\mathbf{X}\) be a random vector with distribution function \(F(\mathbf{x})\). Then \(\{\mathbf{X}_n\}\) converges in distribution to \(\mathbf{X}\) if \[ \lim_{n\to \infty} F_n(\mathbf{x})=F(\mathbf{x}), \tag{5.4.8} \] for all points \(\mathbf{x}\) at which \(F(\mathbf{x})\) is continuous. We write \(\mathbf{X}_n\stackrel{D}{\to}\mathbf{X}\).
- [Hogg, IMS, p.351, thm.5.4.4] (Multivariate Central Limit Theorem). Let \(\{\mathbf{X}_n\}\) be a sequence of iid random vectors with common mean vector \(\mathbf{\mu}\) and variance-covariance matrix \(\Sigma\) which is positive definite. Assume that the common moment generating function \(M(\mathbf{t})\) exists in an open neighborhood of \(\mathbf{0}\). Let \[ \mathbf{Y}_n=\frac{1}{\sqrt{n}}\sum_{i=1}^{n}(\mathbf{X}_i-\mathbf{\mu})=\sqrt{n}(\overline{\mathbf{X}}-\mathbf{\mu}). \] Then \(\mathbf{Y}_n\) converges in distribution to a \(\text{N}_p(\mathbf{0}, \Sigma)\) distribution.
- Consider that \[ \begin{array}{rcl} \overline{\mathbf{Z}}_n &=& \frac{\mathbf{Z}_1+\mathbf{Z}_2+\cdots+\mathbf{Z}_n}{n} \\ &=& \begin{bmatrix} \frac{X_{11}+X_{12}+\cdots+X_{1n}}{n}\\ \frac{X_{21}+X_{22}+\cdots+X_{2n}}{n}\\ \vdots\\ \frac{X_{k-1, 1}+X_{k-1, 2}+\cdots+X_{k-1, n}}{n} \end{bmatrix} \\ &=& \begin{bmatrix} \overline{X}_{1\Box}\\ \overline{X}_{2\Box}\\ \vdots\\ \overline{X}_{k-1,\Box} \end{bmatrix} \end{array} \] By (*), (**) and the Multivariate Central Limit Theorem, \[ \sqrt{n}(\overline{\mathbf{Z}}_n-\mathbf{p}^*)\stackrel{d}{\to}\text{N}_{k-1}(\mathbf{0}, \Sigma^*). \tag{****} \]
- [Hogg, IMS, p.202, thm.3.5.1] Suppose \(\mathbf{X}\) has a \(\text{N}_n(\mathbf{\mu}, \Sigma)\) distribution, where \(\Sigma\) is positive definite. Then the random variable \(Y=(\mathbf{X}-\mathbf{\mu})'\Sigma^{-1}(\mathbf{X}-\mathbf{\mu})\) has a \(\chi^2(n)\) distribution.
- \((\mathbf{X}-\mathbf{\mu})'\) means the transpose of \(\mathbf{X}-\mathbf{\mu}\). That is, \[ (\mathbf{X}-\mathbf{\mu})'=(\mathbf{X}-\mathbf{\mu})^T \]
- By [Hogg, IMS, p.202, thm.3.5.1], (\(\mathbf{X}=\sqrt{n}(\overline{\mathbf{Z}}_n-\mathbf{p}^*)\) and \(\mathbf{\mu}=\mathbf{0}\)), \[ \sqrt{n}(\overline{\mathbf{Z}}_n-\mathbf{p}^*)^T(\Sigma^*)^{-1}\sqrt{n}(\overline{\mathbf{Z}}_n-\mathbf{p}^*) \] has a \(\chi^2(k-1)\) distribution.
Part V
- The final step is to show that \[ \sqrt{n}(\overline{\mathbf{Z}}_n-\mathbf{p}^*)^T(\Sigma^*)^{-1}\sqrt{n}(\overline{\mathbf{Z}}_n-\mathbf{p}^*) =\sum_{i=1}^{k}\frac{(Y_i-np_i)^2}{np_i} \] Note that \[ \overline{\mathbf{Z}}_n-\mathbf{p}^*= \begin{bmatrix} \overline{X}_{1\Box}-p_1\\ \overline{X}_{2\Box}-p_2\\ \vdots\\ \overline{X}_{k-1, \Box}-p_{k-1} \end{bmatrix} \] Therefore, \[ \begin{array}{cl} & \sqrt{n}(\overline{\mathbf{Z}}_n-\mathbf{p}^*)^T(\Sigma^*)^{-1}\sqrt{n}(\overline{\mathbf{Z}}_n-\mathbf{p}^*) \\ \stackrel{\text{(***)}}{=}& n(\overline{\mathbf{Z}}_n-\mathbf{p}^*)^T(\Sigma^*)^{-1}(\overline{\mathbf{Z}}_n-\mathbf{p}^*) \\ =& n(\overline{\mathbf{Z}}_n-\mathbf{p}^*)^T\left(\text{diag}(\frac{1}{p_1}, \frac{1}{p_2}, ..., \frac{1}{p_{k-1}})+\frac{1}{p_k}\mathbf{1}\mathbf{1}^T\right)(\overline{\mathbf{Z}}_n-\mathbf{p}^*) \\ =& n\left[(\overline{\mathbf{Z}}_n-\mathbf{p}^*)^T\text{diag}(\frac{1}{p_1}, \frac{1}{p_2}, ..., \frac{1}{p_{k-1}})(\overline{\mathbf{Z}}_n-\mathbf{p}^*) +\frac{1}{p_k}(\overline{\mathbf{Z}}_n-\mathbf{p}^*)^T\mathbf{1}\mathbf{1}^T(\overline{\mathbf{Z}}_n-\mathbf{p}^*)\right] \\ =& n\left[\sum_{i=1}^{k-1}\frac{(\overline{X}_{i\Box}-p_i)^2}{p_i} +\frac{1}{p_k}\sum_{i, j=1}^{k-1}(\overline{X}_{i\Box}-p_i)(\overline{X}_{j\Box}-p_j)\right] \\ =& n\left\{\sum_{i=1}^{k-1}\frac{(\overline{X}_{i\Box}-p_i)^2}{p_i} +\frac{1}{p_k}\left[\sum_{i=1}^{k-1}(\overline{X}_{i\Box}-p_i)\right]^2\right\} \\ =& \sum_{i=1}^{k-1}\frac{(n\overline{X}_{i\Box}-np_i)^2}{np_i}+\frac{\left[\sum_{i=1}^{k-1}(n\overline{X}_{i\Box}-np_i)\right]^2}{np_k} \\ \stackrel{\sum_{i=1}^{k}n\overline{X}_{i\Box}=\sum_{i=1}^{k}Y_i=n}{=}& \sum_{i=1}^{k-1}\frac{(n\overline{X}_{i\Box}-np_i)^2}{np_i}+\frac{\left[(n-n\overline{X}_{k\Box})-n(1-p_k)\right]^2}{np_k} \\ =& \sum_{i=1}^{k}\frac{(n\overline{X}_{i\Box}-np_i)^2}{np_i} \\ =& \sum_{i=1}^{k}\frac{(Y_i-np_i)^2}{np_i} \end{array} \]
Another Incomplete Proof
下面是另一個不完整的證法,主要是參考這裡,這個證法看起來不需要用到Multivariate Normal Distribution,但最後一步要證明sum of dependent normal random variables is still normal的時候(標記三個問號???的地方),就不可避免地要用到Multivariate Normal,參考這裡。
- By [Casella, p.182, line 5], \(Y_i\sim \text{B}(n, p_i)\). Then by the Central Limit Theorem or De-Moivre Laplace Theorem, \(Y_i\sim \text{N}(np_i, np_i(1-p_i))\).
- Let \(X_i=\frac{Y_i-np_i}{\sqrt{np_i}}\). Then \[ X_i=\frac{Y_i-np_i}{\sqrt{np_i}}=\frac{Y_i-np_i}{\sqrt{np_i(1-p_i)}}\sqrt{1-p_i}\stackrel{\text{Central Limit Theorem}}{\sim}\sqrt{1-p_i}\text{N}(0, 1). \] By [Casella, p.184, cor.4.6.10], \(X_i\sim \text{N}(0, 1-p_i)\). This follows that \[ \text{Var}(X_i)=1-p_i \tag{i} \]
- If \(i\neq j\), then \[ \begin{array}{rcl} \text{Cov}(X_i, X_j) &=& \text{Cov}(\frac{Y_i-np_i}{\sqrt{np_i}}, \frac{Y_j-np_j}{\sqrt{np_j}}) \\ &\stackrel{\text{Cov}(aX+b, cY+d)=ac\text{Cov}(X, Y)}{=}& \frac{1}{\sqrt{np_i}}\cdot \frac{1}{\sqrt{np_j}}\cdot\text{Cov}(Y_i, Y_j) \\ &\stackrel{\text{[Casella, p.182, line -10]}}{=}& \frac{1}{\sqrt{np_i}}\cdot \frac{1}{\sqrt{np_j}}\cdot (-np_i p_j) \\ &=& -\sqrt{p_i p_j} \end{array} \tag{ii} \]
- By (i) and (ii), \[ \text{Cov}(\mathbf{X})= \begin{bmatrix} 1-p_1 & -\sqrt{p_1 p_2} & \cdots & -\sqrt{p_1 p_k} \\ -\sqrt{p_2 p_1} & 1-p_2 & \cdots & -\sqrt{p_2 p_k} \\ \vdots & \vdots & \ddots & \vdots \\ -\sqrt{p_k p_1} & -\sqrt{p_k p_2} & \cdots & 1-p_k \end{bmatrix} =I-\mathbf{p}\mathbf{p}^T, \tag{iii} \] where \[ \mathbf{p}= \begin{bmatrix} \sqrt{p_1}\\ \sqrt{p_2}\\ \vdots\\ \sqrt{p_k} \end{bmatrix}. \]
- We find the eigenvalues of \(\text{Cov}(\mathbf{X})\). Note that \(\mathbf{p}^T\mathbf{p}=p_1+p_2+\cdots+p_k=1\). \[ \begin{array}{rcl} \det{(\text{Cov}(\mathbf{X})-\lambda I)} &=& \det{(I-\mathbf{p}\mathbf{p}^T-\lambda I)} \\ &=& \det{((1-\lambda)I-\mathbf{p}\mathbf{p}^T)} \\ &=& (1-\lambda)^k \det{\left(I-\frac{1}{1-\lambda}\mathbf{p}\mathbf{p}^T\right)} \\ &\stackrel{\text{Sylvester's Theorem}}{=}& (1-\lambda)^k\left(1-\frac{1}{1-\lambda}\mathbf{p}^T\mathbf{p}\right) \\ &=& -\lambda(1-\lambda)^{k-1}. \end{array} \] The eigenvalues of \(\text{Cov}(\mathbf{X})\) are \(0\) and \(1\) with \(k-1\) multiplicity.
- Since \(\text{Cov}(\mathbf{X})\) is symmetric, \(\text{Cov}(\mathbf{X})\) is orthogonally diagonalizable. (See [Friedberg, p.384, thm.6.20].) That is, there exists an orthogonal matrix \(Q\) (\(QQ^T=Q^TQ=I\)) such that \[ Q\text{Cov}(\mathbf{X})Q^T= \left[ \begin{array}{c|c} I_{k-1} & O \\ \hline O & 0 \end{array} \right] \text{ and } Q\text{Cov}(\mathbf{X})= \left[ \begin{array}{c|c} I_{k-1} & O \\ \hline O & 0 \end{array} \right]Q \tag{iv} \]
- Set \[ \mathbf{Z}=Q\mathbf{X} \tag{v} \]
- Note that \[ \text{Cov}(\mathbf{Z}) =\text{Cov}(Q\mathbf{X}) \stackrel{\text{[Hogg, IMS, p.141, thm.2.63]}}{=}Q\text{Cov}(\mathbf{X})Q^T= \left[ \begin{array}{c|c} I_{k-1} & O \\ \hline O & 0 \end{array} \right] \tag{vi} \]
Now, we prove the main four results.
- For each \(i\), by (v), \(Z_i\) is a linear combination of normal random variables \(X_1, X_2, ..., X_k\). By ???, \(Z_i\) is also normal.
- Since \(\text{E}(X_i)=0\), by [Casella, p.57], \(\text{E}(Z_i)=0\). By (vi), \(\text{Var}(Z_i)=1\) for \(i=1, 2, ..., k-1\).
- By (vi), \(\text{Cov}(Z_i, Z_j)=0\) for \(i\neq j\). That is, \(Z_i\) and \(Z_j\) are uncorrelated. By [Roussas, Course, p.466, cor.2], \(Z_1, Z_2, ..., Z_k\) are independent.
- Now, \[ \begin{array}{cl} \text{check that}& \mathbf{p}^T\mathbf{X}=0 \text{ by their definitions}\\ \Rightarrow& \text{Cov}(\mathbf{X})\mathbf{X}\stackrel{\text{(iii)}}{=}(I-\mathbf{p}\mathbf{p}^T)\mathbf{X}=\mathbf{X}-\mathbf{p}\mathbf{p}^T\mathbf{X}=\mathbf{X} \\ \Rightarrow& \mathbf{Z}\stackrel{\text{(v)}}{=}Q\mathbf{X}=Q(\text{Cov}(\mathbf{X})\mathbf{X})=(Q\text{Cov}(\mathbf{X}))\mathbf{X}\stackrel{\text{(iv)}}{=}\left[ \begin{array}{c|c} I_{k-1} & O \\ \hline O & 0 \end{array} \right] Q\mathbf{X} \\ \Rightarrow & Z_k=0 \end{array} \] This follows that \[\sum_{i=1}^{k-1}Z_i^2=\sum_{i=1}^{k}Z_i^2 \] and \[ \sum_{i=1}^{k-1}Z_i^2 =\sum_{i=1}^{k}Z_i^2 =\mathbf{Z}^T\mathbf{Z} =(Q\mathbf{X})^T(Q\mathbf{X}) =\mathbf{X}^T Q^T Q \mathbf{X} =\mathbf{X}^T \mathbf{X} =\sum_{i=1}^{k}X_i^2. \]
The textbooks which have no proof
- [Hogg, PSI, p.416] We accept Pearson's conclusion, as the proof is beyond the level of this text.
- [Hogg, IMS, p.284] It is proved in a more advanced course that, as \(n\to \infty\), \(Q_{k−1}\) has an approximate \(\chi^2(n-1)\) distribution.
- [Mood, p.445] We will not prove the above theorem, but we will indicate its proof for \(k=1\).
- [DeGroot, p.626] In 1900, Karl Pearson proved the following result, whose proof will not be given here.
- This theorem can be proved by maximum likelihood estimator. See [Rice, p.341] or [Roussas, Course, p.370] or [Spokoiny, p.205]
- This proof is similar to the approximation of multinomial by multivariate normal. See here
- Some books don't require that \(\Sigma\) is invertible. In that case, you can use [Hogg, IMS, p.202, thm.3.5.2] directly. At the end of Part II, we have \[ \sqrt{n}(\overline{\mathbf{X}}_n-\mathbf{p})\stackrel{d}{\to} \text{N}_k(\mathbf{0}, \Sigma). \] By [Hogg, IMS, p.202, thm.3.5.2], (\(A=\left[\begin{array}{c|c}I_{k-1}&\mathbf{0}\end{array}\right], \mathbf{b}=\mathbf{0}\)), \[ \sqrt{n}(\overline{\mathbf{Z}}_n-\mathbf{p}^*)=A\sqrt{n}(\overline{\mathbf{X}}_n-\mathbf{p})\stackrel{d}{\to} N_{k-1}(\mathbf{0}, \Sigma^*). \] A same result as (****).
