principal components analysis:主成分分析 PCA

1.PCA

主成分分析(Principle Component Analysis,PCA),是一种多变量统计分析方法,也是数据降维和可视化的常用方法。PCA的原理是将原矩阵投影到一个新的正交坐标系下,且通过依次选择新坐标轴的方向,使得矩阵在新坐标轴上的投影(主成分的样本值)的方差最大

2.证明

输入: x i ∈ R n , i = 1 , 2 , ⋯   , m x_{i} \in \mathbb{R}^{n}, i=1,2, \cdots, m xiRn,i=1,2,,m

输出: 主成分向量 z 1 , z 2 , ⋯   , z k ∈ R n , k ≤ n z_{1}, z_{2}, \cdots, z_{k} \in \mathbb{R}^{n}, k \leq n z1,z2,,zkRn,kn

  • 对输入的数据进行归一化:
    X ~ = [ x ~ 1 , ⋯   , x ~ m ] , x ~ i = x i − x ˉ , i = 1 , ⋯   , m x ˉ = 1 m ∑ i = 1 m x i \tilde{X}=\left[\tilde{x}_{1}, \cdots, \tilde{x}_{m}\right], \tilde{x}_{i}=x_{i}-\bar{x}, i=1, \cdots, m \quad \bar{x}=\frac{1}{m} \sum_{i=1}^{m} x_{i} X~=[x~1,,x~m],x~i=xixˉ,i=1,,mxˉ=m1i=1mxi

  • PCA就是将这些点投影到方向为 z ∈ R n , ∥ z ∥ 2 = 1 z \in \mathbb{R}^{n},\|z\|_{2}=1 zRn,z2=1上时,保证得到最大的方差,也就是在这个方向上分布地最散:
    α i = x ~ i T z , i = 1 , ⋯   , m \alpha_{i}=\tilde{x}_{i}^{T} z, i=1, \cdots, m αi=x~iTz,i=1,,m

  • 投影得到的平均方差为:
    1 m ∑ i = 1 m α i 2 = 1 m ∑ i = 1 m z T x ~ i x ~ i T z = 1 m z T X ~ X ~ T z \frac{1}{m} \sum_{i=1}^{m} \alpha_{i}^{2}=\frac{1}{m} \sum_{i=1}^{m} z^{T} \tilde{x}_{i} \tilde{x}_{i}^{T} z=\frac{1}{m} z^{T} \tilde{X} \tilde{X}^{T} z m1i=1mαi2=m1i=1mzTx~ix~iTz=m1zTX~X~Tz

  • 所以,最大化上式:
    max ⁡ z ∈ R n z T ( X ~ X ~ T ) z ,  s.t.:  ∥ z ∥ 2 = 1 \max _{z \in R^{n}} z^{T}\left(\tilde{X} \tilde{X}^{T}\right) z, \text { s.t.: }\|z\|_{2}=1 zRnmaxzT(X~X~T)z, s.t.: z2=1

  • 根据瑞利熵:
    λ min ⁡ ( A ) ≤ x T A x x T x ≤ λ max ⁡ ( A ) , ∀ x ≠ 0 \lambda_{\min }(A) \leq \frac{x^{T} A x}{x^{T} x} \leq \lambda_{\max }(A), \forall x \neq 0 λmin(A)xTxxTAxλmax(A),x=0

  • 根据谱定理:
    A = U Λ U T = ∑ i = 1 n λ i u i u i T , Λ = diag ⁡ ( λ 1 , ⋯   , λ n ) A=U \Lambda U^{T}=\sum_{i=1}^{n} \lambda_{i} u_{i} u_{i}^{T}, \Lambda=\operatorname{diag}\left(\lambda_{1}, \cdots, \lambda_{n}\right) A=UΛUT=i=1nλiuiuiT,Λ=diag(λ1,,λn)

  • 最后PCA:
    H = X ~ X ~ T = U r Σ 2 U r T H=\tilde{X} \tilde{X}^{T}=U_{r} \Sigma^{2} U_{r}^{T} H=X~X~T=UrΣ2UrT

  • 第一个主要向量 z 1 = u 1 , u 1 z_{1}=u_{1}, u_{1} z1=u1,u1便是 U r U_{r} Ur的第一列。

  • Perform SVD on X ~ \tilde{X} X~:
    X ~ = U r Σ V r T = ∑ i = 1 σ i u i v i T \quad \tilde{X}=U_{r} \Sigma V_{r}^{T}=\sum_{i=1} \sigma_{i} u_{i} v_{i}^{T} X~=UrΣVrT=i=1σiuiviT

  • 通过减去第一个主要向量的数据得到第二个主要向量 z 2 z_{2} z2
    x ~ i ( 1 ) = x ~ i − u 1 ( u 1 T x ~ i ) , i = 1 , ⋯   , m X ~ ( 1 ) = [ x ~ 1 ( 1 ) , ⋯   , x ~ m ( 1 ) ] = ( I n − u 1 u 1 T ) X ~ \begin{aligned} &\tilde{x}_{i}^{(1)}=\tilde{x}_{i}-u_{1}\left(u_{1}^{T} \tilde{x}_{i}\right), i=1, \cdots, m \\ &\tilde{X}^{(1)}=\left[\tilde{x}_{1}^{(1)}, \cdots, \tilde{x}_{m}^{(1)}\right]=\left(I_{n}-u_{1} u_{1}^{T}\right) \tilde{X} \end{aligned} x~i(1)=x~iu1(u1Tx~i),i=1,,mX~(1)=[x~1(1),,x~m(1)]=(Inu1u1T)X~

  • 结合上述两个式子可以得到:
    X ~ ( 1 ) = ∑ i = 1 r σ i u i v i T − ( u 1 u 1 T ) ∑ i = 1 r σ i u i v i T = ∑ i = 1 r σ i u i v i T − ∑ i = 1 r σ i u 1 u 1 T u i v i T = ∑ i = 1 r σ i u i v i T − σ 1 u 1 v 1 T / / U  is orthogonal  = ∑ i = 2 r σ i u i v i T \begin{aligned} \tilde{X}^{(1)} &=\sum_{i=1}^{r} \sigma_{i} u_{i} v_{i}^{T}-\left(u_{1} u_{1}^{T}\right) \sum_{i=1}^{r} \sigma_{i} u_{i} v_{i}^{T} \\ &=\sum_{i=1}^{r} \sigma_{i} u_{i} v_{i}^{T}-\sum_{i=1}^{r} \sigma_{i} u_{1} u_{1}^{T} u_{i} v_{i}^{T} \\ &=\sum_{i=1}^{r} \sigma_{i} u_{i} v_{i}^{T}-\sigma_{1} u_{1} v_{1}^{T} \quad / / \mathrm{U} \text { is orthogonal } \\ &=\sum_{i=2}^{r} \sigma_{i} u_{i} v_{i}^{T} \end{aligned} X~(1)=i=1rσiuiviT(u1u1T)i=1rσiuiviT=i=1rσiuiviTi=1rσiu1u1TuiviT=i=1rσiuiviTσ1u1v1T//U is orthogonal =i=2rσiuiviT

  • 已经去掉了第一个分量,那么第二个主要向量 z 2 z_{2} z2可以通过下式得到:
    max ⁡ z ∈ R n z T ( X ~ ( 1 ) X ~ ( 1 ) T ) z ,  s.t.:  ∥ z ∥ 2 = 1 X ~ ( 1 ) = ∑ i = 2 r σ i u i v i T \begin{gathered} \max _{z \in R^{n}} z^{T}\left(\tilde{X}^{(1)} \tilde{X}^{(1) T}\right) z, \text { s.t.: }\|z\|_{2}=1 \\ \tilde{X}^{(1)}=\sum_{i=2}^{r} \sigma_{i} u_{i} v_{i}^{T} \end{gathered} zRnmaxzT(X~(1)X~(1)T)z, s.t.: z2=1X~(1)=i=2rσiuiviT

  • 最后的结果 z 2 = u 2 , u 2 z_{2}=u_{2}, u_{2} z2=u2,u2 U r U_{r} Ur的第二列。

  • z 3 , ⋯ z m z_{3}, \cdots z_{m} z3,zm 可以通过相似的去除分量操作进行计算。

3.总结

给定数据 x i ∈ R n , i = 1 , 2 , ⋯ m x_{i} \in \mathbb{R}^{n}, i=1,2, \cdots m xiRn,i=1,2,m, 执行PCA的步骤为:

  1. 根据中心进行归一化:
    X ~ = [ x ~ 1 , ⋯   , x ~ m ] , x ~ i = x i − x ˉ , i = 1 , ⋯   , m x ˉ = 1 m ∑ i = 1 m x i \tilde{X}=\left[\tilde{x}_{1}, \cdots, \tilde{x}_{m}\right], \tilde{x}_{i}=x_{i}-\bar{x}, i=1, \cdots, m \quad \bar{x}=\frac{1}{m} \sum_{i=1}^{m} x_{i} X~=[x~1,,x~m],x~i=xixˉ,i=1,,mxˉ=m1i=1mxi
  2. 计算 SVD:
    H = X ~ X ~ T = U r Σ 2 U r T H=\tilde{X} \tilde{X}^{T}=U_{r} \Sigma^{2} U_{r}^{T} H=X~X~T=UrΣ2UrT
  3. 主要向量便是 U r U_{r} Ur的列( X X X的特征向量= H H H的特征向量)

4.Kernel PCA

一般来说,主成分分析(Principal Components Analysis, PCA)适用于数据的线性降维。而核主成分分析(Kernel PCA, KPCA)可实现数据的非线性降维,用于处理线性不可分的数据集。

KPCA的大致思路是:对于输入空间(Input space)中的矩阵 X \mathbf{X} X ,我们先用一个非线性映射把 X \mathbf{X} X 中的所有样本映射到一个高维甚至是无穷维的空间(称为特征空间,Feature space),(使其线性可分),然后在这个高维空间进行PCA降维。

5.证明

输入数据为 x i ∈ R n , i = 1 , 2 , ⋯   , N x_{i} \in \mathbb{R}^{n}, i=1,2, \cdots, N xiRn,i=1,2,,N,假设存在一个非线性变换 ϕ ( ⋅ ) \phi(\cdot) ϕ(),可以将数据的维度从 R n → R n 1 \mathbb{R}^{n} \to \mathbb{R}^{n_1} RnRn1,其中 n < < n 1 n << n_1 n<<n1,最终会得到升维后的数据 ϕ ( X ) = [ ϕ ( x 1 ) , ϕ ( x 2 ) , . . . , ϕ ( x N ) ] \phi(X)=[\phi(x_1),\phi(x_2),...,\phi(x_N)] ϕ(X)=[ϕ(x1),ϕ(x2),...,ϕ(xN)]

在将维度提升至 n 1 n_1 n1后,再采取标准的线性PCA将维度降低到 n 2 n_2 n2维,其中 n < n 2 < n 1 n<n_2<n_1 n<n2<n1:

  1. Assume ϕ ( x i ) \phi\left(x_{i}\right) ϕ(xi) is already zero-center
    1 N ∑ i = 1 N ϕ ( x i ) = 0 \frac{1}{N} \sum_{i=1}^{N} \phi\left(x_{i}\right)=0 N1i=1Nϕ(xi)=0
  2. Compute correlation matrix
    H ~ = 1 N ∑ i = 1 N ϕ ( x i ) ϕ T ( x i ) = ϕ ( X ) ϕ T ( X ) \tilde{H}=\frac{1}{N} \sum_{i=1}^{N} \phi\left(x_{i}\right) \phi^{T}\left(x_{i}\right)=\phi(X)\phi^T(X) H~=N1i=1Nϕ(xi)ϕT(xi)=ϕ(X)ϕT(X)
  3. Solve the eigenvectors/eigenvalues by H ~ z ~ = λ ~ z ~ \tilde{H} \tilde{z}=\tilde{\lambda} \tilde{z} H~z~=λ~z~

此时存在两个问题:

  1. ϕ \phi ϕ将如何定义?
  2. 可以避免通过 ϕ \phi ϕ将数据映射到高维这个步骤吗?

我们注意到,特征向量可以表示为特征的线性组合:
z ~ = ∑ j = 1 N α j ϕ ( x j ) = ϕ ( X ) α \tilde{z}=\sum_{j=1}^{N} \alpha_{j} \phi\left(x_{j}\right)= \phi(X) \alpha z~=j=1Nαjϕ(xj)=ϕ(X)α
注意,此时的 α \alpha α变成了一个向量, α = [ α 1 , α 2 , . . . , α N ] T \alpha = [\alpha_1, \alpha_2,...,\alpha_N]^T α=[α1,α2,...,αN]T


证明:
H ~ z ~ = λ ~ z ~ \tilde{H} \tilde{z}=\tilde{\lambda} \tilde{z} H~z~=λ~z~

1 N ∑ i = 1 N ϕ ( x i ) ϕ T ( x i ) z ~ ⏟ s c a l a r = λ ~ z ~ \frac{1}{N}\sum\limits_{i = 1}^N \phi \left( {{x_i}} \right)\underbrace {{\phi ^T}\left( {{x_i}} \right)\tilde z}_{{\rm{scalar}}}{\rm{ }} = \tilde \lambda \tilde z N1i=1Nϕ(xi)scalar ϕT(xi)z~=λ~z~

  • 将线性组合带入到 H ~ z ~ = λ ~ z ~ \tilde{H} \tilde{z}=\tilde{\lambda} \tilde{z} H~z~=λ~z~中:
    1 N ϕ ( X ) ϕ T ( X ) ϕ ( X ) α = λ ~ ϕ ( X ) α \begin{aligned} \frac{1}{N} \phi(X)\phi^T(X)\phi(X) \alpha&=\tilde{\lambda} \phi(X) \alpha \end{aligned} N1ϕ(X)ϕT(X)ϕ(X)α=λ~ϕ(X)α

  • 定义 kernel 函数为 k ( x i , x j ) = ϕ T ( x i ) ϕ ( x j ) k\left(x_{i}, x_{j}\right)=\phi^{T}\left(x_{i}\right) \phi\left(x_{j}\right) k(xi,xj)=ϕT(xi)ϕ(xj) K = ϕ T ( X ) ϕ ( X ) ∈ R N × N K=\phi^T(X)\phi(X) \in \mathbb{R}^{N \times N} K=ϕT(X)ϕ(X)RN×N, K ( i , j ) = k ( x i , x j ) K(i, j) = k\left(x_{i}, x_{j}\right) K(i,j)=k(xi,xj) K K K是一个对称矩阵:
    1 N ϕ ( X ) K α = λ ~ ϕ ( X ) α \frac{1}{N} \phi(X)K \alpha=\tilde{\lambda} \phi(X) \alpha N1ϕ(X)Kα=λ~ϕ(X)α

  • 两边同时乘以 ϕ T ( X ) \phi^T\left(X\right) ϕT(X)
    1 N ϕ T ( X ) ϕ ( X ) K α = λ ~ ϕ T ( X ) ϕ ( X ) α \frac{1}{N}\phi^T\left(X\right) \phi(X)K \alpha=\tilde{\lambda} \phi^T\left(X\right)\phi(X) \alpha N1ϕT(X)ϕ(X)Kα=λ~ϕT(X)ϕ(X)α

  • 上述公式还可以写成:
    K 2 α = N λ ~ K α K^{2} \alpha=N \tilde{\lambda} K \alpha K2α=Nλ~Kα

  • 两边去掉 K K K
    K α = N λ ~ α K α = λ α \begin{aligned} &K \alpha=N \tilde{\lambda} \alpha \\ &K \alpha=\lambda \alpha \end{aligned} Kα=Nλ~αKα=λα

  • 可以得到特征向量 α r \alpha_{r} αr 和特征值 λ r , r = 1 , ⋯   , l \lambda_{r}, r=1, \cdots, l λr,r=1,,l

  • 但是,必须保证 z ~ \tilde{z} z~ 是单位向量,值得注意的是,我们是在特征空间中解的线性PCA:
    H ~ z ~ = λ ~ z ~ z ~ = ∑ j = 1 N α j ϕ ( x j ) = ϕ ( X ) α \tilde{H} \tilde{z}=\tilde{\lambda} \tilde{z} \quad \tilde{z}=\sum_{j=1}^{N} \alpha_{j} \phi\left(x_{j}\right)=\phi(X) \alpha H~z~=λ~z~z~=j=1Nαjϕ(xj)=ϕ(X)α

  • z ~ \tilde{z} z~ 归一化流程为:
    1 = z ~ r T z ~ r 1 = ∑ i = 1 N ∑ j = 1 N α r i α r j ϕ T ( x i ) ϕ ( x j ) = α r T ϕ T ( X ) ϕ ( X ) α r 1 = α r T K α r \begin{aligned} 1 &=\tilde{z}_{r}^{T} \tilde{z}_{r} \\ 1 &=\sum_{i=1}^{N} \sum_{j=1}^{N} \alpha_{r i} \alpha_{r j} \phi^{T}\left(x_{i}\right) \phi\left(x_{j}\right) =\alpha_{r}^{T} \phi^{T}\left(X\right) \phi\left(X\right) \alpha_{r} \\ 1 &=\alpha_{r}^{T} K \alpha_{r} \end{aligned} 111=z~rTz~r=i=1Nj=1NαriαrjϕT(xi)ϕ(xj)=αrTϕT(X)ϕ(X)αr=αrTKαr

  • 注意到 K α = λ α K \alpha=\lambda \alpha Kα=λα, 有:
    α r T λ r α r = 1 , ∀ r \alpha_{r}^{T} \lambda_{r} \alpha_{r}=1, \forall r αrTλrαr=1,r

  • 即, 将 α r \alpha_{r} αr 归一化为 1 / λ r 1 / \lambda_{r} 1/λr,得到归一化后的 α ~ r \tilde\alpha_{r} α~r.

  • 现在, 第 r t h r^{t h} rth 主要向量为:
    z ~ r = ∑ j = 1 N α ~ r j ϕ ( x j ) \tilde{z}_{r}=\sum_{j=1}^{N} \tilde\alpha_{r j} \phi\left(x_{j}\right) z~r=j=1Nα~rjϕ(xj)

  • 现在,我们已知数据点 x x x的映射 ϕ ( x ) \phi(x) ϕ(x)的主要向量,将 ϕ ( x ) \phi(x) ϕ(x)映射到 z ~ r \tilde{z}_{r} z~r的方向上去:
    ϕ T ( x ) z ~ r = ∑ j = 1 N α r j ϕ T ( x ) ϕ ( x j ) = ∑ j = 1 N α r j k ( x , x j ) \phi^{T}(x) \tilde{z}_{r}=\sum_{j=1}^{N} \alpha_{r j} \phi^{T}(x) \phi\left(x_{j}\right)=\sum_{j=1}^{N} \alpha_{r j} k\left(x, x_{j}\right) ϕT(x)z~r=j=1NαrjϕT(x)ϕ(xj)=j=1Nαrjk(x,xj)

  • 但是,还有一件事需要考虑,我们之前是假设 ϕ ( x i ) \phi\left(x_{i}\right) ϕ(xi)是相对于中心归一化的,所以还要通过kernel保证 ϕ ( x i ) \phi\left(x_{i}\right) ϕ(xi)的归一化。

  • ϕ ( x i ) \phi\left(x_{i}\right) ϕ(xi) 相对于中心点归一化:
    ϕ ~ ( x i ) = ϕ ( x i ) − 1 N ∑ j = 1 N ϕ ( x j ) \tilde{\phi}\left(x_{i}\right)=\phi\left(x_{i}\right)-\frac{1}{N} \sum_{j=1}^{N} \phi\left(x_{j}\right) ϕ~(xi)=ϕ(xi)N1j=1Nϕ(xj)

  • 为简化表达式,引入 N N N维向量 1 N × 1 = [ 1 , 1 , . . . , 1 ] T \mathbf{1}_{N\times1}=[1, 1,...,1]^T 1N×1=[1,1,...,1]T,则
    ϕ ~ ( x i ) = ϕ ( x i ) − 1 N ϕ ( X ) 1 N × 1 \tilde{\phi}\left(x_{i}\right)=\phi\left(x_{i}\right)-\frac{1}{N} \phi(X)\mathbf{1}_{N\times1} ϕ~(xi)=ϕ(xi)N1ϕ(X)1N×1

  • 将矩阵 ϕ ( X ) \phi(\mathbf{X}) ϕ(X) 中所有向量中心化得到:
    ϕ ~ ( X ) = [ ϕ ( x 1 ) − 1 N ϕ ( X ) 1 N × 1 , ϕ ( x 2 ) − 1 N ϕ ( X ) 1 N × 1 , … , ϕ ( x N ) − 1 N ϕ ( X ) 1 N × 1 ] = [ ϕ ( x 1 ) , ϕ ( x 2 ) , … , ϕ ( x N ) ] − 1 N ϕ ( X ) 1 N × 1 1 N × 1 T = ϕ ( X ) − 1 N ϕ ( X ) 1 N × 1 1 N × 1 T \begin{gathered} \widetilde{\phi}(\mathbf{X})=\left[\phi\left(\mathbf{x}_{1}\right)-\frac{1}{N} \phi(\mathbf{X}) \mathbf{1}_{\mathbf{N} \times \mathbf{1}}, \phi\left(\mathbf{x}_{2}\right)-\frac{1}{N} \phi(\mathbf{X}) \mathbf{1}_{\mathbf{N} \times \mathbf{1}}, \ldots, \phi\left(\mathbf{x}_{\mathbf{N}}\right)-\frac{1}{N} \phi(\mathbf{X}) \mathbf{1}_{\mathbf{N} \times \mathbf{1}}\right] \\ =\left[\phi\left(\mathbf{x}_{\mathbf{1}}\right), \phi\left(\mathbf{x}_{\mathbf{2}}\right), \ldots, \phi\left(\mathbf{x}_{\mathbf{N}}\right)\right]-\frac{1}{N} \phi(\mathbf{X}) \mathbf{1}_{\mathbf{N} \times \mathbf{1}} \mathbf{1}_{\mathbf{N} \times \mathbf{1}}^{T} \\ =\phi(\mathbf{X})-\frac{1}{N} \phi(\mathbf{X}) \mathbf{1}_{\mathbf{N} \times \mathbf{1}} \mathbf{1}_{\mathbf{N} \times \mathbf{1}^{T}} \end{gathered} ϕ (X)=[ϕ(x1)N1ϕ(X)1N×1,ϕ(x2)N1ϕ(X)1N×1,,ϕ(xN)N1ϕ(X)1N×1]=[ϕ(x1),ϕ(x2),,ϕ(xN)]N1ϕ(X)1N×11N×1T=ϕ(X)N1ϕ(X)1N×11N×1T

  • 为方便表示,我们再次引入矩阵 1 N = 1 N 1 N × L 1 N × 1 T \mathbf{1}_{\mathbf{N}}=\frac{1}{N} \mathbf{1}_{\mathbf{N} \times \mathbb{L}} \mathbf{1}_{\mathrm{N} \times \mathbf{1}}{ }^{T} 1N=N11N×L1N×1T ,表示一个 N × N N \times N N×N 的矩阵,其每 个元素都为 1 N \frac{1}{N} N1 。则 ϕ ~ ( X ) \widetilde{\phi}(\mathbf{X}) ϕ (X) 可简记为:
    ϕ ~ ( X ) = ϕ ( X ) − ϕ ( X ) 1 N \widetilde{\phi}(\mathbf{X})=\phi(\mathbf{X})-\phi(\mathbf{X}) \mathbf{1}_{\mathbf{N}} ϕ (X)=ϕ(X)ϕ(X)1N

  • 如前文所说,我们并不需要显式地计算 ϕ ~ ( X ) \tilde{\phi}(\mathbf{X}) ϕ~(X) ,只需计算中心化后的核矩阵即可:
    K ~ i j = ϕ ~ ( x i ) T ϕ ~ ( x j ) = [ ϕ ( x i ) − 1 N ϕ ( X ) 1 N × 1 ] T [ ϕ ( x j ) − 1 N ϕ ( X ) 1 N × 1 ] \widetilde{\mathbf{K}}_{i j}=\widetilde{\phi}\left(\mathbf{x}_{\mathbf{i}}\right)^{T} \widetilde{\phi}\left(\mathbf{x}_{\mathbf{j}}\right)=\left[\phi\left(\mathbf{x}_{\mathbf{i}}\right)-\frac{1}{N} \phi(\mathbf{X}) \mathbf{1}_{\mathbf{N} \times \mathbf{1}}\right]^{T}\left[\phi\left(\mathbf{x}_{\mathbf{j}}\right)-\frac{1}{N} \phi(\mathbf{X}) \mathbf{1}_{\mathbf{N} \times \mathbf{1}}\right] K ij=ϕ (xi)Tϕ (xj)=[ϕ(xi)N1ϕ(X)1N×1]T[ϕ(xj)N1ϕ(X)1N×1]

    • 或者,我们可以直接推导中心化整个核矩阵的表达式,更为紧凑:
      K ~ = ϕ ~ ( X ) T ϕ ~ ( X ) = [ ϕ ( X ) − ϕ ( X ) 1 N ] T [ ϕ ( X ) − ϕ ( X ) 1 N ] = ϕ ( X ) T ϕ ( X ) − ϕ ( X ) T ϕ ( X ) 1 N − 1 N T ϕ ( X ) T ϕ ( X ) + 1 N T ϕ ( X ) T ϕ ( X ) 1 N = K − K ⋅ 1 N − 1 N T ⋅ K + 1 N T ⋅ K ⋅ 1 N \begin{gathered} \widetilde{\mathbf{K}}=\widetilde{\phi}(\mathbf{X})^{T} \widetilde{\phi}(\mathbf{X})=\left[\phi(\mathbf{X})-\phi(\mathbf{X}) \mathbf{1}_{\mathbf{N}}\right]^{T}\left[\phi(\mathbf{X})-\phi(\mathbf{X}) \mathbf{1}_{\mathbf{N}}\right] \\ =\phi(\mathbf{X})^{T} \phi(\mathbf{X})-\phi(\mathbf{X})^{T} \phi(\mathbf{X}) \mathbf{1}_{\mathbf{N}}-\mathbf{1}_{\mathbf{N}}^{T} \phi(\mathbf{X})^{T} \phi(\mathbf{X})+\mathbf{1}_{\mathbf{N}}^{T} \phi(\mathbf{X})^{T} \phi(\mathbf{X}) \mathbf{1}_{\mathbf{N}} \\ =\mathbf{K}-\mathbf{K} \cdot \mathbf{1}_{\mathbf{N}}-\mathbf{1}_{\mathbf{N}}^{T} \cdot \mathbf{K}+\mathbf{1}_{\mathbf{N}}^{T} \cdot \mathbf{K} \cdot \mathbf{1}_{\mathbf{N}} \end{gathered} K =ϕ (X)Tϕ (X)=[ϕ(X)ϕ(X)1N]T[ϕ(X)ϕ(X)1N]=ϕ(X)Tϕ(X)ϕ(X)Tϕ(X)1N1NTϕ(X)Tϕ(X)+1NTϕ(X)Tϕ(X)1N=KK1N1NTK+1NTK1N
  • 由于 1 N T = 1 N 1_{\mathrm{N}}{ }^{T}=1_{\mathrm{N}} 1NT=1N 为对称矩阵,所以上式又可简化为
    K ~ = K − K ⋅ 1 N − 1 N ⋅ K + 1 N ⋅ K ⋅ 1 N \widetilde{\mathbf{K}}=\mathbf{K}-\mathbf{K} \cdot \mathbf{1}_{\mathbf{N}}-\mathbf{1}_{\mathbf{N}} \cdot \mathbf{K}+\mathbf{1}_{\mathbf{N}} \cdot \mathbf{K} \cdot \mathbf{1}_{\mathbf{N}} K =KK1N1NK+1NK1N

可选的kernel

  • Linear k ( x i , x j ) = x i T x j k\left(x_{i}, x_{j}\right)=x_{i}^{T} x_{j} k(xi,xj)=xiTxj
  • Polynomial k ( x i , x j ) = ( 1 + x i T x j ) p k\left(x_{i}, x_{j}\right)=\left(1+x_{i}^{T} x_{j}\right)^{p} k(xi,xj)=(1+xiTxj)p
  • Gaussian k ( x i , x j ) = e − β ∥ x i − x j ∥ 2 k\left(x_{i}, x_{j}\right)=e^{-\beta\left\|x_{i}-x_{j}\right\|_{2}} k(xi,xj)=eβxixj2
  • Laplacian k ( x i , x j ) = e − β ∥ x i − x j ∥ 1 k\left(x_{i}, x_{j}\right)=e^{-\beta\left\|x_{i}-x_{j}\right\|_{1}} k(xi,xj)=eβxixj1
  • Sigmoid

6.总结

  • 选择一个核 k ( x i , x j ) k\left(x_{i}, x_{j}\right) k(xi,xj), 计算 Gram matrix K ( i , j ) = k ( x i , x j ) K(i, j)=k\left(x_{i}, x_{j}\right) K(i,j)=k(xi,xj)
  • K K K进行归一化:
    K ~ = K − K ⋅ 1 N − 1 N ⋅ K + 1 N ⋅ K ⋅ 1 N \widetilde{\mathbf{K}}=\mathbf{K}-\mathbf{K} \cdot \mathbf{1}_{\mathbf{N}}-\mathbf{1}_{\mathbf{N}} \cdot \mathbf{K}+\mathbf{1}_{\mathbf{N}} \cdot \mathbf{K} \cdot \mathbf{1}_{\mathbf{N}} K =KK1N1NK+1NK1N
  • 解出 K ~ \widetilde{K} K 的特征值和特征向量:
    K ~ α r = λ r α r \widetilde{K} \alpha_{r}=\lambda_{r} \alpha_{r} K αr=λrαr
  • α r \alpha_{r} αr 归一化为 α r T α r = 1 λ r \alpha_{r}^{T} \alpha_{r}=\frac{1}{\lambda_{r}} αrTαr=λr1
  • 对于任意的数据点 x ∈ R n x \in \mathbb{R}^{n} xRn,计算其到第 r t h r^{t h} rth个主要向量方向上的投影 y r ∈ R y_{r} \in \mathbb{R} yrR
    y r = ϕ T ( x ) z ~ r = ∑ j = 1 N α r j k ( x , x j ) y_{r}=\phi^{T}(x) \tilde{z}_{r}=\sum_{j=1}^{N} \alpha_{r j} k\left(x, x_{j}\right) yr=ϕT(x)z~r=j=1Nαrjk(x,xj)

参考文献

[1] https://zhuanlan.zhihu.com/p/59775730 数据降维: 核主成分分析(Kernel PCA)原理解析
[2] https://zhuanlan.zhihu.com/p/59803328 数据降维: 主成分分析(PCA)与NIPALS算法数学原理

Logo

魔乐社区(Modelers.cn) 是一个中立、公益的人工智能社区,提供人工智能工具、模型、数据的托管、展示与应用协同服务,为人工智能开发及爱好者搭建开放的学习交流平台。社区通过理事会方式运作,由全产业链共同建设、共同运营、共同享有,推动国产AI生态繁荣发展。

更多推荐