【记录读论文时遇到的一些算法7】—— 点云主成分分析
principal components analysis:主成分分析 PCA1.PCA2.证明3.总结4.Kernel PCA5.证明6.总结参考文献1.PCA主成分分析(Principle Component Analysis,PCA),是一种多变量统计分析方法,也是数据降维和可视化的常用方法。PCA的原理是将原矩阵投影到一个新的正交坐标系下,且通过依次选择新坐标轴的方向,使得矩阵在新坐标轴上
1.PCA
主成分分析(Principle Component Analysis,PCA),是一种多变量统计分析方法,也是数据降维和可视化的常用方法。PCA的原理是将原矩阵投影到一个新的正交坐标系下,且通过依次选择新坐标轴的方向,使得矩阵在新坐标轴上的投影(主成分的样本值)的方差最大
2.证明
输入: x i ∈ R n , i = 1 , 2 , ⋯ , m x_{i} \in \mathbb{R}^{n}, i=1,2, \cdots, m xi∈Rn,i=1,2,⋯,m
输出: 主成分向量 z 1 , z 2 , ⋯ , z k ∈ R n , k ≤ n z_{1}, z_{2}, \cdots, z_{k} \in \mathbb{R}^{n}, k \leq n z1,z2,⋯,zk∈Rn,k≤n
-
对输入的数据进行归一化:
X ~ = [ x ~ 1 , ⋯ , x ~ m ] , x ~ i = x i − x ˉ , i = 1 , ⋯ , m x ˉ = 1 m ∑ i = 1 m x i \tilde{X}=\left[\tilde{x}_{1}, \cdots, \tilde{x}_{m}\right], \tilde{x}_{i}=x_{i}-\bar{x}, i=1, \cdots, m \quad \bar{x}=\frac{1}{m} \sum_{i=1}^{m} x_{i} X~=[x~1,⋯,x~m],x~i=xi−xˉ,i=1,⋯,mxˉ=m1i=1∑mxi -
PCA就是将这些点投影到方向为 z ∈ R n , ∥ z ∥ 2 = 1 z \in \mathbb{R}^{n},\|z\|_{2}=1 z∈Rn,∥z∥2=1上时,保证得到最大的方差,也就是在这个方向上分布地最散:
α i = x ~ i T z , i = 1 , ⋯ , m \alpha_{i}=\tilde{x}_{i}^{T} z, i=1, \cdots, m αi=x~iTz,i=1,⋯,m -
投影得到的平均方差为:
1 m ∑ i = 1 m α i 2 = 1 m ∑ i = 1 m z T x ~ i x ~ i T z = 1 m z T X ~ X ~ T z \frac{1}{m} \sum_{i=1}^{m} \alpha_{i}^{2}=\frac{1}{m} \sum_{i=1}^{m} z^{T} \tilde{x}_{i} \tilde{x}_{i}^{T} z=\frac{1}{m} z^{T} \tilde{X} \tilde{X}^{T} z m1i=1∑mαi2=m1i=1∑mzTx~ix~iTz=m1zTX~X~Tz -
所以,最大化上式:
max z ∈ R n z T ( X ~ X ~ T ) z , s.t.: ∥ z ∥ 2 = 1 \max _{z \in R^{n}} z^{T}\left(\tilde{X} \tilde{X}^{T}\right) z, \text { s.t.: }\|z\|_{2}=1 z∈RnmaxzT(X~X~T)z, s.t.: ∥z∥2=1 -
根据瑞利熵:
λ min ( A ) ≤ x T A x x T x ≤ λ max ( A ) , ∀ x ≠ 0 \lambda_{\min }(A) \leq \frac{x^{T} A x}{x^{T} x} \leq \lambda_{\max }(A), \forall x \neq 0 λmin(A)≤xTxxTAx≤λmax(A),∀x=0 -
根据谱定理:
A = U Λ U T = ∑ i = 1 n λ i u i u i T , Λ = diag ( λ 1 , ⋯ , λ n ) A=U \Lambda U^{T}=\sum_{i=1}^{n} \lambda_{i} u_{i} u_{i}^{T}, \Lambda=\operatorname{diag}\left(\lambda_{1}, \cdots, \lambda_{n}\right) A=UΛUT=i=1∑nλiuiuiT,Λ=diag(λ1,⋯,λn) -
最后PCA:
H = X ~ X ~ T = U r Σ 2 U r T H=\tilde{X} \tilde{X}^{T}=U_{r} \Sigma^{2} U_{r}^{T} H=X~X~T=UrΣ2UrT -
第一个主要向量 z 1 = u 1 , u 1 z_{1}=u_{1}, u_{1} z1=u1,u1便是 U r U_{r} Ur的第一列。
-
Perform SVD on X ~ \tilde{X} X~:
X ~ = U r Σ V r T = ∑ i = 1 σ i u i v i T \quad \tilde{X}=U_{r} \Sigma V_{r}^{T}=\sum_{i=1} \sigma_{i} u_{i} v_{i}^{T} X~=UrΣVrT=i=1∑σiuiviT -
通过减去第一个主要向量的数据得到第二个主要向量 z 2 z_{2} z2:
x ~ i ( 1 ) = x ~ i − u 1 ( u 1 T x ~ i ) , i = 1 , ⋯ , m X ~ ( 1 ) = [ x ~ 1 ( 1 ) , ⋯ , x ~ m ( 1 ) ] = ( I n − u 1 u 1 T ) X ~ \begin{aligned} &\tilde{x}_{i}^{(1)}=\tilde{x}_{i}-u_{1}\left(u_{1}^{T} \tilde{x}_{i}\right), i=1, \cdots, m \\ &\tilde{X}^{(1)}=\left[\tilde{x}_{1}^{(1)}, \cdots, \tilde{x}_{m}^{(1)}\right]=\left(I_{n}-u_{1} u_{1}^{T}\right) \tilde{X} \end{aligned} x~i(1)=x~i−u1(u1Tx~i),i=1,⋯,mX~(1)=[x~1(1),⋯,x~m(1)]=(In−u1u1T)X~ -
结合上述两个式子可以得到:
X ~ ( 1 ) = ∑ i = 1 r σ i u i v i T − ( u 1 u 1 T ) ∑ i = 1 r σ i u i v i T = ∑ i = 1 r σ i u i v i T − ∑ i = 1 r σ i u 1 u 1 T u i v i T = ∑ i = 1 r σ i u i v i T − σ 1 u 1 v 1 T / / U is orthogonal = ∑ i = 2 r σ i u i v i T \begin{aligned} \tilde{X}^{(1)} &=\sum_{i=1}^{r} \sigma_{i} u_{i} v_{i}^{T}-\left(u_{1} u_{1}^{T}\right) \sum_{i=1}^{r} \sigma_{i} u_{i} v_{i}^{T} \\ &=\sum_{i=1}^{r} \sigma_{i} u_{i} v_{i}^{T}-\sum_{i=1}^{r} \sigma_{i} u_{1} u_{1}^{T} u_{i} v_{i}^{T} \\ &=\sum_{i=1}^{r} \sigma_{i} u_{i} v_{i}^{T}-\sigma_{1} u_{1} v_{1}^{T} \quad / / \mathrm{U} \text { is orthogonal } \\ &=\sum_{i=2}^{r} \sigma_{i} u_{i} v_{i}^{T} \end{aligned} X~(1)=i=1∑rσiuiviT−(u1u1T)i=1∑rσiuiviT=i=1∑rσiuiviT−i=1∑rσiu1u1TuiviT=i=1∑rσiuiviT−σ1u1v1T//U is orthogonal =i=2∑rσiuiviT -
已经去掉了第一个分量,那么第二个主要向量 z 2 z_{2} z2可以通过下式得到:
max z ∈ R n z T ( X ~ ( 1 ) X ~ ( 1 ) T ) z , s.t.: ∥ z ∥ 2 = 1 X ~ ( 1 ) = ∑ i = 2 r σ i u i v i T \begin{gathered} \max _{z \in R^{n}} z^{T}\left(\tilde{X}^{(1)} \tilde{X}^{(1) T}\right) z, \text { s.t.: }\|z\|_{2}=1 \\ \tilde{X}^{(1)}=\sum_{i=2}^{r} \sigma_{i} u_{i} v_{i}^{T} \end{gathered} z∈RnmaxzT(X~(1)X~(1)T)z, s.t.: ∥z∥2=1X~(1)=i=2∑rσiuiviT -
最后的结果 z 2 = u 2 , u 2 z_{2}=u_{2}, u_{2} z2=u2,u2 是 U r U_{r} Ur的第二列。
-
z 3 , ⋯ z m z_{3}, \cdots z_{m} z3,⋯zm 可以通过相似的去除分量操作进行计算。
3.总结
给定数据 x i ∈ R n , i = 1 , 2 , ⋯ m x_{i} \in \mathbb{R}^{n}, i=1,2, \cdots m xi∈Rn,i=1,2,⋯m, 执行PCA的步骤为:
- 根据中心进行归一化:
X ~ = [ x ~ 1 , ⋯ , x ~ m ] , x ~ i = x i − x ˉ , i = 1 , ⋯ , m x ˉ = 1 m ∑ i = 1 m x i \tilde{X}=\left[\tilde{x}_{1}, \cdots, \tilde{x}_{m}\right], \tilde{x}_{i}=x_{i}-\bar{x}, i=1, \cdots, m \quad \bar{x}=\frac{1}{m} \sum_{i=1}^{m} x_{i} X~=[x~1,⋯,x~m],x~i=xi−xˉ,i=1,⋯,mxˉ=m1i=1∑mxi - 计算 SVD:
H = X ~ X ~ T = U r Σ 2 U r T H=\tilde{X} \tilde{X}^{T}=U_{r} \Sigma^{2} U_{r}^{T} H=X~X~T=UrΣ2UrT - 主要向量便是 U r U_{r} Ur的列( X X X的特征向量= H H H的特征向量)
4.Kernel PCA
一般来说,主成分分析(Principal Components Analysis, PCA)适用于数据的线性降维。而核主成分分析(Kernel PCA, KPCA)可实现数据的非线性降维,用于处理线性不可分的数据集。
KPCA的大致思路是:对于输入空间(Input space)中的矩阵 X \mathbf{X} X ,我们先用一个非线性映射把 X \mathbf{X} X 中的所有样本映射到一个高维甚至是无穷维的空间(称为特征空间,Feature space),(使其线性可分),然后在这个高维空间进行PCA降维。
5.证明
输入数据为 x i ∈ R n , i = 1 , 2 , ⋯ , N x_{i} \in \mathbb{R}^{n}, i=1,2, \cdots, N xi∈Rn,i=1,2,⋯,N,假设存在一个非线性变换 ϕ ( ⋅ ) \phi(\cdot) ϕ(⋅),可以将数据的维度从 R n → R n 1 \mathbb{R}^{n} \to \mathbb{R}^{n_1} Rn→Rn1,其中 n < < n 1 n << n_1 n<<n1,最终会得到升维后的数据 ϕ ( X ) = [ ϕ ( x 1 ) , ϕ ( x 2 ) , . . . , ϕ ( x N ) ] \phi(X)=[\phi(x_1),\phi(x_2),...,\phi(x_N)] ϕ(X)=[ϕ(x1),ϕ(x2),...,ϕ(xN)]
在将维度提升至 n 1 n_1 n1后,再采取标准的线性PCA将维度降低到 n 2 n_2 n2维,其中 n < n 2 < n 1 n<n_2<n_1 n<n2<n1:
- Assume ϕ ( x i ) \phi\left(x_{i}\right) ϕ(xi) is already zero-center
1 N ∑ i = 1 N ϕ ( x i ) = 0 \frac{1}{N} \sum_{i=1}^{N} \phi\left(x_{i}\right)=0 N1i=1∑Nϕ(xi)=0 - Compute correlation matrix
H ~ = 1 N ∑ i = 1 N ϕ ( x i ) ϕ T ( x i ) = ϕ ( X ) ϕ T ( X ) \tilde{H}=\frac{1}{N} \sum_{i=1}^{N} \phi\left(x_{i}\right) \phi^{T}\left(x_{i}\right)=\phi(X)\phi^T(X) H~=N1i=1∑Nϕ(xi)ϕT(xi)=ϕ(X)ϕT(X) - Solve the eigenvectors/eigenvalues by H ~ z ~ = λ ~ z ~ \tilde{H} \tilde{z}=\tilde{\lambda} \tilde{z} H~z~=λ~z~
此时存在两个问题:
- ϕ \phi ϕ将如何定义?
- 可以避免通过 ϕ \phi ϕ将数据映射到高维这个步骤吗?
我们注意到,特征向量可以表示为特征的线性组合:
z ~ = ∑ j = 1 N α j ϕ ( x j ) = ϕ ( X ) α \tilde{z}=\sum_{j=1}^{N} \alpha_{j} \phi\left(x_{j}\right)= \phi(X) \alpha z~=j=1∑Nαjϕ(xj)=ϕ(X)α
注意,此时的 α \alpha α变成了一个向量, α = [ α 1 , α 2 , . . . , α N ] T \alpha = [\alpha_1, \alpha_2,...,\alpha_N]^T α=[α1,α2,...,αN]T
证明:
H ~ z ~ = λ ~ z ~ \tilde{H} \tilde{z}=\tilde{\lambda} \tilde{z} H~z~=λ~z~
1 N ∑ i = 1 N ϕ ( x i ) ϕ T ( x i ) z ~ ⏟ s c a l a r = λ ~ z ~ \frac{1}{N}\sum\limits_{i = 1}^N \phi \left( {{x_i}} \right)\underbrace {{\phi ^T}\left( {{x_i}} \right)\tilde z}_{{\rm{scalar}}}{\rm{ }} = \tilde \lambda \tilde z N1i=1∑Nϕ(xi)scalar
ϕT(xi)z~=λ~z~
-
将线性组合带入到 H ~ z ~ = λ ~ z ~ \tilde{H} \tilde{z}=\tilde{\lambda} \tilde{z} H~z~=λ~z~中:
1 N ϕ ( X ) ϕ T ( X ) ϕ ( X ) α = λ ~ ϕ ( X ) α \begin{aligned} \frac{1}{N} \phi(X)\phi^T(X)\phi(X) \alpha&=\tilde{\lambda} \phi(X) \alpha \end{aligned} N1ϕ(X)ϕT(X)ϕ(X)α=λ~ϕ(X)α -
定义 kernel 函数为 k ( x i , x j ) = ϕ T ( x i ) ϕ ( x j ) k\left(x_{i}, x_{j}\right)=\phi^{T}\left(x_{i}\right) \phi\left(x_{j}\right) k(xi,xj)=ϕT(xi)ϕ(xj), K = ϕ T ( X ) ϕ ( X ) ∈ R N × N K=\phi^T(X)\phi(X) \in \mathbb{R}^{N \times N} K=ϕT(X)ϕ(X)∈RN×N, K ( i , j ) = k ( x i , x j ) K(i, j) = k\left(x_{i}, x_{j}\right) K(i,j)=k(xi,xj), K K K是一个对称矩阵:
1 N ϕ ( X ) K α = λ ~ ϕ ( X ) α \frac{1}{N} \phi(X)K \alpha=\tilde{\lambda} \phi(X) \alpha N1ϕ(X)Kα=λ~ϕ(X)α -
两边同时乘以 ϕ T ( X ) \phi^T\left(X\right) ϕT(X):
1 N ϕ T ( X ) ϕ ( X ) K α = λ ~ ϕ T ( X ) ϕ ( X ) α \frac{1}{N}\phi^T\left(X\right) \phi(X)K \alpha=\tilde{\lambda} \phi^T\left(X\right)\phi(X) \alpha N1ϕT(X)ϕ(X)Kα=λ~ϕT(X)ϕ(X)α -
上述公式还可以写成:
K 2 α = N λ ~ K α K^{2} \alpha=N \tilde{\lambda} K \alpha K2α=Nλ~Kα -
两边去掉 K K K :
K α = N λ ~ α K α = λ α \begin{aligned} &K \alpha=N \tilde{\lambda} \alpha \\ &K \alpha=\lambda \alpha \end{aligned} Kα=Nλ~αKα=λα -
可以得到特征向量 α r \alpha_{r} αr 和特征值 λ r , r = 1 , ⋯ , l \lambda_{r}, r=1, \cdots, l λr,r=1,⋯,l
-
但是,必须保证 z ~ \tilde{z} z~ 是单位向量,值得注意的是,我们是在特征空间中解的线性PCA:
H ~ z ~ = λ ~ z ~ z ~ = ∑ j = 1 N α j ϕ ( x j ) = ϕ ( X ) α \tilde{H} \tilde{z}=\tilde{\lambda} \tilde{z} \quad \tilde{z}=\sum_{j=1}^{N} \alpha_{j} \phi\left(x_{j}\right)=\phi(X) \alpha H~z~=λ~z~z~=j=1∑Nαjϕ(xj)=ϕ(X)α -
z ~ \tilde{z} z~ 归一化流程为:
1 = z ~ r T z ~ r 1 = ∑ i = 1 N ∑ j = 1 N α r i α r j ϕ T ( x i ) ϕ ( x j ) = α r T ϕ T ( X ) ϕ ( X ) α r 1 = α r T K α r \begin{aligned} 1 &=\tilde{z}_{r}^{T} \tilde{z}_{r} \\ 1 &=\sum_{i=1}^{N} \sum_{j=1}^{N} \alpha_{r i} \alpha_{r j} \phi^{T}\left(x_{i}\right) \phi\left(x_{j}\right) =\alpha_{r}^{T} \phi^{T}\left(X\right) \phi\left(X\right) \alpha_{r} \\ 1 &=\alpha_{r}^{T} K \alpha_{r} \end{aligned} 111=z~rTz~r=i=1∑Nj=1∑NαriαrjϕT(xi)ϕ(xj)=αrTϕT(X)ϕ(X)αr=αrTKαr -
注意到 K α = λ α K \alpha=\lambda \alpha Kα=λα, 有:
α r T λ r α r = 1 , ∀ r \alpha_{r}^{T} \lambda_{r} \alpha_{r}=1, \forall r αrTλrαr=1,∀r -
即, 将 α r \alpha_{r} αr 归一化为 1 / λ r 1 / \lambda_{r} 1/λr,得到归一化后的 α ~ r \tilde\alpha_{r} α~r.
-
现在, 第 r t h r^{t h} rth 主要向量为:
z ~ r = ∑ j = 1 N α ~ r j ϕ ( x j ) \tilde{z}_{r}=\sum_{j=1}^{N} \tilde\alpha_{r j} \phi\left(x_{j}\right) z~r=j=1∑Nα~rjϕ(xj) -
现在,我们已知数据点 x x x的映射 ϕ ( x ) \phi(x) ϕ(x)的主要向量,将 ϕ ( x ) \phi(x) ϕ(x)映射到 z ~ r \tilde{z}_{r} z~r的方向上去:
ϕ T ( x ) z ~ r = ∑ j = 1 N α r j ϕ T ( x ) ϕ ( x j ) = ∑ j = 1 N α r j k ( x , x j ) \phi^{T}(x) \tilde{z}_{r}=\sum_{j=1}^{N} \alpha_{r j} \phi^{T}(x) \phi\left(x_{j}\right)=\sum_{j=1}^{N} \alpha_{r j} k\left(x, x_{j}\right) ϕT(x)z~r=j=1∑NαrjϕT(x)ϕ(xj)=j=1∑Nαrjk(x,xj) -
但是,还有一件事需要考虑,我们之前是假设 ϕ ( x i ) \phi\left(x_{i}\right) ϕ(xi)是相对于中心归一化的,所以还要通过kernel保证 ϕ ( x i ) \phi\left(x_{i}\right) ϕ(xi)的归一化。
-
将 ϕ ( x i ) \phi\left(x_{i}\right) ϕ(xi) 相对于中心点归一化:
ϕ ~ ( x i ) = ϕ ( x i ) − 1 N ∑ j = 1 N ϕ ( x j ) \tilde{\phi}\left(x_{i}\right)=\phi\left(x_{i}\right)-\frac{1}{N} \sum_{j=1}^{N} \phi\left(x_{j}\right) ϕ~(xi)=ϕ(xi)−N1j=1∑Nϕ(xj) -
为简化表达式,引入 N N N维向量 1 N × 1 = [ 1 , 1 , . . . , 1 ] T \mathbf{1}_{N\times1}=[1, 1,...,1]^T 1N×1=[1,1,...,1]T,则
ϕ ~ ( x i ) = ϕ ( x i ) − 1 N ϕ ( X ) 1 N × 1 \tilde{\phi}\left(x_{i}\right)=\phi\left(x_{i}\right)-\frac{1}{N} \phi(X)\mathbf{1}_{N\times1} ϕ~(xi)=ϕ(xi)−N1ϕ(X)1N×1 -
将矩阵 ϕ ( X ) \phi(\mathbf{X}) ϕ(X) 中所有向量中心化得到:
ϕ ~ ( X ) = [ ϕ ( x 1 ) − 1 N ϕ ( X ) 1 N × 1 , ϕ ( x 2 ) − 1 N ϕ ( X ) 1 N × 1 , … , ϕ ( x N ) − 1 N ϕ ( X ) 1 N × 1 ] = [ ϕ ( x 1 ) , ϕ ( x 2 ) , … , ϕ ( x N ) ] − 1 N ϕ ( X ) 1 N × 1 1 N × 1 T = ϕ ( X ) − 1 N ϕ ( X ) 1 N × 1 1 N × 1 T \begin{gathered} \widetilde{\phi}(\mathbf{X})=\left[\phi\left(\mathbf{x}_{1}\right)-\frac{1}{N} \phi(\mathbf{X}) \mathbf{1}_{\mathbf{N} \times \mathbf{1}}, \phi\left(\mathbf{x}_{2}\right)-\frac{1}{N} \phi(\mathbf{X}) \mathbf{1}_{\mathbf{N} \times \mathbf{1}}, \ldots, \phi\left(\mathbf{x}_{\mathbf{N}}\right)-\frac{1}{N} \phi(\mathbf{X}) \mathbf{1}_{\mathbf{N} \times \mathbf{1}}\right] \\ =\left[\phi\left(\mathbf{x}_{\mathbf{1}}\right), \phi\left(\mathbf{x}_{\mathbf{2}}\right), \ldots, \phi\left(\mathbf{x}_{\mathbf{N}}\right)\right]-\frac{1}{N} \phi(\mathbf{X}) \mathbf{1}_{\mathbf{N} \times \mathbf{1}} \mathbf{1}_{\mathbf{N} \times \mathbf{1}}^{T} \\ =\phi(\mathbf{X})-\frac{1}{N} \phi(\mathbf{X}) \mathbf{1}_{\mathbf{N} \times \mathbf{1}} \mathbf{1}_{\mathbf{N} \times \mathbf{1}^{T}} \end{gathered} ϕ (X)=[ϕ(x1)−N1ϕ(X)1N×1,ϕ(x2)−N1ϕ(X)1N×1,…,ϕ(xN)−N1ϕ(X)1N×1]=[ϕ(x1),ϕ(x2),…,ϕ(xN)]−N1ϕ(X)1N×11N×1T=ϕ(X)−N1ϕ(X)1N×11N×1T -
为方便表示,我们再次引入矩阵 1 N = 1 N 1 N × L 1 N × 1 T \mathbf{1}_{\mathbf{N}}=\frac{1}{N} \mathbf{1}_{\mathbf{N} \times \mathbb{L}} \mathbf{1}_{\mathrm{N} \times \mathbf{1}}{ }^{T} 1N=N11N×L1N×1T ,表示一个 N × N N \times N N×N 的矩阵,其每 个元素都为 1 N \frac{1}{N} N1 。则 ϕ ~ ( X ) \widetilde{\phi}(\mathbf{X}) ϕ (X) 可简记为:
ϕ ~ ( X ) = ϕ ( X ) − ϕ ( X ) 1 N \widetilde{\phi}(\mathbf{X})=\phi(\mathbf{X})-\phi(\mathbf{X}) \mathbf{1}_{\mathbf{N}} ϕ (X)=ϕ(X)−ϕ(X)1N -
如前文所说,我们并不需要显式地计算 ϕ ~ ( X ) \tilde{\phi}(\mathbf{X}) ϕ~(X) ,只需计算中心化后的核矩阵即可:
K ~ i j = ϕ ~ ( x i ) T ϕ ~ ( x j ) = [ ϕ ( x i ) − 1 N ϕ ( X ) 1 N × 1 ] T [ ϕ ( x j ) − 1 N ϕ ( X ) 1 N × 1 ] \widetilde{\mathbf{K}}_{i j}=\widetilde{\phi}\left(\mathbf{x}_{\mathbf{i}}\right)^{T} \widetilde{\phi}\left(\mathbf{x}_{\mathbf{j}}\right)=\left[\phi\left(\mathbf{x}_{\mathbf{i}}\right)-\frac{1}{N} \phi(\mathbf{X}) \mathbf{1}_{\mathbf{N} \times \mathbf{1}}\right]^{T}\left[\phi\left(\mathbf{x}_{\mathbf{j}}\right)-\frac{1}{N} \phi(\mathbf{X}) \mathbf{1}_{\mathbf{N} \times \mathbf{1}}\right] K ij=ϕ (xi)Tϕ (xj)=[ϕ(xi)−N1ϕ(X)1N×1]T[ϕ(xj)−N1ϕ(X)1N×1]- 或者,我们可以直接推导中心化整个核矩阵的表达式,更为紧凑:
K ~ = ϕ ~ ( X ) T ϕ ~ ( X ) = [ ϕ ( X ) − ϕ ( X ) 1 N ] T [ ϕ ( X ) − ϕ ( X ) 1 N ] = ϕ ( X ) T ϕ ( X ) − ϕ ( X ) T ϕ ( X ) 1 N − 1 N T ϕ ( X ) T ϕ ( X ) + 1 N T ϕ ( X ) T ϕ ( X ) 1 N = K − K ⋅ 1 N − 1 N T ⋅ K + 1 N T ⋅ K ⋅ 1 N \begin{gathered} \widetilde{\mathbf{K}}=\widetilde{\phi}(\mathbf{X})^{T} \widetilde{\phi}(\mathbf{X})=\left[\phi(\mathbf{X})-\phi(\mathbf{X}) \mathbf{1}_{\mathbf{N}}\right]^{T}\left[\phi(\mathbf{X})-\phi(\mathbf{X}) \mathbf{1}_{\mathbf{N}}\right] \\ =\phi(\mathbf{X})^{T} \phi(\mathbf{X})-\phi(\mathbf{X})^{T} \phi(\mathbf{X}) \mathbf{1}_{\mathbf{N}}-\mathbf{1}_{\mathbf{N}}^{T} \phi(\mathbf{X})^{T} \phi(\mathbf{X})+\mathbf{1}_{\mathbf{N}}^{T} \phi(\mathbf{X})^{T} \phi(\mathbf{X}) \mathbf{1}_{\mathbf{N}} \\ =\mathbf{K}-\mathbf{K} \cdot \mathbf{1}_{\mathbf{N}}-\mathbf{1}_{\mathbf{N}}^{T} \cdot \mathbf{K}+\mathbf{1}_{\mathbf{N}}^{T} \cdot \mathbf{K} \cdot \mathbf{1}_{\mathbf{N}} \end{gathered} K =ϕ (X)Tϕ (X)=[ϕ(X)−ϕ(X)1N]T[ϕ(X)−ϕ(X)1N]=ϕ(X)Tϕ(X)−ϕ(X)Tϕ(X)1N−1NTϕ(X)Tϕ(X)+1NTϕ(X)Tϕ(X)1N=K−K⋅1N−1NT⋅K+1NT⋅K⋅1N
- 或者,我们可以直接推导中心化整个核矩阵的表达式,更为紧凑:
-
由于 1 N T = 1 N 1_{\mathrm{N}}{ }^{T}=1_{\mathrm{N}} 1NT=1N 为对称矩阵,所以上式又可简化为
K ~ = K − K ⋅ 1 N − 1 N ⋅ K + 1 N ⋅ K ⋅ 1 N \widetilde{\mathbf{K}}=\mathbf{K}-\mathbf{K} \cdot \mathbf{1}_{\mathbf{N}}-\mathbf{1}_{\mathbf{N}} \cdot \mathbf{K}+\mathbf{1}_{\mathbf{N}} \cdot \mathbf{K} \cdot \mathbf{1}_{\mathbf{N}} K =K−K⋅1N−1N⋅K+1N⋅K⋅1N
可选的kernel
- Linear k ( x i , x j ) = x i T x j k\left(x_{i}, x_{j}\right)=x_{i}^{T} x_{j} k(xi,xj)=xiTxj
- Polynomial k ( x i , x j ) = ( 1 + x i T x j ) p k\left(x_{i}, x_{j}\right)=\left(1+x_{i}^{T} x_{j}\right)^{p} k(xi,xj)=(1+xiTxj)p
- Gaussian k ( x i , x j ) = e − β ∥ x i − x j ∥ 2 k\left(x_{i}, x_{j}\right)=e^{-\beta\left\|x_{i}-x_{j}\right\|_{2}} k(xi,xj)=e−β∥xi−xj∥2
- Laplacian k ( x i , x j ) = e − β ∥ x i − x j ∥ 1 k\left(x_{i}, x_{j}\right)=e^{-\beta\left\|x_{i}-x_{j}\right\|_{1}} k(xi,xj)=e−β∥xi−xj∥1
- Sigmoid
6.总结
- 选择一个核 k ( x i , x j ) k\left(x_{i}, x_{j}\right) k(xi,xj), 计算 Gram matrix K ( i , j ) = k ( x i , x j ) K(i, j)=k\left(x_{i}, x_{j}\right) K(i,j)=k(xi,xj)
- 对 K K K进行归一化:
K ~ = K − K ⋅ 1 N − 1 N ⋅ K + 1 N ⋅ K ⋅ 1 N \widetilde{\mathbf{K}}=\mathbf{K}-\mathbf{K} \cdot \mathbf{1}_{\mathbf{N}}-\mathbf{1}_{\mathbf{N}} \cdot \mathbf{K}+\mathbf{1}_{\mathbf{N}} \cdot \mathbf{K} \cdot \mathbf{1}_{\mathbf{N}} K =K−K⋅1N−1N⋅K+1N⋅K⋅1N - 解出 K ~ \widetilde{K} K
的特征值和特征向量:
K ~ α r = λ r α r \widetilde{K} \alpha_{r}=\lambda_{r} \alpha_{r} K αr=λrαr - 将 α r \alpha_{r} αr 归一化为 α r T α r = 1 λ r \alpha_{r}^{T} \alpha_{r}=\frac{1}{\lambda_{r}} αrTαr=λr1
- 对于任意的数据点 x ∈ R n x \in \mathbb{R}^{n} x∈Rn,计算其到第 r t h r^{t h} rth个主要向量方向上的投影 y r ∈ R y_{r} \in \mathbb{R} yr∈R:
y r = ϕ T ( x ) z ~ r = ∑ j = 1 N α r j k ( x , x j ) y_{r}=\phi^{T}(x) \tilde{z}_{r}=\sum_{j=1}^{N} \alpha_{r j} k\left(x, x_{j}\right) yr=ϕT(x)z~r=j=1∑Nαrjk(x,xj)
参考文献
[1] https://zhuanlan.zhihu.com/p/59775730 数据降维: 核主成分分析(Kernel PCA)原理解析
[2] https://zhuanlan.zhihu.com/p/59803328 数据降维: 主成分分析(PCA)与NIPALS算法数学原理
魔乐社区(Modelers.cn) 是一个中立、公益的人工智能社区,提供人工智能工具、模型、数据的托管、展示与应用协同服务,为人工智能开发及爱好者搭建开放的学习交流平台。社区通过理事会方式运作,由全产业链共同建设、共同运营、共同享有,推动国产AI生态繁荣发展。
更多推荐



所有评论(0)