PCA数据预处理MATLAB

Reduce Input Dimensionality Using processpcaIn some situations, the dimension of the input vector is large, but the components of the vectors are highly correlated (redundant). It is useful in this si

phoenix@Capricornus

775人浏览 · 2024-06-10 09:34:53

phoenix@Capricornus · 2024-06-10 09:34:53 发布

Reduce Input Dimensionality Using processpca
In some situations, the dimension of the input vector is large, but the components of the vectors are highly correlated (redundant). It is useful in this situation to reduce the dimension of the input vectors. An effective procedure for performing this operation is principal component analysis. This technique has three effects: it orthogonalizes the components of the input vectors (so that they are uncorrelated with each other), it orders the resulting orthogonal components (principal components) so that those with the largest variation come first, and it eliminates those components that contribute the least to the variation in the data set. The following code illustrates the use of processpca, which performs a principal-component analysis using the processing setting maxfrac of 0.02.

[pn,ps1] = mapstd(p);
[ptrans,ps2] = processpca(pn,0.02);

The input vectors are first normalized, using mapstd, so that they have zero mean and unity variance. This is a standard procedure when using principal components. In this example, the second argument passed to processpca is 0.02. This means that processpca eliminates those principal components that contribute less than 2% to the total variation in the data set. The matrix ptrans contains the transformed input vectors. The settings structure ps2 contains the principal component transformation matrix. After the network has been trained, these settings should be used to transform any future inputs that are applied to the network. It effectively becomes a part of the network, just like the network weights and biases. If you multiply the normalized input vectors pn by the transformation matrix transMat, you obtain the transformed input vectors ptrans.

If processpca is used to preprocess the training set data, then whenever the trained network is used with new inputs, you should preprocess them with the transformation matrix that was computed for the training set, using ps2. The following code applies a new set of inputs to a network already trained.

pnewn = mapstd('apply',pnew,ps1);
pnewtrans = processpca('apply',pnewn,ps2);
a = sim(net,pnewtrans);

Principal component analysis is not reliably reversible. Therefore it is only recommended for input processing. Outputs require reversible processing functions.
在这里插入图片描述