pca_features#

tangles.separations.finding.pca_features(M: ndarray, k: int | None = None, use_J: bool = False) → ndarray | tuple[ndarray, ndarray]#

Generate features using a method inspired by Principal Component Analysis (PCA).

In principal component analysis we identify principal components: orthogonal vectors which describe directions of high covariance in the data set.

If we interpret these orthogonal vectors as a coordinate system, we can assign to every data-point a score for each of the orthogonal values, namely what the coordinate of the data-point is with regards to this axis. The feature for a particular component then is the set of points which have a positive score with regard to that vector.

Parameters#

Mnp.ndarray: A matrix of shape (\(n\), \(p\)), where \(n\) is the number of measurements, and \(p\) is the number of dimensions of each measurement.
kint, optional: The number of eigenvectors to return. The eigenvectors with the lowest value (i.e. the greatest magnitude) get returned first. Defaults to None in which case every eigenvector gets returned. Accepts negative values, where \(-k\) leads to the exclusion of the last \(k\) eigenvectors.
use_Jbool: There is a shortcut for calculating these features by directly calculating the eigenvalues of \(J = -MM^T\). This makes sense if the dimension \(p\) is large or the number of data points \(n\) is small. Defaults to False.

Returns#

np.ndarray of dtype int: The PCA features.