PCA using the augmented correlation matrix #13

rmflight · 2017-01-27T18:16:42Z

It would be really cool to be able to do a PCA decomposition on the augmented or weighted correlation matrix generated by pairwise_correlations, so that the PCA actually reflects the augmented correlation directly.

There may be a way to do this via eigen and then generating the scores, keeping in mind that PCA on the correlation is already scaled and centered.

Note that I think we would have to set the diagonal to 1 for this to work properly.

Thoughts @hunter-moseley ??

The text was updated successfully, but these errors were encountered:

rmflight · 2017-01-27T18:17:57Z

This could be tested by generating a correlation matrix for data with non-missing values, and verifying that the centered / scaled PCA results match those from the correlation matrix.

rmflight · 2017-01-27T20:12:54Z

Possibly helpful posts:

http://sebastianraschka.com/Articles/2014_pca_step_by_step.html

hunter-moseley · 2017-01-28T02:57:37Z

I think of this from the stand-point of embedding from a distance matrix. The correlation can be viewed as a normalized distance matrix and this is used to embed the rows/columns into an Euclidean space. Starting to understand the link you sent where the covariance matrix or correlation matrix shows dependency between variables which can be used to collapse the number of variables into principal components by calculating significant eigenvectors with large eigenvalues.

hunter-moseley · 2017-01-28T03:26:06Z

Just realized that the correlation matrix needs to be between the features and not the samples. If the current PCA we are using is not dropping zeros, then this approach is going to dramatically change the PCA results, since the correlation will be limited to features the co-occur and not over-weighted by the zeros.

rmflight · 2017-01-28T03:35:57Z

Yes, you are right, it needs to be between features. Right now, there is no way I know of to drop the zeros. On log scale they are either zeros (log1p), or log of 1e-8 or so. So the current PCA is more similar to doing correlation without dropping zeros in the sample. I'd have to look at that again to know how it compares to the augmented correlation.

…

On Fri, Jan 27, 2017, 10:26 PM Hunter Moseley ***@***.***> wrote: Just realized that the correlation matrix needs to be between the features and not the samples. If the current PCA we are using is not dropping zeros, then this approach is going to dramatically change the PCA results, since the correlation will be limited to features the co-occur and not over-weighted by the zeros. — You are receiving this because you were assigned. Reply to this email directly, view it on GitHub <#13 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABcI-v5MQ_gy1zyCPD4kUCwrEQwZyvG2ks5rWrVOgaJpZM4LwDpx> .

rmflight self-assigned this Jan 27, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PCA using the augmented correlation matrix #13

PCA using the augmented correlation matrix #13

rmflight commented Jan 27, 2017

rmflight commented Jan 27, 2017

rmflight commented Jan 27, 2017

hunter-moseley commented Jan 28, 2017

hunter-moseley commented Jan 28, 2017

rmflight commented Jan 28, 2017 via email

PCA using the augmented correlation matrix #13

PCA using the augmented correlation matrix #13

Comments

rmflight commented Jan 27, 2017

rmflight commented Jan 27, 2017

rmflight commented Jan 27, 2017

hunter-moseley commented Jan 28, 2017

hunter-moseley commented Jan 28, 2017

rmflight commented Jan 28, 2017 via email