-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PCA using the augmented correlation matrix #13
Comments
This could be tested by generating a correlation matrix for data with non-missing values, and verifying that the centered / scaled PCA results match those from the correlation matrix. |
Possibly helpful posts: |
I think of this from the stand-point of embedding from a distance matrix. The correlation can be viewed as a normalized distance matrix and this is used to embed the rows/columns into an Euclidean space. Starting to understand the link you sent where the covariance matrix or correlation matrix shows dependency between variables which can be used to collapse the number of variables into principal components by calculating significant eigenvectors with large eigenvalues. |
Just realized that the correlation matrix needs to be between the features and not the samples. If the current PCA we are using is not dropping zeros, then this approach is going to dramatically change the PCA results, since the correlation will be limited to features the co-occur and not over-weighted by the zeros. |
Yes, you are right, it needs to be between features.
Right now, there is no way I know of to drop the zeros. On log scale they
are either zeros (log1p), or log of 1e-8 or so.
So the current PCA is more similar to doing correlation without dropping
zeros in the sample. I'd have to look at that again to know how it compares
to the augmented correlation.
…On Fri, Jan 27, 2017, 10:26 PM Hunter Moseley ***@***.***> wrote:
Just realized that the correlation matrix needs to be between the features
and not the samples. If the current PCA we are using is not dropping zeros,
then this approach is going to dramatically change the PCA results, since
the correlation will be limited to features the co-occur and not
over-weighted by the zeros.
—
You are receiving this because you were assigned.
Reply to this email directly, view it on GitHub
<#13 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABcI-v5MQ_gy1zyCPD4kUCwrEQwZyvG2ks5rWrVOgaJpZM4LwDpx>
.
|
It would be really cool to be able to do a PCA decomposition on the augmented or weighted correlation matrix generated by
pairwise_correlations
, so that the PCA actually reflects the augmented correlation directly.There may be a way to do this via
eigen
and then generating the scores, keeping in mind that PCA on the correlation is already scaled and centered.Note that I think we would have to set the diagonal to 1 for this to work properly.
Thoughts @hunter-moseley ??
The text was updated successfully, but these errors were encountered: