Skip to content

Commit

Permalink
Merge pull request #14 from openproblems-bio/bugfix/issue-13/adjust-w…
Browse files Browse the repository at this point in the history
…aypoint-distances

Adjust waypoint distances
  • Loading branch information
lazappi authored Dec 19, 2024
2 parents df707e4 + 1ecb467 commit 575355a
Show file tree
Hide file tree
Showing 8 changed files with 50 additions and 74 deletions.
4 changes: 3 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -7,4 +7,6 @@ target
.DS_Store
output
trace-*
.ipynb_checkpoints
.ipynb_checkpoints
*.h5ad
temp/
24 changes: 11 additions & 13 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,28 +21,26 @@
## BUGFIXES -->

# dimensionality_reduction 0.2.0 2024-12-09
# dimensionality_reduction 0.2.0 2024-12-19

## NEW FUNCTIONALITY

* Add calculation of distances to/between waypoints and label centroids to dataset pre-processing
* Add a post-processing component that calculates distances in the embedding space
* Add to centroid and between label distance correlation scores
* Define waypoint cells during dataset processing for use by metrics (PR #11, PR #14)
* Add calculation of distances between waypoints to dataset preprocessing (PR #11, PR #14)
* Define label centroids in dataset preprocessing and calculate distances between centroids and from waypoints to centroids (PR #11, PR #14)
* Add a post-processing component that calculates distances in the embedding space (PR #11, PR #14)
* Add to centroid and between label distance correlation scores (PR #11, PR #14)

## MAJOR CHANGES

* Modify co-ranking metrics to use pre-computed distances
* Modify distance correlation metrics to use pre-computed distances
* Move spectral distance correlation to a separate component
* Disable the trustworthiness metric as it is calculated as part of the co-ranking metrics
* Modify co-ranking metrics to use pre-computed distances (PR #11)
* Modify distance correlation metrics to use pre-computed distances (PR #11)
* Move spectral distance correlation to a separate component (PR #11)
* Disable the trustworthiness metric as it is calculated as part of the co-ranking metrics (PR #11)

## DOCUMENTATION

* Update documentation for distance correlation metrics

## MINOR CHANGES

* Speed up calculating distance matrices in the co-ranking metrics (PR #4)
* Update documentation for distance correlation metrics (PR #11, PR #14)

# dimensionality_reduction 0.1.3 2024-10-09

Expand Down
11 changes: 3 additions & 8 deletions src/api/file_processed_embedding.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -12,14 +12,6 @@ info:
name: X_emb
description: The dimensionally reduced embedding.
required: true
- type: double
name: waypoint_distances
description: Euclidean distances between all cells and waypoint cells calculated using the embedding.
required: true
- type: double
name: centroid_distances
description: Euclidean distances between all cells and label centroids calculated using the embedding.
required: true
uns:
- type: string
name: dataset_id
Expand All @@ -39,6 +31,9 @@ info:
- name: label_centroids
type: double
description: Centroid positions of each label in the normalized expression space.
- name: waypoint_centroid_distances
type: double
description: Euclidean distances from waypoint cells to label centroids.
- name: between_centroid_distances
type: double
description: Euclidean distances between label centroids.
12 changes: 3 additions & 9 deletions src/api/file_solution.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -28,15 +28,6 @@ info:
name: hvg_score
description: High variability gene score (normalized dispersion). The greater, the more variable.
required: true
obsm:
- type: double
name: waypoint_distances
description: Euclidean distances between all cells and waypoint cells calculated using normalized data.
required: true
- type: double
name: centroid_distances
description: Euclidean distances between all cells and label centroids calculated using normalized data.
required: true
uns:
- type: string
name: dataset_id
Expand Down Expand Up @@ -76,6 +67,9 @@ info:
- name: label_centroids
type: double
description: Centroid positions of each label in the normalized expression space.
- name: waypoint_centroid_distances
type: double
description: Euclidean distances from waypoint cells to label centroids.
- name: between_centroid_distances
type: double
description: Euclidean distances between label centroids.
29 changes: 12 additions & 17 deletions src/data_processors/process_dataset/script.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,26 +31,21 @@
print(adata, flush=True)

print("\n>>> Selecting waypoint cells...", flush=True)
if adata.n_obs < 10000:
n_waypoints = 50000
if adata.n_obs <= n_waypoints:
print(f"Using all {adata.n_obs} cells as waypoints", flush=True)
waypoint_cells = adata.obs_names
else:
print(f"Using {n_waypoints} random cells as waypoints", flush=True)
np.random.seed(0) # Try to get the same cells each time
waypoint_cells = np.random.choice(adata.obs_names, 10000, replace=False)
waypoint_cells = np.random.choice(adata.obs_names, n_waypoints, replace=False)
adata.obs["is_waypoint"] = adata.obs_names.isin(waypoint_cells)

print("\n>>> Calculating distances to waypoints...", flush=True)
adata.obsm["waypoint_distances"] = pairwise_distances(
adata.layers["normalized"],
adata.layers["normalized"][adata.obs["is_waypoint"].values, :],
metric="euclidean",
n_jobs=-2,
)
np.fill_diagonal(adata.obsm["waypoint_distances"], 0)
is_waypoint = adata.obs["is_waypoint"].values

print("\n>>> Calculating distances between waypoints...", flush=True)
adata.uns["between_waypoint_distances"] = pairwise_distances(
adata.layers["normalized"][adata.obs["is_waypoint"].values, :],
adata.layers["normalized"][adata.obs["is_waypoint"].values, :],
adata.layers["normalized"][is_waypoint, :],
adata.layers["normalized"][is_waypoint, :],
metric="euclidean",
n_jobs=-2,
)
Expand All @@ -66,11 +61,11 @@

adata.uns["label_centroids"] = centroids

print("\n>>> Calculating distances to centroids...", flush=True)
adata.obsm["centroid_distances"] = pairwise_distances(
adata.layers["normalized"], centroids, metric="euclidean", n_jobs=-2
print("\n>>> Calculating distances from waypoints to centroids...", flush=True)
adata.uns["waypoint_centroid_distances"] = pairwise_distances(
adata.layers["normalized"][is_waypoint, :], centroids, metric="euclidean", n_jobs=-2
)
np.fill_diagonal(adata.obsm["centroid_distances"], 0)
np.fill_diagonal(adata.uns["waypoint_centroid_distances"], 0)

print("\n>>> Calculating distances between centroids...", flush=True)
adata.uns["between_centroid_distances"] = pairwise_distances(
Expand Down
22 changes: 7 additions & 15 deletions src/data_processors/process_embedding/script.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,20 +21,12 @@
# Make sure cells have the same order
adata = adata[solution.obs_names, :].copy()
print(adata, flush=True)

print("\n>>> Calculating distances to waypoints...", flush=True)
adata.obsm["waypoint_distances"] = pairwise_distances(
adata.obsm["X_emb"],
adata.obsm["X_emb"][solution.obs["is_waypoint"].values, :],
metric="euclidean",
n_jobs=-2,
)
np.fill_diagonal(adata.obsm["waypoint_distances"], 0)
is_waypoint = solution.obs["is_waypoint"].values

print("\n>>> Calculating distances between waypoints...", flush=True)
adata.uns["between_waypoint_distances"] = pairwise_distances(
adata.obsm["X_emb"][solution.obs["is_waypoint"].values, :],
adata.obsm["X_emb"][solution.obs["is_waypoint"].values, :],
adata.obsm["X_emb"][is_waypoint, :],
adata.obsm["X_emb"][is_waypoint, :],
metric="euclidean",
n_jobs=-2,
)
Expand All @@ -50,11 +42,11 @@

adata.uns["label_centroids"] = centroids

print("\n>>> Calculating distances to centroids...", flush=True)
adata.obsm["centroid_distances"] = pairwise_distances(
adata.obsm["X_emb"], centroids, metric="euclidean", n_jobs=-2
print("\n>>> Calculating distances from waypoints to centroids...", flush=True)
adata.uns["waypoint_centroid_distances"] = pairwise_distances(
adata.obsm["X_emb"][is_waypoint, :], centroids, metric="euclidean", n_jobs=-2
)
np.fill_diagonal(adata.obsm["centroid_distances"], 0)
np.fill_diagonal(adata.uns["waypoint_centroid_distances"], 0)

print("\n>>> Calculating distances between centroids...", flush=True)
adata.uns["between_centroid_distances"] = pairwise_distances(
Expand Down
10 changes: 5 additions & 5 deletions src/metrics/distance_correlation/config.vsh.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -7,10 +7,10 @@ info:
metrics:
- name: waypoint_distance_correlation
label: Waypoint Distance Correlation
summary: "Calculates the distance correlation by computing Spearman correlations between distances to waypoint cells."
summary: "Calculates the distance correlation by computing Spearman correlations between distances between waypoint cells."
description: |
Calculates the distance correlation by computing Spearman correlations
between distances to waypoint cells on the full (or processed) data
between distances between waypoint cells on the full (or processed) data
matrix and the dimensionally-reduced matrix. Also known as the
cellstruct global single-cell (GS) score when using Pearson correlation.
references:
Expand All @@ -25,9 +25,9 @@ info:
summary: "Calculates the distance correlation by computing Spearman correlations between distances to label centroids."
description: |
Calculates the distance correlation by computing Spearman correlations
between distances to label centroids on the full (or processed) data
matrix and the dimensionally-reduced matrix. Also known as Point-Cluster
Distance (PCD) correlation.
between distances from waypoint cells to label centroids on the full
(or processed) data matrix and the dimensionally-reduced matrix. Also
known as Point-Cluster Distance (PCD) correlation.
references:
doi:
- 10.1038/s41467-023-37478-w
Expand Down
12 changes: 6 additions & 6 deletions src/metrics/distance_correlation/script.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,15 +22,15 @@
embedding = ad.read_h5ad(par["input_embedding"])
print(embedding, flush=True)

print("\n>>> Calculating waypoint distance correlation..", flush=True)
high_dists = solution.obsm["waypoint_distances"]
emb_dists = embedding.obsm["waypoint_distances"]
print("\n>>> Calculating between waypoint distance correlation..", flush=True)
high_dists = solution.uns["between_waypoint_distances"]
emb_dists = embedding.uns["between_waypoint_distances"]
waypoint_corr = scipy.stats.spearmanr(high_dists, emb_dists, axis=None).correlation
print(f"Waypoint distance correlation: {waypoint_corr}", flush=True)

print("\n>>> Calculating centroid distance correlation..", flush=True)
high_dists = solution.obsm["centroid_distances"]
emb_dists = embedding.obsm["centroid_distances"]
print("\n>>> Calculating waypoint-centroid distance correlation..", flush=True)
high_dists = solution.uns["waypoint_centroid_distances"]
emb_dists = embedding.uns["waypoint_centroid_distances"]
centroid_corr = scipy.stats.spearmanr(high_dists, emb_dists, axis=None).correlation
print(f"Centroid distance correlation: {centroid_corr}", flush=True)

Expand Down

0 comments on commit 575355a

Please sign in to comment.