Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Graph plots #197

Open
wants to merge 29 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 20 commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
622d7b8
Fix param name typo in function docstring
kamurani Jul 2, 2022
8a3f3a4
add scaling node size by "rsa" feature as well as degree
kamurani Jul 2, 2022
9c9520b
add option for scaling node size by meiler embedding dimensions. Tak…
kamurani Jul 2, 2022
44a0bf8
remove walrus operator := for compatability
kamurani Jul 6, 2022
920e14e
Merge pull request #1 from a-r-j/master
kamurani Jul 7, 2022
efc69bb
Merge pull request #2 from a-r-j/master
kamurani Jul 8, 2022
a2806b6
Add type hints
a-r-j Jul 8, 2022
024e7a0
Update changelog
a-r-j Jul 8, 2022
7569751
add support for sizing nodes by RSA and colouring by hydrophobicity i…
kamurani Jul 22, 2022
66baf18
Merge remote-tracking branch 'origin/master' into graph_plots
kamurani Jul 22, 2022
3513e0c
Merge branch 'master' into graph_plots
a-r-j Jul 22, 2022
21d496a
add amino acid 3-letter code mapping to hydrophobicity scales from th…
kamurani Jul 29, 2022
2f4f0ee
add amino acid 3-letter code mapping to hydrophobicity scales from th…
kamurani Jul 29, 2022
fbac7ea
colour by hydrophobicity implemented for different scales
kamurani Jul 29, 2022
5878e4f
refactor `_node_feature` function; colour_by and size_by msupported w…
kamurani Jul 29, 2022
8948b1f
fix import statement to use graphein actual
kamurani Jul 29, 2022
1212d5e
add `hydrophobicity()`. not sure if should be passed a parameter dec…
kamurani Jul 29, 2022
6b29498
"Add a utility for getting the names of node, edge and graph attribut…
a-r-j Aug 1, 2022
00f99a5
fix edge attribute selection in util
a-r-j Aug 1, 2022
28b0ee1
add test for attribute name selection util
a-r-j Aug 1, 2022
8f22fed
use typing_extensions Literal for 3.7 support and update changelog
a-r-j Aug 2, 2022
2b69d6e
docstring, black
a-r-j Oct 23, 2022
9743332
black
a-r-j Oct 23, 2022
43bc240
fix type import; black
a-r-j Oct 23, 2022
c5e6d83
Merge branch 'master' into graph_plots
a-r-j Oct 31, 2022
f43f5e7
Merge branch 'master' into graph_plots
a-r-j Feb 16, 2023
fd1f996
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Feb 16, 2023
37aaee8
Merge branch 'master' into graph_plots
a-r-j Mar 25, 2024
34c816b
Merge branch 'master' into graph_plots
a-r-j Oct 20, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,6 @@
### 1.5.1
* [Feature] - [#197](https://github.com/a-r-j/graphein/pull/197/) adds support for sizing and colouring nodes in asteroid plots


#### Protein

Expand All @@ -8,6 +10,7 @@
* [Feature] - [#189](https://github.com/a-r-j/graphein/pull/189/) adds a `residue_id` column to PDB dfs to enable easier accounting in atom graphs.
* [Feature] - [#189](https://github.com/a-r-j/graphein/pull/189/) refactors torch geometric datasets to use parallelised download for faster dataset preparation.


#### Bugfixes

* [Patch] - [#187](https://github.com/a-r-j/graphein/pull/187) updates sequence retrieval due to UniProt API changes.
Expand Down
42 changes: 41 additions & 1 deletion graphein/protein/features/nodes/amino_acid.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
import logging
from functools import lru_cache
from pathlib import Path
from typing import Any, Dict, List, Optional, Union
from typing import Any, Dict, List, Optional, Union, Literal

import numpy as np
import pandas as pd
Expand All @@ -18,6 +18,8 @@
HYDROGEN_BOND_ACCEPTORS,
HYDROGEN_BOND_DONORS,
RESI_THREE_TO_1,
HYDROPHOBICITY_SCALES,
HYDROPHOBICITY_TYPES
)
from graphein.utils.utils import onek_encoding_unk

Expand Down Expand Up @@ -248,3 +250,41 @@ def hydrogen_bond_acceptor(
if not sum_features:
features = np.array(features > 0).astype(int)
d["hbond_acceptors"] = features


def hydrophobicity(
n: str,
d: Dict[str, any],
mapping: HYDROPHOBICITY_TYPE = "kd",
return_array: bool = True,
) -> Union[np.ndarray, pd.Series]:
"""
Adds hydrophobicity values for each residue to graph nodes.
See :const:`~graphein.protein.resi_atoms.HYDROPHOBICITY_SCALES` for
values and available scales.

:param n: Node ID. Unused - kept to maintain consistent function signature.
:type n: str
:param d: Dictionary of node attributes.
:type d: Dict[str, any]
:param mapping: Which hydrophobicity scale to use. See
:const:`~graphein.protein.resi_atoms.HYDROPHOBICITY_TYPE` for supported types.
:type mapping: graphien.protein.resi_atoms.HYDROPHOBICITY_TYPE
:param return_array: If ``True``, returns a ``np.ndarray``, otherwise returns
a ``pd.Series``. Default is ``True``.
:type return_array: bool
"""
assert mapping in HYDROPHOBICITY_SCALES.keys(), f"Unsupported mapping: {mapping}. Supported mappings: {HYDROPHOBICITY_SCALES.keys()}"
hydr = HYDROPHOBICITY_SCALES[mapping]

amino_acid = d["residue_name"]
try:
features = hydr[amino_acid]
except:
features = pd.Series(np.zeros(1))

if return_array:
features = np.array(features)

d[f"hydrophobicity_{mapping}"] = features
return features
147 changes: 146 additions & 1 deletion graphein/protein/resi_atoms.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@
# Code Repository: https://github.com/a-r-j/graphein


from typing import Dict, List
from typing import Dict, List, Literal

import numpy as np
from sklearn.preprocessing import StandardScaler
Expand Down Expand Up @@ -836,6 +836,151 @@
https://pubs.acs.org/doi/10.1021/j100785a001
"""

HYDROPHOBICITY_TYPE = Literal["kd", "ww", "hh", "mf", "tt"]
"""Supported hydrophobicity types. See :const:`~graphein.protein.resi_atoms.HYDROPHOBICITY_SCALES` for further details."""

HYDROPHOBICITY_SCALES: Dict[str, Dict[str, float]] = {
"kd": { # kdHydrophobicity (a)
"ILE": 4.5,
"VAL": 4.2,
"LEU": 3.8,
"PHE": 2.8,
"CYS": 2.5,
"MET": 1.9,
"ALA": 1.8,
"GLY": -0.4,
"THR": -0.7,
"SER": -0.8,
"TRP": -0.9,
"TYR": -1.3,
"PRO": -1.6,
"HIS": -3.2,
"GLU": -3.5,
"GLN": -3.5,
"ASP": -3.5,
"ASN": -3.5,
"LYS": -3.9,
"ARG": -4.5,
},
"ww": { # wwHydrophobicity (b)
"ILE": 0.31,
"VAL": -0.07,
"LEU": 0.56,
"PHE": 1.13,
"CYS": 0.24,
"MET": 0.23,
"ALA": -0.17,
"GLY": -0.01,
"THR": -0.14,
"SER": -0.13,
"TRP": 1.85,
"TYR": 0.94,
"PRO": -0.45,
"HIS": -0.96,
"GLU": -2.02,
"GLN": -0.58,
"ASP": -1.23,
"ASN": -0.42,
"LYS": -0.99,
"ARG": -0.81,
},
"hh": { # hhHydrophobicity (c)
"ILE": -0.60,
"VAL": -0.31,
"LEU": -0.55,
"PHE": -0.32,
"CYS": -0.13,
"MET": -0.10,
"ALA": 0.11,
"GLY": 0.74,
"THR": 0.52,
"SER": 0.84,
"TRP": 0.30,
"TYR": 0.68,
"PRO": 2.23,
"HIS": 2.06,
"GLU": 2.68,
"GLN": 2.36,
"ASP": 3.49,
"ASN": 2.05,
"LYS": 2.71,
"ARG": 2.58,
},
"mf": { # mfHydrophobicity (d)
"ILE": -1.56,
"VAL": -0.78,
"LEU": -1.81,
"PHE": -2.20,
"CYS": 0.49,
"MET": -0.76,
"ALA": 0.0,
"GLY": 1.72,
"THR": 1.78,
"SER": 1.83,
"TRP": -0.38,
"TYR": -1.09,
"PRO": -1.52,
"HIS": 4.76,
"GLU": 1.64,
"GLN": 3.01,
"ASP": 2.95,
"ASN": 3.47,
"LYS": 5.39,
"ARG": 3.71,
},
"tt": { # ttHydrophobicity (e)
"ILE": 1.97,
"VAL": 1.46,
"LEU": 1.82,
"PHE": 1.98,
"CYS": -0.30,
"MET": 1.40,
"ALA": 0.38,
"GLY": -0.19,
"THR": -0.32,
"SER": -0.53,
"TRP": 1.53,
"TYR": 0.49,
"PRO": -1.44,
"HIS": -1.44,
"GLU": -2.90,
"GLN": -1.84,
"ASP": -3.27,
"ASN": -1.62,
"LYS": -3.46,
"ARG": -2.57,
}
}
"""
Set of (5) dictionaries that map amino acid 3-letter codes to their hydrophobicity.

The scales included are from Chimera (UCSF) https://www.cgl.ucsf.edu/chimera/docs/UsersGuide/midas/hydrophob.html
and are as follows:

* kdHydrophobicity
(a) A simple method for displaying the hydropathic character of a protein. Kyte J, Doolittle RF. J Mol Biol. 1982 May 5;157(1):105-32.
https://www.ncbi.nlm.nih.gov/pubmed/7108955

* wwHydrophobicity
(b) Experimentally determined hydrophobicity scale for proteins at membrane interfaces. Wimley WC, White SH. Nat Struct Biol. 1996 Oct;3(10):842-8.
https://www.ncbi.nlm.nih.gov/pubmed/8836100

* hhHydrophobicity
(c) Recognition of transmembrane helices by the endoplasmic reticulum translocon. Hessa T, Kim H, Bihlmaier K, Lundin C, Boekel J, Andersson H, Nilsson I, White SH, von Heijne G. Nature. 2005 Jan 27;433(7024):377-81, supplementary data.
https://www.ncbi.nlm.nih.gov/pubmed/15674282

In this scale, negative values indicate greater hydrophobicity.

* mfHydrophobicity
(d) Side-chain hydrophobicity scale derived from transmembrane protein folding into lipid bilayers. Moon CP, Fleming KG. Proc Natl Acad Sci USA. 2011 Jun 21;108(25):10174-7, supplementary data.
https://www.ncbi.nlm.nih.gov/pubmed/21606332

In this scale, negative values indicate greater hydrophobicity.

* ttHydrophobicity
(e) An amino acid “transmembrane tendency” scale that approaches the theoretical limit to accuracy for prediction of transmembrane helices: relationship to biological hydrophobicity. Zhao G, London E. Protein Sci. 2006 Aug;15(8):1987-2001.
https://www.ncbi.nlm.nih.gov/pubmed/16877712
"""

ISOELECTRIC_POINTS: Dict[str, float] = {
"ALA": 6.11,
Expand Down
Loading