This repository hosts the code and resources for predicting histone mark age in human tissues and cells. Our preprint detailing the methodology and results can be found here.
analysis/
: Analysis notebook files for the manuscript's four main sections [for peer-review].metadata/
: Metadata required to run the scripts.results/
: Tables with the data and statistics for the plots in Figure 2.results/models/
: Feature selection, dimensionality reduction, and ARD regressor for each histone mark age predictor and the pan histone age predictor.scripts/
: Scripts to reproduce the paper's results.tutorial/
: Tutorial notebook for predicting histone mark age with your own data.main.sh
: Main shell script calling other scripts in the scripts folder.requirements.txt
: Required dependencies.LICENSE
: License file..gitignore
: Files to ignore in Git (e.g., .DS_Store).README.md
: This README file.
The recommended way to run the histone mark age predictors is with pyaging. In the documentation page of the package, a detailed tutorial is available.
Alternatively, follow this step-by-step guide to use the histone mark age predictors. A more detailed version is available in the 'tutorial' folder named 'tutorial.ipynb'.
-
Load required packages: Import the necessary packages for data processing and prediction.
-
Download an Example File from the ENCODE Project: Execute the command to download a training sample (bigWig file) used in the models. Alternatively, if you would like to use your own ChIP-Seq data, please refer to the ENCODE website for guidelines to handle your biosample appropriately and to the ENCODE ChIP-Seq pipeline GitHub to obtain the bigWig file from the sequencing data.
-
Process the bigWig File: Use the function `process_bigWig(bigWig_file_path, annotation_file_path)` to extract genomic annotations and transform signal values.
-
Predict the Histone Mark Age: Utilize the function `predict_histone_mark_age(processed_sample, histone)` to predict age based on the processed sample for a given histone type.
-
Print the Result: The code will print the predicted histone mark age.
Example code snippet:
sample = process_bigWig('ENCFF386QWG.bigWig')
histone_mark = 'H3K4me3'
y_hat = predict_histone_mark_age(sample, histone=histone_mark)[0]
print(f'The predicted {histone_mark} age is {round(y_hat,3)} years.')
All data used was publicly available from the ENCODE project. This can be programmatically accessed and downloaded through the scripts in this GitHub. Nevertheless, to download the already-processed data with the results, please access our Google Drive. This should make it easier to train any future models.
- Set up Environment: Spin up an AWS SageMaker instance (e.g.,
ml.t3.2xlarge
) or any other computer. - Clone Repository: Clone this repository to your environment.
- Download Processed ENCODE Data (optional): Access our Google Drive. Copy the
data
folder and all files within to the root of your directory. If you've already downloaded the processed ENCODE data, comment out the download scripts. - Run
main.sh
: Run themain.sh
script to replicate the results.
To cite our study, please use the following:
de Lima Camillo, L.P., Asif, M. H., Horvath, S., Larschan, E. & Singh, R. Histone mark age of human tissues and cell types. Science Advances (2025). 10.1126/sciadv.adk9373
BibTex citation:
@article {de_Lima_Camillo_HistoneClocks,
author = {de Lima Camillo, Lucas Paulo and Asif, Muhammad H. and Horvath, Steve and Larschan, Erica and Singh, Ritambhara},
title = {Histone mark age of human tissues and cell types},
year = {2025},
doi = {10.1126/sciadv.adk9373},
URL = {https://www.science.org/doi/10.1126/sciadv.adk9373},
journal = {Science Advances}
}