Skip to content

Latest commit

 

History

History
11 lines (6 loc) · 961 Bytes

README.md

File metadata and controls

11 lines (6 loc) · 961 Bytes

Goal: simulate GxE phenotypes for workflow testing.

assemble_summary_statistics.R: Hack around in R to wrangle Excel Supplementary File from Sung 2018 into a straightforward summary statistics table.

download_1000G_vcfs.sh: Download VCFs containing genotype calls for 2504 individuals from the 1000G phase 3 v5 datast.

process_1000G_vcfs.sh: Filter VCFs first for MAF > 0.05 to generate a more reasonable dataset size, then for top variants from Sung 2018 (smoking -> blood pressure GWIS) and concatenate chromosomes.

simulate_phenos.R: Based on input genotypes and summary statistics, simulate phenotypes to contain (sparse) genetic signal.

fetch_MIS_results.sh & subset_imputed_vcfs.sh: If MAF > 0.05 sequenced genotypes used above are imputed using the Michigan Imputation Server, fetch the results (fetch_MIS_results.sh) and subset to common variants (subset_1000G_vcfs.sh). These imputed genotypes can then be used for downstream interaction testing.