Skip to content

Pipeline parameters

Zhiao Shi edited this page Jul 20, 2021 · 1 revision

The pipeline currently uses the following parameters.

name meaning default
--outdir output directory results
--catalog_file sample catalog file s3://zhanglab-pancancer/assets/CPTAC3.Catalog.dat
--genome_basedir location that contains genome index and annotation files s3://zhanglab-pancancer/reference/GRCh38.p13
--genome_ref_prefix prefix for genome reference file GRCh38.p13.genome.fa
--genome_ref path to the genome reference file s3://zhanglab-pancancer/reference/GRCh38.p13/GRCh38.p13.genome.fa
--genome_ref_index path to the genome index files s3://zhanglab-pancancer/reference/GRCh38.p13/GRCh38.p13.genome.fa.*
--genome_ref_anno path to genome annotation file s3://zhanglab-pancancer/reference/GRCh38.p13/gencode.v34.basic.annotation.flat.gtf
--data_source location type of the fastq files (gdc,s3 or local) gdc
--run_indexing whether or not to index the reference genome false
--start, --end starting/ending index for the samples need to be processed -1 for both, which means all samples in the case id file (source type: gdc) or all samples in the catalog file (source type: local or s3)
--allow_dup whether or not to allow duplicated sample names in the catalog file, applies to gdc case only false
--gdc-token path to GDC token file for downloading data from GDC