In addition to the data itself, we require some metadata about your file. When you use our website to load your data we fill in this metadata for you. When you use the command line, you will need to provide this data in an additional file.
The metadata file is a .json file and follows json formatting. The metadata .json file needs to be in the same directory as the data file. The metadata file and the data file need to have the same base name, including any file extensions (e.g. my_first_dataset and my_first_dataset.json OR my_second_dataset.txt and my_second_dataset.txt.json).
There are two required fields: type and cohort.
Type can be:
- 'genomicMatrix' -> genomic data where samples are columns and genomic regions are rows. Note that for loading on the command line we do not support the other orientation
- 'clinicalMatrix' -> phenotypic data where samples are rows and phenotypic columns are rows. Note that for loading on the command line we do not support the other orientation
- 'mutationVector' -> mutation data
- ‘genomicSegment’-> segmented copy number data
{"type":"mutationVector"}
Cohort is used to know if there are other data on the samples that you are loaded. You can either specify a pre-existing cohort or create your own. Cohort names are displayed on the dataset pages and the cohort drop down menu on the Heatmaps page.
For existing cohorts, you need to enter the cohort name EXACTLY as it appears as the existing cohort name. Note that our cohort names are case sensitive.
{"type":"mutationVector",
"cohort":"TCGA Breast Cancer"}
If you are loading a mutation or segmented copy number file you will also need to specify the reference genome. You do not need to specify this for other file types
{"type":"mutationVector",
"cohort":"TCGA Breast Cancer",
"assembly":"hg19"}
If you are loading a file that has probes, transcripts, or exons and you would like to query your data by gene, you will need to provide a mapping file. You do not need to specify this for other file types.
Here is an example probemap file (a delimitated file): https://toil.xenahubs.net/download/probeMap/gencode.v23.annotation.gene.probemap
#id gene chrom chromStart chromEnd strand
id_1 AADACL3 chr1 12776118 12776347 +
We have many probemap files that you can see via our xenaPython app.
host =“https://reference.xenahubs.net”
xenaPython.probemap_list(host)
If you do not see a probemap that will work for you, please let us know.
To reference a probemap you need three files:
- Include the probemap reference in your data file .json
- Have the probemap file in the same directory as your data file and data file .json
- Also have a .json file for the probemap so that we know how to load it
{% hint style="warning" %} Note that to reference a probemap you need to load the probemap first, then load the data file. {% endhint %}
{"type":"genomicMatrix",
"cohort":"TCGA Breast Cancer",
":probeMap":"/unc_v2_exon_hg19_probe_TCGA"}
https://toil.xenahubs.net/download/probeMap/gencode.v23.annotation.gene.probemap
{ “type”:“probeMap”,
“assembly”:“hg19"}
{% content-ref url="../technical-documentation/metadata-specification-1.md" %} metadata-specification-1.md {% endcontent-ref %}
Put both your .tsv and .json files in your_home_directory/xena/files. Then run the jar, passing in the file name, like so:
java -jar cavm-0.xx.0-standalone.jar -l ~/xena/files/*
→ loads all files
OR
java -jar cavm-0.xx.0-standalone.jar -l ~/xena/files/file1.tsv
→ loads just file1.tsv
Note that you will need to substitute the name of the .jar. file As of the time of writing (September 20, 2018), the name of the .jar file was cavm-0.22.0-standalone.jar. On linux this will be in the directory where you opened the archive. On Windows or MacOS, use your operating system’s file search capability to search for cavm*jar. On Windows you will need to use the full path to your home directory, instead of “~”.
Note you do not need to load the .json files. Xena will automatically look for these and load them.
java -jar cavm-0.xx.0-standalone.jar -x ~/xena/files/file1.tsv
→ delete just file1.tsv
java -jar cavm-0.xx.0-standalone.jar -x ~/xena/files/file1.tsv ~/xena/files/file2.tsv
→ delete file1.tsv and file2.tsv
You can always type:
java -jar cavm-0.xx.0-standalone.jar -h
for help.