Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Export TreeSummarizedExperiment R object #807

Open
wants to merge 15 commits into
base: dev
Choose a base branch
from
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

- [#798](https://github.com/nf-core/ampliseq/pull/798) - Added SILVA version 138.2 of DADA2 taxonomy database: `silva=138.2` or `silva` as parameter to `--dada2_ref_taxonomy`
- [#804](https://github.com/nf-core/ampliseq/pull/804) - Added version 10 of Unite as parameter for `--dada_ref_taxonomy` (issue [#768](https://github.com/nf-core/ampliseq/issues/768))
- [#807](https://github.com/nf-core/ampliseq/pull/807) - Export of TreeSummarizedExperiment R object by default, can be omitted with `--skip_tse`, also added ability to skip phyloseq R object generation with `--skip_phyloseq`

### `Changed`

Expand Down
4 changes: 4 additions & 0 deletions CITATIONS.md
Original file line number Diff line number Diff line change
Expand Up @@ -149,6 +149,10 @@

> McMurdie PJ, Holmes S (2013). “phyloseq: An R package for reproducible interactive analysis and graphics of microbiome census data.” PLoS ONE, 8(4), e61217.

- [TreeSummarizedExperiment](https://doi.org/10.12688/f1000research.26669.2)

> Huang R, Soneson C, Ernst FGM et al. TreeSummarizedExperiment: a S4 class for data with hierarchical structure [version 2; peer review: 3 approved]. F1000Research 2021, 9:1246.

### Non-default tools

- [ITSx](https://besjournals.onlinelibrary.wiley.com/doi/10.1111/2041-210X.12073)
Expand Down
33 changes: 26 additions & 7 deletions assets/report_template.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -107,6 +107,7 @@ params:
picrust_pathways: FALSE
sbdi: FALSE
phyloseq: FALSE
tse: FALSE
---

<!-- Load libraries -->
Expand Down Expand Up @@ -1615,19 +1616,37 @@ but if you run nf-core/ampliseq with a sample metadata table (`--metadata`) any
"))
```

<!-- Section on PHYLOSEQ results -->
<!-- Section on exported R objects -->

```{r, eval = !isFALSE(params$phyloseq), results='asis'}
```{r, results='asis'}
any_robject <- !isFALSE(params$phyloseq) || !isFALSE(params$tse)
```

```{r, eval = !isFALSE(any_robject), results='asis'}
cat(paste0("
# Phyloseq
# R objects

[Phyloseq](https://doi.org/10.1371/journal.pone.0061217)
is a popular R package to analyse and visualize microbiom data.
The produced RDS files contain phyloseq objects and can be loaded directely into R and phyloseq.
Microbiome data can be analysed and visualized with certain R packages. For convenience, R objects in RDS format are provided.
"))

if ( !isFALSE(params$phyloseq) ) {
cat(paste0("
[Phyloseq](https://doi.org/10.1371/journal.pone.0061217) objects and can be loaded directely into R with package 'phyloseq'.
The objects contain an ASV abundance table and a taxonomy table.
If available, metadata and phylogenetic tree will also be included in the phyloseq object.
The files can be found in folder [phyloseq](../phyloseq/).
"))
"))
}

if ( !isFALSE(params$tse) ) {
cat(paste0("
[TreeSummarizedExperiment](https://doi.org/10.12688/f1000research.26669.2) (TreeSE, TSE)
objects can be loaded into R with package 'TreeSummarizedExperiment'. and contain an ASV abundance table,
a taxonomy table, and sequences.
If available, metadata and phylogenetic tree will also be included in the object.
The files can be found in folder [treesummarizedexperiment](../treesummarizedexperiment/).
"))
}
```

<!-- Section on methods -->
Expand Down
9 changes: 9 additions & 0 deletions conf/modules.config
Original file line number Diff line number Diff line change
Expand Up @@ -1057,6 +1057,15 @@ process {
pattern: "*.rds"
]
}

withName: TREESUMMARIZEDEXPERIMENT {
publishDir = [
path: { "${params.outdir}/treesummarizedexperiment" },
mode: params.publish_dir_mode,
pattern: "*.rds"
]
}

withName: 'MULTIQC' {
ext.args = { params.multiqc_title ? "--title \"$params.multiqc_title\"" : '' }
publishDir = [
Expand Down
3 changes: 3 additions & 0 deletions conf/test_multi.config
Original file line number Diff line number Diff line change
Expand Up @@ -30,4 +30,7 @@ params {
dada_ref_taxonomy = "rdp=18"
skip_dada_addspecies = true
input = params.pipelines_testdata_base_path + "ampliseq/samplesheets/Samplesheet_multi.tsv"

skip_phyloseq = true
skip_tse = true
}
2 changes: 2 additions & 0 deletions conf/test_pacbio_its.config
Original file line number Diff line number Diff line change
Expand Up @@ -37,4 +37,6 @@ params {

// Prevent default taxonomic classification
skip_dada_taxonomy = true

skip_phyloseq = true
}
2 changes: 2 additions & 0 deletions conf/test_reftaxcustom.config
Original file line number Diff line number Diff line change
Expand Up @@ -37,4 +37,6 @@ params {

// Skip downstream analysis with QIIME2
skip_qiime_downstream = true

skip_tse = true
}
8 changes: 5 additions & 3 deletions docs/output.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes d
- [Differential abundance analysis](#differential-abundance-analysis) - Calling differentially abundant features with ANCOM or ANCOM-BC
- [PICRUSt2](#picrust2) - Predict the functional potential of a bacterial community
- [SBDI export](#sbdi-export) - Swedish Biodiversity Infrastructure (SBDI) submission file
- [Phyloseq](#phyloseq) - Phyloseq R objects
- [R object](#r-objects) - Phyloseq and TreeSummarizedExperiment R objects
- [Read count report](#read-count-report) - Report of read counts during various steps of the pipeline
- [Pipeline information](#pipeline-information) - Report metrics generated during the workflow execution

Expand Down Expand Up @@ -629,15 +629,17 @@ Most of the fields in the template will not be populated by the export process,

</details>

### Phyloseq
### R objects

This directory will hold phyloseq objects for each taxonomy table produced by this pipeline. The objects will contain an ASV abundance table and a taxonomy table. If the pipeline is provided with metadata, that metadata will also be included in the phyloseq object. A phylogenetic tree will also be included if the pipeline produces a tree.
Pipeline results are stored in phyloseq and TreeSummarizedExperiment R objects for each taxonomy table produced by this pipeline. The R objects will contain an ASV abundance table and a taxonomy table, and optionally sequences, metadata and a phylogenetic tree.

<details markdown="1">
<summary>Output files</summary>

- `phyloseq/`
- `<taxonomy>_phyloseq.rds`: Phyloseq R object.
- `treesummarizedexperiment/`
- `<taxonomy>_TreeSummarizedExperiment.rds`: TreeSummarizedExperiment R object.

</details>

Expand Down
2 changes: 2 additions & 0 deletions modules/local/summary_report.nf
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,7 @@ process SUMMARY_REPORT {
path(picrust_pathways)
path(sbdi, stageAs: 'sbdi/*')
path(phyloseq, stageAs: 'phyloseq/*')
path(tse, stageAs: 'tse/*')

output:
path "*.svg" , emit: svg, optional: true
Expand Down Expand Up @@ -137,6 +138,7 @@ process SUMMARY_REPORT {
ancombc_formula ? "ancombc_formula='"+ ancombc_formula.join(",") +"'" : "",
sbdi ? "sbdi='"+ sbdi.join(",") +"'" : "",
phyloseq ? "phyloseq='"+ phyloseq.join(",") +"'" : "",
tse ? "tse='"+ tse.join(",") +"'" : "",
]
// groovy list to R named list string; findAll removes empty entries
params_list_named_string = params_list_named.findAll().join(',').trim()
Expand Down
82 changes: 82 additions & 0 deletions modules/local/treesummarizedexperiment.nf
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@
process TREESUMMARIZEDEXPERIMENT {
tag "$prefix"
label 'process_low'

conda "bioconda::bioconductor-treesummarizedexperiment=2.10.0"
container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
d4straub marked this conversation as resolved.
Show resolved Hide resolved
'https://depot.galaxyproject.org/singularity/bioconductor-treesummarizedexperiment:2.10.0--r43hdfd78af_0' :
'biocontainers/bioconductor-treesummarizedexperiment:2.10.0--r43hdfd78af_0' }"

input:
tuple val(prefix), path(tax_tsv), path(otu_tsv)
path sam_tsv
path tree

output:
tuple val(prefix), path("*TreeSummarizedExperiment.rds"), emit: rds
path "versions.yml" , emit: versions

when:
task.ext.when == null || task.ext.when

script:
def sam_tsv = "\"${sam_tsv}\""
def otu_tsv = "\"${otu_tsv}\""
def tax_tsv = "\"${tax_tsv}\""
def tree = "\"${tree}\""
def prefix = "\"${prefix}\""
"""
#!/usr/bin/env Rscript

suppressPackageStartupMessages(library(TreeSummarizedExperiment))

# Read otu table. It must be in a SimpleList as a matrix where rows
# represent taxa and columns samples.
otu_mat <- read.table($otu_tsv, sep="\\t", header=TRUE, row.names=1)
otu_mat <- as.matrix(otu_mat)
assays <- SimpleList(counts = otu_mat)
# Read taxonomy table. Correct format for it is DataFrame.
taxonomy_table <- read.table($tax_tsv, sep="\\t", header=TRUE, row.names=1)
taxonomy_table <- DataFrame(taxonomy_table)

# Match rownames between taxonomy table and abundance matrix.
taxonomy_table <- taxonomy_table[match(rownames(otu_mat), rownames(taxonomy_table)), ]

# Create TreeSE object.
tse <- TreeSummarizedExperiment(
assays = assays,
rowData = taxonomy_table
)

# If taxonomy table contains sequences, move them to referenceSeq slot
if (!is.null(rowData(tse)[["sequence"]])) {
referenceSeq(tse) <- DNAStringSet( rowData(tse)[["sequence"]] )
rowData(tse)[["sequence"]] <- NULL
}

# If provided, we add sample metadata as DataFrame object. rownames of
# sample metadata must match with colnames of abundance matrix.
if (file.exists($sam_tsv)) {
sample_meta <- read.table($sam_tsv, sep="\\t", header=TRUE, row.names=1)
sample_meta <- sample_meta[match(colnames(tse), rownames(sample_meta)), ]
sample_meta <- DataFrame(sample_meta)
colData(tse) <- sample_meta
d4straub marked this conversation as resolved.
Show resolved Hide resolved
}

# If provided, we add phylogeny. The rownames in abundance matrix must match
# with node labels in phylogeny.
if (file.exists($tree)) {
phylogeny <- ape::read.tree($tree)
rowTree(tse) <- phylogeny
}

saveRDS(tse, file = paste0($prefix, "_TreeSummarizedExperiment.rds"))

# Version information
writeLines(c("\\"${task.process}\\":",
paste0(" R: ", paste0(R.Version()[c("major","minor")], collapse = ".")),
paste0(" TreeSummarizedExperiment: ", packageVersion("TreeSummarizedExperiment"))),
"versions.yml"
)
"""
}
2 changes: 2 additions & 0 deletions nextflow.config
Original file line number Diff line number Diff line change
Expand Up @@ -105,6 +105,8 @@ params {
skip_dada_addspecies = false
skip_alpha_rarefaction = false
skip_diversity_indices = false
skip_phyloseq = false
skip_tse = false
skip_multiqc = false
skip_report = false

Expand Down
8 changes: 8 additions & 0 deletions nextflow_schema.json
Original file line number Diff line number Diff line change
Expand Up @@ -790,6 +790,14 @@
"type": "boolean",
"description": "Skip alpha and beta diversity analysis"
},
"skip_phyloseq": {
"type": "boolean",
"description": "Skip exporting phyloseq rds object(s)"
},
"skip_tse": {
"type": "boolean",
"description": "Skip exporting TreeSummarizedExperiment rds object(s)"
},
"skip_multiqc": {
"type": "boolean",
"description": "Skip MultiQC reporting"
Expand Down
44 changes: 0 additions & 44 deletions subworkflows/local/phyloseq_workflow.nf

This file was deleted.

5 changes: 4 additions & 1 deletion subworkflows/local/qiime2_diversity.nf
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,8 @@ workflow QIIME2_DIVERSITY {
ch_versions_qiime2_diversity = Channel.empty()

//Phylogenetic tree for beta & alpha diversities
if (!ch_tree) {
produce_tree = !ch_tree ? true : false
if (produce_tree) {
QIIME2_TREE ( ch_seq )
ch_versions_qiime2_diversity = ch_versions_qiime2_diversity.mix(QIIME2_TREE.out.versions)
ch_tree = QIIME2_TREE.out.qza
Expand Down Expand Up @@ -82,6 +83,8 @@ workflow QIIME2_DIVERSITY {
}

emit:
tree_qza = ch_tree
tree_nwk = produce_tree ? QIIME2_TREE.out.nwk : []
depth = !skip_diversity_indices ? QIIME2_DIVERSITY_CORE.out.depth : []
alpha = !skip_diversity_indices ? QIIME2_DIVERSITY_ALPHA.out.alpha : []
beta = !skip_diversity_indices ? QIIME2_DIVERSITY_BETA.out.beta : []
Expand Down
50 changes: 50 additions & 0 deletions subworkflows/local/robject_workflow.nf
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
/*
* Create phyloseq objects
*/

include { PHYLOSEQ } from '../../modules/local/phyloseq'
include { PHYLOSEQ_INASV } from '../../modules/local/phyloseq_inasv'
include { TREESUMMARIZEDEXPERIMENT } from '../../modules/local/treesummarizedexperiment'

workflow ROBJECT_WORKFLOW {
take:
ch_tax
ch_tsv
ch_meta
ch_robject_intree
run_qiime2

main:
ch_versions_robject_workflow = Channel.empty()

if ( params.metadata ) {
ch_robject_inmeta = ch_meta.first() // The .first() is to make sure it's a value channel
} else {
ch_robject_inmeta = []
}

if ( run_qiime2 ) {
if ( params.exclude_taxa != "none" || params.min_frequency != 1 || params.min_samples != 1 ) {
ch_robject_inasv = PHYLOSEQ_INASV ( ch_tsv ).tsv
} else {
ch_robject_inasv = ch_tsv
}
} else {
ch_robject_inasv = ch_tsv
}

if ( !params.skip_phyloseq ) {
PHYLOSEQ ( ch_tax.combine(ch_robject_inasv), ch_robject_inmeta, ch_robject_intree )
ch_versions_robject_workflow = ch_versions_robject_workflow.mix(PHYLOSEQ.out.versions)
}

if ( !params.skip_tse ) {
TREESUMMARIZEDEXPERIMENT ( ch_tax.combine(ch_robject_inasv), ch_robject_inmeta, ch_robject_intree )
ch_versions_robject_workflow = ch_versions_robject_workflow.mix(TREESUMMARIZEDEXPERIMENT.out.versions)
}

emit:
phyloseq = !params.skip_phyloseq ? PHYLOSEQ.out.rds : []
tse = !params.skip_tse ? TREESUMMARIZEDEXPERIMENT.out.rds : []
versions = ch_versions_robject_workflow
}
1 change: 1 addition & 0 deletions tests/pipeline/doubleprimers.nf.test
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@ nextflow_pipeline {
{ assert snapshot(path("$outputDir/input/Samplesheet_double_primer.tsv")).match("input") },
{ assert new File("$outputDir/qiime2/abundance_tables/feature-table.tsv").exists() },
{ assert new File("$outputDir/phyloseq/kraken2_phyloseq.rds").exists() },
{ assert new File("$outputDir/treesummarizedexperiment/kraken2_TreeSummarizedExperiment.rds").exists() },
{ assert snapshot(path("$outputDir/kraken2/ASV_tax.greengenes.kraken2.classifiedreads.txt"),
path("$outputDir/kraken2/ASV_tax.greengenes.kraken2.complete.tsv"),
path("$outputDir/kraken2/ASV_tax.greengenes.kraken2.tsv")).match("kraken2") },
Expand Down
3 changes: 2 additions & 1 deletion tests/pipeline/failed.nf.test
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,8 @@ nextflow_pipeline {
{ assert new File("$outputDir/qiime2/diversity/alpha_diversity/shannon_vector/kruskal-wallis-pairwise-treatment1.csv").exists() },
{ assert new File("$outputDir/qiime2/diversity/beta_diversity/bray_curtis_pcoa_results-PCoA/index.html").exists() },
{ assert new File("$outputDir/summary_report/summary_report.html").exists() },
{ assert new File("$outputDir/phyloseq/dada2_phyloseq.rds").exists() }
{ assert new File("$outputDir/phyloseq/dada2_phyloseq.rds").exists() },
{ assert new File("$outputDir/treesummarizedexperiment/dada2_TreeSummarizedExperiment.rds").exists() }
)
}
}
Expand Down
Loading
Loading