nf-core · d4straub · Nov 25, 2024 · Nov 26, 2024 · Nov 26, 2024 · Nov 26, 2024
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -9,6 +9,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 
 - [#798](https://github.com/nf-core/ampliseq/pull/798) - Added SILVA version 138.2 of DADA2 taxonomy database: `silva=138.2` or `silva` as parameter to `--dada2_ref_taxonomy`
 - [#804](https://github.com/nf-core/ampliseq/pull/804) - Added version 10 of Unite as parameter for `--dada_ref_taxonomy` (issue [#768](https://github.com/nf-core/ampliseq/issues/768))
+- [#807](https://github.com/nf-core/ampliseq/pull/807) - Export of TreeSummarizedExperiment R object by default, can be omitted with `--skip_tse`, also added ability to skip phyloseq R object generation with `--skip_phyloseq`
 
 ### `Changed`
 

diff --git a/CITATIONS.md b/CITATIONS.md
@@ -149,6 +149,10 @@
 
   > McMurdie PJ, Holmes S (2013). “phyloseq: An R package for reproducible interactive analysis and graphics of microbiome census data.” PLoS ONE, 8(4), e61217.
 
+- [TreeSummarizedExperiment](https://doi.org/10.12688/f1000research.26669.2)
+
+  > Huang R, Soneson C, Ernst FGM et al. TreeSummarizedExperiment: a S4 class for data with hierarchical structure [version 2; peer review: 3 approved]. F1000Research 2021, 9:1246.
+
 ### Non-default tools
 
 - [ITSx](https://besjournals.onlinelibrary.wiley.com/doi/10.1111/2041-210X.12073)

diff --git a/assets/report_template.Rmd b/assets/report_template.Rmd
@@ -107,6 +107,7 @@ params:
     picrust_pathways: FALSE
     sbdi: FALSE
     phyloseq: FALSE
+    tse: FALSE
 ---
 
 <!-- Load libraries -->
@@ -1615,19 +1616,37 @@ but if you run nf-core/ampliseq with a sample metadata table (`--metadata`) any
 "))
 ```
 
-<!-- Section on PHYLOSEQ results -->
+<!-- Section on exported R objects -->
 
-```{r, eval = !isFALSE(params$phyloseq), results='asis'}
+```{r, results='asis'}
+any_robject <- !isFALSE(params$phyloseq) || !isFALSE(params$tse)
+```
+
+```{r, eval = !isFALSE(any_robject), results='asis'}
 cat(paste0("
-# Phyloseq
+# R objects
 
-[Phyloseq](https://doi.org/10.1371/journal.pone.0061217)
-is a popular R package to analyse and visualize microbiom data.
-The produced RDS files contain phyloseq objects and can be loaded directely into R and phyloseq.
+Microbiome data can be analysed and visualized with certain R packages. For convenience, R objects in RDS format are provided.
+"))
+
+if ( !isFALSE(params$phyloseq) ) {
+    cat(paste0("
+[Phyloseq](https://doi.org/10.1371/journal.pone.0061217) objects and can be loaded directely into R with package 'phyloseq'.
 The objects contain an ASV abundance table and a taxonomy table.
 If available, metadata and phylogenetic tree will also be included in the phyloseq object.
 The files can be found in folder [phyloseq](../phyloseq/).
-"))
+    "))
+}
+
+if ( !isFALSE(params$tse) ) {
+    cat(paste0("
+[TreeSummarizedExperiment](https://doi.org/10.12688/f1000research.26669.2) (TreeSE, TSE)
+objects can be loaded into R with package 'TreeSummarizedExperiment'. and contain an ASV abundance table,
+a taxonomy table, and sequences.
+If available, metadata and phylogenetic tree will also be included in the object.
+The files can be found in folder [treesummarizedexperiment](../treesummarizedexperiment/).
+    "))
+}
 ```
 
 <!-- Section on methods -->

diff --git a/conf/modules.config b/conf/modules.config
@@ -1057,6 +1057,15 @@ process {
             pattern: "*.rds"
         ]
     }
+
+    withName: TREESUMMARIZEDEXPERIMENT {
+        publishDir = [
+            path: { "${params.outdir}/treesummarizedexperiment" },
+            mode: params.publish_dir_mode,
+            pattern: "*.rds"
+        ]
+    }
+
     withName: 'MULTIQC' {
         ext.args   = { params.multiqc_title ? "--title \"$params.multiqc_title\"" : '' }
         publishDir = [

diff --git a/conf/test_multi.config b/conf/test_multi.config
@@ -30,4 +30,7 @@ params {
     dada_ref_taxonomy = "rdp=18"
     skip_dada_addspecies = true
     input = params.pipelines_testdata_base_path + "ampliseq/samplesheets/Samplesheet_multi.tsv"
+
+    skip_phyloseq = true
+    skip_tse = true
 }
diff --git a/conf/test_pacbio_its.config b/conf/test_pacbio_its.config
@@ -37,4 +37,6 @@ params {
 
     // Prevent default taxonomic classification
     skip_dada_taxonomy = true
+
+    skip_phyloseq = true
 }
diff --git a/conf/test_reftaxcustom.config b/conf/test_reftaxcustom.config
@@ -37,4 +37,6 @@ params {
 
     // Skip downstream analysis with QIIME2
     skip_qiime_downstream = true
+
+    skip_tse = true
 }
diff --git a/docs/output.md b/docs/output.md
@@ -46,7 +46,7 @@ The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes d
   - [Differential abundance analysis](#differential-abundance-analysis) - Calling differentially abundant features with ANCOM or ANCOM-BC
 - [PICRUSt2](#picrust2) - Predict the functional potential of a bacterial community
 - [SBDI export](#sbdi-export) - Swedish Biodiversity Infrastructure (SBDI) submission file
-- [Phyloseq](#phyloseq) - Phyloseq R objects
+- [R object](#r-objects) - Phyloseq and TreeSummarizedExperiment R objects
 - [Read count report](#read-count-report) - Report of read counts during various steps of the pipeline
 - [Pipeline information](#pipeline-information) - Report metrics generated during the workflow execution
 
@@ -629,15 +629,17 @@ Most of the fields in the template will not be populated by the export process,
 
 </details>
 
-### Phyloseq
+### R objects
 
-This directory will hold phyloseq objects for each taxonomy table produced by this pipeline. The objects will contain an ASV abundance table and a taxonomy table. If the pipeline is provided with metadata, that metadata will also be included in the phyloseq object. A phylogenetic tree will also be included if the pipeline produces a tree.
+Pipeline results are stored in phyloseq and TreeSummarizedExperiment R objects for each taxonomy table produced by this pipeline. The R objects will contain an ASV abundance table and a taxonomy table, and optionally sequences, metadata and a phylogenetic tree.
 
 <details markdown="1">
 <summary>Output files</summary>
 
 - `phyloseq/`
   - `<taxonomy>_phyloseq.rds`: Phyloseq R object.
+- `treesummarizedexperiment/`
+  - `<taxonomy>_TreeSummarizedExperiment.rds`: TreeSummarizedExperiment R object.
 
 </details>
 

diff --git a/modules/local/summary_report.nf b/modules/local/summary_report.nf
@@ -58,6 +58,7 @@ process SUMMARY_REPORT  {
     path(picrust_pathways)
     path(sbdi, stageAs: 'sbdi/*')
     path(phyloseq, stageAs: 'phyloseq/*')
+    path(tse, stageAs: 'tse/*')
 
     output:
     path "*.svg"               , emit: svg, optional: true
@@ -137,6 +138,7 @@ process SUMMARY_REPORT  {
         ancombc_formula ? "ancombc_formula='"+ ancombc_formula.join(",") +"'" : "",
         sbdi ? "sbdi='"+ sbdi.join(",") +"'" : "",
         phyloseq ? "phyloseq='"+ phyloseq.join(",") +"'" : "",
+        tse ? "tse='"+ tse.join(",") +"'" : "",
     ]
     // groovy list to R named list string; findAll removes empty entries
     params_list_named_string = params_list_named.findAll().join(',').trim()

diff --git a/modules/local/treesummarizedexperiment.nf b/modules/local/treesummarizedexperiment.nf
@@ -0,0 +1,82 @@
+process TREESUMMARIZEDEXPERIMENT {
+    tag "$prefix"
+    label 'process_low'
+
+    conda "bioconda::bioconductor-treesummarizedexperiment=2.10.0"
+    container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
+        'https://depot.galaxyproject.org/singularity/bioconductor-treesummarizedexperiment:2.10.0--r43hdfd78af_0' :
+        'biocontainers/bioconductor-treesummarizedexperiment:2.10.0--r43hdfd78af_0' }"
+
+    input:
+    tuple val(prefix), path(tax_tsv), path(otu_tsv)
+    path sam_tsv
+    path tree
+
+    output:
+    tuple val(prefix), path("*TreeSummarizedExperiment.rds"), emit: rds
+    path "versions.yml"                                     , emit: versions
+
+    when:
+    task.ext.when == null || task.ext.when
+
+    script:
+    def sam_tsv = "\"${sam_tsv}\""
+    def otu_tsv = "\"${otu_tsv}\""
+    def tax_tsv = "\"${tax_tsv}\""
+    def tree    = "\"${tree}\""
+    def prefix  = "\"${prefix}\""
+    """
+    #!/usr/bin/env Rscript
+
+    suppressPackageStartupMessages(library(TreeSummarizedExperiment))
+
+    # Read otu table. It must be in a SimpleList as a matrix where rows
+    # represent taxa and columns samples.
+    otu_mat  <- read.table($otu_tsv, sep="\\t", header=TRUE, row.names=1)
+    otu_mat <- as.matrix(otu_mat)
+    assays <- SimpleList(counts = otu_mat)
+    # Read taxonomy table. Correct format for it is DataFrame.
+    taxonomy_table  <- read.table($tax_tsv, sep="\\t", header=TRUE, row.names=1)
+    taxonomy_table <- DataFrame(taxonomy_table)
+
+    # Match rownames between taxonomy table and abundance matrix.
+    taxonomy_table <- taxonomy_table[match(rownames(otu_mat), rownames(taxonomy_table)), ]
+
+    # Create TreeSE object.
+    tse <- TreeSummarizedExperiment(
+        assays = assays,
+        rowData = taxonomy_table
+    )
+
+    # If taxonomy table contains sequences, move them to referenceSeq slot
+    if (!is.null(rowData(tse)[["sequence"]])) {
+        referenceSeq(tse) <- DNAStringSet( rowData(tse)[["sequence"]] )
+        rowData(tse)[["sequence"]] <- NULL
+    }
+
+    # If provided, we add sample metadata as DataFrame object. rownames of
+    # sample metadata must match with colnames of abundance matrix.
+    if (file.exists($sam_tsv)) {
+        sample_meta  <- read.table($sam_tsv, sep="\\t", header=TRUE, row.names=1)
+        sample_meta <- sample_meta[match(colnames(tse), rownames(sample_meta)), ]
+        sample_meta  <- DataFrame(sample_meta)
+        colData(tse) <- sample_meta
+    }
+
+    # If provided, we add phylogeny. The rownames in abundance matrix must match
+    # with node labels in phylogeny.
+    if (file.exists($tree)) {
+        phylogeny <- ape::read.tree($tree)
+        rowTree(tse) <- phylogeny
+    }
+
+    saveRDS(tse, file = paste0($prefix, "_TreeSummarizedExperiment.rds"))
+
+    # Version information
+    writeLines(c("\\"${task.process}\\":",
+        paste0("    R: ", paste0(R.Version()[c("major","minor")], collapse = ".")),
+        paste0("    TreeSummarizedExperiment: ", packageVersion("TreeSummarizedExperiment"))),
+        "versions.yml"
+    )
+    """
+}
diff --git a/nextflow.config b/nextflow.config
@@ -105,6 +105,8 @@ params {
     skip_dada_addspecies   = false
     skip_alpha_rarefaction = false
     skip_diversity_indices = false
+    skip_phyloseq          = false
+    skip_tse               = false
     skip_multiqc           = false
     skip_report            = false
 

diff --git a/nextflow_schema.json b/nextflow_schema.json
@@ -790,6 +790,14 @@
                     "type": "boolean",
                     "description": "Skip alpha and beta diversity analysis"
                 },
+                "skip_phyloseq": {
+                    "type": "boolean",
+                    "description": "Skip exporting phyloseq rds object(s)"
+                },
+                "skip_tse": {
+                    "type": "boolean",
+                    "description": "Skip exporting TreeSummarizedExperiment rds object(s)"
+                },
                 "skip_multiqc": {
                     "type": "boolean",
                     "description": "Skip MultiQC reporting"

diff --git a/subworkflows/local/phyloseq_workflow.nf b/subworkflows/local/phyloseq_workflow.nf
diff --git a/subworkflows/local/qiime2_diversity.nf b/subworkflows/local/qiime2_diversity.nf
@@ -27,7 +27,8 @@ workflow QIIME2_DIVERSITY {
     ch_versions_qiime2_diversity = Channel.empty()
 
     //Phylogenetic tree for beta & alpha diversities
-    if (!ch_tree) {
+    produce_tree = !ch_tree ? true : false
+    if (produce_tree) {
         QIIME2_TREE ( ch_seq )
         ch_versions_qiime2_diversity = ch_versions_qiime2_diversity.mix(QIIME2_TREE.out.versions)
         ch_tree = QIIME2_TREE.out.qza
@@ -82,6 +83,8 @@ workflow QIIME2_DIVERSITY {
     }
 
     emit:
+    tree_qza = ch_tree
+    tree_nwk = produce_tree ? QIIME2_TREE.out.nwk : []
     depth    = !skip_diversity_indices ? QIIME2_DIVERSITY_CORE.out.depth : []
     alpha    = !skip_diversity_indices ? QIIME2_DIVERSITY_ALPHA.out.alpha : []
     beta     = !skip_diversity_indices ? QIIME2_DIVERSITY_BETA.out.beta : []

diff --git a/subworkflows/local/robject_workflow.nf b/subworkflows/local/robject_workflow.nf
@@ -0,0 +1,50 @@
+/*
+ * Create phyloseq objects
+ */
+
+include { PHYLOSEQ                                } from '../../modules/local/phyloseq'
+include { PHYLOSEQ_INASV                          } from '../../modules/local/phyloseq_inasv'
+include { TREESUMMARIZEDEXPERIMENT                } from '../../modules/local/treesummarizedexperiment'
+
+workflow ROBJECT_WORKFLOW {
+    take:
+    ch_tax
+    ch_tsv
+    ch_meta
+    ch_robject_intree
+    run_qiime2
+
+    main:
+    ch_versions_robject_workflow = Channel.empty()
+
+    if ( params.metadata ) {
+        ch_robject_inmeta = ch_meta.first() // The .first() is to make sure it's a value channel
+    } else {
+        ch_robject_inmeta = []
+    }
+
+    if ( run_qiime2 ) {
+        if ( params.exclude_taxa != "none" || params.min_frequency != 1 || params.min_samples != 1 ) {
+            ch_robject_inasv = PHYLOSEQ_INASV ( ch_tsv ).tsv
+        } else {
+            ch_robject_inasv = ch_tsv
+        }
+    } else {
+        ch_robject_inasv = ch_tsv
+    }
+
+    if ( !params.skip_phyloseq ) {
+        PHYLOSEQ ( ch_tax.combine(ch_robject_inasv), ch_robject_inmeta, ch_robject_intree )
+        ch_versions_robject_workflow = ch_versions_robject_workflow.mix(PHYLOSEQ.out.versions)
+    }
+
+    if ( !params.skip_tse ) {
+        TREESUMMARIZEDEXPERIMENT ( ch_tax.combine(ch_robject_inasv), ch_robject_inmeta, ch_robject_intree )
+        ch_versions_robject_workflow = ch_versions_robject_workflow.mix(TREESUMMARIZEDEXPERIMENT.out.versions)
+    }
+
+    emit:
+    phyloseq = !params.skip_phyloseq ? PHYLOSEQ.out.rds : []
+    tse      = !params.skip_tse ? TREESUMMARIZEDEXPERIMENT.out.rds : []
+    versions = ch_versions_robject_workflow
+}
diff --git a/tests/pipeline/doubleprimers.nf.test b/tests/pipeline/doubleprimers.nf.test
@@ -32,6 +32,7 @@ nextflow_pipeline {
                 { assert snapshot(path("$outputDir/input/Samplesheet_double_primer.tsv")).match("input") },
                 { assert new File("$outputDir/qiime2/abundance_tables/feature-table.tsv").exists() },
                 { assert new File("$outputDir/phyloseq/kraken2_phyloseq.rds").exists() },
+                { assert new File("$outputDir/treesummarizedexperiment/kraken2_TreeSummarizedExperiment.rds").exists() },
                 { assert snapshot(path("$outputDir/kraken2/ASV_tax.greengenes.kraken2.classifiedreads.txt"),
                                 path("$outputDir/kraken2/ASV_tax.greengenes.kraken2.complete.tsv"),
                                 path("$outputDir/kraken2/ASV_tax.greengenes.kraken2.tsv")).match("kraken2") },

diff --git a/tests/pipeline/failed.nf.test b/tests/pipeline/failed.nf.test
@@ -29,7 +29,8 @@ nextflow_pipeline {
                 { assert new File("$outputDir/qiime2/diversity/alpha_diversity/shannon_vector/kruskal-wallis-pairwise-treatment1.csv").exists() },
                 { assert new File("$outputDir/qiime2/diversity/beta_diversity/bray_curtis_pcoa_results-PCoA/index.html").exists() },
                 { assert new File("$outputDir/summary_report/summary_report.html").exists() },
-                { assert new File("$outputDir/phyloseq/dada2_phyloseq.rds").exists() }
+                { assert new File("$outputDir/phyloseq/dada2_phyloseq.rds").exists() },
+                { assert new File("$outputDir/treesummarizedexperiment/dada2_TreeSummarizedExperiment.rds").exists() }
             )
         }
     }