Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Export TreeSummarizedExperiment R object #807

Open
wants to merge 15 commits into
base: dev
Choose a base branch
from

Conversation

d4straub
Copy link
Collaborator

@d4straub d4straub commented Nov 25, 2024

This solves #802 and is based on #805

PR checklist

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
  • If you've added a new tool - have you followed the pipeline conventions in the contribution docs
  • If necessary, also make a PR on the nf-core/ampliseq branch on the nf-core/test-datasets repository.
  • Make sure your code lints (nf-core pipelines lint).
  • Ensure the test suite passes (nextflow run . -profile test,docker --outdir <OUTDIR>).
  • Check for unexpected warnings in debug mode (nextflow run . -profile debug,test,docker --outdir <OUTDIR>).
  • Usage Documentation in docs/usage.md is updated.
  • Output Documentation in docs/output.md is updated.
  • CHANGELOG.md is updated.
  • README.md is updated (including new tool citations and authors/contributors).

@TuomasBorman
Copy link

Cool!

In SummarizedExperiment:

  • row names of abundance matrix must match with row names of taxonomy table
  • column names of abundance matrix must match with row names of sample metadata table

This error occurs if those names do not match, they must be exactly the same (same order also). I can check this week.

If the names do not match because of different order, it is simple to sort the data. If the naming scheme differs between abundance matrix versus taxonomy table and sample metadata, it is also easy to fix but then we just have to ensure that the order is the same despite the names.

@d4straub
Copy link
Collaborator Author

Dear @TuomasBorman , the pipeline runs successfully for me locally with singularity, however CI tests with docker fail with
docker: invalid reference format: repository name (bioconductor-treesummarizedexperiment%3A2.10.0--r43hdfd78af_0) must be lowercase.
Will need to be fixed somehow, most likely there are solutions somewhere in the nf-core slack solutions. Will search for this another time.

@d4straub
Copy link
Collaborator Author

d4straub commented Nov 26, 2024

There seems to be an issue with read_tree, see error with -profile test_pplace:

     Error in read_tree("test_pplace.graft.test_pplace.epa_result.newick") : 
      could not find function "read_tree"

@TuomasBorman
Copy link

There seems to be an issue with read_tree, see error with -profile test_pplace:

     Error in read_tree("test_pplace.graft.test_pplace.epa_result.newick") : 
      could not find function "read_tree"

read_tree() comes from phyloseq package. However, TreeSE does not have dedicated function for loading the tree, instead ape::read.tree() is commonly used.

@d4straub
Copy link
Collaborator Author

@TuomasBorman would you be so kind and check whether the four rds files are fine, i.e. contain what you would expect? Two files produced with -profile test, two with -profile test_pplace.
tse_output.zip

@TuomasBorman
Copy link

test/dada2_TreeSummarizedExperiment.rds

  • Does not have phylogeny; should it have?
  • Taxonomy table has sequences. More convenient place would be to store them to referenceSeq(tse). Worth to note is that also phyloseq has refseq(pseq) slot to store these sequences, but similarly to TreeSE it seems that this is not used in phyloseq.
# If taxonomy table contains sequences, move them to referenceSeq slot
if( !is.null(rowData(tse)[["sequence"]]){
    referenceSeq(tse) <- DNAStringSet(rowData(tse)[["sequence"]])
    rowData(tse)[["sequence"]] <- NULL
}

test/qiime2_TreeSummarizedExperiment.rds

  • Does not have phylogeny; should it have?

test_pplace/pplace_TreeSummarizedExperiment.rds

  • Everything seems to be correct

test_pplace/qiime2_TreeSummarizedExperiment.rds

  • Everything is good

@d4straub
Copy link
Collaborator Author

Great, thanks. All files look as expected. But the phylogeny potentially can still be added as newick tree from QIIME2. I will test that in the next days. Unfortunately I think it will take until next week to finish this PR.

@nf-core-bot
Copy link
Member

Warning

Newer version of the nf-core template is available.

Your pipeline is using an old version of the nf-core template: 3.0.2.
Please update your pipeline to the latest version.

For more documentation on how to update your pipeline, please see the nf-core documentation and Synchronisation documentation.

@d4straub
Copy link
Collaborator Author

So, finally some progress. @TuomasBorman Could you have a look at the two objects? They should contain now sequences & pylogenetic tree. Also, is the addition inCITATIONS.md fine?
tse_output.zip

@d4straub d4straub mentioned this pull request Dec 20, 2024
11 tasks
@d4straub d4straub marked this pull request as ready for review December 20, 2024 15:47
@TuomasBorman
Copy link

DADA2 object have reference sequences and both have phylogeny correctly defined. Citation for TreeSE is also correct.

--> looks good

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants