diff --git a/_config.yml b/_config.yml index f52482b5c6b2f8..e82e388b154aec 100644 --- a/_config.yml +++ b/_config.yml @@ -92,6 +92,7 @@ icon-tag: cofest: fas fa-users comment: far fa-comment-dots congratulations: far fa-thumbs-up + copy: far fa-copy curriculum: fas fa-graduation-cap details: fas fa-info-circle docker_image: fab fa-docker diff --git a/_plugins/jekyll-figurify.rb b/_plugins/jekyll-figurify.rb index 6cef94b0ec5746..711bab1e51d31b 100644 --- a/_plugins/jekyll-figurify.rb +++ b/_plugins/jekyll-figurify.rb @@ -79,9 +79,9 @@ def figurify(page, site) image = insert_image(url, alt, style, dimensions, actual_path) %( -
+
#{image} - Open image in new tab + Open image in new tab

#{prefix}#{num_figure}: #{title}
diff --git a/assets/css/main.scss b/assets/css/main.scss index 7c88abe8391c4d..ba69d168ffade7 100644 --- a/assets/css/main.scss +++ b/assets/css/main.scss @@ -362,7 +362,9 @@ div.main-content { figure { text-align: center; - margin: 1rem 2rem; + margin: 2rem; + border: thin silver solid; + padding: 1rem; & > img { margin-bottom: 1rem; @@ -374,6 +376,11 @@ figure { } } +@media (max-width:992px) { + figure { + margin: 1rem; + } +} :not(pre) > code { color: var(--code-foreground); @@ -1665,7 +1672,7 @@ figure > a[target="_blank"]::after { justify-content: space-between; align-items: flex-end; row-gap: 1em; - + figure { max-width: 20em; margin: 0rem 0.5rem; diff --git a/faqs/galaxy/collections_build_list.md b/faqs/galaxy/collections_build_list.md index 1456216e97c03d..6d0219133da1cc 100644 --- a/faqs/galaxy/collections_build_list.md +++ b/faqs/galaxy/collections_build_list.md @@ -15,3 +15,5 @@ contributors: [shiltemann, hexylena] * Enter a name for your collection * Click **Create List** to build your collection * Click on the checkmark icon at the top of your history again + +![Creating a simple collection]({{site.baseurl}}/faqs/galaxy/images/create_simple_list.png "Creating a simple (list) collection in Galaxy's history") diff --git a/faqs/galaxy/datasets_upload_from_genomeark.md b/faqs/galaxy/datasets_upload_from_genomeark.md new file mode 100644 index 00000000000000..3620e43aa60334 --- /dev/null +++ b/faqs/galaxy/datasets_upload_from_genomeark.md @@ -0,0 +1,13 @@ +--- +title: Upload datasets from GenomeArk +area: data upload +box_type: tip +layout: faq +contributors: [nekrut] +--- + +1. Open the file {% icon galaxy-upload %} __upload__ menu +2. Click on **Choose remote files** tab +3. Click on the **Genome Ark** button and then click on **species** + +You can find the data by following this path: `/species/${Genus}_${species}/${specimen_code}/genomic_data`. Inside a given datatype directory (*e.g.* `pacbio`), select all the relevant files individually until all the desired files are highlighted and click the Ok button. Note that there may be multiple pages of files listed. Also note that you may not want every file listed. \ No newline at end of file diff --git a/faqs/galaxy/images/create_simple_list.png b/faqs/galaxy/images/create_simple_list.png new file mode 100644 index 00000000000000..ababdd571cfc0a Binary files /dev/null and b/faqs/galaxy/images/create_simple_list.png differ diff --git a/faqs/galaxy/images/upload_fasta_via_url.png b/faqs/galaxy/images/upload_fasta_via_url.png new file mode 100644 index 00000000000000..2d33c30a99aa48 Binary files /dev/null and b/faqs/galaxy/images/upload_fasta_via_url.png differ diff --git a/faqs/galaxy/images/upload_fastqsanger_via_url.png b/faqs/galaxy/images/upload_fastqsanger_via_url.png new file mode 100644 index 00000000000000..9b33a97f6c501b Binary files /dev/null and b/faqs/galaxy/images/upload_fastqsanger_via_url.png differ diff --git a/faqs/galaxy/workflows_import.md b/faqs/galaxy/workflows_import.md index 4e53494a248de8..b1486809d1b875 100644 --- a/faqs/galaxy/workflows_import.md +++ b/faqs/galaxy/workflows_import.md @@ -12,3 +12,8 @@ contributors: [shiltemann,mblue9,hexylena] - Option 1: Paste the URL of the workflow into the box labelled *"Archived Workflow URL"* - Option 2: Upload the workflow file in the box labelled *"Archived Workflow File"* - Click the **Import workflow** button + +Below is a short video demonstrating how to import a workflow from GitHub using this procedure: + +

+ diff --git a/faqs/galaxy/workflows_import_from_dockstore.md b/faqs/galaxy/workflows_import_from_dockstore.md new file mode 100644 index 00000000000000..7cd6fcf5ed02fb --- /dev/null +++ b/faqs/galaxy/workflows_import_from_dockstore.md @@ -0,0 +1,26 @@ +--- +title: Import workflows from DockStore +area: workflows +box_type: tip +layout: faq +contributors: [nekrut] +--- + +[Dockstore](https://dockstore.org/) is a free and open source platform for sharing reusable and scalable analytical tools and workflows. + +1. Go to [DockStore](https://dockstore.org). +2. Select any Galaxy workflow you want to import. +3. Click on "Galaxy" dropdown within the "Launch with" panel located in the upper right corner. +4. Select a galaxy instance you want to launch this workflow with. +5. You will be redirected to Galaxy and presented with a list of workflow versions. +6. Click the version you want (usually the latest labelled as "main") +7. You are done! + +> Make sure you are logged in! +> Ensure that you are logged in into your Galaxy account! +{: .warning} + +The following short video walks you through this uncomplicated procedure: + +

+{: .hands_on} diff --git a/faqs/galaxy/workflows_import_from_workflowhub.md b/faqs/galaxy/workflows_import_from_workflowhub.md index cff7399755e08b..4e671175f2d89f 100644 --- a/faqs/galaxy/workflows_import_from_workflowhub.md +++ b/faqs/galaxy/workflows_import_from_workflowhub.md @@ -3,21 +3,25 @@ title: Import workflows from WorkflowHub area: workflows box_type: tip layout: faq -contributors: [gallardoalba, abueg] +contributors: [gallardoalba, abueg, nekrut] --- [WorkflowHub](https://workflowhub.eu/) is a workflow management system which allows workflows to be FAIR (Findable, Accessible, Interoperable, and Reusable), citable, have managed metadata profiles, and be openly available for review and analytics. -> Import a workflow -> -> 1. Click on the **Workflow** menu, located in the top bar. -> 2. Click on the **Import** button, located in the right corner. -> 3. In the section "Import a Workflow from Configured GA4GH Tool Registry Servers (e.g. Dockstore)", click on **Search form**. -> -> 4. In the **TRS Server: *workflowhub.eu*** menu you should type {% if include.filter %}`{{ include.filter }}`{% else %}your query.{% endif %} -> ![galaxy TRS workflow search field, name:vgp is entered in the search bar, and five different workflows all labelled VGP are listed]({% link topics/assembly/images/vgp_assembly/workflow_list.png %}) -> 5. Click on the desired workflow, and finally select the latest available version. -{: .hands_on} +> Make sure you are logged in! +> Ensure that you are logged in into your Galaxy account! +{: .warning} + +1. Click on the **Workflow** menu, located in the top bar. +2. Click on the **Import** button, located in the right corner. +3. In the section "Import a Workflow from Configured GA4GH Tool Registry Servers (e.g. Dockstore)", click on **Search form**. +4. In the **TRS Server: *workflowhub.eu*** menu you should type {% if include.filter %}`{{ include.filter }}`{% else %}your query.{% endif %} + ![galaxy TRS workflow search field, name:vgp is entered in the search bar, and five different workflows all labelled VGP are listed]({% link topics/assembly/images/vgp_assembly/workflow_list.png %}) +5. Click on the desired workflow, and finally select the latest available version. +After that, the imported workflows will appear in the main workflow menu. In order to run the workflow, just need to click in the {% icon workflow-run %} **Run workflow** icon. -After that, the imported workflows will appear in the main workflow menu. In order to initialize the workflow, we just need to click in the {% icon workflow-run %} **Run workflow** icon. +Below is a short video showing this uncomplicated procedure: + +

+{: .hands_on} diff --git a/topics/assembly/images/vgp_assembly/BUSCO_full_table.png b/topics/assembly/images/vgp_assembly/BUSCO_full_table.png index e76bd2d66fec30..6a0a68cebf9ae3 100644 Binary files a/topics/assembly/images/vgp_assembly/BUSCO_full_table.png and b/topics/assembly/images/vgp_assembly/BUSCO_full_table.png differ diff --git a/topics/assembly/images/vgp_assembly/VGP_workflow_modules.svg b/topics/assembly/images/vgp_assembly/VGP_workflow_modules.svg new file mode 100644 index 00000000000000..a1c3579b44147d --- /dev/null +++ b/topics/assembly/images/vgp_assembly/VGP_workflow_modules.svg @@ -0,0 +1 @@ + \ No newline at end of file diff --git a/topics/assembly/images/vgp_assembly/after_upload.png b/topics/assembly/images/vgp_assembly/after_upload.png new file mode 100644 index 00000000000000..7e6d793f36c39e Binary files /dev/null and b/topics/assembly/images/vgp_assembly/after_upload.png differ diff --git a/topics/assembly/images/vgp_assembly/busco_after_contiging.svg b/topics/assembly/images/vgp_assembly/busco_after_contiging.svg new file mode 100644 index 00000000000000..3f834522286084 --- /dev/null +++ b/topics/assembly/images/vgp_assembly/busco_after_contiging.svg @@ -0,0 +1 @@ + \ No newline at end of file diff --git a/topics/assembly/images/vgp_assembly/genomescope_plot.png b/topics/assembly/images/vgp_assembly/genomescope_plot.png index 223ef0bd962649..e2f25848878d28 100644 Binary files a/topics/assembly/images/vgp_assembly/genomescope_plot.png and b/topics/assembly/images/vgp_assembly/genomescope_plot.png differ diff --git a/topics/assembly/images/vgp_assembly/hi-c_pretext_conclusion.svg b/topics/assembly/images/vgp_assembly/hi-c_pretext_conclusion.svg new file mode 100644 index 00000000000000..a0d375f090b46d --- /dev/null +++ b/topics/assembly/images/vgp_assembly/hi-c_pretext_conclusion.svg @@ -0,0 +1 @@ + \ No newline at end of file diff --git a/topics/assembly/images/vgp_assembly/hi-c_pretext_final.svg b/topics/assembly/images/vgp_assembly/hi-c_pretext_final.svg new file mode 100644 index 00000000000000..f4eb73dbd8ac66 --- /dev/null +++ b/topics/assembly/images/vgp_assembly/hi-c_pretext_final.svg @@ -0,0 +1 @@ + \ No newline at end of file diff --git a/topics/assembly/images/vgp_assembly/imported_workflows.png b/topics/assembly/images/vgp_assembly/imported_workflows.png index 5930f6e8c47d59..7fc53bde582197 100644 Binary files a/topics/assembly/images/vgp_assembly/imported_workflows.png and b/topics/assembly/images/vgp_assembly/imported_workflows.png differ diff --git a/topics/assembly/images/vgp_assembly/importing_via_url_vgp_specific.png b/topics/assembly/images/vgp_assembly/importing_via_url_vgp_specific.png new file mode 100644 index 00000000000000..350130f42e00a6 Binary files /dev/null and b/topics/assembly/images/vgp_assembly/importing_via_url_vgp_specific.png differ diff --git a/topics/assembly/images/vgp_assembly/making_list.svg b/topics/assembly/images/vgp_assembly/making_list.svg new file mode 100644 index 00000000000000..7f39334687c8b8 --- /dev/null +++ b/topics/assembly/images/vgp_assembly/making_list.svg @@ -0,0 +1 @@ + \ No newline at end of file diff --git a/topics/assembly/images/vgp_assembly/merqury_cn_after_purging.svg b/topics/assembly/images/vgp_assembly/merqury_cn_after_purging.svg new file mode 100644 index 00000000000000..cb968775624087 --- /dev/null +++ b/topics/assembly/images/vgp_assembly/merqury_cn_after_purging.svg @@ -0,0 +1 @@ + \ No newline at end of file diff --git a/topics/assembly/images/vgp_assembly/quast_plot.png b/topics/assembly/images/vgp_assembly/quast_plot.png new file mode 100644 index 00000000000000..72318f26d9f1fc Binary files /dev/null and b/topics/assembly/images/vgp_assembly/quast_plot.png differ diff --git a/topics/assembly/images/vgp_assembly/vgp_wfs.svg b/topics/assembly/images/vgp_assembly/vgp_wfs.svg new file mode 100644 index 00000000000000..db867c01a5d793 --- /dev/null +++ b/topics/assembly/images/vgp_assembly/vgp_wfs.svg @@ -0,0 +1 @@ + \ No newline at end of file diff --git a/topics/assembly/images/vgp_assembly/wf1_launch_ui.png b/topics/assembly/images/vgp_assembly/wf1_launch_ui.png new file mode 100644 index 00000000000000..e35de8abd70030 Binary files /dev/null and b/topics/assembly/images/vgp_assembly/wf1_launch_ui.png differ diff --git a/topics/assembly/images/vgp_assembly/yeast_c_merqury_cn.svg b/topics/assembly/images/vgp_assembly/yeast_c_merqury_cn.svg new file mode 100644 index 00000000000000..9a01cf16010151 --- /dev/null +++ b/topics/assembly/images/vgp_assembly/yeast_c_merqury_cn.svg @@ -0,0 +1 @@ + \ No newline at end of file diff --git a/topics/assembly/tutorials/vgp_genome_assembly/faqs/dataset_upload_fasta_via_urls.md b/topics/assembly/tutorials/vgp_genome_assembly/faqs/dataset_upload_fasta_via_urls.md new file mode 100644 index 00000000000000..d1cc442fb4da11 --- /dev/null +++ b/topics/assembly/tutorials/vgp_genome_assembly/faqs/dataset_upload_fasta_via_urls.md @@ -0,0 +1,13 @@ +--- +title: Upload fasta datasets via links +area: data upload +box_type: tip +layout: faq +contributors: [nekrut] +--- + +Uploading `fasta` or `fasta.gz` datasets via URL. + +![UploadAnimatedPng]({{site.baseurl}}/faqs/galaxy/images/upload_fasta_via_url.png) + + diff --git a/faqs/galaxy/dataset_upload_fastqsanger_via_urls.md b/topics/assembly/tutorials/vgp_genome_assembly/faqs/dataset_upload_fastqsanger_via_urls.md similarity index 65% rename from faqs/galaxy/dataset_upload_fastqsanger_via_urls.md rename to topics/assembly/tutorials/vgp_genome_assembly/faqs/dataset_upload_fastqsanger_via_urls.md index 462a5c034e3761..5a9d23656f6948 100644 --- a/faqs/galaxy/dataset_upload_fastqsanger_via_urls.md +++ b/topics/assembly/tutorials/vgp_genome_assembly/faqs/dataset_upload_fastqsanger_via_urls.md @@ -6,6 +6,7 @@ layout: faq contributors: [nekrut] --- +Uploading `fastqsanger` or `fastqsanger.gz` datasets via URL. 1. Click on **Upload Data** on the top of the left panel: @@ -24,3 +25,10 @@ contributors: [nekrut] ![ChangeTypeDropDown]({{site.baseurl}}/faqs/galaxy/images/paste_fetch_set_data_type.png): +> Danger: Make sure you choose corect format! +> When selecting datatype in "**Type (set all)**" dropdown, make sure you select `fastaqsanger` or `fastqsanger.gz` BUT NOT `fastqcssanger` or anything else! +{: .warning} + +![UploadAnimatedPng]({{site.baseurl}}/faqs/galaxy/images/upload_fastqsanger_via_url.png) + + diff --git a/topics/assembly/tutorials/vgp_genome_assembly/tutorial.md b/topics/assembly/tutorials/vgp_genome_assembly/tutorial.md index 4923c72f840691..a9559c58bb5bf1 100644 --- a/topics/assembly/tutorials/vgp_genome_assembly/tutorial.md +++ b/topics/assembly/tutorials/vgp_genome_assembly/tutorial.md @@ -26,6 +26,7 @@ contributors: - pickettbd - gf777 - msozzoni +- nekrut abbreviations: BUSCO: Benchmarking Universal Single-Copy Orthologs NGS: next generation sequencing @@ -51,7 +52,7 @@ Repetitive elements can be grouped into two categories: interspersed repeats, su Heterozygosity is also an important factor impacting genome assembly. Haplotype phasing, the identification of alleles that are co-located on the same chromosome, has become a fundamental problem in heterozygous and polyploid genome assemblies ({% cite Zhang2020 %}). When no reference sequence is available, the *state-of-the-art* strategy consists of constructing a string graph with vertices representing reads and edges representing consistent overlaps. In this kind of graph, after transitive reduction, heterozygous alleles in the string graph are represented by bubbles. When combined with {Hi-C} data, this approach allows complete diploid reconstruction ({% cite DominguezDelAngel2018 %}, {% cite Zhang2020 %}, {% cite Dida2021 %}). -The {G10K} launched the Vertebrate Genome Project ({VGP}), whose goal is generating high-quality, near-error-free, gap-free, chromosome-level, haplotype-phased, annotated reference genome assemblies for every vertebrate species ({% cite Rhie2021 %}). This tutorial will guide you step by step to assemble a high-quality genome using the VGP assembly pipeline, including multiple {QC} evaluations. +The {G10K} launched the Vertebrate Genome Project ({VGP}), whose goal is generating high-quality, near-error-free, gap-free, chromosome-level, haplotype-phased, annotated reference genome assemblies for every vertebrate species ({% cite Rhie2021 %}). This tutorial will guide you step by step to assemble a high-quality genome using the VGP assembly pipeline, including multiple {QC} evaluations. > Your results may differ! > @@ -72,7 +73,7 @@ The {G10K} launched the Vertebrate Genome Project ({VGP}), whose goal is generat # Important terms to know -Before getting into the thick of things, let's go over some terms you will often hear when learning about genome assembly. These concepts will be used often throughout this tutorial as well, so please refer to this section as necessary to help your understanding. +Before getting into the thick of things, let's go over some terms you will often hear when learning about genome assembly. These concepts will be used often throughout this tutorial as well, so please refer to this section as necessary to help your understanding. **Pseudohaplotype assembly**: A genome assembly that consists of long phased haplotype blocks separated by regions where the haplotype cannot be distinguished (often homozygous regions). This can result in "switch errors", when the parental haplotypes alternate along the same sequence. These types of assemblies are usually represented by a _primary assembly_ and an _alternate assembly_. (This definition largely taken from the [NCBI's Genome Assembly Model](https://www.ncbi.nlm.nih.gov/assembly/model/#asmb_def).) @@ -80,11 +81,11 @@ Before getting into the thick of things, let's go over some terms you will often **Alternate assembly**: The alternate assembly consists of the alternate loci not represented in the _primary assembly_ (heterozygous loci from the other haplotype). These types of sequences are often referred to as haplotigs. Traditionally, the alternate assembly is less complete compared to the primary assembly since homozygous regions are not represented. -**Phasing**: Phasing aims to partition the contigs for an individual according to the haplotype they are derived from. When possible, this is done by identifying parental alleles using read data from the parents. Locally, this is achieved using linkage information in long read datasets. Recent approaches have managed to phase using long-range Hi-C linkage information from the same individual ({% cite Cheng2021 %}). +**Phasing**: Phasing aims to partition the contigs for an individual according to the haplotype they are derived from. When possible, this is done by identifying parental alleles using read data from the parents. Locally, this is achieved using linkage information in long read datasets. Recent approaches have managed to phase using long-range Hi-C linkage information from the same individual ({% cite Cheng2021 %}). **Assembly graph**: A representation of the genome inferred from sequencing reads. Sequencing captures the genome as many fragmented pieces, instead of whole entire chromosomes at once (we eagerly await the day when this statement will be outdated!). The start of the assembly process pieces together these genome fragments to generate an assembly graph, which is a representation of the sequences and their overlaps. Visualizing assembly graphs can show where homozygous regions branch off into alternate paths on different haplotypes. -**Unitig**: Usually the smallest unit of an assembly graph, consistent with all the available sequencing data. A unitig is often constructed from an unambiguous path in the assembly graph where all the vertices have exactly one incoming and one outgoing edge, except the first vertex can have any number of incoming edges, while the last vertex can have any number of outgoing edges ({% cite Rahman2022 %}). In other words, the internal vertices in the unitig path can only be walked one way, so unitigs represent a path of confident sequence. In the assembly graph, unitig nodes can then have overlap edges with other unitigs. +**Unitig**: Usually the smallest unit of an assembly graph, consistent with all the available sequencing data. A unitig is often constructed from an unambiguous path in the assembly graph where all the vertices have exactly one incoming and one outgoing edge, except the first vertex can have any number of incoming edges, while the last vertex can have any number of outgoing edges ({% cite Rahman2022 %}). In other words, the internal vertices in the unitig path can only be walked one way, so unitigs represent a path of confident sequence. In the assembly graph, unitig nodes can then have overlap edges with other unitigs. **Contig**: A contiguous (*i.e.*, gapless) sequence in an assembly, usually inferred algorithmically from the unitig graph. @@ -96,9 +97,9 @@ False duplications via **overlaps** result from unresolved overlaps in the assem ![Types of false duplication.](../../images/vgp_assembly/falseduplications.png "Schematic of types of false duplication. Image adapted from {% cite Rhie2021 %}.") -**Purging**: Purging aims to remove false duplications, collapsed repeats, and very low support/coverage regions from an assembly. When performed on a primary assembly, the haplotigs are retained and typically placed in the alternate assembly. +**Purging**: Purging aims to remove false duplications, collapsed repeats, and very low support/coverage regions from an assembly. When performed on a primary assembly, the haplotigs are retained and typically placed in the alternate assembly. -**Scaffold**: A scaffold refers to one or more contigs separated by gap (unknown) sequence. Contigs are usually generated with the aid of additional information, such as Bionano optical maps, linked reads, Hi-C chromatin information, etc. The regions between contigs are usually of unknown sequence, thus they are represented by sequences of _N_'s. Gaps length in the sequence can be sized or arbitrary, depending on the technology used for scaffolding (*e.g.*, optical maps can introduce sized gaps). +**Scaffold**: A scaffold refers to one or more contigs separated by gap (unknown) sequence. Contigs are usually generated with the aid of additional information, such as Bionano optical maps, linked reads, Hi-C chromatin information, etc. The regions between contigs are usually of unknown sequence, thus they are represented by sequences of _N_'s. Gaps length in the sequence can be sized or arbitrary, depending on the technology used for scaffolding (*e.g.*, optical maps can introduce sized gaps). For more about the specific scaffolding technologies used in the VGP pipeline (currently Bionano optical maps and Hi-C chromatin conformation data), please refer to those specific sections within this tutorial. @@ -106,13 +107,13 @@ For more about the specific scaffolding technologies used in the VGP pipeline (c **Ultra-long reads**: Ultra-long reads are typically defined as reads of over 100 kbp, and are usually generated using Oxford Nanopore Technology. Read quality is often lower than HiFi or Illumina (*i.e.*, have a higher error rate), but they are often significantly longer than any other current sequencing technology, and can help assembly algorithms walk complex repeat regions in the assembly graphs. -**Manual curation**: This term refers to manually evaluating and manipulating an assembly based on the raw supporting evidence (*e.g.*, using Hi-C contact map information). The user takes into account the original sequencing data to resolve potential _misassemblies_ and _missed joins_. +**Manual curation**: This term refers to manually evaluating and manipulating an assembly based on the raw supporting evidence (*e.g.*, using Hi-C contact map information). The user takes into account the original sequencing data to resolve potential _misassemblies_ and _missed joins_. -**Misassembly**: Misassemblies are a type of assembly error that usually refers to any structural error in the genome reconstruction, *.e.g.*, sequences that are not adjacent in the genome being placed next to each other in the sequence. Misassemblies can be potentially identified and remedied by manual curation. +**Misassembly**: Misassemblies are a type of assembly error that usually refers to any structural error in the genome reconstruction, *.e.g.*, sequences that are not adjacent in the genome being placed next to each other in the sequence. Misassemblies can be potentially identified and remedied by manual curation. **Missed join**: A missed join happens when two sequences are adjacent to each other in the genome but are not represented contiguously in the final sequence. Missed joins can be identified and remedied in manual curation with Hi-C data. -**Telomere-to-telomere assembly**: Often abbreviated as "T2T", this term refers to an assembly where each chromosome is completely gapless from telomere to telomere. The term usually refers to the recently completed CHM13 human genome ({% cite Nurk2022 %}), though there is an increasing number of efforts to generate T2T genomes for other species. +**Telomere-to-telomere assembly**: Often abbreviated as "T2T", this term refers to an assembly where each chromosome is completely gapless from telomere to telomere. The term usually refers to the recently completed CHM13 human genome ({% cite Nurk2022 %}), though there is an increasing number of efforts to generate T2T genomes for other species. # VGP assembly pipeline overview @@ -168,7 +169,7 @@ The first step is to get the datasets from Zenodo. The VGP assembly pipeline use > - Click `Add Definition` button and select `Type`: column `C` > - Click `Add Definition` button and select `Name Tag`: column `D` > - Click `Apply` and press Upload -> +> > 3. Import the remaining datasets from [Zenodo]({{ page.zenodo_link }}) > > - Open the file {% icon galaxy-upload %} __upload__ menu @@ -193,7 +194,7 @@ The first step is to get the datasets from Zenodo. The VGP assembly pipeline use {: .hands_on} ### HiFi reads preprocessing with **cutadapt** - + Adapter trimming usually means trimming the adapter sequence off the ends of reads, which is where the adapter sequence is usually located in {NGS} reads. However, due to the nature of {SMRT} sequencing technology, adapters do not have a specific, predictable location in {HiFi} reads. Additionally, the reads containing adapter sequence could be of generally lower quality compared to the rest of the reads. Thus, we will use **cutadapt** not to trim, but to remove the entire read if a read is found to have an adapter inside of it. > Background on PacBio HiFi reads @@ -244,8 +245,6 @@ Adapter trimming usually means trimming the adapter sequence off the ends of rea # Genome profile analysis -[{% icon exchange %} Switch to short version]({% link topics/assembly/tutorials/vgp_workflow_training/tutorial.md %}#genome-profile-analysis) - Before starting a *de novo* genome assembly project, it is useful to collect metrics on the properties of the genome under consideration, such as the expected genome size, so that you know what to expect from your assembly. Traditionally, DNA flow cytometry was considered the golden standard for estimating the genome size. Nowadays, experimental methods have been replaced by computational approaches ({% cite wang2020estimation %}). One of the widely used genome profiling methods is based on the analysis of *k*-mer frequencies. It allows one to provide information not only about the genomic complexity, such as the genome size and levels of heterozygosity and repeat content, but also about the data quality. > K-mer size, sequencing coverage and genome size @@ -256,11 +255,11 @@ Before starting a *de novo* genome assembly project, it is useful to collect met >---------| -------------|----------------------- > Bases | K-mer size | Total possible k-mers >---------| -------------|----------------------- -> 4 | 1 | 4 -> 4 | 2 | 16 -> 4 | 3 | 64 -> 4 | ... | ... -> 4 | 10 | 1.048.576 +> 4 | 1 | 4 +> 4 | 2 | 16 +> 4 | 3 | 64 +> 4 | ... | ... +> 4 | 10 | 1.048.576 >---------|--------------|----------------------- > > Thus, the k-mer size is a key parameter, which must be large enough to map uniquely to the genome, but not too large, since it can lead to wasting computational resources. In the case of the human genome, *k*-mers of 31 bases in length lead to 96.96% of unique *k*-mers. @@ -299,7 +298,7 @@ Meryl will allow us to generate the *k*-mer profile by decomposing the sequencin > - *"Operations on sets of k-mers"*: `Union-sum: return k-mers that occur in any input, set the count to the sum of the counts` > - {% icon param-file %} *"Input meryldb"*: `Collection meryldb` > -> 4. Rename it as `Merged meryldb` +> 4. Rename it as `Merged meryldb` > > 5. Run {% tool [Meryl](toolshed.g2.bx.psu.edu/repos/iuc/meryl/meryl/1.3+galaxy6) %} for the third time with the following parameters: > - *"Operation type selector"*: `Generate histogram dataset` @@ -353,8 +352,6 @@ This distribution is the result of the Poisson process underlying the generation # Assembly with **hifiasm** -[{% icon exchange %} Switch to short version]({% link topics/assembly/tutorials/vgp_workflow_training/tutorial.md %}#assembly-with-hifiasm) - Once we have finished the genome profiling stage, we can start the genome assembly with hifiasm, a fast open-source *de novo* assembler specifically developed for PacBio HiFi reads. One of the key advantages of hifiasm is that it allows us to resolve near-identical, but not exactly identical, sequences, such as repeats and segmental duplications ({% cite Cheng2021 %}). > Hifiasm algorithm details @@ -371,17 +368,17 @@ The output of hifiasm will be {GFA} files. These differ from FASTA files in that Hifiasm can be run in multiple modes depending on data availability: -**Solo**: generates a pseudohaplotype assembly, resulting in a primary & an alternate assembly (fig. 5). +**Solo**: generates a pseudohaplotype assembly, resulting in a primary & an alternate assembly (fig. 5). - _Input: only HiFi reads_ - _Output: scaffolded primary assembly, and alternate contigs_ ![Diagram for hifiasm solo mode.](../../images/vgp_assembly/hifiasm_solo_schematic.png "The solo pipeline creates primary and alternate contigs, which then typically undergo purging with purge_dups to reconcile the haplotypes. During the purging process, haplotigs are removed from the primary assembly and added to the alternate assembly, which is then purged to generate the final alternate set of contigs. The purged primary contigs are then carried through scaffolding with Bionano and/or Hi-C data, resulting in one final draft primary assembly to be sent to manual curation.") -**Hi-C-phased**: generates a hap1 assembly and a hap2 assembly, which are phased using the {Hi-C} reads from the same individual (fig. 6). +**Hi-C-phased**: generates a hap1 assembly and a hap2 assembly, which are phased using the {Hi-C} reads from the same individual (fig. 6). - _Input: HiFi & HiC reads_ - _Output: scaffolded hap1 assembly, and scaffolded hap2 assembly (assuming you run the scaffolding on **both** haplotypes)_ ![Diagram for hifiasm hic mode.](../../images/vgp_assembly/hifiasm_hic_schematic.png "The Hi-C-phased mode produces hap1 and hap2 contigs, which have been phased using the HiC information as described in {% cite Cheng2021 %}. Typically, these assemblies do not need to undergo purging, but you should always look at your assemblies' QC to make sure. These contigs are then scaffolded separately using Bionano and/or Hi-C workflows, resulting in two scaffolded assemblies.") -**Trio**: generates a maternal assembly and a paternal assembly, which are phased using reads from the parents (fig. 7). +**Trio**: generates a maternal assembly and a paternal assembly, which are phased using reads from the parents (fig. 7). - _Input: HiFi reads from child, Illumina reads from both parents._ - _Output: scaffolded maternal assembly, and scaffolded paternal assembly (assuming you run the scaffolding on **both** haplotypes)_ ![Diagram for hifiasm trio mode.](../../images/vgp_assembly/hifiasm_trio_schematic.png "The trio mode produces maternal and paternal contigs, which have been phased using paternal short read data. Typically, these assemblies do not need to undergo purging, but you should always look at your assemblies' QC to make sure. These contigs are then scaffolded separately using Bionano and/or Hi-C workflows, resulting in two scaffolded assemblies.") @@ -389,10 +386,10 @@ Hifiasm can be run in multiple modes depending on data availability: No matter which way you run hifiasm, you will have to evaluate the assemblies' {QC} to ensure your genome is in good shape. The VGP pipeline features several reference-free ways of evaluating assembly quality, all of which are automatically generated with our workflows; however, we will run them manually in this tutorial so we can familiarize ourselves with how each QC metric captures a different aspect of assembly quality. ## Assembly evaluation -- **gfastats**: manipulation & evaluation of assembly graphs and FASTA files, particularly used for summary statistics (*e.g.*, contig count, N50, NG50, etc.) ({% cite Formenti2022 %}). +- **gfastats**: manipulation & evaluation of assembly graphs and FASTA files, particularly used for summary statistics (*e.g.*, contig count, N50, NG50, etc.) ({% cite Formenti2022 %}). ![Schematic of N50 calculation.](../../images/vgp_assembly/n50schematic.jpg "N50 is a commonly reported statistic used to represent genome contiguity. N50 is calculated by sorting contigs according to their lengths, and then taking the halfway point of the total genome length. The size of the contig at that halfway point is the N50 value. In the pictured example, the total genome length is 400 bp, so the N50 value is 60 because the contig at the halfway point is 60 bp long. N50 can be interpreted as the value where >50% of an assembly's contigs are at that value or higher. Image adapted from Elin Videvall at The Molecular Ecologist.") - **{BUSCO}**: assesses completeness of a genome from an evolutionarily informed functional point of view. BUSCO genes are genes that are expected to be present at single-copy in one haplotype for a certain clade, so their presence, absence, or duplication can inform scientists about if an assembly is likely missing important regions, or if it has multiple copies of them, which can indicate a need for purging ({% cite Simo2015 %}). -- **Merqury**: reference-free assessment of assembly completeness and phasing based on *k*-mers. Merqury compares *k*-mers in the reads to the *k*-mers found in the assemblies, as well as the {CN} of each *k*-mer in the assemblies ({% cite Rhie_merqury %}). +- **Merqury**: reference-free assessment of assembly completeness and phasing based on *k*-mers. Merqury compares *k*-mers in the reads to the *k*-mers found in the assemblies, as well as the {CN} of each *k*-mer in the assemblies ({% cite Rhie_merqury %}). {% include _includes/cyoa-choices.html option1="hic" option2="solo" default="hic" @@ -450,7 +447,7 @@ We have obtained the fully phased contig graphs (as {GFA} files) of hap1 and hap > {: .comment} -Let's use gfastats to get a basic idea of what our assembly looks like. We'll run gfastats on the {GFA} files because gfastats can report graph-specific statistics as well. After generating the stats, we'll be doing some text manipulations to get the stats side-by-side in a pretty fashion. +Let's use gfastats to get a basic idea of what our assembly looks like. We'll run gfastats on the {GFA} files because gfastats can report graph-specific statistics as well. After generating the stats, we'll be doing some text manipulations to get the stats side-by-side in a pretty fashion. > Assembly evaluation with gfastats > @@ -471,7 +468,7 @@ Let's use gfastats to get a basic idea of what our assembly looks like. We'll ru > {: .hands_on} -Take a look at the _gfastats on hap1 and hap2 contigs_ output — it should have three columns: 1) name of statistic, 2) hap1 value, and 3) hap2 value. According to the report, both assemblies are quite similar; the hap1 assembly includes 16 contigs, totalling ~11.3Mbp of sequence (the `Total contig length` statistic), while the hap2 assembly includes 17 contigs, whose total length is ~12.2Mbp. (**NB**: Your values may differ slightly, or be reversed between the two haplotypes!) +Take a look at the _gfastats on hap1 and hap2 contigs_ output — it should have three columns: 1) name of statistic, 2) hap1 value, and 3) hap2 value. According to the report, both assemblies are quite similar; the hap1 assembly includes 16 contigs, totalling ~11.3Mbp of sequence (the `Total contig length` statistic), while the hap2 assembly includes 17 contigs, whose total length is ~12.2Mbp. (**NB**: Your values may differ slightly, or be reversed between the two haplotypes!) > > @@ -534,7 +531,7 @@ Despite BUSCO being robust for species that have been widely studied, it can be > - {% icon param-file %} *"k-mer counts database"*: `Merged meryldb` > - *"Number of assemblies"*: `Two assemblies > - {% icon param-file %} *"First genome assembly"*: `Hap1 contigs FASTA` -> - {% icon param-file %} *"Second genome assembly"*: `Hap2 contigs FASTA` +> - {% icon param-file %} *"Second genome assembly"*: `Hap2 contigs FASTA` > {: .hands_on} @@ -557,7 +554,7 @@ The large green peak is centered at 50x coverage (remember that's our diploid co ## Pseudohaplotype assembly with **hifiasm** -When hifiasm is run without any additional phasing data, it will do its best to generate a pseudohaplotype primary/alternate set of assemblies. These assemblies will typically contain more contigs that switch between parental blocks. Because of this, the primary assembly generated with this method can have a higher N50 value than an assembly generated with haplotype-phasing, but the contigs will contain more switch errors. +When hifiasm is run without any additional phasing data, it will do its best to generate a pseudohaplotype primary/alternate set of assemblies. These assemblies will typically contain more contigs that switch between parental blocks. Because of this, the primary assembly generated with this method can have a higher N50 value than an assembly generated with haplotype-phasing, but the contigs will contain more switch errors. > Pseudohaplotype assembly with hifiasm > 1. {% tool [Hifiasm](toolshed.g2.bx.psu.edu/repos/bgruening/hifiasm/hifiasm/0.18.8+galaxy1) %} with the following parameters: @@ -610,7 +607,7 @@ We have obtained the primary and alternate contig graphs (as {GFA} files), but t > {: .comment} -Let's use gfastats to get a basic idea of what our assembly looks like. We'll run gfastats on the {GFA} files because gfastats can report graph-specific statistics as well. After generating the stats, we'll be doing some text manipulation to get the stats side-by-side in a pretty fashion. +Let's use gfastats to get a basic idea of what our assembly looks like. We'll run gfastats on the {GFA} files because gfastats can report graph-specific statistics as well. After generating the stats, we'll be doing some text manipulation to get the stats side-by-side in a pretty fashion. > Assembly evaluation with gfastats > @@ -631,7 +628,7 @@ Let's use gfastats to get a basic idea of what our assembly looks like. We'll ru > {: .hands_on} -Take a look at the _gfastats on pri and alt contigs_ output — it should have three columns: 1) name of statistic, 2) primary assembly value, and 3) alternate assembly value. The report makes it clear that the two assemblies are markedly uneven: the primary assembly has 25 contigs totalling ~18.5 Mbp, while the alternate assembly has 8 contigs totalling only about 4.95 Mbp. If you'll remember that our estimated genome size is ~11.7 Mbp, then you'll see that the primary assembly has almost 2/3 more sequence than expected for a haploid representation of the genome! This is because a lot of heterozygous regions have had *both* copies of those loci placed into the primary assembly, as a result of incomplete purging. The presence of false duplications can be confirmed by looking at {BUSCO} and Merqury results. +Take a look at the _gfastats on pri and alt contigs_ output — it should have three columns: 1) name of statistic, 2) primary assembly value, and 3) alternate assembly value. The report makes it clear that the two assemblies are markedly uneven: the primary assembly has 25 contigs totalling ~18.5 Mbp, while the alternate assembly has 8 contigs totalling only about 4.95 Mbp. If you'll remember that our estimated genome size is ~11.7 Mbp, then you'll see that the primary assembly has almost 2/3 more sequence than expected for a haploid representation of the genome! This is because a lot of heterozygous regions have had *both* copies of those loci placed into the primary assembly, as a result of incomplete purging. The presence of false duplications can be confirmed by looking at {BUSCO} and Merqury results. > > @@ -698,7 +695,7 @@ Despite BUSCO being robust for species that have been widely studied, it can be > - {% icon param-file %} *"k-mer counts database"*: `Merged meryldb` > - *"Number of assemblies"*: `Two assemblies > - {% icon param-file %} *"First genome assembly"*: `Primary contigs FASTA` -> - {% icon param-file %} *"Second genome assembly"*: `Alternate contigs FASTA` +> - {% icon param-file %} *"Second genome assembly"*: `Alternate contigs FASTA` > {: .hands_on} @@ -712,9 +709,9 @@ To get an idea of how the *k*-mers have been distributed between our hap1 and ha ![Merqury spectra-asm plot for the hap1/hap2 assemblies.](../../images/vgp_assembly/merqury_prialt_asm_prepurge.png "Merqury ASM plot. This plot tracks the multiplicity of each k-mer found in the Hi-Fi read set and colors it according to which assemblies contain those k-mers. This can tell you which k-mers are found in only one assembly or shared between them."){:width="65%"} -For an idea of what a properly phased spectra-asm plot would look like, **please click over to the Hi-C phasing version of this tutorial**. A properly phased spectra-asm plot should have a large green peak centered around the point of diploid coverage (here ~50X), and the two assembly-specific peaks should be centered around the point of haploid coverage (here ~25X) and resembling each other in size. +For an idea of what a properly phased spectra-asm plot would look like, **please click over to the Hi-C phasing version of this tutorial**. A properly phased spectra-asm plot should have a large green peak centered around the point of diploid coverage (here ~50X), and the two assembly-specific peaks should be centered around the point of haploid coverage (here ~25X) and resembling each other in size. -The spectra-asm plot we have for our primary & alternate assemblies here does not resemble one that is properly phased. There is a peak of green (shared) *k*-mers around diploid coverage, indicating that some homozygous regions have been properly split between the primary and alternate assemblies; however, there is still a large red peak of primary-assembly-only *k*-mers at that coverage value, too, which means that some homozygous regions are being represented twice in the primary assembly, instead of once in the primary and once in the alternate. Additionally, for the haploid peaks, the primary-only peak (in red) is much larger than the alternate-only peak (in blue), indicating that a lot of heterozygous regions might have both their alternate alleles represented in the primary assembly, which is false duplication. +The spectra-asm plot we have for our primary & alternate assemblies here does not resemble one that is properly phased. There is a peak of green (shared) *k*-mers around diploid coverage, indicating that some homozygous regions have been properly split between the primary and alternate assemblies; however, there is still a large red peak of primary-assembly-only *k*-mers at that coverage value, too, which means that some homozygous regions are being represented twice in the primary assembly, instead of once in the primary and once in the alternate. Additionally, for the haploid peaks, the primary-only peak (in red) is much larger than the alternate-only peak (in blue), indicating that a lot of heterozygous regions might have both their alternate alleles represented in the primary assembly, which is false duplication. For further confirmation, we can also look at the individual, assembly-specific {CN} plots. In the Merqury outputs, the `output_merqury.assembly_01.spectra-cn.fl` is a {CN} spectra with *k*-mers colored according to their copy number in the primary assembly. @@ -722,7 +719,7 @@ For further confirmation, we can also look at the individual, assembly-specific In the primary-only {CN} plot, we observe a large 2-copy (colored blue) peak at diploid coverage. Ideally, this would not be here, beacause these diploid regions would be *1-copy in both assemblies*. Purging this assembly should reconcile this by removing one copy of false duplicates, making these 2-copy *k*-mers 1-copy. You might notice the 'read-only' peak at haploid coverage — this is actually expected, because 'read-only' here just means that the *k*-mer in question is not seen in this specific assembly while it was in the original readset. **Often, these 'read-only' _k_-mers are actually present as alternate loci in the other assembly.** -Now that we have looked at our primary assembly with multiple {QC} metrics, we know that it should undergo purging. The VGP pipeline uses **purge_dups** to remove false duplications from the primary assembly and put them in the alternate assembly to reconcile the haplotypes. Additionally, purge_dups can also find collapsed repeats and regions of suspiciously low coverage. +Now that we have looked at our primary assembly with multiple {QC} metrics, we know that it should undergo purging. The VGP pipeline uses **purge_dups** to remove false duplications from the primary assembly and put them in the alternate assembly to reconcile the haplotypes. Additionally, purge_dups can also find collapsed repeats and regions of suspiciously low coverage. ## Purging the primary and alternate assemblies @@ -778,7 +775,7 @@ The first relevant parameter is the `estimated genome size`. > {: .hands_on} -Now let's parse the `transition between haploid & diploid` and `upper bound for the read depth estimation` parameters. The transition between haploid & diploid represents the coverage value halfway between haploid and diploid coverage, and helps purger_dups identify *haplotigs*. The upper bound parameter will be used by purge_dups as high read depth cutoff to identify *collapsed repeats*. When repeats are collapsed in an assembly, they are not as long as they actually are in the genome. This results in a pileup of reads at the collapsed region when mapping the reads back to the assembly. +Now let's parse the `transition between haploid & diploid` and `upper bound for the read depth estimation` parameters. The transition between haploid & diploid represents the coverage value halfway between haploid and diploid coverage, and helps purger_dups identify *haplotigs*. The upper bound parameter will be used by purge_dups as high read depth cutoff to identify *collapsed repeats*. When repeats are collapsed in an assembly, they are not as long as they actually are in the genome. This results in a pileup of reads at the collapsed region when mapping the reads back to the assembly. > Get maximum read depth > @@ -837,8 +834,6 @@ Now let's parse the `transition between haploid & diploid` and `upper bound for ## Purging with **purge_dups** -[{% icon exchange %} Switch to short version]({% link topics/assembly/tutorials/vgp_workflow_training/tutorial.md %}#purging-with-purge_dups) - An ideal haploid representation would consist of one allelic copy of all heterozygous regions in the two haplomes, as well as all hemizygous regions from both haplomes ({% cite Guan2019 %}). However, in highly heterozygous genomes, assembly algorithms are frequently not able to identify the highly divergent allelic sequences as belonging to the same region, resulting in the assembly of those regions as separate contigs. This can lead to issues in downstream analysis, such as scaffolding, gene annotation and read mapping in general ({% cite Small2007 %}, {% cite Guan2019 %}, {% cite Roach2018 %}). In order to solve this problem, we are going to use purge_dups; this tool will allow us to identify and reassign allelic contigs. This stage consists of three substages: read-depth analysis, generation of all versus all self-alignment and resolution of haplotigs and overlaps (fig. 8). @@ -848,7 +843,7 @@ This stage consists of three substages: read-depth analysis, generation of all v ### Read-depth analysis Initially, we need to collapse our HiFi trimmed reads collection into a single dataset. - + > Collapse the collection > > 1. {% tool [Collapse Collection](toolshed.g2.bx.psu.edu/repos/nml/collapse_collections/collapse_dataset/4.2) %} with the following parameters: @@ -872,7 +867,7 @@ Now, we will map the reads against the primary assembly by using Minimap2 ({% ci > 2. Rename the output as `Reads mapped to contigs` {: .hands_on} -Finally, we will use the `Reads mapped to contigs` pairwise mapping format (PAF) file for calculating some statistics required in a later stage. In this step, purge_dups (listed as **Purge overlaps** in Galaxy tool panel) initially produces a read-depth histogram from base-level coverages. This information is used for estimating the coverage cutoffs, taking into account that collapsed haplotype contigs will lead to reads from both alleles mapping to those contigs, whereas if the alleles have assembled as separate contigs, then the reads will be split over the two contigs, resulting in half the read-depth ({% cite Roach2018 %}). +Finally, we will use the `Reads mapped to contigs` pairwise mapping format (PAF) file for calculating some statistics required in a later stage. In this step, purge_dups (listed as **Purge overlaps** in Galaxy tool panel) initially produces a read-depth histogram from base-level coverages. This information is used for estimating the coverage cutoffs, taking into account that collapsed haplotype contigs will lead to reads from both alleles mapping to those contigs, whereas if the alleles have assembled as separate contigs, then the reads will be split over the two contigs, resulting in half the read-depth ({% cite Roach2018 %}). > Read-depth analisys > 1. {% tool [Purge overlaps](toolshed.g2.bx.psu.edu/repos/iuc/purge_dups/purge_dups/1.2.5+galaxy3) %} with the following parameters: @@ -939,7 +934,7 @@ During the final step of the purge_dups pipeline, it will use the self alignment {: .details} > Resolution of haplotigs and overlaps -> +> > 1. {% tool [Purge overlaps](toolshed.g2.bx.psu.edu/repos/iuc/purge_dups/purge_dups/1.2.5+galaxy5) %} with the following parameters: > - *"Select the purge_dups function"*: `Purge haplotigs and overlaps for an assembly (purge_dups)` > - {% icon param-file %} *"PAF input file"*: `Self-homology map primary` @@ -961,7 +956,7 @@ During the final step of the purge_dups pipeline, it will use the self alignment ### Process the alternate assembly Now we should repeat the same procedure with the alternate contigs generated by hifiasm. In that case, we should start by merging the `Alternate haplotype contigs` generated in the previous step and the `Alternate contigs FASTA` file. - + > Merge the purged sequences and the Alternate contigs > > 1. {% tool [Concatenate datasets](cat1) %} with the following parameters: @@ -1019,7 +1014,7 @@ Once we have merged the files, we should run the purge_dups pipeline again, but > - *"Select an output format"*: `PAF` > > 7. Rename the output as `Self-homology map alternate` -> +> > 8. {% tool [Purge overlaps](toolshed.g2.bx.psu.edu/repos/iuc/purge_dups/purge_dups/1.2.5+galaxy5) %} with the following parameters: > - *"Select the purge_dups function"*: `Purge haplotigs and overlaps for an assembly (purge_dups)` > - {% icon param-file %} *"PAF input file"*: `Self-homology map alternate` @@ -1027,7 +1022,7 @@ Once we have merged the files, we should run the purge_dups pipeline again, but > - {% icon param-file %} *"Cutoffs file"*: `calcuts cutoff alternate` > > 9. Rename the output as `purge_dups BED alternate` -> +> > 10. {% tool [Purge overlaps](toolshed.g2.bx.psu.edu/repos/iuc/purge_dups/purge_dups/1.2.5+galaxy2) %} with the following parameters: > - *"Select the purge_dups function"*: `Obtain sequences after purging (get_seqs)` > - {% icon param-file %} *"Assembly FASTA file"*: `Alternate contigs full` @@ -1078,13 +1073,13 @@ The summary statistics indicate that both assemblies are now of a similar size t ![BUSCO for primary assembly after purging.](../../images/vgp_assembly/busco_pri_purged.png "BUSCO for the primary assembly after purging.") -The {BUSCO} results for the purged primary assembly look much better, since we no longer have the large amount of duplicate BUSCOs that we previously had. Additionally, there is no large increase in missing BUSCOs, indicating that we have *not* over-purged the primary assembly. +The {BUSCO} results for the purged primary assembly look much better, since we no longer have the large amount of duplicate BUSCOs that we previously had. Additionally, there is no large increase in missing BUSCOs, indicating that we have *not* over-purged the primary assembly. The previous metrics tell us that the primary is likely fixed after purging, but what about the previously incomplete alternate assembly? Let's see if the Merqury spectra plots show any change in how *k*-mers are split up between the two assemblies. ![Merqury spectra-asm plot after purging.](../../images/vgp_assembly/merqury_prialt_asm_postpurge.png "Merqury ASM plot after purging."){:width="65%"} -This looks a lot better! The diploid regions are all shared between the two assemblies (the large green peak centered at 50x, the diploid coverage value), and the haplotypic variation is shared between the primary and alternate assemblies (the red and blue peaks centered around 25x, the haploid coverage value). +This looks a lot better! The diploid regions are all shared between the two assemblies (the large green peak centered at 50x, the diploid coverage value), and the haplotypic variation is shared between the primary and alternate assemblies (the red and blue peaks centered around 25x, the haploid coverage value). ![Merqury spectra-cn plot for primary assembly after purging.](../../images/vgp_assembly/merqury_prialt_priCN_postpurge.png "Merqury CN plot for the primary assembly only after purging."){:width="65%"} @@ -1094,7 +1089,7 @@ Additionally, when we look at the primary-only {CN} plot, we see that the large # Scaffolding -At this point, we have a set of contigs, which may or may not be fully phased, depending on how we ran hifiasm. Next, the contigs will be assembled into scaffolds, *i.e.*, sequences of contigs interspaced with gaps. The VGP pipeline currently scaffolds using two additional technologies: Bionano optical maps and {Hi-C} data. +At this point, we have a set of contigs, which may or may not be fully phased, depending on how we ran hifiasm. Next, the contigs will be assembled into scaffolds, *i.e.*, sequences of contigs interspaced with gaps. The VGP pipeline currently scaffolds using two additional technologies: Bionano optical maps and {Hi-C} data. > What assembly am I scaffolding?? > @@ -1106,8 +1101,6 @@ At this point, we have a set of contigs, which may or may not be fully phased, d # Hybrid scaffolding with Bionano optical maps -[{% icon exchange %} Switch to short version]({% link topics/assembly/tutorials/vgp_workflow_training/tutorial.md %}#hybrid-scaffolding-with-bionano-optical-maps) - In this step, the linkage information provided by optical maps is integrated with primary assembly sequences, and the overlaps are used to orient and order the contigs, resolve chimeric joins, and estimate the length of gaps between adjacent contigs. One of the advantages of optical maps is that they can easily span genomic regions that are difficult to resolve using DNA sequencing technologies ({% cite Savara2021 %}, {% cite Yuan2020 %}). > What are Bionano optical maps? @@ -1154,7 +1147,7 @@ The *Bionano Hybrid Scaffold* tool automates the scaffolding process, which incl ## Evaluating Bionano scaffolds -Let's evaluate our scaffolds to see the impact of scaffolding on some key assembly statistics. +Let's evaluate our scaffolds to see the impact of scaffolding on some key assembly statistics. > Bionano assembly evaluation with QUAST and BUSCO > @@ -1183,8 +1176,6 @@ Let's evaluate our scaffolds to see the impact of scaffolding on some key assemb # Hi-C scaffolding -[{% icon exchange %} Switch to short version]({% link topics/assembly/tutorials/vgp_workflow_training/tutorial.md %}#hi-c-scaffolding) - Hi-C is a sequencing-based molecular assay designed to identify regions of frequent physical interaction in the genome by measuring the contact frequency between all pairs of loci, allowing us to provide an insight into the three-dimensional organization of a genome ({% cite Dixon2012 %}, {% cite LiebermanAiden2009 %}). In this final stage, we will exploit the fact that the contact frequency between a pair of loci strongly correlates with the one-dimensional distance between them with the objective of linking the Bionano scaffolds to a chromosome scale. > How does Hi-C sequencing work? @@ -1200,7 +1191,7 @@ Hi-C is a sequencing-based molecular assay designed to identify regions of frequ {: .details} -### Pre-processing Hi-C data +## Pre-processing Hi-C data Despite Hi-C generating paired-end reads, we need to map each read separately. This is because most aligners assume that the distance between paired-end reads fit a known distribution, but in Hi-C data the insert size of the ligation product can vary between one base pair to hundreds of megabases ({% cite Lajoie2015 %}). @@ -1238,7 +1229,7 @@ Despite Hi-C generating paired-end reads, we need to map each read separately. T Finally, we need to convert the BAM file to BED format and sort it. -### Generate initial Hi-C contact map +## Generate initial Hi-C contact map After mapping the Hi-C reads, the next step is to generate an initial Hi-C contact map, which will allow us to compare the Hi-C contact maps before and after using the Hi-C for scaffolding. @@ -1249,7 +1240,7 @@ After mapping the Hi-C reads, the next step is to generate an initial Hi-C conta > The higher interaction between cis regions can be explained, at least in part, by the territorial organization of chromosomes in interphase (chromosome territories), and in a genome-wide contact map, this pattern appears as blocks of high interaction centered along the diagonal and matching individual chromosomes (fig. 12) ({% cite Cremer2010 %}, {% cite Lajoie2015 %}). > > ![Hi-C map](../../images/vgp_assembly/hic_map.png "An example of a Hi-C map. Genomic regions are arranged along the x and y axes, and contacts are colored on the matrix like a heat map; here darker color indicates greater interaction frequency.") {:width="10%"} -> +> > On the other hand, the distance-dependent decay may be due to random movement of the chromosomes, and in the contact map appears as a gradual decrease of the interaction frequency the farther away from the diagonal it moves ({% cite Lajoie2015 %}). > > @@ -1276,9 +1267,9 @@ Let's have a look at the Hi-C contact maps generated by Pretext Snapshot. In the contact generated from the Bionano-scaffolded assembly can be identified 17 scaffolds, representing each of the haploid chromosomes of our genome (fig. 13.a). The fact that all the contact signals are found around the diagonal suggest that the contigs were scaffolded in the right order. However, during the assembly of complex genomes, it is common to find in the contact maps indicators of errors during the scaffolding process, as shown in the figure 13b. In that case, a contig belonging to the second chromosome has been misplaced as part of the fourth chromosome. We can also note that the final portion of the second chromosome should be placed at the beginning, as the off-diagonal contact signal suggests. -Once we have evaluated the quality of the scaffolded genome assembly, the next step consists in integrating the information contained in the HiC reads into our assembly, so that any errors identified can be resolved. For this purpose we will use SALSA2 ({% cite Ghurye2019 %}). - -### SALSA2 scaffolding +Once we have evaluated the quality of the scaffolded genome assembly, the next step consists in integrating the information contained in the HiC reads into our assembly, so that any errors identified can be resolved. For this purpose we will use SALSA2 ({% cite Ghurye2019 %}). + +## SALSA2 scaffolding SALSA2 is an open source software that makes use of Hi-C to linearly orient and order assembled contigs along entire chromosomes ({% cite Ghurye2019 %}). One of the advantages of SALSA2 with respect to most existing Hi-C scaffolding tools is that it doesn't require the estimated number of chromosomes. @@ -1333,9 +1324,9 @@ Now we can launch SALSA2 in order to generate the hybrid scaffolding based on th > {: .hands_on} -### Evaluate the final genome assembly with Pretext +## Evaluate the final genome assembly with Pretext -Finally, we should repeat the procedure described previously for generating the contact maps, but in that case, we will use the scaffold generated by SALSA2. +Finally, we should repeat the procedure described previously for generating the contact maps, but in that case, we will use the scaffold generated by SALSA2. > Mapping reads against the scaffold > @@ -1380,8 +1371,8 @@ Finally, we should repeat the procedure described previously for generating the > {: .hands_on} -In order to evaluate the Hi-C hybrid scaffolding, we are going to compare the contact maps before and after running SALSA2 (fig. 15). - +In order to evaluate the Hi-C hybrid scaffolding, we are going to compare the contact maps before and after running SALSA2 (fig. 15). + ![Figure 15: Pretext final contact map](../../images/vgp_assembly/hi-c_pretext_final.png "Hi-C map generated by Pretext after the hybrid scaffolding based on Hi-C data. The red circles indicate the differences between the contact map generated after (a) and before (b) Hi-C hybrid scaffolding.") Among the most notable differences that can be identified between the contact maps, it can be highlighted the regions marked with red circles, where inversion can be identified. @@ -1397,5 +1388,5 @@ With respect to the total sequence length, we can conclude that the size of our ![Comparison reference genome](../../images/vgp_assembly/hi-c_pretext_conclusion.png "Comparison between contact maps generated by using the final assembly (a) and the reference genome (b).") If we compare the contact map of our assembled genome (fig. 17a) with the reference assembly (fig. 17b), we can see that the two are essentially identical. This means that we have achieved an almost perfect assembly at the chromosome level. - + diff --git a/topics/assembly/tutorials/vgp_workflow_training/faqs/dataset_upload_fasta_via_urls.md b/topics/assembly/tutorials/vgp_workflow_training/faqs/dataset_upload_fasta_via_urls.md new file mode 120000 index 00000000000000..db82c116ba8cb6 --- /dev/null +++ b/topics/assembly/tutorials/vgp_workflow_training/faqs/dataset_upload_fasta_via_urls.md @@ -0,0 +1 @@ +../../vgp_genome_assembly/faqs/dataset_upload_fasta_via_urls.md \ No newline at end of file diff --git a/topics/assembly/tutorials/vgp_workflow_training/faqs/dataset_upload_fastqsanger_via_urls.md b/topics/assembly/tutorials/vgp_workflow_training/faqs/dataset_upload_fastqsanger_via_urls.md new file mode 120000 index 00000000000000..6bcf7be046cf73 --- /dev/null +++ b/topics/assembly/tutorials/vgp_workflow_training/faqs/dataset_upload_fastqsanger_via_urls.md @@ -0,0 +1 @@ +../../vgp_genome_assembly/faqs/dataset_upload_fastqsanger_via_urls.md \ No newline at end of file diff --git a/topics/assembly/tutorials/vgp_workflow_training/tutorial.bib b/topics/assembly/tutorials/vgp_workflow_training/tutorial.bib index f24affb86d9917..33a1cc88fe170e 100644 --- a/topics/assembly/tutorials/vgp_workflow_training/tutorial.bib +++ b/topics/assembly/tutorials/vgp_workflow_training/tutorial.bib @@ -246,3 +246,41 @@ @article{Lariviere2022 title = {VGP assembly pipeline}, journal = {Galaxy Training Network} } + +@article{Guan2020-st, + title = {Identifying and removing haplotypic duplication in primary genome + assemblies}, + author = {Guan, Dengfeng and McCarthy, Shane A and Wood, Jonathan and Howe, + Kerstin and Wang, Yadong and Durbin, Richard}, + abstract = {MOTIVATION: Rapid development in long-read sequencing and + scaffolding technologies is accelerating the production of + reference-quality assemblies for large eukaryotic genomes. + However, haplotype divergence in regions of high heterozygosity + often results in assemblers creating two copies rather than one + copy of a region, leading to breaks in contiguity and + compromising downstream steps such as gene annotation. Several + tools have been developed to resolve this problem. However, they + either focus only on removing contained duplicate regions, also + known as haplotigs, or fail to use all the relevant information + and hence make errors. RESULTS: Here we present a novel tool, + purge\_dups, that uses sequence similarity and read depth to + automatically identify and remove both haplotigs and heterozygous + overlaps. In comparison with current tools, we demonstrate that + purge\_dups can reduce heterozygous duplication and increase + assembly continuity while maintaining completeness of the primary + assembly. Moreover, purge\_dups is fully automatic and can easily + be integrated into assembly pipelines. AVAILABILITY AND + IMPLEMENTATION: The source code is written in C and is available + at https://github.com/dfguan/purge\_dups. SUPPLEMENTARY + INFORMATION: Supplementary data are available at Bioinformatics + online.}, + doi = {10.1093/bioinformatics/btaa025}, + journal = {Bioinformatics}, + volume = {36}, + number = {9}, + pages = {2896--2898}, + month = {may}, + year = {2020}, + language = {en} +} + diff --git a/topics/assembly/tutorials/vgp_workflow_training/tutorial.md b/topics/assembly/tutorials/vgp_workflow_training/tutorial.md index ce0f8f7fb07c79..9106230ba203b0 100644 --- a/topics/assembly/tutorials/vgp_workflow_training/tutorial.md +++ b/topics/assembly/tutorials/vgp_workflow_training/tutorial.md @@ -23,12 +23,16 @@ contributors: - gallardoalba - pickettbd - abueg +- nekrut abbreviations: primary assembly: homozygous regions of the genome plus one set of alleles for the heterozygous loci alternate assembly: alternate loci not represented in the primary assembly QV: assembly consensus quality - unitig: + unitig: A uniquely assembleable subset of overlapping fragments. A unitig is an assembly of fragments for which there are no competing internal overlaps. A unitig is either a correctly assembled portion of a contig or a collapsed assembly of several high-fidelity copies of a repeat. contigs: contiguous sequences in an assembly + collection: Galaxy's way to represent multiple datasets as a single interface entity + collections: Galaxy's way to represent multiple datasets as a single interface entity + scaffold: one or more contigs joined by gap sequence scaffolds: one or more contigs joined by gap sequence Hi-C: all-versus-all chromatin conformation capture HiFi: high fidelity reads @@ -37,13 +41,7 @@ abbreviations: G10K: Genome 10K --- - - -The {VGP}, a project of the {G10K} Consortium, aims to generate high-quality, near error-free, gap-free, chromosome-level, haplotype-phased, annotated reference genome assemblies for every vertebrate species ({% cite Rhie2021 %}). The VGP has developed a fully automated *de-novo* genome assembly pipeline, which uses a combination of three different technologies: Pacbio {HiFi}, Bionano optical maps, and {Hi-C} data. - -As a result of a collaboration with the VGP team, a training including a step-by-step detailed description of parameter choices for each step of assembly was developed for the Galaxy Training Network ({% cite Lariviere2022 %}). The following tutorial instead provides a quick walkthrough on how the workflows can be used to rapidly assemble a genome using the VGP pipeline with the {GWS}. - -GWS facilitates analysis repeatability, while minimizing the number of manual steps required to execute an analysis workflow, and automating the process of inputting parameters and software tool version tracking. The objective of this training is to explain how to run the VGP workflow, focusing on what are the required inputs and which outputs are generated and delegating how the steps are executed to the GWS. +The {VGP}, a project of the {G10K} Consortium, aims to generate high-quality, near error-free, gap-free, chromosome-level, haplotype-phased, annotated reference genome assemblies for every vertebrate species ({% cite Rhie2021 %}). The VGP has developed a fully automated *de-novo* genome assembly pipeline, which uses a combination of three different technologies: Pacbio {HiFi}, {Hi-C} data, and (optionally) BioNano optical map data. The pipeline consists of nine distinct workflows. This tutorial provides a quick example of how to run these workflows for one particular scenario, which is, based on our experience, the most common: assembling genomes using {HiFi} Reads combined with {Hi-C} data (both generated from the same individual). > > @@ -56,169 +54,307 @@ GWS facilitates analysis repeatability, while minimizing the number of manual st # Getting started on Galaxy -This tutorial assumes you are comfortable getting data into Galaxy, running jobs, managing history, etc. If you are unfamiliar with Galaxy, we recommed you visit the [Galaxy Training Network](https://training.galaxyproject.org). Consider starting with the following trainings: +This tutorial assumes you are comfortable getting data into Galaxy, running jobs, managing history, etc. If you are unfamiliar with Galaxy, we recommend you visit the [Galaxy Training Network](https://training.galaxyproject.org). Consider starting with the following trainings: - [Introduction to Galaxy]({% link topics/introduction/tutorials/introduction/slides.html %}) - [Galaxy 101]({% link topics/introduction/tutorials/galaxy-intro-101/tutorial.md %}) - [Getting Data into Galaxy]({% link topics/galaxy-interface/tutorials/get-data/slides.html %}) - [Using Dataset Collections]({% link topics/galaxy-interface/tutorials/collections/tutorial.md %}) -- [Introduction to Galaxy Analyses]({% link topics/introduction/index.md %}) - [Understanding the Galaxy History System]({% link topics/galaxy-interface/tutorials/history/tutorial.md %}) +- [Introduction to Galaxy Analyses]({% link topics/introduction/index.md %}) - [Downloading and Deleting Data in Galaxy]({% link topics/galaxy-interface/tutorials/download-delete-data/tutorial.md %}) +# The VGP-Galaxy pipeline -# VGP assembly workflow structure +The {VGP} assembly pipeline has a modular organization, consisting in ten workflows (Fig. 1). It can used with the following types of input data: -The {VGP} assembly pipeline has a modular organization, consisting in five main subworkflows (fig. 1), each one integrated by a series of data manipulation steps. Firstly, it allows the evaluation of intermediate steps, which facilitates the modification of parameters if necessary, without the need to start from the initial stage. Secondly, it allows to adapt the workflow to the available data. +| Input data | Assembly quality | Analysis trajectory
([Fig. 1)](#figure-1)| +|------|---------------|-----| +| HiFi | The minimum requirement | A | +| HiFi + HiC| Better continuity | B | +| HiFi + BioNano | Better continuity | C | +| HiFi + Hi-C + BioNano | Even better continuity | D | +| HiFi + parental data| Better haplotype resolution | E | +| HiFi + parental data + Hi-C| Better haplotype resolution and improved continuity | F | +| HiFi + parental + BioNano | Better haplotype resolution and improved continuity | G | +| HiFi + parental data + Hi-C + BioNano | Better haplotype resolution and ultimate continuity | H | -![Figure 1: VGP pipeline modules](../../images/vgp_assembly/VGP_workflow_modules.png "VGP assembly pipeline. The VGP workflow is implemented in a modular fashion: it consists of five independent subworkflows. In addition, it includes some additional workflows (not shown in the figure), required for exporting the results to GenomeArk.") +If this table "HiFi" and "Hi-C" are derived from the individual whose genome is being assembled. "Parental data" is high coverage Illumina data derived from parents of the individual being assembled. Datasets containing parental data are also called "*Trios*". Each combination of input datasets is supported by an *analysis trajectory*: a combination of workflows designed for generating assembly given a particular combination of inputs. These trajectories are listed in the table above and shown in the figure below. We suggest at least 30✕ PacBio HiFi coverage and 30✕ Hi-C coverage per haplotype (parental genome); and up to 60✕ coverage to accurately assemble highly repetitive regions. -The VGP pipeline first uses an assembly program to generate {contigs}. When {Hi-C} data and Bionano data are avilable, then they are used to generate {scaffolds}. When both data types are available, then Bionano scaffolding is run first before Hi-C scaffolding, but if optical maps are not available then HiC scaffolding can be run on the contigs. +![The nine workflows of Galaxy assembly pipeline](../../images/vgp_assembly/VGP_workflow_modules.svg "Eight analysis trajectories are possible depending on the combination of input data. A decision on whether or not to invoke Workflow 6 is based on the analysis of QC output of workflows 3, 4, or 5. Thicker lines connecting Workflows 7, 8, and 9 represent the fact that these workflows are invoked separately for each phased assembly (once for maternal and once for paternal).") +
+The first stage of the pipeline is the generation of *k*-mer profiles of the raw reads to estimate genome size, heterozygosity, repetitiveness, and error rate necessary for parameterizing downstream workflows. The generation of *k*-mer counts can be done from HiFi data only (Workflow 1) or include data from parental reads for trio-based phasing (Workflow 2; trio is a combination of paternal sequencing data with that from an offspring that is being assembled). The second stage is the phased contig assembly. In addition to using only {HiFi} reads (Workflow 3), the contig building (contiging) step can leverage {Hi-C} (Workflow 4) or parental read data (Workflow 5) to produce fully-phased haplotypes (hap1/hap2 or parental/maternal assigned haplotypes), using [`hifiasm`](https://github.com/chhylp123/hifiasm). The contiging workflows also produce a number of critical quality control (QC) metrics such as *k*-mer multiplicity profiles. Inspection of these profiles provides information to decide whether the third stage—purging of false duplication—is required. Purging (Workflow 6), using [`purge_dups`](https://github.com/dfguan/purge_dups) identifies and resolves haplotype-specific assembly segments incorrectly labeled as primary contigs, as well as heterozygous contig overlaps. This increases continuity and the quality of the final assembly. The purging stage is generally unnecessary for trio data for which reliable haplotype resolution is performed using *k*-mer profiles obtained from parental reads. The fourth stage, scaffolding, produces chromosome-level scaffolds using information provided by Bionano (Workflow 7), with [`Bionano Solve`](https://bionano.com/software-downloads/) (optional) and Hi-C (Workflow 8) data and [`YaHS`](https://github.com/c-zhou/yahsscaffolding) algorithms. A final stage of decontamination (Workflow 9) removes exogenous sequences (e.g., viral and bacterial sequences) from the scaffolded assembly. A separate workflow (WF0) is used for mitochondrial assembly. -> Input option order -> This tutorial assumes the input datasets are high-quality. QC on raw read data should be performed before it is used. QC on raw read data is outside the scope of this tutorial. +> A note on data quality +> We suggest at least 30✕ PacBio HiFi coverage and 30✕ Hi-C coverage per haplotype (parental genome); and up to 60✕ coverage to accurately assemble highly repetitive regions. {: .comment} -## Get data +# Getting the data + +The following steps use PacBio {HiFi} and Illumina {Hi-C} data from baker's yeast ([*Saccharomyces cerevisiae*](https://en.wikipedia.org/wiki/Saccharomyces_cerevisiae)). The tutorial represents trajectory **B** from Fig. 1 above. For this tutorial, the first step is to get the datasets from Zenodo. Specifically, we will be uploading two datasets: -For this tutorial, the first step is to get the datasets from Zenodo. The VGP assembly pipeline uses data generated by a variety of technologies, including PacBio HiFi reads, Bionano optical maps, and Hi-C chromatin interaction maps. +1. A set of PacBio {HiFi} reads in `fasta` format +2. A set of Illumina {Hi-C} reads in `fastqsanger.gz` format -> Data upload +## Uploading `fasta` datasets from Zenodo + +The following two steps demonstrate how to upload three PacBio {HiFi} datasets into you Galaxy history. + + +> Uploading FASTA datasets from Zenodo > > 1. Create a new history for this tutorial -> 2. Import the files from [Zenodo]({{ page.zenodo_link }}) -> -> - Open the file {% icon galaxy-upload %} __upload__ menu -> - Click on **Rule-based** tab -> - *"Upload data as"*: `Datasets` -> - Copy the tabular data, paste it into the textbox and press Build -> -> ``` -> Hi-C_dataset_F https://zenodo.org/record/5550653/files/SRR7126301_1.fastq.gz?download=1 fastqsanger.gz Hi-C -> Hi-C_dataset_R https://zenodo.org/record/5550653/files/SRR7126301_2.fastq.gz?download=1 fastqsanger.gz Hi-C -> Bionano_dataset https://zenodo.org/record/5550653/files/bionano.cmap?download=1 cmap Bionano -> ``` -> -> - From **Rules** menu select `Add / Modify Column Definitions` -> - Click `Add Definition` button and select `Name`: column `A` -> - Click `Add Definition` button and select `URL`: column `B` -> - Click `Add Definition` button and select `Type`: column `C` -> - Click `Add Definition` button and select `Name Tag`: column `D` -> - Click `Apply` and press Upload -> -> 3. Import the remaining datasets from [Zenodo]({{ page.zenodo_link }}) -> -> - Open the file {% icon galaxy-upload %} __upload__ menu -> - Click on **Rule-based** tab -> - *"Upload data as"*: `Collections` -> - Copy the tabular data, paste it into the textbox and press Build -> -> ``` -> dataset_01 https://zenodo.org/record/6098306/files/HiFi_synthetic_50x_01.fasta?download=1 fasta HiFi HiFi_collection -> dataset_02 https://zenodo.org/record/6098306/files/HiFi_synthetic_50x_02.fasta?download=1 fasta HiFi HiFi_collection -> dataset_03 https://zenodo.org/record/6098306/files/HiFi_synthetic_50x_03.fasta?download=1 fasta HiFi HiFi_collection -> ``` -> -> - From **Rules** menu select `Add / Modify Column Definitions` -> - Click `Add Definition` button and select `List Identifier(s)`: column `A` -> - Click `Add Definition` button and select `URL`: column `B` -> - Click `Add Definition` button and select `Type`: column `C` -> - Click `Add Definition` button and select `Group Tag`: column `D` -> - Click `Add Definition` button and select `Collection Name`: column `E` -> - Click `Apply` and press Upload +> +> {% snippet faqs/galaxy/histories_create_new.md %} +> +> 2. Copy the following URLs into clipboard. +> - you can do this by clicking on {% icon copy %} button in the right upper corner of the box below. It will appear if you mouse over the box.) +> +> ``` +> https://zenodo.org/record/6098306/files/HiFi_synthetic_50x_01.fasta +> https://zenodo.org/record/6098306/files/HiFi_synthetic_50x_02.fasta +> https://zenodo.org/record/6098306/files/HiFi_synthetic_50x_03.fasta +> ``` +> +> 3. Upload datasets into Galaxy. +> - set the datatype to `fasta` +> +> {% snippet faqs/galaxy/datasets_import_via_link.md format="fasta" %} +> +> {% snippet topics/assembly/tutorials/vgp_genome_assembly/faqs/dataset_upload_fasta_via_urls.md %} > {: .hands_on} -> Working with your own data +## Uploading `fastqsanger.gz` datasets from Zenodo + +Illumina {Hi-C} data is uploaded in essentially the same way as shown in the following two steps. + +> DANGER: Make sure you choose correct format! +> When selecting datatype in "**Type (set all)**" drop-down, make sure you select `fastaqsanger` or `fastqsanger.gz` BUT NOT `fastqcssanger` or anything else! +{: .warning} + +> Uploading fastqsanger.gz datasets from Zenodo > -> If working on a genome other than the example yeast genome, you can upload the VGP data from the [VGP/Genome Ark AWS S3 bucket](https://genomeark.s3.amazonaws.com/index.html) as follows: +> 1. Copy the following URLs into clipboard. +> - you can do this by clicking on {% icon copy %} button in the right upper corner of the box below. It will appear if you mouse over the box. > -> > Import data from GenomeArk -> > -> > 1. Open the file {% icon galaxy-upload %} __upload__ menu -> > 2. Click on **Choose remote files** tab -> > 3. Click on the **Genome Ark** button and then click on **species** -> {: .hands_on} +> ``` +> https://zenodo.org/record/5550653/files/SRR7126301_1.fastq.gz +> https://zenodo.org/record/5550653/files/SRR7126301_2.fastq.gz +> ``` > -> You can find the VGP data following this path: `/species/${Genus}_${species}/${specimen_code}/genomic_data`. Inside a given datatype directory (*e.g.* `pacbio`), select all the relevant files individually until all the desired files are highlighted and click the Ok button. Note that there may be multiple pages of files listed. Also note that you may not want every file listed. +> 2. Upload datasets into Galaxy. +> - set the datatype to `fastqsanger.gz` > -> {% snippet faqs/galaxy/collections_build_list.md %} +> {% snippet faqs/galaxy/datasets_import_via_link.md format="fasta" %} > -{: .details} +> {% snippet topics/assembly/tutorials/vgp_genome_assembly/faqs/dataset_upload_fastqsanger_via_urls.md %} +> +{: .hands_on} -Once we have imported the datasets, the next step is to import the VGP workflows from the WorkflowHub. -## Import workflows from WorkflowHub -{% snippet faqs/galaxy/workflows_import_from_workflowhub.md filter="name:vgp" %} +> These datasets are large! +> Hi-C datasets are large. It will take some time (~15 min) for them to be fully uploaded. Please, be patient. +{: .warning} -The workflows imported are marked with a red square in the following figure: +## Organizing the data -![Figure 2: Workflow menu](../../images/vgp_assembly/imported_workflows.png "Workflow main menu. The workflow menu lists all the workflows that have been imported. It provides useful information for organizing the workflows, such as last update and the tags. The worklows can be run by clicking in the play icon, marked in red in the image.") +If everything goes smoothly you history will look like shown in Fig. 4 below. The three {HiFi} fasta files are better represented as a collection: {collection}. Also, importantly, the workflow we will be using for the analysis of our data takes collection as an input (it does not access individual datasets). So let's create a collection using steps outlines in the Tip {% icon tip %} "Creating a dataset collection" that you can find below Fig. 4. -Once we have imported the datasets and the workflows, we can start with the genome assembly. +![AfterUpload](../../images/vgp_assembly/making_list.svg "History after uploading HiFi and HiC data (left). Creation of a list (collection) combines all HiFi datasets into a single history item called 'HiFi data' (right). See below for instruction on how to make this collection.") + +{% snippet faqs/galaxy/collections_build_list.md %} + +> Other ways to upload the data +> You can obviously upload your own datasets via URLs as illustrated above or from your own computer. In addition, you can upload data from a major repository called [GenomeArk](https://genomeark.org). GenomeArk is integrated directly into Galaxy Upload. To use GenomeArk following the steps in the Tip {% icon tip %} below: +> +> {% snippet faqs/galaxy/datasets_upload_from_genomeark.md %} +{: .details} -> Workflow-centric Research Objects + +Once we have imported the datasets, the next step is to import the workflows necessary for the analysis of our data from [DockStore](https://dockstore.org). + +# Importing workflows + +All analyses described in this tutorial are performed using *workflows*--chains of tools--shown in [Fig. 1](#figure-1). Specifically, we will use four workflows corresponding to analysis trajectory **B**: 1, 4, 6, and 8. To use these four workflows you need to import them into your Galaxy account following the steps below: + +> Importing workflows from GitHub > -> In WorkflowHub, workflows are packaged, registered, downloaded and exchanged as Research Objects using the RO-Crate specification, with test and example data, managed metadata profiles, citations and more. +> Links to the four workflows that will be used in this tutorial are listed in the table. Follow the procedure described below the table to import each of them into your Galaxy account. +>
> -{: .comment} +> | Workflow | Link | +> |---------|---------| +> | *K*-mer profiling workflow (WF1) | [https://raw.githubusercontent.com/iwc-workflows/kmer-profiling-hifi-VGP1/v0.1.4/kmer-profiling-hifi-VGP1.ga](https://raw.githubusercontent.com/iwc-workflows/kmer-profiling-hifi-VGP1/v0.1.4/kmer-profiling-hifi-VGP1.ga) | +> | Assembly (contiging) with Hi-C workflow (WF4) | [https://raw.githubusercontent.com/iwc-workflows/Assembly-Hifi-HiC-phasing-VGP4/v0.1.6/Assembly-Hifi-HiC-phasing-VGP4.ga](https://raw.githubusercontent.com/iwc-workflows/Assembly-Hifi-HiC-phasing-VGP4/v0.1.6/Assembly-Hifi-HiC-phasing-VGP4.ga) | +> | Purge duplicate contigs workflow (WF6) | [https://raw.githubusercontent.com/iwc-workflows/Purge-duplicate-contigs-VGP6/v0.3.2/Purge-duplicate-contigs-VGP6.ga](https://raw.githubusercontent.com/iwc-workflows/Purge-duplicate-contigs-VGP6/v0.3.2/Purge-duplicate-contigs-VGP6.ga) | +> | Scaffolding with Hi-C workflow (WF8) | [https://raw.githubusercontent.com/iwc-workflows/Scaffolding-HiC-VGP8/v0.2/Scaffolding-HiC-VGP8.ga](https://raw.githubusercontent.com/iwc-workflows/Scaffolding-HiC-VGP8/v0.2/Scaffolding-HiC-VGP8.ga)| +> +>
+> +> **Step 1: Copy the workflow URL into clipboard** +> +> 1. Right click on a URL in the table above. +> 2. Select "Copy link address" option in the dropdown menu that appears. +> 3. Go to Galaxy +> +>> Make sure you are logged in! +>> Ensure that you are logged in into your Galaxy account! +> {: .warning} +> +>
+> +> **Step 2: Import the workflow** +> +> 1. Click "Workflow" on top of the Galaxy interface. +> 2. On top-right of the middle pane click "{% icon galaxy-upload %} Import" button. +> 3. Paste the URL you copied into the clipboard at Step 1 above to "Archived Workflow URL" box. +> 4. Click "Import workflow" button. +> +> This entire procedure is shown in the animated figure below. {% icon warning %} **You need to repeat this process for all four workflows** +> +> ![Upload via URL](../../images/vgp_assembly/importing_via_url_vgp_specific.png "Importing a workflow via URL.") +> +{: .hands-on} + +> Other ways to import workflows the data +> You can import workflows from a variety of different sources including [DockStore](https://dockstore.org), [WorkflowHub](https://workflowhub.eu), or a URL: +> +> {% snippet faqs/galaxy/workflows_import_from_dockstore.md %} +> +> {% snippet faqs/galaxy/workflows_import_from_workflowhub.md %} +> +> {% snippet faqs/galaxy/workflows_import.md %} +> +{: .details} -# Genome profile analysis +Once all four workflows are imported, your workflow list should look like this: -[{% icon exchange %} Switch to step by step version]({% link topics/assembly/tutorials/vgp_genome_assembly/tutorial.md %}#genome-profile-analysis) +![Workflow menu](../../images/vgp_assembly/imported_workflows.png "Workflow list. The workflow menu lists all the workflows that have been imported. It provides useful information for organizing the workflows, such as last update and the tags. The workflows can be run by clicking in the play icon, marked in red in the image.") -Now that our data and workflows are imported, we can run our first workflow. Before the assembly can be run, we need to collect metrics on the properties of the genome under consideration, such as the expected genome size according to our data. The present pipeline uses **Meryl** for generating the k-mer database and **Genomescope2** for determining genome characteristics based on a k-mer analysis. +Once we have imported the datasets and the workflows, we can start with the genome assembly. + +# Performing the assembly + +Workflows listed in [Fig. 1](#figure-1) support a variety of "analysis trajectories". The majority of species that were sequenced by the {VGP} usually contain {HiFi} reads for the individual being sequenced supplemented with {Hi-C} data. As a result most assemblies performed by us follow the trajectory **B**. This is why this tutorial was designed to follow this trajectory as well. + +## Genome profile analysis (WF1) -> VGP genome profile analysis workflow +Now that our data and workflows are imported, we can run our first workflow. Before the assembly can be run, we need to collect metrics on the properties of the genome under consideration, such as the expected genome size according to our data. The present pipeline uses **Meryl** for generating the *k*-mer database and **Genomescope2** for determining genome characteristics based on a *k*-mer analysis. + +### Launching the workflow + +> Launching K-mer profile analysis workflow +> +> **Step 1: Identify inputs** +> +> The profiling workflow takes the following inputs: +> +> 1. {HiFi} reads as a collection +> 2. *K*-mer length +> 3. Ploidy +> +> **Step 2: Launch *k*-mer profiling workflow** > > 1. Click in the **Workflow** menu, located in the top bar -> 2. Click in the {% icon workflow-run %} **Run workflow** buttom corresponding to `VGP genome profile analysis` +> 2. Click in the {% icon workflow-run %} **Run workflow** buttom corresponding to `K-mer profiling and QC (WF1)` > 3. In the **Workflow: VGP genome profile analysis** menu: -> - {% icon param-collection %} "*Collection of Pacbio Data*": `7: HiFi_collection` -> - "*K-mer length*": `31` -> - "*Ploidy*": `2` +> - {% icon param-collection %} "*Collection of Pacbio Data*": `7: HiFi_collection` +> - "*K-mer length*": `31` +> - "*Ploidy*": `2` > 4. Click on the Run workflow buttom > -> > K-mer length -> > In this tutorial, we are using a k-mer length of 31. This can vary, but the VGP pipeline tends to use a k-mer length of 21, which tends to work well for most mammalian-size genomes. There is more discussion about k-mer length trade-offs in the extended VGP pipeline tutorial. +> This should like this: +> +> +>![Parameters of *k*-mer profiling workflow](../../images/vgp_assembly/wf1_launch_ui.png "Workflow main menu. The workflow menu lists all the workflows that have been imported. It provides useful information for organizing the workflows, such as last update and the tags. The worklows can be run by clicking in the play icon, marked in red in the image.") +> +> +>> K-mer length +>> In this tutorial, we are using a *k*-mer length of 31. This can vary, but the VGP pipeline tends to use a *k*-mer length of 21, which tends to work well for most mammalian-size genomes. There is more discussion about *k*-mer length trade-offs in the extended VGP pipeline tutorial. > {: .comment} > +>
+> +> **Step 3: Refill your coffee** +> +> Assembly is not exactly an instantaneous type of analysis - this workflow will take approx 15 minutes to complete. The same is true for all analyses in tutorial. {: .hands_on} -Once the workflow has finished, we can evaluate the linear plot generated by **Genomescope** (fig. 3), which includes valuable information such as the observed k-mer profile, fitted models and estimated parameters. This file corresponds to the dataset `26`. +### Interpreting the results -![Figure 3: Genomescope plot](../../images/vgp_assembly/genomescope_plot.png "GenomeScope2 k-mer profile. The first peak located at about 25x corresponds to the heterozygous peak. The second peak at 50x, corresponds to the homozygous peak. The plot also includes information about the the inferred total genome length (len), genome unique length percent (uniq), overall heterozygosity rate (ab), mean k-mer coverage for heterozygous bases (kcov), read error rate (err), average rate of read duplications (dup) and k-mer size (k).") + Once the workflow has finished, we can evaluate the linear plot generated by [**`Genomescope`**](https://github.com/schatzlab/`Genomescope`), which includes valuable information such as the observed *k*-mer profile, fitted models and estimated parameters. This file corresponds to the dataset `15` in this [history](https://usegalaxy.org/u/cartman/h/k-mer-profiling). +
+![`Genomescope` plot](../../images/vgp_assembly/genomescope_plot.png "GenomeScope2 k-mer profile. The first peak located at about 25× corresponds to the heterozygous peak. The second peak at 50×, corresponds to the homozygous peak. The plot also includes information about the the inferred total genome length (len), genome unique length percent (uniq), overall heterozygosity rate (ab), mean k-mer coverage for heterozygous bases (kcov), read error rate (err), average rate of read duplications (dup) and k-mer size (k).") +
+This distribution is the result of the Poisson process underlying the generation of sequencing reads. As we can see, the *k*-mer profile follows a bimodal distribution, indicative of a diploid genome. The distribution is consistent with the theoretical diploid model (model fit > 93%). Low frequency *k*-mers are the result of sequencing errors, and are indicated by the red line. Genomescope2 estimated a haploid genome size of around 11.7 Mbp, a value reasonably close to the *Saccharomyces* genome size. -This distribution is the result of the Poisson process underlying the generation of sequencing reads. As we can see, the k-mer profile follows a bimodal distribution, indicative of a diploid genome. The distribution is consistent with the theoretical diploid model (model fit > 93%). Low frequency *k*-mers are the result of sequencing errors, and are indicated by the red line. GenomeScope2 estimated a haploid genome size of around 11.7 Mbp, a value reasonably close to the *Saccharomyces* genome size. +## Assembly (contiging) with `hifiasm` (WF4) -# Assembly with hifiasm +To generate {contigs} we will use [**hifiasm**](https://github.com/chhylp123/hifiasm) assembler. It is a part of the `Assembly with HiC (WF4)` workflow . This workflow uses **hifiasm** (HiC mode) to generate HiC-phased haplotypes (hap1 and hap2). This is in contrast to its default mode, which generates primary and alternate pseudohaplotype assemblies. This workflow includes three tools for evaluating assembly quality: [**`gfastats`**](https://github.com/vgl-hub/gfastats), [**`BUSCO`**](https://busco.ezlab.org/) and [**`Merqury`**](https://github.com/marbl/merqury). -[{% icon exchange %} Switch to step by step version]({% link topics/assembly/tutorials/vgp_genome_assembly/tutorial.md %}#assembly-with-hifiasm) +### Launching the workflow -To generate {contigs}, the VGP pipeline uses **hifiasm**. After genome profiling, the next step is to run the **VGP HiFi phased assembly with hifiasm and HiC data workflow**. This workflow uses **hifiasm** (HiC mode) to generate HiC-phased haplotypes (hap1 and hap2). This is in contrast to its default mode, which generates primary and alternate pseudohaplotype assemblies. This workflow includes three tools for evaluating assembly quality: **gfastats**, **BUSCO** and **Merqury**. - -> VGP HiFi phased assembly with hifiasm and HiC data workflow -> 1. Click in the **Workflow** menu, located in the top bar -> 2. Click in the {% icon workflow-run %} **Run workflow** buttom corresponding to `VGP HiFi phased assembly with hifiasm and HiC data` -> 3. In the **Workflow: VGP HiFi phased assembly with hifiasm and HiC data** menu: -> - {% icon param-collection %} "*Pacbio Reads Collection*": `7. HiFi_collection` -> - {% icon param-file %} "*Meryl database*": `12: Meryl on data 11, data 10, data 9: read-db.meryldb` -> - {% icon param-file %} "*HiC forward reads*": `3: Hi-C_dataset_F` -> - {% icon param-file %} "*HiC reverse reads*": `2: Hi-C_dataset_R` -> - {% icon param-file %} "*Genomescope summary dataset*": `19: Genomescope on data 13 Summary` -> 4. Click on the Run workflow button +> Launching assembly (contiging) workflow > -> > Input option order -> > Note that the order of the inputs may differ slightly. -> {: .comment} +>**Step 1: Identify inputs** +> +>The assembly workflow takes the following inputs: > +> 1. {HiFi} reads as a collection +> 2. Forward Hi-C reads +> 3. Reverse Hi-C reads +> 4. `Genomescope` Model Parameters generated by previous (*k*-mer profiling) workflow +> 5. `Genomescope` Summary generated by previous (*k*-mer profiling) workflow +> 6. Meryl *k*-mer database generated by previous (*k*-mer profiling) workflow +> 7. Busco lineage +>
+> +>**Step 2: Launch the workflow** +> +> 1. Click in the **Workflow** menu, located in the top bar +> 2. Click in the {% icon workflow-run %} **Run workflow** button corresponding to `VGP HiFi phased assembly with hifiasm and HiC data` +> 3. In the **Workflow: Assembly with HiC (WF4)** menu fill the following parameters: +> - {% icon param-collection %} "*Pacbio Reads Collection*": Collection with original HiFi data +> - {% icon param-file %} "*Meryl database*": Meryl *k*-mer database: one of the outputs of the previous workflow (contains tag "`MerylDatabase`") +> - {% icon param-file %} "*HiC forward reads*": Forward Hi-C reads +> - {% icon param-file %} "*HiC reverse reads*": Reverse Hi-C reads +> - {% icon param-file %} "*Provide lineage for BUSCO (e.g., Vertebrata)*": `Ascomycota` +> - {% icon param-file %} "*GenomeScope Summary*": GenomeScope summary: one of the outputs of the previous workflow (contains tag "`GenomeScopeSummary`") +> - {% icon param-file %} "*GenomeScope Model Parameters*": GenomeScope model parameters: one of the outputs of the previous workflow (contains tag "`GenomeScopeParameters`") +> 4. Click on the Run workflow button {: .hands_on} -Let's have a look at the stats generated by **gfastats**. This output summarizes some main assembly statistics, such as contig number, N50, assembly length, etc. - -According to the report, both assemblies are quite similar; the primary assembly includes 18 {contigs}, whose cumulative length is around 12.2Mbp. The alternate assembly includes 17 contigs, whose total length is 11.3Mbp. As we can see in figure 4a, both assemblies come close to the estimated genome size, which is as expected since we used hifiasm-HiC mode to generate phased assemblies which lowers the chance of false duplications that can inflate assembly size. +### Interpreting the results + +> There will be two assemblies! +> Because we are assembling a diploid organism this workflow will produce two assemblies: hap1 and hap2! +{: .warning} + +Let's have a look at the stats generated by **gfastats**. This output summarizes some main assembly statistics, such as contig number, N50, assembly length, etc. Below we provide a partial output of `gfastats` in which information about both assemblies is shown side-by-side: + +>| Statistic | Hap 1 | Hap 2 | +>|-----------|----------:|------:| +>| # contigs | 16 | 19 | +>| Total contig length | 12,050,076 | 12,360,746 | +>| Average contig length | 753,129.75 | 650,565.58 | +>| Contig N50 | 923,452 | 922,430 | +>| Contig N50 | 923,452 | 922,430 | +>| Contig auN | 909,022.62 | 891,508.36 | +>| Contig L50 | 6 | 6 | +>| Contig L50 | 6 | 6 | +>| Contig NG50 | 923,452 | 922,430 | +>| Contig NG50 | 923,452 | 922,430 | +>| Contig auNG | 932,462.97 | 938,074.26 | +>| Contig LG50 | 6 | 6 | +>| Contig LG50 | 6 | 6 | +>| Largest contig | 1,532,843 | 1,531,728 | +>| Smallest contig | 231,313 | 26,588 | +{: .matrix} + +According to the report, both assemblies are quite similar; the primary assembly includes 16 {contigs}, whose cumulative length is around 12 Mbp. The alternate assembly includes 19 contigs, whose total length is 12.3Mbp. Both assemblies come close to the estimated genome size, which is as expected since we used hifiasm-HiC mode to generate phased assemblies which lowers the chance of false duplications that can inflate assembly size. > Are you working with pri/alt assemblies? > This tutorial uses the hifiasm-HiC workflow, which generates phased hap1 and hap2 assemblies. The phasing helps lower the chance of false duplications, since the phasing information helps the assembler know which genomic variation is heterozygosity at the same locus versus being two different loci entirely. If you are working with primary/alternate assemblies (especially if there is no internal purging in the initial assembly), you can expect higher false duplication rates than we observe here with the yeast HiC hap1/hap2. @@ -228,13 +364,11 @@ According to the report, both assemblies are quite similar; the primary assembly > > 1. What is the longest contig in the primary assembly? And in the alternate one? > 2. What is the N50 of the primary assembly? -> 3. Which percentage of reads mapped to each assembly? > > > > > -> > 1. The longest contig in the primary assembly is 1.532.843 bp, and 1.532.843 bp in the alternate assembly. -> > 2. The N50 of the primary assembly is 922.430 bp. -> > 3. According to the report, 100% of reads mapped to both the primary assembly and the alternate assembly. +> > 1. The longest contig in the primary assembly is 1,532,843 bp, and 1,531,728 bp in the alternate assembly. +> > 2. The N50 of the primary assembly is 923.452 bp. > > > {: .solution} > @@ -242,7 +376,11 @@ According to the report, both assemblies are quite similar; the primary assembly Next, we are going to evaluate the outputs generated by **BUSCO**. This tool provides quantitative assessment of the completeness of a genome assembly in terms of expected gene content. It relies on the analysis of genes that should be present only once in a complete assembly or gene set, while allowing for rare gene duplications or losses ({% cite Simo2015 %}). -![Figure 5 : BUSCO](../../images/vgp_assembly/BUSCO_full_table.png "BUSCO full table. It contains the complete results in a tabular format with scores and lengths of BUSCO matches, and coordinates.") +
+ +![BUSCO assessment](../../images/vgp_assembly/busco_after_contiging.svg "A composite of BUSCO completeness summaries for hap1 and hap2") + +
As we can see in the report, the results are simplified into four categories: *complete and single-copy*, *complete and duplicated*, *fragmented* and *missing*. @@ -253,121 +391,150 @@ As we can see in the report, the results are simplified into four categories: *c > > > > > -> > 1. According to the report, our assembly contains the complete sequence of 2080 complete BUSCO genes (97.3%). -> > 2. 19 BUSCO genes are missing. +> > 1. According to the report, our assembly contains the complete sequence of 1,562 complete BUSCO genes. +> > 2. 92 BUSCO genes are missing. > > > {: .solution} > {: .question} -Despite **BUSCO** being robust for species that have been widely studied, it can be inaccurate when the newly assembled genome belongs to a taxonomic group that is not well represented in [OrthoDB](https://www.orthodb.org/). Merqury provides a complementary approach for assessing genome assembly quality metrics in a reference-free manner via *k*-mer copy number analysis. Specifically, it takes our hap1 as the first genome assembly, hap2 as the second genome assembly, and the merylDB generated previously for k-mer counts. Like the other QC metrics we have been looking at, the VGP Hifiasm-HiC workflow will automatically generate the Merqury analysis. +Despite **BUSCO** being robust for species that have been widely studied, it can be inaccurate when the newly assembled genome belongs to a taxonomic group that is not well represented in [OrthoDB](https://www.orthodb.org/). `Merqury` provides a complementary approach for assessing genome assembly quality metrics in a reference-free manner via *k*-mer copy number analysis. Specifically, it takes our hap1 as the first genome assembly, hap2 as the second genome assembly, and the merylDB generated previously for *k*-mer counts. -By default, **Merqury** generates three collections as output: stats, plots and {QV} stats. The "stats" collection contains the completeness statistics, while the "QV stats" collection contains the quality value statistics. Let's have a look at the copy number (CN) spectrum plot, known as the *spectra-cn* plot. The spectra-cn plot looks at both of your assemblies (here, your haplotypes) taken *together* (fig. 6a). We can see a small amount of false duplications here: at the 50 mark on the x-axis, there is a small amount of k-mers present at 3-copy across the two assemblies (the green bump). +By default, `Merqury` generates three collections as output: stats, plots and {QV} stats. The "stats" collection contains the completeness statistics, while the "QV stats" collection contains the quality value statistics. Let's have a look at the copy number (CN) spectrum plot, known as the *spectra-cn* plot. The spectra-cn plot looks at both of your assemblies (here, your haplotypes) taken *together* (fig. 6a). We can see a small amount of false duplications here: at the 50 mark on the x-axis, there is a small amount of *k*-mers present at 3-copy across the two assemblies (the green bump). +
+![Figure 6: Merqury spectra-cn plot for initial yeast contigs](../../images/vgp_assembly/yeast_c_merqury_cn.svg "Merqury CN plot for yeast assemblies. The plot tracks the multiplicity of each k-mer found in the read set and colors it by the number of times it is found in a given assembly. Merqury connects the midpoint of each histogram bin with a line, giving the illusion of a smooth curve. a). K-mer distribution of both haplotypes. b). K-mer distribution of an individual haplotype (hap2)."){:width="100%"} +
-![Figure 6: Merqury spectra-cn plot for initial yeast contigs](../../images/vgp_assembly/yeast_c_merqury_cn.png "Merqury CN plot for these yeast haplotypes. The plot tracks the multiplicity of each k-mer found in the read set and colors it by the number of times it is found in a given assembly. Merqury connects the midpoint of each histogram bin with a line, giving the illusion of a smooth curve. K-mer distribution of both haplotypes (a). K-mer distribution of an individual haplotype (b)"){:width="100%"} - - Thus, we know there is some false duplication (the 3-copy green bump) present as 2-copy in one of our assemblies, but we don't know which one. We can look at the individual copy number spectrum for each haplotype in order to figure out which one contains the 2-copy k-mers (*i.e.*, the false duplications). In the Merqury spectra-CN plot for hap2 we can see the small bump of 2-copy k-mers at around the 50 mark on the x-axis (fig. 6b). +Thus, we know there is some false duplication (the 3-copy green bump) present as 2-copy in one of our assemblies, but we don't know which one. We can look at the individual copy number spectrum for each haplotype in order to figure out which one contains the 2-copy *k*-mers (*i.e.*, the false duplications). In the Merqury spectra-CN plot for hap2 we can see the small bump of 2-copy *k*-mers (blue) at around the 50 mark on the x-axis (fig. 6b). Now that we know which haplotype contains the false duplications, we can run the purging workflow to try to get rid of these duplicates. -# Purging with purge_dups +## Purging duplicates with `purge_dups` -[{% icon exchange %} Switch to step by step version]({% link topics/assembly/tutorials/vgp_genome_assembly/tutorial.md %}#purging-with-purgedups) +An ideal haploid representation would consist of one allelic copy of all heterozygous regions in the two haplotypes, as well as all hemizygous regions from both haplotypes ({% cite Guan2019 %}). However, in highly heterozygous genomes, assembly algorithms are frequently not able to identify the highly divergent allelic sequences as belonging to the same region, resulting in the assembly of those regions as separate contigs. In order to prevent potential issues in downstream analysis, we are going to run the **Purge duplicate contigs (WF6)**, which will allow to identify and reassign heterozygous contigs. This step is only necessary if haplotypic duplications are observed, and the output should be carefully checked for overpurging. -An ideal haploid representation would consist of one allelic copy of all heterozygous regions in the two haplotypes, as well as all hemizygous regions from both haplotypes ({% cite Guan2019 %}). However, in highly heterozygous genomes, assembly algorithms are frequently not able to identify the highly divergent allelic sequences as belonging to the same region, resulting in the assembly of those regions as separate contigs. In order to prevent potential issues in downstream analysis, we are going to run the **VGP purge assembly with purge_dups workflow**, which will allow to identify and reassign heterozygous contigs. This step is only necessary if haplotypic duplications are observed, and the output should be carefully checked for overpurging. +### Launching the workflow -> VGP purge assembly with purge_dups pipeline workflow +> Launching duplicate purging workflow +> +>**Step 1: Identify inputs** +> +>The purging workflow takes the following inputs: +> +> 1. {HiFi} reads as a collection +> 2. Primary assembly produced by `hifiasm` in the previous run of assembly workflow (WF4). +> 3. Alternate assembly produced by `hifiasam` in the previous run of assembly workflow (WF4). +> 4. `Genomescope` Model Parameters generated by previous (*k*-mer profiling) workflow +> 5. Estimated genome size parsed from GenoeScope summary by the previous run of assembly workflow (WF4). +> 6. Meryl *k*-mer database generated by previous (*k*-mer profiling, WF1) workflow +> 7. Busco lineage +> +>**Step 2: Launch Purge duplicate contigs workflow (WF6)** > > 1. Click in the **Workflow** menu, located in the top bar -> 2. Click in the {% icon workflow-run %} **Run workflow** buttom corresponding to `VGP purge assembly with purge_dups pipeline` +> 2. Click in the {% icon workflow-run %} **Run workflow** button corresponding to `Purge duplicate contigs (WF6)` > 3. In the **Workflow: VGP purge assembly with purge_dups pipeline** menu: -> - {% icon param-file %} "*Hifiasm Primary assembly*": `39: Hifiasm HiC hap1` -> - {% icon param-file %} "*Hifiasm Alternate assembly*": `40: Hifiasm HiC hap2` -> - {% icon param-collection %} "*Pacbio Reads Collection - Trimmed*": `22: Cutadapt` -> - {% icon param-file %} "*Genomescope model parameters*": `20: Genomescope on data 13 Model parameters` +> - {% icon param-collection %} "*Pacbio Reads Collection - Trimmed*": One of the outputs of the assembly workflow is a trimmed collection of HiFi reads. It has a tag `trimmed_hifi`. +> - {% icon param-file %} "*Hifiasm Primary assembly*": An output of the assembly workflow (WF4) containing contigs for Hap1 in FASTA format. This dataset has a tag `hifiasm_Assembly_Haplotype_1`. +> - {% icon param-file %} "*Hifiasm Alternate assembly*": An output of the assembly workflow (WF4) containing contigs for Hap2 in FASTA format. This dataset has a tag `hifiasm_Assembly_Haplotype_2` +> - {% icon param-file %} "*Meryl database*": Meryl *k*-mer database: one of the outputs of the previous workflow (contains tag "`MerylDatabase`") +> - {% icon param-file %} "*GenomeScope Model Parameters*": GenomeScope model parameters: one of the outputs of the previous workflow (contains tag "`GenomeScopeParameters`") +> - {% icon param-file %} "*Estimated genome size*": A dataset produced with the assembly workflow (WF4). It contains a tag `estimated_genome_size`. +> - {% icon param-file %} "*Provide lineage for BUSCO (e.g., Vertebrata)*": `Ascomycota` > 4. Click in the Run workflow buttom -> {: .hands_on} -This workflow generates a large number of outputs, among which we should highlight the datasets `74` and `91`, which correspond to the purged primary and alternative assemblies respectively. +### Interpreting results -# Hybrid scaffolding with Bionano optical maps +The two most important outputs of the purging workflow are purged versions of Primary and Alternate assemblies. These have tags PurgedPrimaryAssembly and PurgedAlternateAssembly for Primary and Alternate assemblies, respectively. This step also provides QC metrics for evaluating the effect of purging (Figure below). -[{% icon exchange %} Switch to step by step version]({% link topics/assembly/tutorials/vgp_genome_assembly/tutorial.md %}#hybrid-scaffolding-with-bionano-optical-maps) +
-Once the assemblies generated by **hifiasm** have been purged, the next step is to run the **VGP hybrid scaffolding with Bionano optical maps workflow**. It will integrate the information provided by optical maps with primary assembly sequences in order to detect and correct chimeric joins and misoriented contigs. In addition, this workflow includes some additonal steps for evaluating the outputs. +![Comparison of pre- and post-purging](../../images/vgp_assembly/merqury_cn_after_purging.svg "Comparison of pre- (a) and c)) and post-purging (b) and d)) Merqury CN spectra . The two top plots (a) and b)) for our dataset (yeast) and the two bottom plots (c) and d)) for a Chub mackerel (Scomber japonicus) -- a much larger genome. In the case of yeast the difference is not profound because our training dataset has been downsized and groomed to be as small as possible. In the case of zebra finch the green bump (k-mers appearing in three copies) is smaller after purging (Although potential overpurging can be seen by the new read-only (grey) bump that was not there before). Given the scale of the Y-axis this difference is substantial."){:width="100%"} -> VGP hybrid scaffolding with Bionano optical maps workflow -> -> 1. Click in the **Workflow** menu, located in the top bar -> 2. Click in the {% icon workflow-run %} **Run workflow** buttom corresponding to `VGP hybrid scaffolding with Bionano optical maps` -> 3. In the **Workflow: VGP hybrid scaffolding with Bionano optical maps** menu: -> - {% icon param-file %} "*Bionano data*": `1: Bionano_dataset` -> - {% icon param-file %} "*Hifiasm Purged Assembly*": `90: Purge overlaps on data 88 and data 33: get_seqs purged sequences` (note: the data numbers may differ in your workflow, but it should still be tagged `p1` from the purge_dups workflow) -> - {% icon param-file %} "*Estimated genome size - Parameter File*": `60: Estimated Genome size` -> - "*Is genome large (>100Mb)?*": `No` -> 4. Click on the Run workflow buttom -{: .hands_on} +
-Once the workfow has finished, let's have a look at the assembly reports. +## Hi-C scaffolding -As we can observe in the cumulative plot of the file `119` (fig. 7a), the total length of the assembly (12.160.926 bp) is slightly larger than the expected genome size. With respect to the NG50 statistic (fig. 7b), the value is 922.430 bp, which is significantly higher than the value obtained during the first evaluation stage (813.039 bp). This increase in NG50 means this scaffolded assembly is more contiguous compared to the non-scaffolded contigs. +In this final stage, we will run the **Scaffolding HiC YAHS (WF8)**, which exploits the fact that the contact frequency between a pair of loci strongly correlates with the one-dimensional distance between them. This information allows [**YAHS**](https://github.com/c-zhou/yahs) -- the main tool in this workflow -- to generate scaffolds that are often chromosome-sized. -It is also recommended to examine **BUSCO** outputs. In the summary image (fig. 7c), which can be found in the daset `117`, we can appreciate that most of the universal single-copy orthologs are present in our assembly at the expected single-copy. +### Launching Hi-C scaffolding workflow -> +> The scaffolding workflow is run on ONE haplotype at a time. +> Contiging (WF4) and purging (WF6) workflows work with both (hap1/hap2, primary/alternate) assemblies simultaneously. This is not the case for contiging -- it hgas to be run independently for each haplotype assembly. In this example (below) we run contiging on hap1 (Primary) assembly only. +{: .warning} + +> Launching Hi-C scaffolding workflow > -> 1. How many scaffolds are in the primary assembly after the hybrid scaffolding? -> 2. What is the size of the largest scaffold? Has this changed with respect to the previous evaluation? -> 3. What is the percentage of completeness on the core set genes in BUSCO? Has Bionano scaffolding increased the completeness? +>**Step 1: Identify inputs** > -> > -> > -> > 1. The number of contigs is 17. -> > 2. The largest contig is 1.531.728 bp long. This value hasn't changed. This is expected, as the VGP pipeline implementation of Bionano scaffolding does not allow for breaking contigs. -> > 3. The percentage of complete BUSCOs is 95.7%. Yes, it has increased, since in the previous evaluation the completeness percentage was 88.7%. -> > -> {: .solution} +>The scaffolding workflow takes the following inputs: > -{: .question} +> 1. An assembly graph +> 2. Forward Hi-C reads +> 3. Reverse Hi-C reads +> 4. Estimated genome size parsed from GenoeScope summary by the previous run of assembly workflow (WF4). +> 5. Restriction enzymes used in Hi-C library preparation procedure +> 6. Busco lineage +> +> **Step 2: Launch scaffolding workflow (WF8)** +> +> 1. Click in the **Workflow** menu, located in the top bar +> 2. Click in the {% icon workflow-run %} **Run workflow** button corresponding to `Scaffolding HiC YAHS (WF8)` +> 3. In the **Scaffolding HiC YAHS (WF8)** menu: +> - {% icon param-file %} "*input GFA*": Output of purging workflow (WF6) with a tag `PurgedPrimaryAssembly` (or `PurgedPrimaryAssembly` of scaffolding the Alternate assembly). +> - {% icon param-file %} "*HiC forward reads*": Forward Hi-C reads +> - {% icon param-file %} "*HiC reverse reads*": Reverse Hi-C reads +> - {% icon param-file %} "*Estimated genome size - Parameter File*": An output of the contiging workflow (WF4) with a tag `estimated_genome_size`. +> - {% icon param-file %} "*Provide lineage for BUSCO (e.g., Vertebrata)*": `Ascomycota` +> 4. Click in the Run workflow button +{: .hands_on} -# Hi-C scaffolding +> Bypassing purging workflow +> In some situations (such as assemblies utilizing Trio data (Fig. 1) you do not need to perform purging and can go directly from contiging to scaffolding. In this case you will need to use an output of contiging workflow that has a tag `hic_hap1_gfa` for primary assembly or `hic_hap2_gfa` for alternate assembly: +> +>In other words, the only parameter that you will need to set differently (relative to setting above) is this: +>
+> {% icon param-file %} "*input GFA*": Output of contiging workflow (WF4) with a tag `hic_hap1_gfa` for primary assembly or `hic_hap2_gfa` for alternate assembly. +>
+{: .comment} -[{% icon exchange %} Switch to step by step version]({% link topics/assembly/tutorials/vgp_genome_assembly/tutorial.md %}#hi-c-scaffolding) +### Interpreting the results -In this final stage, we will run the **VGP hybrid scaffolding with HiC data**, which exploits the fact that the contact frequency between a pair of loci strongly correlates with the one-dimensional distance between them. This information allows further scaffolding the Bionano scaffolds using **SALSA2**, usually generating chromosome-level scaffolds. +In order to evaluate the Hi-C hybrid scaffolding, we are going to compare the contact maps before and after running the HiC hybrid scaffolding workflow (Fig. below). They will have the following tags: +- Before scaffolding: `pretext_s1` +- After scaffolding: `pretext_s2` -> VGP hybrid scaffolding with HiC data -> -> 1. Click in the **Workflow** menu, located in the top bar -> 2. Click in the {% icon workflow-run %} **Run workflow** buttom corresponding to `VGP hybrid scaffolding with HiC data` -> 3. In the **Workflow: VGP hybrid scaffolding with HiC data** menu: -> - {% icon param-file %} "*Scaffolded Assembly*": `114: Concatenate datasets on data 110 and data 109` -> - {% icon param-file %} "*HiC Forward reads*": `3: Hi-C_dataset_F (as fastqsanger)` -> - {% icon param-file %} "*HiC Reverse reads*": `2: Hi-C_dataset_R (as fastqsanger)` -> - {% icon param-file %} "*Estimated genome size - Parameter File*": `50: Estimated Genome size` -> - "*Is genome large (>100Mb)?*": `No` -> 4. Click in the Run workflow buttom -{: .hands_on} +Below is the comparison of the two maps obtained from our data a more profound "real live" example from assembly of zebra finch (*Taeniopygia guttata*) genome: -In order to evaluate the Hi-C hybrid scaffolding, we are going to compare the contact maps before and after running the HiC hybrid scaffolding workflow (fig. 8), corresponding to the datasets `130` and `141` respectively. +
-![Figure 8: Pretext final contact map](../../images/vgp_assembly/hi-c_pretext_final.png "Hi-C maps generated by Pretext using Hi-C data. The red circles indicate the differences between the contact maps generated after (a) and before (b) Hi-C hybrid scaffolding.") +![Pretext final contact map](../../images/vgp_assembly/hi-c_pretext_final.svg "Hi-C maps generated by Pretext before and after scaffolding with Hi-C data. The red circles indicate the differences between the contact maps generated before and after Hi-C hybrid scaffolding. The bottom two panels show results of scaffolding on zebra finch where scaffolding dramatically decreases the number of segments by merging multiple contigs into scaffolds.") +
The regions marked with red circles highlight the most notable difference between the two contact maps, where inversion has been fixed. - # Conclusion -To sum up, it is worthwhile to compare the final assembly with the [_S. cerevisiae_ S288C reference genome](https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/146/045/GCF_000146045.2_R64/GCF_000146045.2_R64_assembly_stats.txt). +To sum up, it is worthwhile to compare the final assembly with the [_S. cerevisiae_ S288C reference genome](https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/146/045/GCF_000146045.2_R64/GCF_000146045.2_R64_assembly_stats.txt): + +
+ +![Quast plot](../../images/vgp_assembly/quast_plot.png "Cumulative continuity plot comparing assembly generated here (red line) with existing yeast reference (black dotted line). Our assembly is slightly smaller (11,287,131 bp versus 12,071,326. Our assembly is lacking the mitochondrial genome (~86 kb) beacuse the initial data does include mitochondrial reads. This is partially responsible for this discrepancy. ") + +
+ +With respect to the total sequence length, we can conclude that the size of our genome assembly is very similar to the reference genome. It is noteworthy that the reference genome consists of 17 sequences, while our assembly includes only 16 chromosomes. This is due to the fact that the reference genome also includes the sequence of the mitochondrial DNA, which consists of 85,779 bp. (The above comparison is performed using {% tool [Quast](toolshed.g2.bx.psu.edu/repos/iuc/quast/quast/5.2.0+galaxy1) %} using Primary assembly generated with scaffolding workflow (WF8) and yeast reference.) + +
+ +![Comparison reference genome](../../images/vgp_assembly/hi-c_pretext_conclusion.svg "Comparison between contact maps generated using the final Primary assembly from this tutorial (left) and the reference genome (right).") -![Figure 9: Final stats](../../images/vgp_assembly/stats_conclusion.png "Comparison between the final assembly generated in this training and the reference genome. Contiguity plot using the reference genome size (a). Assemby statistics (b).") +
-With respect to the total sequence length, we can conclude that the size of our genome assembly is almost identical to the reference genome (fig.9a,b). It is noteworthy that the reference genome consists of 17 sequences, while our assembly includes only 16 chromosomes. This is due to the fact that the reference genome also includes the sequence of the mitochondrial DNA, which consists of 85,779 bp. The remaining statistics exhibit very similar values (fig. 9b). +If we compare the contact map of our assembled genome with the reference assembly (Fig. above), we can see that the two are indistinguishable, suggesting that we have generated a chromosome level genome assembly. -![Figure 10: Comparison reference genome](../../images/vgp_assembly/hi-c_pretext_conclusion.png "Comparison between contact maps generated using the final assembly (a) and the reference genome (b).") -If we compare the contact map of our assembled genome (fig. 10a) with the reference assembly (fig. 10b), we can see that the two are indistinguishable, suggesting that we have generated a chromosome level genome assembly. diff --git a/topics/assembly/tutorials/vgp_workflow_training/workflows/main_workflow.ga b/topics/assembly/tutorials/vgp_workflow_training/workflows/main_workflow.ga deleted file mode 100644 index f3ea5e13ccf94b..00000000000000 --- a/topics/assembly/tutorials/vgp_workflow_training/workflows/main_workflow.ga +++ /dev/null @@ -1,3635 +0,0 @@ -{ - "a_galaxy_workflow": "true", - "annotation": "VGP assembly tutorial", - "format-version": "0.1", - "name": "VGP assembly: training workflow", - "steps": { - "0": { - "annotation": "", - "content_id": null, - "errors": null, - "id": 0, - "input_connections": {}, - "inputs": [ - { - "description": "", - "name": "Hi-C_dataset_F" - } - ], - "label": "Hi-C_dataset_F", - "name": "Input dataset", - "outputs": [], - "position": { - "bottom": 616.6967338793205, - "height": 27.116729736328125, - "left": 207.52415512547347, - "right": 273.5241551254735, - "top": 589.5800041429924, - "width": 66, - "x": 207.52415512547347, - "y": 589.5800041429924 - }, - "tool_id": null, - "tool_state": "{\"optional\": false}", - "tool_version": null, - "type": "data_input", - "uuid": "250984e1-5af4-47aa-a7c9-90e0abe5bd42", - "workflow_outputs": [] - }, - "1": { - "annotation": "", - "content_id": null, - "errors": null, - "id": 1, - "input_connections": {}, - "inputs": [ - { - "description": "", - "name": "Hi-C_dataset_R" - } - ], - "label": "Hi-C_dataset_R", - "name": "Input dataset", - "outputs": [], - "position": { - "bottom": 736.7123505563446, - "height": 27.11669921875, - "left": 207.52415512547347, - "right": 273.5241551254735, - "top": 709.5956513375946, - "width": 66, - "x": 207.52415512547347, - "y": 709.5956513375946 - }, - "tool_id": null, - "tool_state": "{\"optional\": false}", - "tool_version": null, - "type": "data_input", - "uuid": "ee03cfe1-a698-497c-8a08-c73415b6f493", - "workflow_outputs": [] - }, - "2": { - "annotation": "", - "content_id": null, - "errors": null, - "id": 2, - "input_connections": {}, - "inputs": [ - { - "description": "", - "name": "Input Dataset Collection" - } - ], - "label": "Input Dataset Collection", - "name": "Input dataset collection", - "outputs": [], - "position": { - "bottom": 1338.7279903527462, - "height": 27.11669921875, - "left": -1480.569550485322, - "right": -1414.569550485322, - "top": 1311.6112911339962, - "width": 66, - "x": -1480.569550485322, - "y": 1311.6112911339962 - }, - "tool_id": null, - "tool_state": "{\"optional\": false, \"collection_type\": \"list\"}", - "tool_version": null, - "type": "data_collection_input", - "uuid": "5390d8d6-500b-4d79-b2ea-7879f1841c68", - "workflow_outputs": [] - }, - "3": { - "annotation": "", - "content_id": null, - "errors": null, - "id": 3, - "input_connections": {}, - "inputs": [ - { - "description": "", - "name": "Bionano_dataset" - } - ], - "label": "Bionano_dataset", - "name": "Input dataset", - "outputs": [], - "position": { - "bottom": 797.6811079545454, - "height": 27.11669921875, - "left": 1925.4616477272727, - "right": 1991.4616477272727, - "top": 770.5644087357954, - "width": 66, - "x": 1925.4616477272727, - "y": 770.5644087357954 - }, - "tool_id": null, - "tool_state": "{\"optional\": false}", - "tool_version": null, - "type": "data_input", - "uuid": "52f6fc65-1b24-4a75-b9ca-fb7bfdd68ad7", - "workflow_outputs": [] - }, - "4": { - "annotation": "", - "content_id": "toolshed.g2.bx.psu.edu/repos/lparsons/cutadapt/cutadapt/3.5+galaxy2", - "errors": null, - "id": 4, - "input_connections": { - "library|input_1": { - "id": 2, - "output_name": "output" - } - }, - "inputs": [], - "label": null, - "name": "Cutadapt", - "outputs": [ - { - "name": "out1", - "type": "fastqsanger" - } - ], - "position": { - "bottom": 1329.8157681551845, - "height": 44.20452880859375, - "left": -1202.5070652817235, - "right": -1136.5070652817235, - "top": 1285.6112393465908, - "width": 66, - "x": -1202.5070652817235, - "y": 1285.6112393465908 - }, - "post_job_actions": { - "ChangeDatatypeActionout1": { - "action_arguments": { - "newtype": "fasta" - }, - "action_type": "ChangeDatatypeAction", - "output_name": "out1" - } - }, - "tool_id": "toolshed.g2.bx.psu.edu/repos/lparsons/cutadapt/cutadapt/3.5+galaxy2", - "tool_shed_repository": { - "changeset_revision": "48f587c13075", - "name": "cutadapt", - "owner": "lparsons", - "tool_shed": "toolshed.g2.bx.psu.edu" - }, - "tool_state": "{\"adapter_options\": {\"action\": \"trim\", \"internal\": \"\", \"error_rate\": \"0.1\", \"no_indels\": \"false\", \"times\": \"3\", \"overlap\": \"35\", \"match_read_wildcards\": \" \", \"revcomp\": \"true\"}, \"filter_options\": {\"discard_trimmed\": \"true\", \"discard_untrimmed\": \"false\", \"minimum_length\": null, \"maximum_length\": null, \"length_R2_options\": {\"length_R2_status\": \"False\", \"__current_case__\": 1}, \"max_n\": null, \"pair_filter\": \"any\", \"max_expected_errors\": null, \"discard_cassava\": \"false\"}, \"library\": {\"type\": \"single\", \"__current_case__\": 0, \"input_1\": {\"__class__\": \"ConnectedValue\"}, \"r1\": {\"adapters\": [], \"front_adapters\": [], \"anywhere_adapters\": [{\"__index__\": 0, \"anywhere_adapter_source\": {\"anywhere_adapter_source_list\": \"user\", \"__current_case__\": 0, \"anywhere_adapter_name\": \"First adaptor\", \"anywhere_adapter\": \"ATCTCTCTCAACAACAACAACGGAGGAGGAGGAAAAGAGAGAGAT\"}, \"single_noindels\": \"false\"}, {\"__index__\": 1, \"anywhere_adapter_source\": {\"anywhere_adapter_source_list\": \"user\", \"__current_case__\": 0, \"anywhere_adapter_name\": \"Second adaptor\", \"anywhere_adapter\": \"ATCTCTCTCTTTTCCTCCTCCTCCGTTGTTGTTGTTGAGAGAGAT\"}, \"single_noindels\": \"false\"}], \"cut\": \"0\"}}, \"output_selector\": null, \"read_mod_options\": {\"quality_cutoff\": \"0\", \"nextseq_trim\": \"0\", \"trim_n\": \"false\", \"strip_suffix\": \"\", \"shorten_options\": {\"shorten_values\": \"False\", \"__current_case__\": 1}, \"length_tag\": \"\", \"rename\": \"\", \"zero_cap\": \"false\"}, \"__page__\": null, \"__rerun_remap_job_id__\": null}", - "tool_version": "3.5+galaxy2", - "type": "tool", - "uuid": "1ad390d7-995e-4fdc-ae71-2617414b2526", - "workflow_outputs": [ - { - "label": null, - "output_name": "out1", - "uuid": "292c0745-9898-44b7-b70d-62cb79af24ed" - } - ] - }, - "5": { - "annotation": "", - "content_id": "toolshed.g2.bx.psu.edu/repos/iuc/meryl/meryl/1.3+galaxy4", - "errors": null, - "id": 5, - "input_connections": { - "operation_type|input_reads": { - "id": 4, - "output_name": "out1" - } - }, - "inputs": [], - "label": null, - "name": "Meryl", - "outputs": [ - { - "name": "read_db", - "type": "meryldb" - } - ], - "position": { - "bottom": 1333.0712825890744, - "height": 37.475616455078125, - "left": -924.5538884943181, - "right": -858.5538884943181, - "top": 1295.5956661339962, - "width": 66, - "x": -924.5538884943181, - "y": 1295.5956661339962 - }, - "post_job_actions": {}, - "tool_id": "toolshed.g2.bx.psu.edu/repos/iuc/meryl/meryl/1.3+galaxy4", - "tool_shed_repository": { - "changeset_revision": "eadfd71dde37", - "name": "meryl", - "owner": "iuc", - "tool_shed": "toolshed.g2.bx.psu.edu" - }, - "tool_state": "{\"__input_ext\": \"input\", \"chromInfo\": \"/opt/galaxy/tool-data/shared/ucsc/chrom/?.len\", \"input_reads|__identifier__\": \"SRR13577846_1\", \"operation_type\": {\"command_type\": \"count-kmers\", \"__current_case__\": 0, \"count_operations\": \"count\", \"input_reads\": {\"__class__\": \"ConnectedValue\"}, \"options_kmer_size\": {\"kmer_size\": \"provide\", \"__current_case__\": 0, \"input_kmer_size\": \"21\"}}, \"__page__\": null, \"__rerun_remap_job_id__\": null}", - "tool_version": "1.3+galaxy4", - "type": "tool", - "uuid": "09524a39-45c1-4093-bc6b-ccdbbbc91e79", - "workflow_outputs": [ - { - "label": null, - "output_name": "read_db", - "uuid": "fdd64d8f-7be3-4664-886f-e168243d8465" - } - ] - }, - "6": { - "annotation": "", - "content_id": "toolshed.g2.bx.psu.edu/repos/nml/collapse_collections/collapse_dataset/5.1.0", - "errors": null, - "id": 6, - "input_connections": { - "input_list": { - "id": 4, - "output_name": "out1" - } - }, - "inputs": [], - "label": null, - "name": "Collapse Collection", - "outputs": [ - { - "name": "output", - "type": "input" - } - ], - "position": { - "bottom": 2374.815879128196, - "height": 44.20452880859375, - "left": 2223.5240589488635, - "right": 2289.5240589488635, - "top": 2330.611350319602, - "width": 66, - "x": 2223.5240589488635, - "y": 2330.611350319602 - }, - "post_job_actions": {}, - "tool_id": "toolshed.g2.bx.psu.edu/repos/nml/collapse_collections/collapse_dataset/5.1.0", - "tool_shed_repository": { - "changeset_revision": "90981f86000f", - "name": "collapse_collections", - "owner": "nml", - "tool_shed": "toolshed.g2.bx.psu.edu" - }, - "tool_state": "{\"__input_ext\": \"fastqsanger.gz\", \"chromInfo\": \"/opt/galaxy/tool-data/shared/ucsc/chrom/?.len\", \"filename\": {\"add_name\": \"false\", \"__current_case__\": 1}, \"input_list\": {\"__class__\": \"ConnectedValue\"}, \"one_header\": \"false\", \"__page__\": null, \"__rerun_remap_job_id__\": null}", - "tool_version": "5.1.0", - "type": "tool", - "uuid": "a7dcf565-464e-4ff8-a3ee-d1abe6a34134", - "workflow_outputs": [ - { - "label": null, - "output_name": "output", - "uuid": "7773735c-7a9e-4e73-a520-3b0094378110" - } - ] - }, - "7": { - "annotation": "", - "content_id": "toolshed.g2.bx.psu.edu/repos/iuc/meryl/meryl/1.3+galaxy4", - "errors": null, - "id": 7, - "input_connections": { - "operation_type|input_meryldb_02": { - "id": 5, - "output_name": "read_db" - } - }, - "inputs": [], - "label": null, - "name": "Meryl", - "outputs": [ - { - "name": "read_db", - "type": "meryldb" - } - ], - "position": { - "bottom": 1333.0712825890744, - "height": 37.475616455078125, - "left": -646.5852679628314, - "right": -580.5852679628314, - "top": 1295.5956661339962, - "width": 66, - "x": -646.5852679628314, - "y": 1295.5956661339962 - }, - "post_job_actions": {}, - "tool_id": "toolshed.g2.bx.psu.edu/repos/iuc/meryl/meryl/1.3+galaxy4", - "tool_shed_repository": { - "changeset_revision": "eadfd71dde37", - "name": "meryl", - "owner": "iuc", - "tool_shed": "toolshed.g2.bx.psu.edu" - }, - "tool_state": "{\"__input_ext\": \"input\", \"chromInfo\": \"/opt/galaxy/tool-data/shared/ucsc/chrom/?.len\", \"operation_type\": {\"command_type\": \"groups-kmers\", \"__current_case__\": 3, \"groups_operations\": \"union-sum\", \"input_meryldb_02\": {\"__class__\": \"ConnectedValue\"}}, \"__page__\": null, \"__rerun_remap_job_id__\": null}", - "tool_version": "1.3+galaxy4", - "type": "tool", - "uuid": "e10610dd-4ca0-4490-9d49-0484701d711e", - "workflow_outputs": [ - { - "label": null, - "output_name": "read_db", - "uuid": "4852beca-d492-486d-a87b-962f69920ad6" - } - ] - }, - "8": { - "annotation": "", - "content_id": "toolshed.g2.bx.psu.edu/repos/iuc/meryl/meryl/1.3+galaxy4", - "errors": null, - "id": 8, - "input_connections": { - "operation_type|input_meryldb_02": { - "id": 7, - "output_name": "read_db" - } - }, - "inputs": [], - "label": null, - "name": "Meryl", - "outputs": [ - { - "name": "read_db_hist", - "type": "tabular" - } - ], - "position": { - "bottom": 2249.0869381066523, - "height": 37.47564697265625, - "left": -358.4758411754261, - "right": -292.4758411754261, - "top": 2211.611291133996, - "width": 66, - "x": -358.4758411754261, - "y": 2211.611291133996 - }, - "post_job_actions": {}, - "tool_id": "toolshed.g2.bx.psu.edu/repos/iuc/meryl/meryl/1.3+galaxy4", - "tool_shed_repository": { - "changeset_revision": "eadfd71dde37", - "name": "meryl", - "owner": "iuc", - "tool_shed": "toolshed.g2.bx.psu.edu" - }, - "tool_state": "{\"__input_ext\": \"input\", \"chromInfo\": \"/opt/galaxy/tool-data/shared/ucsc/chrom/?.len\", \"operation_type\": {\"command_type\": \"histogram-kmers\", \"__current_case__\": 4, \"input_meryldb_02\": {\"__class__\": \"ConnectedValue\"}}, \"__page__\": null, \"__rerun_remap_job_id__\": null}", - "tool_version": "1.3+galaxy4", - "type": "tool", - "uuid": "86a0c60e-6ed9-438d-955b-4a13d34fa69b", - "workflow_outputs": [ - { - "label": null, - "output_name": "read_db_hist", - "uuid": "e4798cd6-a847-492c-9efd-1f5d227bf1d8" - } - ] - }, - "9": { - "annotation": "", - "content_id": "toolshed.g2.bx.psu.edu/repos/iuc/genomescope/genomescope/2.0+galaxy1", - "errors": null, - "id": 9, - "input_connections": { - "input": { - "id": 8, - "output_name": "read_db_hist" - } - }, - "inputs": [], - "label": null, - "name": "GenomeScope", - "outputs": [ - { - "name": "linear_plot", - "type": "png" - }, - { - "name": "log_plot", - "type": "png" - }, - { - "name": "transformed_linear_plot", - "type": "png" - }, - { - "name": "transformed_log_plot", - "type": "png" - }, - { - "name": "summary", - "type": "txt" - }, - { - "name": "model_params", - "type": "tabular" - } - ], - "position": { - "bottom": 2347.791522401752, - "height": 148.1802978515625, - "left": -80.56964296283144, - "right": -14.569627704042375, - "top": 2199.6112245501895, - "width": 66.00001525878906, - "x": -80.56964296283144, - "y": 2199.6112245501895 - }, - "post_job_actions": {}, - "tool_id": "toolshed.g2.bx.psu.edu/repos/iuc/genomescope/genomescope/2.0+galaxy1", - "tool_shed_repository": { - "changeset_revision": "3169a38c2656", - "name": "genomescope", - "owner": "iuc", - "tool_shed": "toolshed.g2.bx.psu.edu" - }, - "tool_state": "{\"__input_ext\": \"input\", \"advanced_options\": {\"topology\": null, \"initial_repetitiveness\": null, \"initial_heterozygosities\": \"\", \"transform_exp\": null, \"testing\": \"true\", \"true_params\": \"\", \"trace_flag\": \"false\", \"num_rounds\": null}, \"chromInfo\": \"/opt/galaxy/tool-data/shared/ucsc/chrom/?.len\", \"input\": {\"__class__\": \"ConnectedValue\"}, \"kmer_length\": \"21\", \"lambda\": null, \"max_kmercov\": null, \"output_options\": {\"output_files\": [\"summary_output\"], \"no_unique_sequence\": \"false\"}, \"ploidy\": null, \"__page__\": null, \"__rerun_remap_job_id__\": null}", - "tool_version": "2.0+galaxy1", - "type": "tool", - "uuid": "33b89b76-9661-4a04-bcd1-08845533e502", - "workflow_outputs": [ - { - "label": null, - "output_name": "transformed_log_plot", - "uuid": "99f43026-046e-49c9-8e57-4a6b13c5b7e3" - }, - { - "label": null, - "output_name": "transformed_linear_plot", - "uuid": "96f38618-c254-4cfa-b5c6-281237924d04" - }, - { - "label": null, - "output_name": "linear_plot", - "uuid": "190de9bf-15f9-4336-8172-a568bf7a8067" - }, - { - "label": null, - "output_name": "summary", - "uuid": "156d21bb-bdf4-446f-92c0-c899cacc3241" - }, - { - "label": null, - "output_name": "log_plot", - "uuid": "ab338100-cce9-4bf0-a32a-eb3a2f12c1cf" - }, - { - "label": null, - "output_name": "model_params", - "uuid": "e4eee33e-3be0-4b6d-83cb-4e2e83be8b13" - } - ] - }, - "10": { - "annotation": "", - "content_id": "toolshed.g2.bx.psu.edu/repos/devteam/column_maker/Add_a_column1/1.6", - "errors": null, - "id": 10, - "input_connections": { - "input": { - "id": 9, - "output_name": "model_params" - } - }, - "inputs": [], - "label": null, - "name": "Compute", - "outputs": [ - { - "name": "out_file1", - "type": "input" - } - ], - "position": { - "bottom": 2303.3423498905067, - "height": 30.74676513671875, - "left": 207.52415512547347, - "right": 273.5241551254735, - "top": 2272.595584753788, - "width": 66, - "x": 207.52415512547347, - "y": 2272.595584753788 - }, - "post_job_actions": {}, - "tool_id": "toolshed.g2.bx.psu.edu/repos/devteam/column_maker/Add_a_column1/1.6", - "tool_shed_repository": { - "changeset_revision": "02026300aa45", - "name": "column_maker", - "owner": "devteam", - "tool_shed": "toolshed.g2.bx.psu.edu" - }, - "tool_state": "{\"__input_ext\": \"tabular\", \"avoid_scientific_notation\": \"false\", \"chromInfo\": \"/opt/galaxy/tool-data/shared/ucsc/chrom/?.len\", \"cond\": \"1.5*c3\", \"header_lines_conditional\": {\"header_lines_select\": \"no\", \"__current_case__\": 0}, \"input\": {\"__class__\": \"ConnectedValue\"}, \"round\": \"true\", \"__page__\": null, \"__rerun_remap_job_id__\": null}", - "tool_version": "1.6", - "type": "tool", - "uuid": "6d662c83-b746-4994-83a3-febfcc350851", - "workflow_outputs": [ - { - "label": null, - "output_name": "out_file1", - "uuid": "a9625c03-b443-41de-bd2f-c3ae4f6d090c" - } - ] - }, - "11": { - "annotation": "", - "content_id": "toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/tp_find_and_replace/1.1.3", - "errors": null, - "id": 11, - "input_connections": { - "infile": { - "id": 9, - "output_name": "summary" - } - }, - "inputs": [], - "label": null, - "name": "Replace", - "outputs": [ - { - "name": "outfile", - "type": "input" - } - ], - "position": { - "bottom": 2485.3580026337595, - "height": 30.7467041015625, - "left": 207.52415512547347, - "right": 273.5241551254735, - "top": 2454.611298532197, - "width": 66, - "x": 207.52415512547347, - "y": 2454.611298532197 - }, - "post_job_actions": {}, - "tool_id": "toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/tp_find_and_replace/1.1.3", - "tool_shed_repository": { - "changeset_revision": "ddf54b12c295", - "name": "text_processing", - "owner": "bgruening", - "tool_shed": "toolshed.g2.bx.psu.edu" - }, - "tool_state": "{\"__input_ext\": \"txt\", \"caseinsensitive\": \"false\", \"chromInfo\": \"/opt/galaxy/tool-data/shared/ucsc/chrom/?.len\", \"find_pattern\": \"bp\", \"global\": \"true\", \"infile\": {\"__class__\": \"ConnectedValue\"}, \"is_regex\": \"false\", \"replace_pattern\": \"\", \"searchwhere\": {\"searchwhere_select\": \"line\", \"__current_case__\": 0}, \"skip_first_line\": \"false\", \"wholewords\": \"false\", \"__page__\": null, \"__rerun_remap_job_id__\": null}", - "tool_version": "1.1.3", - "type": "tool", - "uuid": "fb143b52-3c57-4ee9-813a-85d1b5c89f57", - "workflow_outputs": [ - { - "label": null, - "output_name": "outfile", - "uuid": "37e10dce-1aa5-46c6-9d31-1a64db93360d" - } - ] - }, - "12": { - "annotation": "", - "content_id": "toolshed.g2.bx.psu.edu/repos/devteam/column_maker/Add_a_column1/1.6", - "errors": null, - "id": 12, - "input_connections": { - "input": { - "id": 10, - "output_name": "out_file1" - } - }, - "inputs": [], - "label": null, - "name": "Compute", - "outputs": [ - { - "name": "out_file1", - "type": "input" - } - ], - "position": { - "bottom": 2303.3423498905067, - "height": 30.74676513671875, - "left": 495.50855232007575, - "right": 561.5085370612867, - "top": 2272.595584753788, - "width": 65.99998474121094, - "x": 495.50855232007575, - "y": 2272.595584753788 - }, - "post_job_actions": {}, - "tool_id": "toolshed.g2.bx.psu.edu/repos/devteam/column_maker/Add_a_column1/1.6", - "tool_shed_repository": { - "changeset_revision": "02026300aa45", - "name": "column_maker", - "owner": "devteam", - "tool_shed": "toolshed.g2.bx.psu.edu" - }, - "tool_state": "{\"__input_ext\": \"tabular\", \"avoid_scientific_notation\": \"false\", \"chromInfo\": \"/opt/galaxy/tool-data/shared/ucsc/chrom/?.len\", \"cond\": \"3*c7\", \"header_lines_conditional\": {\"header_lines_select\": \"no\", \"__current_case__\": 0}, \"input\": {\"__class__\": \"ConnectedValue\"}, \"round\": \"true\", \"__page__\": null, \"__rerun_remap_job_id__\": null}", - "tool_version": "1.6", - "type": "tool", - "uuid": "189d73fa-00cf-4c7a-bd50-eb28bfedbd42", - "workflow_outputs": [ - { - "label": null, - "output_name": "out_file1", - "uuid": "b3e8887f-e032-42a6-8c38-dc83dd3d73c9" - } - ] - }, - "13": { - "annotation": "", - "content_id": "toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/tp_find_and_replace/1.1.3", - "errors": null, - "id": 13, - "input_connections": { - "infile": { - "id": 11, - "output_name": "outfile" - } - }, - "inputs": [], - "label": null, - "name": "Replace", - "outputs": [ - { - "name": "outfile", - "type": "input" - } - ], - "position": { - "bottom": 2485.3580026337595, - "height": 30.7467041015625, - "left": 495.50855232007575, - "right": 561.5085370612867, - "top": 2454.611298532197, - "width": 65.99998474121094, - "x": 495.50855232007575, - "y": 2454.611298532197 - }, - "post_job_actions": {}, - "tool_id": "toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/tp_find_and_replace/1.1.3", - "tool_shed_repository": { - "changeset_revision": "ddf54b12c295", - "name": "text_processing", - "owner": "bgruening", - "tool_shed": "toolshed.g2.bx.psu.edu" - }, - "tool_state": "{\"__input_ext\": \"txt\", \"caseinsensitive\": \"false\", \"chromInfo\": \"/opt/galaxy/tool-data/shared/ucsc/chrom/?.len\", \"find_pattern\": \",\", \"global\": \"true\", \"infile\": {\"__class__\": \"ConnectedValue\"}, \"is_regex\": \"false\", \"replace_pattern\": \"\", \"searchwhere\": {\"searchwhere_select\": \"line\", \"__current_case__\": 0}, \"skip_first_line\": \"false\", \"wholewords\": \"false\", \"__page__\": null, \"__rerun_remap_job_id__\": null}", - "tool_version": "1.1.3", - "type": "tool", - "uuid": "ca15ccc7-2122-4972-b510-e0f0cd46d6a3", - "workflow_outputs": [ - { - "label": null, - "output_name": "outfile", - "uuid": "9863a2fa-1b28-4703-93f8-2d552b544334" - } - ] - }, - "14": { - "annotation": "", - "content_id": "toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/tp_cut_tool/1.1.0", - "errors": null, - "id": 14, - "input_connections": { - "input": { - "id": 12, - "output_name": "out_file1" - } - }, - "inputs": [ - { - "description": "runtime parameter for tool Advanced Cut", - "name": "input" - } - ], - "label": null, - "name": "Advanced Cut", - "outputs": [ - { - "name": "output", - "type": "tabular" - } - ], - "position": { - "bottom": 2118.0712761156487, - "height": 37.47564697265625, - "left": 773.5085227272726, - "right": 839.5085227272726, - "top": 2080.5956291429925, - "width": 66, - "x": 773.5085227272726, - "y": 2080.5956291429925 - }, - "post_job_actions": {}, - "tool_id": "toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/tp_cut_tool/1.1.0", - "tool_shed_repository": { - "changeset_revision": "ddf54b12c295", - "name": "text_processing", - "owner": "bgruening", - "tool_shed": "toolshed.g2.bx.psu.edu" - }, - "tool_state": "{\"complement\": \"\", \"cut_type_options\": {\"cut_element\": \"-f\", \"__current_case__\": 0, \"list\": \"7\\n\"}, \"delimiter\": \"\", \"input\": {\"__class__\": \"RuntimeValue\"}, \"__page__\": null, \"__rerun_remap_job_id__\": null}", - "tool_version": "1.1.0", - "type": "tool", - "uuid": "37255296-b78f-488b-99be-abe6bc9afb50", - "workflow_outputs": [ - { - "label": null, - "output_name": "output", - "uuid": "08cd3c5d-525d-42a9-97b7-e0867ff7cf00" - } - ] - }, - "15": { - "annotation": "", - "content_id": "toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/tp_cut_tool/1.1.0", - "errors": null, - "id": 15, - "input_connections": { - "input": { - "id": 12, - "output_name": "out_file1" - } - }, - "inputs": [ - { - "description": "runtime parameter for tool Advanced Cut", - "name": "input" - } - ], - "label": null, - "name": "Advanced Cut", - "outputs": [ - { - "name": "output", - "type": "tabular" - } - ], - "position": { - "bottom": 2300.0869288589015, - "height": 37.4755859375, - "left": 773.5085227272726, - "right": 839.5085227272726, - "top": 2262.6113429214015, - "width": 66, - "x": 773.5085227272726, - "y": 2262.6113429214015 - }, - "post_job_actions": {}, - "tool_id": "toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/tp_cut_tool/1.1.0", - "tool_shed_repository": { - "changeset_revision": "ddf54b12c295", - "name": "text_processing", - "owner": "bgruening", - "tool_shed": "toolshed.g2.bx.psu.edu" - }, - "tool_state": "{\"complement\": \"\", \"cut_type_options\": {\"cut_element\": \"-f\", \"__current_case__\": 0, \"list\": \"8\\n\"}, \"delimiter\": \"\", \"input\": {\"__class__\": \"RuntimeValue\"}, \"__page__\": null, \"__rerun_remap_job_id__\": null}", - "tool_version": "1.1.0", - "type": "tool", - "uuid": "a065939e-7d41-45fa-bda8-2ae34350ae84", - "workflow_outputs": [ - { - "label": null, - "output_name": "output", - "uuid": "40044ea0-8d91-41e4-8f64-e50e73f07591" - } - ] - }, - "16": { - "annotation": "", - "content_id": "toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/tp_grep_tool/1.1.1", - "errors": null, - "id": 16, - "input_connections": { - "infile": { - "id": 13, - "output_name": "outfile" - } - }, - "inputs": [], - "label": null, - "name": "Search in textfiles", - "outputs": [ - { - "name": "output", - "type": "input" - } - ], - "position": { - "bottom": 2482.0869214607005, - "height": 37.4755859375, - "left": 773.5085227272726, - "right": 839.5085227272726, - "top": 2444.6113355232005, - "width": 66, - "x": 773.5085227272726, - "y": 2444.6113355232005 - }, - "post_job_actions": {}, - "tool_id": "toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/tp_grep_tool/1.1.1", - "tool_shed_repository": { - "changeset_revision": "ddf54b12c295", - "name": "text_processing", - "owner": "bgruening", - "tool_shed": "toolshed.g2.bx.psu.edu" - }, - "tool_state": "{\"__input_ext\": \"txt\", \"case_sensitive\": \"-i\", \"chromInfo\": \"/opt/galaxy/tool-data/shared/ucsc/chrom/?.len\", \"color\": \"NOCOLOR\", \"infile\": {\"__class__\": \"ConnectedValue\"}, \"invert\": \"\", \"lines_after\": \"0\", \"lines_before\": \"0\", \"regex_type\": \"-G\", \"url_paste\": \"Haploid\", \"__page__\": null, \"__rerun_remap_job_id__\": null}", - "tool_version": "1.1.1", - "type": "tool", - "uuid": "fbdba375-7b98-4ba7-a039-b8f2dd3f53e5", - "workflow_outputs": [ - { - "label": null, - "output_name": "output", - "uuid": "22d3d363-fc6b-4a30-be17-c9646f2ba411" - } - ] - }, - "17": { - "annotation": "", - "content_id": "param_value_from_file", - "errors": null, - "id": 17, - "input_connections": { - "input1": { - "id": 14, - "output_name": "output" - } - }, - "inputs": [ - { - "description": "runtime parameter for tool Parse parameter value", - "name": "input1" - } - ], - "label": "Transition parameter", - "name": "Parse parameter value", - "outputs": [ - { - "name": "integer_param", - "type": "expression.json" - } - ], - "position": { - "bottom": 2111.5291137695312, - "height": 50.93341064453125, - "left": 1081.5241033380682, - "right": 1147.5241033380682, - "top": 2060.595703125, - "width": 66, - "x": 1081.5241033380682, - "y": 2060.595703125 - }, - "post_job_actions": { - "HideDatasetActioninteger_param": { - "action_arguments": {}, - "action_type": "HideDatasetAction", - "output_name": "integer_param" - } - }, - "tool_id": "param_value_from_file", - "tool_state": "{\"input1\": {\"__class__\": \"RuntimeValue\"}, \"param_type\": \"integer\", \"remove_newlines\": \"true\", \"__page__\": null, \"__rerun_remap_job_id__\": null}", - "tool_version": "0.1.0", - "type": "tool", - "uuid": "e525f6e5-6adc-4fdb-bb36-ab67f099d397", - "workflow_outputs": [] - }, - "18": { - "annotation": "", - "content_id": "param_value_from_file", - "errors": null, - "id": 18, - "input_connections": { - "input1": { - "id": 15, - "output_name": "output" - } - }, - "inputs": [ - { - "description": "runtime parameter for tool Parse parameter value", - "name": "input1" - } - ], - "label": "Upper bound", - "name": "Parse parameter value", - "outputs": [ - { - "name": "integer_param", - "type": "expression.json" - } - ], - "position": { - "bottom": 2296.81572376598, - "height": 44.20452880859375, - "left": 1081.5241033380682, - "right": 1147.5241033380682, - "top": 2252.611194957386, - "width": 66, - "x": 1081.5241033380682, - "y": 2252.611194957386 - }, - "post_job_actions": { - "HideDatasetActioninteger_param": { - "action_arguments": {}, - "action_type": "HideDatasetAction", - "output_name": "integer_param" - } - }, - "tool_id": "param_value_from_file", - "tool_state": "{\"input1\": {\"__class__\": \"RuntimeValue\"}, \"param_type\": \"integer\", \"remove_newlines\": \"true\", \"__page__\": null, \"__rerun_remap_job_id__\": null}", - "tool_version": "0.1.0", - "type": "tool", - "uuid": "538f917a-0632-40ec-a416-0a9e77c44993", - "workflow_outputs": [] - }, - "19": { - "annotation": "", - "content_id": "Convert characters1", - "errors": null, - "id": 19, - "input_connections": { - "input": { - "id": 16, - "output_name": "output" - } - }, - "inputs": [], - "label": null, - "name": "Convert", - "outputs": [ - { - "name": "out_file1", - "type": "tabular" - } - ], - "position": { - "bottom": 2485.3580026337595, - "height": 30.7467041015625, - "left": 1081.5241033380682, - "right": 1147.5241033380682, - "top": 2454.611298532197, - "width": 66, - "x": 1081.5241033380682, - "y": 2454.611298532197 - }, - "post_job_actions": {}, - "tool_id": "Convert characters1", - "tool_state": "{\"__input_ext\": \"txt\", \"chromInfo\": \"/opt/galaxy/tool-data/shared/ucsc/chrom/?.len\", \"condense\": \"true\", \"convert_from\": \"s\", \"input\": {\"__class__\": \"ConnectedValue\"}, \"strip\": \"true\", \"__page__\": null, \"__rerun_remap_job_id__\": null}", - "tool_version": "1.0.0", - "type": "tool", - "uuid": "fd6f966f-feff-4e88-b650-c9f693e511bc", - "workflow_outputs": [ - { - "label": null, - "output_name": "out_file1", - "uuid": "050ecbee-75e4-49d5-b4e5-fb942239e85b" - } - ] - }, - "20": { - "annotation": "", - "content_id": "toolshed.g2.bx.psu.edu/repos/bgruening/hifiasm/hifiasm/0.16.1+galaxy2", - "errors": null, - "id": 20, - "input_connections": { - "hic_partition|h1": { - "id": 0, - "output_name": "output" - }, - "hic_partition|h2": { - "id": 1, - "output_name": "output" - }, - "mode|reads": { - "id": 4, - "output_name": "out1" - }, - "purge_options|purge_max": { - "id": 18, - "output_name": "integer_param" - } - }, - "inputs": [ - { - "description": "runtime parameter for tool Hifiasm", - "name": "hic_partition" - }, - { - "description": "runtime parameter for tool Hifiasm", - "name": "hic_partition" - }, - { - "description": "runtime parameter for tool Hifiasm", - "name": "mode" - } - ], - "label": null, - "name": "Hifiasm", - "outputs": [ - { - "name": "hic_pcontig_graph", - "type": "gfa1" - }, - { - "name": "hic_acontig_graph", - "type": "gfa1" - }, - { - "name": "hic_balanced_contig_hap1_graph", - "type": "gfa1" - }, - { - "name": "hic_balanced_contig_hap2_graph", - "type": "gfa1" - } - ], - "position": { - "bottom": 1146.3627014160156, - "height": 144.75137329101562, - "left": 495.50855232007575, - "right": 561.5085370612867, - "top": 1001.611328125, - "width": 65.99998474121094, - "x": 495.50855232007575, - "y": 1001.611328125 - }, - "post_job_actions": { - "HideDatasetActionhic_acontig_graph": { - "action_arguments": {}, - "action_type": "HideDatasetAction", - "output_name": "hic_acontig_graph" - }, - "HideDatasetActionhic_pcontig_graph": { - "action_arguments": {}, - "action_type": "HideDatasetAction", - "output_name": "hic_pcontig_graph" - } - }, - "tool_id": "toolshed.g2.bx.psu.edu/repos/bgruening/hifiasm/hifiasm/0.16.1+galaxy2", - "tool_shed_repository": { - "changeset_revision": "5bec28269d95", - "name": "hifiasm", - "owner": "bgruening", - "tool_shed": "toolshed.g2.bx.psu.edu" - }, - "tool_state": "{\"advanced_options\": {\"advanced_selector\": \"blank\", \"__current_case__\": 0}, \"assembly_options\": {\"assembly_selector\": \"blank\", \"__current_case__\": 0}, \"filter_bits\": \"37\", \"hic_partition\": {\"hic_partition_selector\": \"set\", \"__current_case__\": 1, \"h1\": {\"__class__\": \"RuntimeValue\"}, \"h2\": {\"__class__\": \"RuntimeValue\"}, \"seed\": null, \"n_weight\": null, \"n_perturb\": null, \"f_perturb\": null, \"l_msjoin\": \"500000\"}, \"log_out\": \"false\", \"mode\": {\"mode_selector\": \"standard\", \"__current_case__\": 0, \"reads\": {\"__class__\": \"RuntimeValue\"}}, \"purge_options\": {\"purge_selector\": \"set\", \"__current_case__\": 1, \"purge_level\": \"0\", \"similarity_threshold\": \"0.75\", \"minimum_overlap\": \"1\", \"purge_max\": {\"__class__\": \"ConnectedValue\"}, \"n_hap\": null}, \"__page__\": null, \"__rerun_remap_job_id__\": null}", - "tool_version": "0.16.1+galaxy2", - "type": "tool", - "uuid": "981d04da-27c4-471b-bbff-915625462bbe", - "workflow_outputs": [ - { - "label": null, - "output_name": "hic_balanced_contig_hap2_graph", - "uuid": "a93d61d5-bdfe-48fc-a027-7a2ddf306f3c" - }, - { - "label": null, - "output_name": "hic_balanced_contig_hap1_graph", - "uuid": "f0c2c6d9-002b-425f-b506-e5087ab75a18" - } - ] - }, - "21": { - "annotation": "", - "content_id": "toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/tp_cut_tool/1.1.0", - "errors": null, - "id": 21, - "input_connections": { - "input": { - "id": 19, - "output_name": "out_file1" - } - }, - "inputs": [ - { - "description": "runtime parameter for tool Advanced Cut", - "name": "input" - } - ], - "label": null, - "name": "Advanced Cut", - "outputs": [ - { - "name": "output", - "type": "tabular" - } - ], - "position": { - "bottom": 2482.0869214607005, - "height": 37.4755859375, - "left": 1359.5085375236742, - "right": 1425.5085375236742, - "top": 2444.6113355232005, - "width": 66, - "x": 1359.5085375236742, - "y": 2444.6113355232005 - }, - "post_job_actions": {}, - "tool_id": "toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/tp_cut_tool/1.1.0", - "tool_shed_repository": { - "changeset_revision": "ddf54b12c295", - "name": "text_processing", - "owner": "bgruening", - "tool_shed": "toolshed.g2.bx.psu.edu" - }, - "tool_state": "{\"complement\": \"\", \"cut_type_options\": {\"cut_element\": \"-f\", \"__current_case__\": 0, \"list\": \"5\\n\"}, \"delimiter\": \"\", \"input\": {\"__class__\": \"RuntimeValue\"}, \"__page__\": null, \"__rerun_remap_job_id__\": null}", - "tool_version": "1.1.0", - "type": "tool", - "uuid": "c6acbefe-f7ab-426b-84d7-7adf3ea71970", - "workflow_outputs": [ - { - "label": null, - "output_name": "output", - "uuid": "936cb6dc-3b06-4edb-93af-f46891c97777" - } - ] - }, - "22": { - "annotation": "", - "content_id": "toolshed.g2.bx.psu.edu/repos/iuc/gfa_to_fa/gfa_to_fa/0.1.2", - "errors": null, - "id": 22, - "input_connections": { - "in_gfa": { - "id": 20, - "output_name": "hic_balanced_contig_hap2_graph" - } - }, - "inputs": [], - "label": null, - "name": "GFA to FASTA", - "outputs": [ - { - "name": "out_fa", - "type": "fasta" - } - ], - "position": { - "bottom": 1075.8002097389915, - "height": 44.20452880859375, - "left": 773.5085227272726, - "right": 839.5085227272726, - "top": 1031.5956809303977, - "width": 66, - "x": 773.5085227272726, - "y": 1031.5956809303977 - }, - "post_job_actions": {}, - "tool_id": "toolshed.g2.bx.psu.edu/repos/iuc/gfa_to_fa/gfa_to_fa/0.1.2", - "tool_shed_repository": { - "changeset_revision": "e33c82b63727", - "name": "gfa_to_fa", - "owner": "iuc", - "tool_shed": "toolshed.g2.bx.psu.edu" - }, - "tool_state": "{\"in_gfa\": {\"__class__\": \"ConnectedValue\"}, \"__page__\": null, \"__rerun_remap_job_id__\": null}", - "tool_version": "0.1.2", - "type": "tool", - "uuid": "43474147-7c56-4609-8ced-529b6972623e", - "workflow_outputs": [ - { - "label": null, - "output_name": "out_fa", - "uuid": "07a9bbaf-73a2-4f6e-b4b6-bcf790f9961c" - } - ] - }, - "23": { - "annotation": "", - "content_id": "toolshed.g2.bx.psu.edu/repos/iuc/gfa_to_fa/gfa_to_fa/0.1.2", - "errors": null, - "id": 23, - "input_connections": { - "in_gfa": { - "id": 20, - "output_name": "hic_balanced_contig_hap1_graph" - } - }, - "inputs": [], - "label": null, - "name": "GFA to FASTA", - "outputs": [ - { - "name": "out_fa", - "type": "fasta" - } - ], - "position": { - "bottom": 1319.8158051461885, - "height": 44.20452880859375, - "left": 773.5085227272726, - "right": 839.5085227272726, - "top": 1275.6112763375947, - "width": 66, - "x": 773.5085227272726, - "y": 1275.6112763375947 - }, - "post_job_actions": {}, - "tool_id": "toolshed.g2.bx.psu.edu/repos/iuc/gfa_to_fa/gfa_to_fa/0.1.2", - "tool_shed_repository": { - "changeset_revision": "e33c82b63727", - "name": "gfa_to_fa", - "owner": "iuc", - "tool_shed": "toolshed.g2.bx.psu.edu" - }, - "tool_state": "{\"in_gfa\": {\"__class__\": \"ConnectedValue\"}, \"__page__\": null, \"__rerun_remap_job_id__\": null}", - "tool_version": "0.1.2", - "type": "tool", - "uuid": "5910ad13-7012-4c38-a5a8-5709a3c15994", - "workflow_outputs": [ - { - "label": null, - "output_name": "out_fa", - "uuid": "b0550d57-f348-4057-ab6c-ffce7ecd2437" - } - ] - }, - "24": { - "annotation": "", - "content_id": "param_value_from_file", - "errors": null, - "id": 24, - "input_connections": { - "input1": { - "id": 21, - "output_name": "output" - } - }, - "inputs": [ - { - "description": "runtime parameter for tool Parse parameter value", - "name": "input1" - } - ], - "label": "Estimated genome size", - "name": "Parse parameter value", - "outputs": [ - { - "name": "integer_param", - "type": "expression.json" - } - ], - "position": { - "bottom": 2475.544696229877, - "height": 50.9334716796875, - "left": 1637.5085079308712, - "right": 1703.5085079308712, - "top": 2424.6112245501895, - "width": 66, - "x": 1637.5085079308712, - "y": 2424.6112245501895 - }, - "post_job_actions": { - "HideDatasetActioninteger_param": { - "action_arguments": {}, - "action_type": "HideDatasetAction", - "output_name": "integer_param" - } - }, - "tool_id": "param_value_from_file", - "tool_state": "{\"input1\": {\"__class__\": \"RuntimeValue\"}, \"param_type\": \"integer\", \"remove_newlines\": \"true\", \"__page__\": null, \"__rerun_remap_job_id__\": null}", - "tool_version": "0.1.0", - "type": "tool", - "uuid": "bb1d6a46-6032-4dd6-8a3d-69ede06047ce", - "workflow_outputs": [] - }, - "25": { - "annotation": "", - "content_id": "toolshed.g2.bx.psu.edu/repos/iuc/busco/busco/5.2.2+galaxy2", - "errors": null, - "id": 25, - "input_connections": { - "input": { - "id": 22, - "output_name": "out_fa" - } - }, - "inputs": [], - "label": null, - "name": "Busco", - "outputs": [ - { - "name": "busco_sum", - "type": "txt" - }, - { - "name": "busco_table", - "type": "tabular" - } - ], - "position": { - "bottom": 1186.2869086988044, - "height": 67.69125366210938, - "left": 1081.5241033380682, - "right": 1147.5241033380682, - "top": 1118.595655036695, - "width": 66, - "x": 1081.5241033380682, - "y": 1118.595655036695 - }, - "post_job_actions": {}, - "tool_id": "toolshed.g2.bx.psu.edu/repos/iuc/busco/busco/5.2.2+galaxy2", - "tool_shed_repository": { - "changeset_revision": "46ae58b1d792", - "name": "busco", - "owner": "iuc", - "tool_shed": "toolshed.g2.bx.psu.edu" - }, - "tool_state": "{\"__input_ext\": \"input\", \"adv\": {\"evalue\": \"0.001\", \"limit\": \"3\"}, \"busco_mode\": {\"mode\": \"geno\", \"__current_case__\": 0, \"use_augustus\": {\"use_augustus_selector\": \"no\", \"__current_case__\": 0}}, \"chromInfo\": \"/opt/galaxy/tool-data/shared/ucsc/chrom/?.len\", \"input\": {\"__class__\": \"ConnectedValue\"}, \"lineage\": {\"lineage_mode\": \"select_lineage\", \"__current_case__\": 1, \"lineage_dataset\": \"saccharomycetes_odb10\"}, \"outputs\": [\"short_summary\"], \"__page__\": null, \"__rerun_remap_job_id__\": null}", - "tool_version": "5.2.2+galaxy2", - "type": "tool", - "uuid": "bbadf584-b321-4280-a5d4-be9c327c5087", - "workflow_outputs": [ - { - "label": null, - "output_name": "busco_sum", - "uuid": "53701c97-fd2a-4d18-8ad1-502a21bc50d8" - }, - { - "label": null, - "output_name": "busco_table", - "uuid": "a3dd7f65-4ae0-4388-8d09-bf13f43f18ac" - } - ] - }, - "26": { - "annotation": "", - "content_id": "toolshed.g2.bx.psu.edu/repos/iuc/merqury/merqury/1.3+galaxy1", - "errors": null, - "id": 26, - "input_connections": { - "mode|assembly_options|assembly_01": { - "id": 23, - "output_name": "out_fa" - }, - "mode|assembly_options|assembly_02": { - "id": 22, - "output_name": "out_fa" - }, - "mode|meryldb_F1": { - "id": 7, - "output_name": "read_db" - } - }, - "inputs": [], - "label": null, - "name": "Merqury", - "outputs": [ - { - "name": "qv_files", - "type": "input" - }, - { - "name": "png_files", - "type": "input" - }, - { - "name": "stats_files", - "type": "input" - } - ], - "position": { - "bottom": 872.6447531960226, - "height": 91.049072265625, - "left": 1081.5241033380682, - "right": 1147.5241033380682, - "top": 781.5956809303976, - "width": 66, - "x": 1081.5241033380682, - "y": 781.5956809303976 - }, - "post_job_actions": {}, - "tool_id": "toolshed.g2.bx.psu.edu/repos/iuc/merqury/merqury/1.3+galaxy1", - "tool_shed_repository": { - "changeset_revision": "39edec572bae", - "name": "merqury", - "owner": "iuc", - "tool_shed": "toolshed.g2.bx.psu.edu" - }, - "tool_state": "{\"__input_ext\": \"input\", \"chromInfo\": \"/opt/galaxy/tool-data/shared/ucsc/chrom/?.len\", \"label\": \"output_merqury\", \"mode\": {\"options\": \"default\", \"__current_case__\": 0, \"meryldb_F1\": {\"__class__\": \"ConnectedValue\"}, \"assembly_options\": {\"number_assemblies\": \"two\", \"__current_case__\": 1, \"assembly_01\": {\"__class__\": \"ConnectedValue\"}, \"assembly_02\": {\"__class__\": \"ConnectedValue\"}}}, \"output_selector\": [\"qv\", \"plots\", \"stats\"], \"__page__\": null, \"__rerun_remap_job_id__\": null}", - "tool_version": "1.3+galaxy1", - "type": "tool", - "uuid": "21e13849-7836-42a5-b57b-d1d55dab9b19", - "workflow_outputs": [ - { - "label": null, - "output_name": "qv_files", - "uuid": "34470957-c558-4e64-8af0-f0bf89d931d6" - }, - { - "label": null, - "output_name": "stats_files", - "uuid": "45a7ce2f-afc7-4100-98c5-fe6d4013a464" - }, - { - "label": null, - "output_name": "png_files", - "uuid": "2be0f6f9-b852-4e9d-9428-12d085b8f1e6" - } - ] - }, - "27": { - "annotation": "", - "content_id": "toolshed.g2.bx.psu.edu/repos/iuc/busco/busco/5.2.2+galaxy2", - "errors": null, - "id": 27, - "input_connections": { - "input": { - "id": 23, - "output_name": "out_fa" - } - }, - "inputs": [], - "label": null, - "name": "Busco", - "outputs": [ - { - "name": "busco_sum", - "type": "txt" - }, - { - "name": "busco_table", - "type": "tabular" - } - ], - "position": { - "bottom": 1430.2868448893228, - "height": 67.69122314453125, - "left": 1081.5241033380682, - "right": 1147.5241033380682, - "top": 1362.5956217447915, - "width": 66, - "x": 1081.5241033380682, - "y": 1362.5956217447915 - }, - "post_job_actions": {}, - "tool_id": "toolshed.g2.bx.psu.edu/repos/iuc/busco/busco/5.2.2+galaxy2", - "tool_shed_repository": { - "changeset_revision": "46ae58b1d792", - "name": "busco", - "owner": "iuc", - "tool_shed": "toolshed.g2.bx.psu.edu" - }, - "tool_state": "{\"__input_ext\": \"input\", \"adv\": {\"evalue\": \"0.001\", \"limit\": \"3\"}, \"busco_mode\": {\"mode\": \"geno\", \"__current_case__\": 0, \"use_augustus\": {\"use_augustus_selector\": \"no\", \"__current_case__\": 0}}, \"chromInfo\": \"/opt/galaxy/tool-data/shared/ucsc/chrom/?.len\", \"input\": {\"__class__\": \"ConnectedValue\"}, \"lineage\": {\"lineage_mode\": \"select_lineage\", \"__current_case__\": 1, \"lineage_dataset\": \"saccharomycetes_odb10\"}, \"outputs\": [\"short_summary\"], \"__page__\": null, \"__rerun_remap_job_id__\": null}", - "tool_version": "5.2.2+galaxy2", - "type": "tool", - "uuid": "97853834-9bec-435c-8d27-5b2272079f69", - "workflow_outputs": [ - { - "label": null, - "output_name": "busco_sum", - "uuid": "56773a24-69f9-475f-aaf5-40cf075448a6" - }, - { - "label": null, - "output_name": "busco_table", - "uuid": "7bf0073a-ba33-410a-927d-4b2da4df8c8a" - } - ] - }, - "28": { - "annotation": "", - "content_id": "toolshed.g2.bx.psu.edu/repos/iuc/purge_dups/purge_dups/1.2.5+galaxy4", - "errors": null, - "id": 28, - "input_connections": { - "function_select|input": { - "id": 23, - "output_name": "out_fa" - } - }, - "inputs": [], - "label": null, - "name": "Purge overlaps", - "outputs": [ - { - "name": "split_fasta", - "type": "fasta" - } - ], - "position": { - "bottom": 1656.5291008226798, - "height": 50.9334716796875, - "left": 1081.5241033380682, - "right": 1147.5241033380682, - "top": 1605.5956291429923, - "width": 66, - "x": 1081.5241033380682, - "y": 1605.5956291429923 - }, - "post_job_actions": {}, - "tool_id": "toolshed.g2.bx.psu.edu/repos/iuc/purge_dups/purge_dups/1.2.5+galaxy4", - "tool_shed_repository": { - "changeset_revision": "a315c25dc813", - "name": "purge_dups", - "owner": "iuc", - "tool_shed": "toolshed.g2.bx.psu.edu" - }, - "tool_state": "{\"__input_ext\": \"input\", \"chromInfo\": \"/opt/galaxy/tool-data/shared/ucsc/chrom/?.len\", \"function_select\": {\"functions\": \"split_fa\", \"__current_case__\": 1, \"input\": {\"__class__\": \"ConnectedValue\"}}, \"__page__\": null, \"__rerun_remap_job_id__\": null}", - "tool_version": "1.2.5+galaxy4", - "type": "tool", - "uuid": "36f0a1f5-e662-408b-a8a5-40a4f2cca2de", - "workflow_outputs": [ - { - "label": null, - "output_name": "split_fasta", - "uuid": "698b38fa-be8f-4510-86a8-b5dba07e9544" - } - ] - }, - "29": { - "annotation": "", - "content_id": "toolshed.g2.bx.psu.edu/repos/iuc/minimap2/minimap2/2.24+galaxy0", - "errors": null, - "id": 29, - "input_connections": { - "fastq_input|fastq_input1": { - "id": 4, - "output_name": "out1" - }, - "reference_source|ref_file": { - "id": 23, - "output_name": "out_fa" - } - }, - "inputs": [], - "label": null, - "name": "Map with minimap2", - "outputs": [ - { - "name": "alignment_output", - "type": "bam" - } - ], - "position": { - "bottom": 1872.0157507694128, - "height": 74.420166015625, - "left": 1081.5241033380682, - "right": 1147.5241033380682, - "top": 1797.5955847537878, - "width": 66, - "x": 1081.5241033380682, - "y": 1797.5955847537878 - }, - "post_job_actions": {}, - "tool_id": "toolshed.g2.bx.psu.edu/repos/iuc/minimap2/minimap2/2.24+galaxy0", - "tool_shed_repository": { - "changeset_revision": "11a0d50a54e6", - "name": "minimap2", - "owner": "iuc", - "tool_shed": "toolshed.g2.bx.psu.edu" - }, - "tool_state": "{\"__input_ext\": \"input\", \"alignment_options\": {\"splicing\": {\"splice_mode\": \"preset\", \"__current_case__\": 0}, \"A\": null, \"B\": null, \"O\": null, \"O2\": null, \"E\": null, \"E2\": null, \"z\": null, \"z2\": null, \"s\": null, \"no_end_flt\": \"true\"}, \"chromInfo\": \"/opt/galaxy/tool-data/shared/ucsc/chrom/?.len\", \"fastq_input\": {\"fastq_input_selector\": \"single\", \"__current_case__\": 0, \"fastq_input1\": {\"__class__\": \"ConnectedValue\"}, \"analysis_type_selector\": \"asm5\"}, \"fastq_input1|__identifier__\": \"SRR13577846_1\", \"indexing_options\": {\"H\": \"false\", \"k\": null, \"w\": null, \"I\": null}, \"io_options\": {\"output_format\": \"paf\", \"Q\": \"false\", \"L\": \"false\", \"K\": null, \"cs\": null, \"c\": \"false\", \"eqx\": \"false\", \"Y\": \"false\"}, \"mapping_options\": {\"N\": null, \"F\": null, \"f\": null, \"kmer_ocurrence_interval\": {\"interval\": \"\", \"__current_case__\": 1}, \"min_occ_floor\": null, \"q_occ_frac\": \"0.01\", \"g\": null, \"r\": null, \"n\": null, \"m\": null, \"max_chain_skip\": null, \"max_chain_iter\": null, \"X\": \"false\", \"p\": null, \"mask_len\": null}, \"reference_source\": {\"reference_source_selector\": \"history\", \"__current_case__\": 1, \"ref_file\": {\"__class__\": \"ConnectedValue\"}}, \"__page__\": null, \"__rerun_remap_job_id__\": null}", - "tool_version": "2.24+galaxy0", - "type": "tool", - "uuid": "b0ef8e59-2631-45c7-acf0-f2042b8d7b69", - "workflow_outputs": [ - { - "label": null, - "output_name": "alignment_output", - "uuid": "b196a114-96c6-40c8-8a42-9d303fc842c7" - } - ] - }, - "30": { - "annotation": "", - "content_id": "toolshed.g2.bx.psu.edu/repos/iuc/quast/quast/5.0.2+galaxy4", - "errors": null, - "id": 30, - "input_connections": { - "assembly|ref|est_ref_size": { - "id": 24, - "output_name": "integer_param" - }, - "in|inputs_0|input": { - "id": 23, - "output_name": "out_fa" - }, - "in|inputs_1|input": { - "id": 22, - "output_name": "out_fa" - }, - "reads|input_1": { - "id": 4, - "output_name": "out1" - } - }, - "inputs": [ - { - "description": "runtime parameter for tool Quast", - "name": "reads" - } - ], - "label": null, - "name": "Quast", - "outputs": [ - { - "name": "report_html", - "type": "html" - } - ], - "position": { - "bottom": 991.7869040749289, - "height": 101.20684814453125, - "left": 1925.4616477272727, - "right": 1991.4616477272727, - "top": 890.5800559303976, - "width": 66, - "x": 1925.4616477272727, - "y": 890.5800559303976 - }, - "post_job_actions": {}, - "tool_id": "toolshed.g2.bx.psu.edu/repos/iuc/quast/quast/5.0.2+galaxy4", - "tool_shed_repository": { - "changeset_revision": "875d0f36d66f", - "name": "quast", - "owner": "iuc", - "tool_shed": "toolshed.g2.bx.psu.edu" - }, - "tool_state": "{\"advanced\": {\"contig_thresholds\": \"0,1000\", \"strict_NA\": \"false\", \"extensive_mis_size\": \"1000\", \"scaffold_gap_max_size\": \"1000\", \"unaligned_part_size\": \"500\", \"skip_unaligned_mis_contigs\": \"true\", \"fragmented_max_indent\": null}, \"alignments\": {\"use_all_alignments\": \"false\", \"min_alignment\": \"65\", \"min_identity\": \"95.0\", \"ambiguity_usage\": \"one\", \"ambiguity_score\": \"0.99\", \"fragmented\": \"false\", \"upper_bound_assembly\": \"false\", \"upper_bound_min_con\": null}, \"assembly\": {\"type\": \"genome\", \"__current_case__\": 0, \"ref\": {\"use_ref\": \"false\", \"__current_case__\": 1, \"est_ref_size\": {\"__class__\": \"ConnectedValue\"}}, \"orga_type\": \"--eukaryote\"}, \"genes\": {\"gene_finding\": {\"tool\": \"none\", \"__current_case__\": 0}, \"rna_finding\": \"false\", \"conserved_genes_finding\": \"false\"}, \"in\": {\"custom\": \"true\", \"__current_case__\": 0, \"inputs\": [{\"__index__\": 0, \"input\": {\"__class__\": \"RuntimeValue\"}, \"labels\": \"Primary assebly\"}, {\"__index__\": 1, \"input\": {\"__class__\": \"RuntimeValue\"}, \"labels\": \"Alternate assembly\"}]}, \"large\": \"false\", \"min_contig\": \"500\", \"output_files\": [\"html\"], \"reads\": {\"reads_option\": \"pacbio\", \"__current_case__\": 5, \"input_1\": {\"__class__\": \"RuntimeValue\"}}, \"split_scaffolds\": \"false\", \"__page__\": null, \"__rerun_remap_job_id__\": null}", - "tool_version": "5.0.2+galaxy4", - "type": "tool", - "uuid": "c1d5c438-6d68-437d-90c1-aeeb46304be1", - "workflow_outputs": [ - { - "label": null, - "output_name": "report_html", - "uuid": "a0988654-d08b-4e08-9803-ed9133f35a69" - } - ] - }, - "31": { - "annotation": "", - "content_id": "toolshed.g2.bx.psu.edu/repos/iuc/minimap2/minimap2/2.24+galaxy0", - "errors": null, - "id": 31, - "input_connections": { - "fastq_input|fastq_input1": { - "id": 28, - "output_name": "split_fasta" - }, - "reference_source|ref_file": { - "id": 28, - "output_name": "split_fasta" - } - }, - "inputs": [], - "label": null, - "name": "Map with minimap2", - "outputs": [ - { - "name": "alignment_output", - "type": "bam" - } - ], - "position": { - "bottom": 1685.015869140625, - "height": 74.420166015625, - "left": 1359.5085375236742, - "right": 1425.5085375236742, - "top": 1610.595703125, - "width": 66, - "x": 1359.5085375236742, - "y": 1610.595703125 - }, - "post_job_actions": {}, - "tool_id": "toolshed.g2.bx.psu.edu/repos/iuc/minimap2/minimap2/2.24+galaxy0", - "tool_shed_repository": { - "changeset_revision": "11a0d50a54e6", - "name": "minimap2", - "owner": "iuc", - "tool_shed": "toolshed.g2.bx.psu.edu" - }, - "tool_state": "{\"__input_ext\": \"input\", \"alignment_options\": {\"splicing\": {\"splice_mode\": \"preset\", \"__current_case__\": 0}, \"A\": null, \"B\": null, \"O\": null, \"O2\": null, \"E\": null, \"E2\": null, \"z\": null, \"z2\": null, \"s\": null, \"no_end_flt\": \"true\"}, \"chromInfo\": \"/opt/galaxy/tool-data/shared/ucsc/chrom/?.len\", \"fastq_input\": {\"fastq_input_selector\": \"single\", \"__current_case__\": 0, \"fastq_input1\": {\"__class__\": \"ConnectedValue\"}, \"analysis_type_selector\": \"self-homology\"}, \"indexing_options\": {\"H\": \"false\", \"k\": null, \"w\": null, \"I\": null}, \"io_options\": {\"output_format\": \"paf\", \"Q\": \"false\", \"L\": \"false\", \"K\": null, \"cs\": null, \"c\": \"false\", \"eqx\": \"false\", \"Y\": \"false\"}, \"mapping_options\": {\"N\": null, \"F\": null, \"f\": null, \"kmer_ocurrence_interval\": {\"interval\": \"\", \"__current_case__\": 1}, \"min_occ_floor\": null, \"q_occ_frac\": \"0.01\", \"g\": null, \"r\": null, \"n\": null, \"m\": null, \"max_chain_skip\": null, \"max_chain_iter\": null, \"X\": \"false\", \"p\": null, \"mask_len\": null}, \"reference_source\": {\"reference_source_selector\": \"history\", \"__current_case__\": 1, \"ref_file\": {\"__class__\": \"ConnectedValue\"}}, \"__page__\": null, \"__rerun_remap_job_id__\": null}", - "tool_version": "2.24+galaxy0", - "type": "tool", - "uuid": "c9b90a4d-e3a6-4b72-84ef-9d0374b896e4", - "workflow_outputs": [ - { - "label": null, - "output_name": "alignment_output", - "uuid": "9b6ec151-11dc-4477-8740-fc70f5ca3d86" - } - ] - }, - "32": { - "annotation": "", - "content_id": "toolshed.g2.bx.psu.edu/repos/iuc/purge_dups/purge_dups/1.2.5+galaxy4", - "errors": null, - "id": 32, - "input_connections": { - "function_select|input": { - "id": 29, - "output_name": "alignment_output" - }, - "function_select|section_calcuts|transition": { - "id": 17, - "output_name": "integer_param" - }, - "function_select|section_calcuts|upper_depth": { - "id": 18, - "output_name": "integer_param" - } - }, - "inputs": [ - { - "description": "runtime parameter for tool Purge overlaps", - "name": "function_select" - } - ], - "label": null, - "name": "Purge overlaps", - "outputs": [ - { - "name": "pbcstat_cov", - "type": "tabular" - }, - { - "name": "hist", - "type": "png" - }, - { - "name": "calcuts_cutoff", - "type": "tabular" - } - ], - "position": { - "bottom": 2113.289277047822, - "height": 124.693603515625, - "left": 1359.5085375236742, - "right": 1425.5085375236742, - "top": 1988.5956735321968, - "width": 66, - "x": 1359.5085375236742, - "y": 1988.5956735321968 - }, - "post_job_actions": {}, - "tool_id": "toolshed.g2.bx.psu.edu/repos/iuc/purge_dups/purge_dups/1.2.5+galaxy4", - "tool_shed_repository": { - "changeset_revision": "a315c25dc813", - "name": "purge_dups", - "owner": "iuc", - "tool_shed": "toolshed.g2.bx.psu.edu" - }, - "tool_state": "{\"function_select\": {\"functions\": \"pbcstat\", \"__current_case__\": 2, \"input\": {\"__class__\": \"RuntimeValue\"}, \"pbcstat_options\": {\"max_cov\": \"500\", \"min_map_ratio\": \"0.0\", \"min_map_qual\": null, \"flank\": \"0\", \"primary_alignments\": \"true\"}, \"section_calcuts\": {\"min_depth\": \"0.1\", \"low_depth\": null, \"transition\": {\"__class__\": \"ConnectedValue\"}, \"upper_depth\": {\"__class__\": \"ConnectedValue\"}, \"ploidy\": \"-d 0\"}, \"section_hist\": {\"ymin\": null, \"ymax\": null, \"xmin\": null, \"xmax\": null, \"title\": \"Read depth histogram plot\"}, \"output_options\": [\"pbcstat_coverage\", \"histogram\", \"calcuts_cutoff\"]}, \"__page__\": null, \"__rerun_remap_job_id__\": null}", - "tool_version": "1.2.5+galaxy4", - "type": "tool", - "uuid": "2b7791ec-5f27-4b8f-ad94-1770c1895c70", - "workflow_outputs": [ - { - "label": null, - "output_name": "pbcstat_cov", - "uuid": "e99b7258-5f0d-4335-89c9-32d1b19f580d" - }, - { - "label": null, - "output_name": "hist", - "uuid": "8fb79644-8f37-4fac-aa18-80f7e4b8b4e0" - }, - { - "label": null, - "output_name": "calcuts_cutoff", - "uuid": "fd0f7672-419c-4ed5-87bd-9f2aed33e0d3" - } - ] - }, - "33": { - "annotation": "", - "content_id": "toolshed.g2.bx.psu.edu/repos/iuc/purge_dups/purge_dups/1.2.5+galaxy4", - "errors": null, - "id": 33, - "input_connections": { - "function_select|coverage": { - "id": 32, - "output_name": "pbcstat_cov" - }, - "function_select|cutoffs": { - "id": 32, - "output_name": "calcuts_cutoff" - }, - "function_select|input": { - "id": 31, - "output_name": "alignment_output" - } - }, - "inputs": [ - { - "description": "runtime parameter for tool Purge overlaps", - "name": "function_select" - }, - { - "description": "runtime parameter for tool Purge overlaps", - "name": "function_select" - }, - { - "description": "runtime parameter for tool Purge overlaps", - "name": "function_select" - } - ], - "label": null, - "name": "Purge overlaps", - "outputs": [ - { - "name": "purge_dups_bed", - "type": "bed" - } - ], - "position": { - "bottom": 2074.586921460701, - "height": 70.9912109375, - "left": 1637.5085079308712, - "right": 1703.5085079308712, - "top": 2003.5957105232008, - "width": 66, - "x": 1637.5085079308712, - "y": 2003.5957105232008 - }, - "post_job_actions": {}, - "tool_id": "toolshed.g2.bx.psu.edu/repos/iuc/purge_dups/purge_dups/1.2.5+galaxy4", - "tool_shed_repository": { - "changeset_revision": "a315c25dc813", - "name": "purge_dups", - "owner": "iuc", - "tool_shed": "toolshed.g2.bx.psu.edu" - }, - "tool_state": "{\"function_select\": {\"functions\": \"purge_dups\", \"__current_case__\": 0, \"input\": {\"__class__\": \"RuntimeValue\"}, \"coverage\": {\"__class__\": \"RuntimeValue\"}, \"cutoffs\": {\"__class__\": \"RuntimeValue\"}, \"min_bad\": \"0.8\", \"min_align\": \"70\", \"min_match\": \"200\", \"min_chain\": \"500\", \"max_gap\": \"20000\", \"double_chain\": {\"chaining_rounds\": \"one\", \"__current_case__\": 1}, \"min_chain_score\": \"10000\", \"max_extend\": \"15000\", \"log_file\": \"false\"}, \"__page__\": null, \"__rerun_remap_job_id__\": null}", - "tool_version": "1.2.5+galaxy4", - "type": "tool", - "uuid": "9aca1622-3a22-4602-8d2d-8d36becd63f1", - "workflow_outputs": [ - { - "label": null, - "output_name": "purge_dups_bed", - "uuid": "6fe2bec0-96fd-4fd8-9c2c-354e414c4e01" - } - ] - }, - "34": { - "annotation": "", - "content_id": "toolshed.g2.bx.psu.edu/repos/iuc/purge_dups/purge_dups/1.2.5+galaxy4", - "errors": null, - "id": 34, - "input_connections": { - "function_select|bed_input": { - "id": 33, - "output_name": "purge_dups_bed" - }, - "function_select|fasta_input": { - "id": 23, - "output_name": "out_fa" - } - }, - "inputs": [ - { - "description": "runtime parameter for tool Purge overlaps", - "name": "function_select" - }, - { - "description": "runtime parameter for tool Purge overlaps", - "name": "function_select" - } - ], - "label": null, - "name": "Purge overlaps", - "outputs": [ - { - "name": "get_seqs_hap", - "type": "fasta" - }, - { - "name": "get_seqs_purged", - "type": "fasta" - } - ], - "position": { - "bottom": 1399.0602971857243, - "height": 84.44903564453125, - "left": 1925.4616477272727, - "right": 1991.4616477272727, - "top": 1314.611261541193, - "width": 66, - "x": 1925.4616477272727, - "y": 1314.611261541193 - }, - "post_job_actions": {}, - "tool_id": "toolshed.g2.bx.psu.edu/repos/iuc/purge_dups/purge_dups/1.2.5+galaxy4", - "tool_shed_repository": { - "changeset_revision": "a315c25dc813", - "name": "purge_dups", - "owner": "iuc", - "tool_shed": "toolshed.g2.bx.psu.edu" - }, - "tool_state": "{\"function_select\": {\"functions\": \"get_seqs\", \"__current_case__\": 5, \"fasta_input\": {\"__class__\": \"RuntimeValue\"}, \"bed_input\": {\"__class__\": \"RuntimeValue\"}, \"advanced_options\": {\"coverage\": \"false\", \"haplotigs\": \"false\", \"length\": \"10000\", \"min_ratio\": \"0.05\", \"end_trim\": \"true\", \"split\": \"false\", \"min_gap\": \"10000\"}}, \"__page__\": null, \"__rerun_remap_job_id__\": null}", - "tool_version": "1.2.5+galaxy4", - "type": "tool", - "uuid": "d69ed63b-9065-4c55-94b8-88c0b88e6976", - "workflow_outputs": [ - { - "label": null, - "output_name": "get_seqs_purged", - "uuid": "37bd9afc-80e8-4855-ab52-8c013cf71b18" - }, - { - "label": null, - "output_name": "get_seqs_hap", - "uuid": "0855301e-baa3-4acc-b7c4-d4ed1a92cb81" - } - ] - }, - "35": { - "annotation": "", - "content_id": "toolshed.g2.bx.psu.edu/repos/bgruening/bionano_scaffold/bionano_scaffold/3.7.0+galaxy0", - "errors": null, - "id": 35, - "input_connections": { - "bionano_cmap": { - "id": 3, - "output_name": "output" - }, - "ngs_fasta": { - "id": 34, - "output_name": "get_seqs_purged" - } - }, - "inputs": [ - { - "description": "runtime parameter for tool Bionano Hybrid Scaffold", - "name": "bionano_cmap" - }, - { - "description": "runtime parameter for tool Bionano Hybrid Scaffold", - "name": "conflict_resolution" - }, - { - "description": "runtime parameter for tool Bionano Hybrid Scaffold", - "name": "ngs_fasta" - } - ], - "label": null, - "name": "Bionano Hybrid Scaffold", - "outputs": [ - { - "name": "ngs_contigs_scaffold_trimmed", - "type": "fasta" - }, - { - "name": "ngs_contigs_not_scaffolded_trimmed", - "type": "fasta" - }, - { - "name": "report", - "type": "txt" - }, - { - "name": "conflicts", - "type": "txt" - }, - { - "name": "ngs_contigs_scaffold_agp", - "type": "agp" - } - ], - "position": { - "bottom": 948.9603437943891, - "height": 178.39593505859375, - "left": 2223.5240589488635, - "right": 2289.5240589488635, - "top": 770.5644087357954, - "width": 66, - "x": 2223.5240589488635, - "y": 770.5644087357954 - }, - "post_job_actions": {}, - "tool_id": "toolshed.g2.bx.psu.edu/repos/bgruening/bionano_scaffold/bionano_scaffold/3.7.0+galaxy0", - "tool_shed_repository": { - "changeset_revision": "5258e18bbe23", - "name": "bionano_scaffold", - "owner": "bgruening", - "tool_shed": "toolshed.g2.bx.psu.edu" - }, - "tool_state": "{\"bionano_cmap\": {\"__class__\": \"RuntimeValue\"}, \"configuration_options\": {\"configuration\": \"vgp\", \"__current_case__\": 0, \"enzyme\": \"CTTAAG\"}, \"conflict_filter_genome\": \"2\", \"conflict_filter_sequence\": \"2\", \"conflict_resolution\": {\"__class__\": \"RuntimeValue\"}, \"ngs_fasta\": {\"__class__\": \"RuntimeValue\"}, \"trim_cut_sites\": \"true\", \"zip_file\": \"false\", \"__page__\": null, \"__rerun_remap_job_id__\": null}", - "tool_version": "3.7.0+galaxy0", - "type": "tool", - "uuid": "8afd7b0b-f414-4fb6-8026-ba3bdd333f2e", - "workflow_outputs": [ - { - "label": null, - "output_name": "ngs_contigs_not_scaffolded_trimmed", - "uuid": "0ae77c95-1392-4c91-bfb7-c59627dd06de" - }, - { - "label": null, - "output_name": "conflicts", - "uuid": "c3d98160-5c67-4755-9c5e-1d08e3a98808" - }, - { - "label": null, - "output_name": "ngs_contigs_scaffold_trimmed", - "uuid": "3d080b0c-b71b-4f75-954d-bb2022e2efa7" - }, - { - "label": null, - "output_name": "report", - "uuid": "b5bdd44d-fbcd-4643-9cd8-c655371a4bea" - }, - { - "label": null, - "output_name": "ngs_contigs_scaffold_agp", - "uuid": "2c2081c3-6c88-4da4-b1ac-28ee2e9ebe7a" - } - ] - }, - "36": { - "annotation": "", - "content_id": "toolshed.g2.bx.psu.edu/repos/iuc/busco/busco/5.2.2+galaxy2", - "errors": null, - "id": 36, - "input_connections": { - "input": { - "id": 34, - "output_name": "get_seqs_purged" - } - }, - "inputs": [], - "label": null, - "name": "Busco", - "outputs": [ - { - "name": "busco_sum", - "type": "txt" - }, - { - "name": "busco_table", - "type": "tabular" - } - ], - "position": { - "bottom": 1417.3025087298768, - "height": 67.6912841796875, - "left": 2223.5240589488635, - "right": 2289.5240589488635, - "top": 1349.6112245501893, - "width": 66, - "x": 2223.5240589488635, - "y": 1349.6112245501893 - }, - "post_job_actions": {}, - "tool_id": "toolshed.g2.bx.psu.edu/repos/iuc/busco/busco/5.2.2+galaxy2", - "tool_shed_repository": { - "changeset_revision": "46ae58b1d792", - "name": "busco", - "owner": "iuc", - "tool_shed": "toolshed.g2.bx.psu.edu" - }, - "tool_state": "{\"__input_ext\": \"input\", \"adv\": {\"evalue\": \"0.001\", \"limit\": \"3\"}, \"busco_mode\": {\"mode\": \"geno\", \"__current_case__\": 0, \"use_augustus\": {\"use_augustus_selector\": \"no\", \"__current_case__\": 0}}, \"chromInfo\": \"/opt/galaxy/tool-data/shared/ucsc/chrom/?.len\", \"input\": {\"__class__\": \"ConnectedValue\"}, \"lineage\": {\"lineage_mode\": \"select_lineage\", \"__current_case__\": 1, \"lineage_dataset\": \"saccharomycetes_odb10\"}, \"outputs\": [\"short_summary\"], \"__page__\": null, \"__rerun_remap_job_id__\": null}", - "tool_version": "5.2.2+galaxy2", - "type": "tool", - "uuid": "5a25e3ee-3b1a-4e53-b065-71430d5b1ac6", - "workflow_outputs": [ - { - "label": null, - "output_name": "busco_sum", - "uuid": "52fa2e25-c5e9-4a81-9f17-56a1cd1528a6" - }, - { - "label": null, - "output_name": "busco_table", - "uuid": "13af5346-e82f-4c6b-b8e1-4a1a38555545" - } - ] - }, - "37": { - "annotation": "", - "content_id": "cat1", - "errors": null, - "id": 37, - "input_connections": { - "input1": { - "id": 34, - "output_name": "get_seqs_hap" - }, - "queries_0|input2": { - "id": 22, - "output_name": "out_fa" - } - }, - "inputs": [], - "label": null, - "name": "Concatenate datasets", - "outputs": [ - { - "name": "out_file1", - "type": "input" - } - ], - "position": { - "bottom": 1640.1158095851088, - "height": 47.50457763671875, - "left": 2223.5240589488635, - "right": 2289.5240589488635, - "top": 1592.61123194839, - "width": 66, - "x": 2223.5240589488635, - "y": 1592.61123194839 - }, - "post_job_actions": {}, - "tool_id": "cat1", - "tool_state": "{\"__input_ext\": \"fasta\", \"chromInfo\": \"/opt/galaxy/tool-data/shared/ucsc/chrom/?.len\", \"input1\": {\"__class__\": \"ConnectedValue\"}, \"queries\": [{\"__index__\": 0, \"input2\": {\"__class__\": \"ConnectedValue\"}}], \"__page__\": null, \"__rerun_remap_job_id__\": null}", - "tool_version": "1.0.0", - "type": "tool", - "uuid": "dda3d7b0-3279-46af-93e6-2e2fd2024204", - "workflow_outputs": [ - { - "label": null, - "output_name": "out_file1", - "uuid": "f6175082-e57c-46c5-a990-3e2b249c1e46" - } - ] - }, - "38": { - "annotation": "", - "content_id": "cat1", - "errors": null, - "id": 38, - "input_connections": { - "input1": { - "id": 35, - "output_name": "ngs_contigs_scaffold_trimmed" - }, - "queries_0|input2": { - "id": 35, - "output_name": "ngs_contigs_not_scaffolded_trimmed" - } - }, - "inputs": [], - "label": null, - "name": "Concatenate datasets", - "outputs": [ - { - "name": "out_file1", - "type": "input" - } - ], - "position": { - "bottom": 1103.1158049612334, - "height": 47.504547119140625, - "left": 2511.5241773200755, - "right": 2577.5241773200755, - "top": 1055.6112578420928, - "width": 66, - "x": 2511.5241773200755, - "y": 1055.6112578420928 - }, - "post_job_actions": {}, - "tool_id": "cat1", - "tool_state": "{\"__input_ext\": \"fasta\", \"chromInfo\": \"/opt/galaxy/tool-data/shared/ucsc/chrom/?.len\", \"input1\": {\"__class__\": \"ConnectedValue\"}, \"queries\": [{\"__index__\": 0, \"input2\": {\"__class__\": \"ConnectedValue\"}}], \"__page__\": null, \"__rerun_remap_job_id__\": null}", - "tool_version": "1.0.0", - "type": "tool", - "uuid": "13317723-c405-4c9c-81e6-d5d00ad22538", - "workflow_outputs": [ - { - "label": null, - "output_name": "out_file1", - "uuid": "db4e650f-ff13-4914-bc5d-dbd31c1cc4d8" - } - ] - }, - "39": { - "annotation": "", - "content_id": "toolshed.g2.bx.psu.edu/repos/iuc/purge_dups/purge_dups/1.2.5+galaxy4", - "errors": null, - "id": 39, - "input_connections": { - "function_select|input": { - "id": 37, - "output_name": "out_file1" - } - }, - "inputs": [ - { - "description": "runtime parameter for tool Purge overlaps", - "name": "function_select" - } - ], - "label": null, - "name": "Purge overlaps", - "outputs": [ - { - "name": "split_fasta", - "type": "fasta" - } - ], - "position": { - "bottom": 1638.5290323893228, - "height": 50.93341064453125, - "left": 2511.5241773200755, - "right": 2577.5241773200755, - "top": 1587.5956217447915, - "width": 66, - "x": 2511.5241773200755, - "y": 1587.5956217447915 - }, - "post_job_actions": {}, - "tool_id": "toolshed.g2.bx.psu.edu/repos/iuc/purge_dups/purge_dups/1.2.5+galaxy4", - "tool_shed_repository": { - "changeset_revision": "a315c25dc813", - "name": "purge_dups", - "owner": "iuc", - "tool_shed": "toolshed.g2.bx.psu.edu" - }, - "tool_state": "{\"function_select\": {\"functions\": \"split_fa\", \"__current_case__\": 1, \"input\": {\"__class__\": \"RuntimeValue\"}}, \"__page__\": null, \"__rerun_remap_job_id__\": null}", - "tool_version": "1.2.5+galaxy4", - "type": "tool", - "uuid": "af48d338-441a-42e0-af04-8e64891e51b8", - "workflow_outputs": [ - { - "label": null, - "output_name": "split_fasta", - "uuid": "adb872ff-ece3-4e01-a940-366fbabaf36b" - } - ] - }, - "40": { - "annotation": "", - "content_id": "toolshed.g2.bx.psu.edu/repos/iuc/minimap2/minimap2/2.24+galaxy0", - "errors": null, - "id": 40, - "input_connections": { - "fastq_input|fastq_input1": { - "id": 6, - "output_name": "output" - }, - "reference_source|ref_file": { - "id": 37, - "output_name": "out_file1" - } - }, - "inputs": [], - "label": null, - "name": "Map with minimap2", - "outputs": [ - { - "name": "alignment_output", - "type": "bam" - } - ], - "position": { - "bottom": 1909.0158173532195, - "height": 74.420166015625, - "left": 2511.5241773200755, - "right": 2577.5241773200755, - "top": 1834.5956513375945, - "width": 66, - "x": 2511.5241773200755, - "y": 1834.5956513375945 - }, - "post_job_actions": {}, - "tool_id": "toolshed.g2.bx.psu.edu/repos/iuc/minimap2/minimap2/2.24+galaxy0", - "tool_shed_repository": { - "changeset_revision": "11a0d50a54e6", - "name": "minimap2", - "owner": "iuc", - "tool_shed": "toolshed.g2.bx.psu.edu" - }, - "tool_state": "{\"__input_ext\": \"input\", \"alignment_options\": {\"splicing\": {\"splice_mode\": \"preset\", \"__current_case__\": 0}, \"A\": null, \"B\": null, \"O\": null, \"O2\": null, \"E\": null, \"E2\": null, \"z\": null, \"z2\": null, \"s\": null, \"no_end_flt\": \"true\"}, \"chromInfo\": \"/opt/galaxy/tool-data/shared/ucsc/chrom/?.len\", \"fastq_input\": {\"fastq_input_selector\": \"single\", \"__current_case__\": 0, \"fastq_input1\": {\"__class__\": \"ConnectedValue\"}, \"analysis_type_selector\": \"asm5\"}, \"indexing_options\": {\"H\": \"false\", \"k\": null, \"w\": null, \"I\": null}, \"io_options\": {\"output_format\": \"paf\", \"Q\": \"false\", \"L\": \"false\", \"K\": null, \"cs\": null, \"c\": \"false\", \"eqx\": \"false\", \"Y\": \"false\"}, \"mapping_options\": {\"N\": null, \"F\": null, \"f\": null, \"kmer_ocurrence_interval\": {\"interval\": \"\", \"__current_case__\": 1}, \"min_occ_floor\": null, \"q_occ_frac\": \"0.01\", \"g\": null, \"r\": null, \"n\": null, \"m\": null, \"max_chain_skip\": null, \"max_chain_iter\": null, \"X\": \"false\", \"p\": null, \"mask_len\": null}, \"reference_source\": {\"reference_source_selector\": \"history\", \"__current_case__\": 1, \"ref_file\": {\"__class__\": \"ConnectedValue\"}}, \"__page__\": null, \"__rerun_remap_job_id__\": null}", - "tool_version": "2.24+galaxy0", - "type": "tool", - "uuid": "1d14b2d9-f019-4502-a063-562af76a08ae", - "workflow_outputs": [ - { - "label": null, - "output_name": "alignment_output", - "uuid": "903e5888-3e07-428a-b5fa-24da623530d8" - } - ] - }, - "41": { - "annotation": "", - "content_id": "toolshed.g2.bx.psu.edu/repos/devteam/bwa/bwa_mem/0.7.17.2", - "errors": null, - "id": 41, - "input_connections": { - "fastq_input|fastq_input1": { - "id": 0, - "output_name": "output" - }, - "reference_source|ref_file": { - "id": 38, - "output_name": "out_file1" - } - }, - "inputs": [], - "label": null, - "name": "Map with BWA-MEM", - "outputs": [ - { - "name": "bam_output", - "type": "bam" - } - ], - "position": { - "bottom": 757.7603140166311, - "height": 81.14907836914062, - "left": 2809.508537523674, - "right": 2875.508537523674, - "top": 676.6112356474905, - "width": 66, - "x": 2809.508537523674, - "y": 676.6112356474905 - }, - "post_job_actions": {}, - "tool_id": "toolshed.g2.bx.psu.edu/repos/devteam/bwa/bwa_mem/0.7.17.2", - "tool_shed_repository": { - "changeset_revision": "64f11cf59c6e", - "name": "bwa", - "owner": "devteam", - "tool_shed": "toolshed.g2.bx.psu.edu" - }, - "tool_state": "{\"__input_ext\": \"fasta\", \"analysis_type\": {\"analysis_type_selector\": \"illumina\", \"__current_case__\": 0}, \"chromInfo\": \"/opt/galaxy/tool-data/shared/ucsc/chrom/?.len\", \"fastq_input\": {\"fastq_input_selector\": \"single\", \"__current_case__\": 1, \"fastq_input1\": {\"__class__\": \"ConnectedValue\"}}, \"output_sort\": \"name\", \"reference_source\": {\"reference_source_selector\": \"history\", \"__current_case__\": 1, \"ref_file\": {\"__class__\": \"ConnectedValue\"}, \"index_a\": \"auto\"}, \"rg\": {\"rg_selector\": \"do_not_set\", \"__current_case__\": 3}, \"__page__\": null, \"__rerun_remap_job_id__\": null}", - "tool_version": "0.7.17.2", - "type": "tool", - "uuid": "ecfda239-f364-433a-afa3-3281c11d0f24", - "workflow_outputs": [ - { - "label": null, - "output_name": "bam_output", - "uuid": "69b7db14-b48a-40a0-9e89-6f95ba1dd63b" - } - ] - }, - "42": { - "annotation": "", - "content_id": "toolshed.g2.bx.psu.edu/repos/devteam/bwa/bwa_mem/0.7.17.2", - "errors": null, - "id": 42, - "input_connections": { - "fastq_input|fastq_input1": { - "id": 1, - "output_name": "output" - }, - "reference_source|ref_file": { - "id": 38, - "output_name": "out_file1" - } - }, - "inputs": [], - "label": null, - "name": "Map with BWA-MEM", - "outputs": [ - { - "name": "bam_output", - "type": "bam" - } - ], - "position": { - "bottom": 1041.7447509765625, - "height": 81.1490478515625, - "left": 2809.508537523674, - "right": 2875.508537523674, - "top": 960.595703125, - "width": 66, - "x": 2809.508537523674, - "y": 960.595703125 - }, - "post_job_actions": {}, - "tool_id": "toolshed.g2.bx.psu.edu/repos/devteam/bwa/bwa_mem/0.7.17.2", - "tool_shed_repository": { - "changeset_revision": "64f11cf59c6e", - "name": "bwa", - "owner": "devteam", - "tool_shed": "toolshed.g2.bx.psu.edu" - }, - "tool_state": "{\"__input_ext\": \"fasta\", \"analysis_type\": {\"analysis_type_selector\": \"illumina\", \"__current_case__\": 0}, \"chromInfo\": \"/opt/galaxy/tool-data/shared/ucsc/chrom/?.len\", \"fastq_input\": {\"fastq_input_selector\": \"single\", \"__current_case__\": 1, \"fastq_input1\": {\"__class__\": \"ConnectedValue\"}}, \"output_sort\": \"name\", \"reference_source\": {\"reference_source_selector\": \"history\", \"__current_case__\": 1, \"ref_file\": {\"__class__\": \"ConnectedValue\"}, \"index_a\": \"auto\"}, \"rg\": {\"rg_selector\": \"do_not_set\", \"__current_case__\": 3}, \"__page__\": null, \"__rerun_remap_job_id__\": null}", - "tool_version": "0.7.17.2", - "type": "tool", - "uuid": "9dd81280-dc44-44e0-8ed1-a470d6c6dc6f", - "workflow_outputs": [ - { - "label": null, - "output_name": "bam_output", - "uuid": "f7d1e802-955c-4fbb-bb1b-d0b9e03aa24a" - } - ] - }, - "43": { - "annotation": "", - "content_id": "toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/tp_find_and_replace/1.1.3", - "errors": null, - "id": 43, - "input_connections": { - "infile": { - "id": 38, - "output_name": "out_file1" - } - }, - "inputs": [], - "label": null, - "name": "Replace", - "outputs": [ - { - "name": "outfile", - "type": "input" - } - ], - "position": { - "bottom": 1209.3579813639321, - "height": 30.746734619140625, - "left": 3087.524229107481, - "right": 3153.5242901426373, - "top": 1178.6112467447915, - "width": 66.00006103515625, - "x": 3087.524229107481, - "y": 1178.6112467447915 - }, - "post_job_actions": {}, - "tool_id": "toolshed.g2.bx.psu.edu/repos/bgruening/text_processing/tp_find_and_replace/1.1.3", - "tool_shed_repository": { - "changeset_revision": "ddf54b12c295", - "name": "text_processing", - "owner": "bgruening", - "tool_shed": "toolshed.g2.bx.psu.edu" - }, - "tool_state": "{\"__input_ext\": \"fasta\", \"caseinsensitive\": \"false\", \"chromInfo\": \"/opt/galaxy/tool-data/shared/ucsc/chrom/?.len\", \"find_pattern\": \":\", \"global\": \"true\", \"infile\": {\"__class__\": \"ConnectedValue\"}, \"is_regex\": \"false\", \"replace_pattern\": \"\", \"searchwhere\": {\"searchwhere_select\": \"line\", \"__current_case__\": 0}, \"skip_first_line\": \"false\", \"wholewords\": \"false\", \"__page__\": null, \"__rerun_remap_job_id__\": null}", - "tool_version": "1.1.3", - "type": "tool", - "uuid": "cf9b2414-81fd-4bbd-b478-b81fe7a71c20", - "workflow_outputs": [ - { - "label": null, - "output_name": "outfile", - "uuid": "90c175ad-a531-470f-ade0-930e908c372b" - } - ] - }, - "44": { - "annotation": "", - "content_id": "toolshed.g2.bx.psu.edu/repos/iuc/quast/quast/5.0.2+galaxy4", - "errors": null, - "id": 44, - "input_connections": { - "assembly|ref|est_ref_size": { - "id": 24, - "output_name": "integer_param" - }, - "in|inputs_0|input": { - "id": 38, - "output_name": "out_file1" - }, - "reads|input_1": { - "id": 6, - "output_name": "output" - } - }, - "inputs": [ - { - "description": "runtime parameter for tool Quast", - "name": "reads" - } - ], - "label": null, - "name": "Quast", - "outputs": [ - { - "name": "report_html", - "type": "html" - } - ], - "position": { - "bottom": 2381.06035082268, - "height": 84.4490966796875, - "left": 2809.508537523674, - "right": 2875.508537523674, - "top": 2296.6112541429925, - "width": 66, - "x": 2809.508537523674, - "y": 2296.6112541429925 - }, - "post_job_actions": {}, - "tool_id": "toolshed.g2.bx.psu.edu/repos/iuc/quast/quast/5.0.2+galaxy4", - "tool_shed_repository": { - "changeset_revision": "875d0f36d66f", - "name": "quast", - "owner": "iuc", - "tool_shed": "toolshed.g2.bx.psu.edu" - }, - "tool_state": "{\"advanced\": {\"contig_thresholds\": \"0,1000\", \"strict_NA\": \"false\", \"extensive_mis_size\": \"1000\", \"scaffold_gap_max_size\": \"1000\", \"unaligned_part_size\": \"500\", \"skip_unaligned_mis_contigs\": \"true\", \"fragmented_max_indent\": null}, \"alignments\": {\"use_all_alignments\": \"false\", \"min_alignment\": \"65\", \"min_identity\": \"95.0\", \"ambiguity_usage\": \"one\", \"ambiguity_score\": \"0.99\", \"fragmented\": \"false\", \"upper_bound_assembly\": \"false\", \"upper_bound_min_con\": null}, \"assembly\": {\"type\": \"genome\", \"__current_case__\": 0, \"ref\": {\"use_ref\": \"false\", \"__current_case__\": 1, \"est_ref_size\": {\"__class__\": \"ConnectedValue\"}}, \"orga_type\": \"--eukaryote\"}, \"genes\": {\"gene_finding\": {\"tool\": \"none\", \"__current_case__\": 0}, \"rna_finding\": \"false\", \"conserved_genes_finding\": \"false\"}, \"in\": {\"custom\": \"true\", \"__current_case__\": 0, \"inputs\": [{\"__index__\": 0, \"input\": {\"__class__\": \"RuntimeValue\"}, \"labels\": \"Primary assembly\"}]}, \"large\": \"false\", \"min_contig\": \"500\", \"output_files\": [\"html\"], \"reads\": {\"reads_option\": \"pacbio\", \"__current_case__\": 5, \"input_1\": {\"__class__\": \"RuntimeValue\"}}, \"split_scaffolds\": \"false\", \"__page__\": null, \"__rerun_remap_job_id__\": null}", - "tool_version": "5.0.2+galaxy4", - "type": "tool", - "uuid": "a584815c-1223-466b-bfd2-4ee7d2b26023", - "workflow_outputs": [ - { - "label": null, - "output_name": "report_html", - "uuid": "47a2b9b0-2eda-4624-a800-215b1744534b" - } - ] - }, - "45": { - "annotation": "", - "content_id": "toolshed.g2.bx.psu.edu/repos/iuc/minimap2/minimap2/2.24+galaxy0", - "errors": null, - "id": 45, - "input_connections": { - "fastq_input|fastq_input1": { - "id": 39, - "output_name": "split_fasta" - }, - "reference_source|ref_file": { - "id": 39, - "output_name": "split_fasta" - } - }, - "inputs": [], - "label": null, - "name": "Map with minimap2", - "outputs": [ - { - "name": "alignment_output", - "type": "bam" - } - ], - "position": { - "bottom": 1662.0157877604165, - "height": 74.420166015625, - "left": 2809.508537523674, - "right": 2875.508537523674, - "top": 1587.5956217447915, - "width": 66, - "x": 2809.508537523674, - "y": 1587.5956217447915 - }, - "post_job_actions": {}, - "tool_id": "toolshed.g2.bx.psu.edu/repos/iuc/minimap2/minimap2/2.24+galaxy0", - "tool_shed_repository": { - "changeset_revision": "11a0d50a54e6", - "name": "minimap2", - "owner": "iuc", - "tool_shed": "toolshed.g2.bx.psu.edu" - }, - "tool_state": "{\"__input_ext\": \"input\", \"alignment_options\": {\"splicing\": {\"splice_mode\": \"preset\", \"__current_case__\": 0}, \"A\": null, \"B\": null, \"O\": null, \"O2\": null, \"E\": null, \"E2\": null, \"z\": null, \"z2\": null, \"s\": null, \"no_end_flt\": \"true\"}, \"chromInfo\": \"/opt/galaxy/tool-data/shared/ucsc/chrom/?.len\", \"fastq_input\": {\"fastq_input_selector\": \"single\", \"__current_case__\": 0, \"fastq_input1\": {\"__class__\": \"ConnectedValue\"}, \"analysis_type_selector\": \"self-homology\"}, \"indexing_options\": {\"H\": \"false\", \"k\": null, \"w\": null, \"I\": null}, \"io_options\": {\"output_format\": \"paf\", \"Q\": \"false\", \"L\": \"false\", \"K\": null, \"cs\": null, \"c\": \"false\", \"eqx\": \"false\", \"Y\": \"false\"}, \"mapping_options\": {\"N\": null, \"F\": null, \"f\": null, \"kmer_ocurrence_interval\": {\"interval\": \"\", \"__current_case__\": 1}, \"min_occ_floor\": null, \"q_occ_frac\": \"0.01\", \"g\": null, \"r\": null, \"n\": null, \"m\": null, \"max_chain_skip\": null, \"max_chain_iter\": null, \"X\": \"false\", \"p\": null, \"mask_len\": null}, \"reference_source\": {\"reference_source_selector\": \"history\", \"__current_case__\": 1, \"ref_file\": {\"__class__\": \"ConnectedValue\"}}, \"__page__\": null, \"__rerun_remap_job_id__\": null}", - "tool_version": "2.24+galaxy0", - "type": "tool", - "uuid": "9d7527a9-c076-4a3d-ba6a-1f7aff53e279", - "workflow_outputs": [ - { - "label": null, - "output_name": "alignment_output", - "uuid": "d1d01a87-598d-401f-9127-6b3858b3c586" - } - ] - }, - "46": { - "annotation": "", - "content_id": "toolshed.g2.bx.psu.edu/repos/iuc/purge_dups/purge_dups/1.2.5+galaxy4", - "errors": null, - "id": 46, - "input_connections": { - "function_select|input": { - "id": 40, - "output_name": "alignment_output" - }, - "function_select|section_calcuts|transition": { - "id": 17, - "output_name": "integer_param" - }, - "function_select|section_calcuts|upper_depth": { - "id": 18, - "output_name": "integer_param" - } - }, - "inputs": [ - { - "description": "runtime parameter for tool Purge overlaps", - "name": "function_select" - } - ], - "label": null, - "name": "Purge overlaps", - "outputs": [ - { - "name": "pbcstat_cov", - "type": "tabular" - }, - { - "name": "hist", - "type": "png" - }, - { - "name": "calcuts_cutoff", - "type": "tabular" - } - ], - "position": { - "bottom": 1976.304931640625, - "height": 124.693603515625, - "left": 2809.508537523674, - "right": 2875.508537523674, - "top": 1851.611328125, - "width": 66, - "x": 2809.508537523674, - "y": 1851.611328125 - }, - "post_job_actions": {}, - "tool_id": "toolshed.g2.bx.psu.edu/repos/iuc/purge_dups/purge_dups/1.2.5+galaxy4", - "tool_shed_repository": { - "changeset_revision": "a315c25dc813", - "name": "purge_dups", - "owner": "iuc", - "tool_shed": "toolshed.g2.bx.psu.edu" - }, - "tool_state": "{\"function_select\": {\"functions\": \"pbcstat\", \"__current_case__\": 2, \"input\": {\"__class__\": \"RuntimeValue\"}, \"pbcstat_options\": {\"max_cov\": \"500\", \"min_map_ratio\": \"0.0\", \"min_map_qual\": null, \"flank\": \"0\", \"primary_alignments\": \"true\"}, \"section_calcuts\": {\"min_depth\": \"0.1\", \"low_depth\": null, \"transition\": {\"__class__\": \"ConnectedValue\"}, \"upper_depth\": {\"__class__\": \"ConnectedValue\"}, \"ploidy\": \"-d 0\"}, \"section_hist\": {\"ymin\": null, \"ymax\": null, \"xmin\": null, \"xmax\": null, \"title\": \"Read depth histogram plot\"}, \"output_options\": [\"pbcstat_coverage\", \"histogram\", \"calcuts_cutoff\"]}, \"__page__\": null, \"__rerun_remap_job_id__\": null}", - "tool_version": "1.2.5+galaxy4", - "type": "tool", - "uuid": "ef0a97be-df6f-4929-877d-d5d6ffe5385e", - "workflow_outputs": [ - { - "label": null, - "output_name": "pbcstat_cov", - "uuid": "52fa2145-438b-4352-b857-47f121110285" - }, - { - "label": null, - "output_name": "hist", - "uuid": "80c91e05-93c4-4704-8618-04d6e35a1cf0" - }, - { - "label": null, - "output_name": "calcuts_cutoff", - "uuid": "9a818a87-68b1-464c-9380-f939fd4cb7d1" - } - ] - }, - "47": { - "annotation": "", - "content_id": "toolshed.g2.bx.psu.edu/repos/iuc/bellerophon/bellerophon/1.0+galaxy0", - "errors": null, - "id": 47, - "input_connections": { - "forward": { - "id": 41, - "output_name": "bam_output" - }, - "reverse": { - "id": 42, - "output_name": "bam_output" - } - }, - "inputs": [], - "label": null, - "name": "Filter and merge", - "outputs": [ - { - "name": "outfile", - "type": "bam" - } - ], - "position": { - "bottom": 893.0845845540364, - "height": 47.504547119140625, - "left": 3087.524229107481, - "right": 3153.5242901426373, - "top": 845.5800374348958, - "width": 66.00006103515625, - "x": 3087.524229107481, - "y": 845.5800374348958 - }, - "post_job_actions": {}, - "tool_id": "toolshed.g2.bx.psu.edu/repos/iuc/bellerophon/bellerophon/1.0+galaxy0", - "tool_shed_repository": { - "changeset_revision": "25ca5d73aedf", - "name": "bellerophon", - "owner": "iuc", - "tool_shed": "toolshed.g2.bx.psu.edu" - }, - "tool_state": "{\"__input_ext\": \"input\", \"chromInfo\": \"/opt/galaxy/tool-data/shared/ucsc/chrom/?.len\", \"forward\": {\"__class__\": \"ConnectedValue\"}, \"quality\": \"20\", \"reverse\": {\"__class__\": \"ConnectedValue\"}, \"__page__\": null, \"__rerun_remap_job_id__\": null}", - "tool_version": "1.0+galaxy0", - "type": "tool", - "uuid": "e330090d-e993-4b51-8d83-38cb7c7d6d82", - "workflow_outputs": [ - { - "label": null, - "output_name": "outfile", - "uuid": "9e94432d-64b1-43fa-bc0b-29a9701fa132" - } - ] - }, - "48": { - "annotation": "", - "content_id": "toolshed.g2.bx.psu.edu/repos/iuc/purge_dups/purge_dups/1.2.5+galaxy4", - "errors": null, - "id": 48, - "input_connections": { - "function_select|coverage": { - "id": 46, - "output_name": "pbcstat_cov" - }, - "function_select|cutoffs": { - "id": 46, - "output_name": "calcuts_cutoff" - }, - "function_select|input": { - "id": 45, - "output_name": "alignment_output" - } - }, - "inputs": [ - { - "description": "runtime parameter for tool Purge overlaps", - "name": "function_select" - }, - { - "description": "runtime parameter for tool Purge overlaps", - "name": "function_select" - }, - { - "description": "runtime parameter for tool Purge overlaps", - "name": "function_select" - } - ], - "label": null, - "name": "Purge overlaps", - "outputs": [ - { - "name": "purge_dups_bed", - "type": "bed" - } - ], - "position": { - "bottom": 1910.5712761156485, - "height": 70.99127197265625, - "left": 3084.4772801254735, - "right": 3150.4772190903172, - "top": 1839.5800041429923, - "width": 65.99993896484375, - "x": 3084.4772801254735, - "y": 1839.5800041429923 - }, - "post_job_actions": {}, - "tool_id": "toolshed.g2.bx.psu.edu/repos/iuc/purge_dups/purge_dups/1.2.5+galaxy4", - "tool_shed_repository": { - "changeset_revision": "a315c25dc813", - "name": "purge_dups", - "owner": "iuc", - "tool_shed": "toolshed.g2.bx.psu.edu" - }, - "tool_state": "{\"function_select\": {\"functions\": \"purge_dups\", \"__current_case__\": 0, \"input\": {\"__class__\": \"RuntimeValue\"}, \"coverage\": {\"__class__\": \"RuntimeValue\"}, \"cutoffs\": {\"__class__\": \"RuntimeValue\"}, \"min_bad\": \"0.8\", \"min_align\": \"70\", \"min_match\": \"200\", \"min_chain\": \"500\", \"max_gap\": \"20000\", \"double_chain\": {\"chaining_rounds\": \"one\", \"__current_case__\": 1}, \"min_chain_score\": \"10000\", \"max_extend\": \"15000\", \"log_file\": \"false\"}, \"__page__\": null, \"__rerun_remap_job_id__\": null}", - "tool_version": "1.2.5+galaxy4", - "type": "tool", - "uuid": "aab90c1f-06d8-4146-bf54-13cb5f7abb6c", - "workflow_outputs": [ - { - "label": null, - "output_name": "purge_dups_bed", - "uuid": "d50c11ea-dcb1-4500-8f41-67baa1258d5d" - } - ] - }, - "49": { - "annotation": "", - "content_id": "toolshed.g2.bx.psu.edu/repos/iuc/pretext_map/pretext_map/0.1.9+galaxy0", - "errors": null, - "id": 49, - "input_connections": { - "input": { - "id": 47, - "output_name": "outfile" - } - }, - "inputs": [], - "label": null, - "name": "PretextMap", - "outputs": [ - { - "name": "pretext_map_out", - "type": "pretext" - } - ], - "position": { - "bottom": 722.8001773718631, - "height": 44.204559326171875, - "left": 3365.524384469697, - "right": 3431.524384469697, - "top": 678.5956180456913, - "width": 66, - "x": 3365.524384469697, - "y": 678.5956180456913 - }, - "post_job_actions": {}, - "tool_id": "toolshed.g2.bx.psu.edu/repos/iuc/pretext_map/pretext_map/0.1.9+galaxy0", - "tool_shed_repository": { - "changeset_revision": "dfb8a4497339", - "name": "pretext_map", - "owner": "iuc", - "tool_shed": "toolshed.g2.bx.psu.edu" - }, - "tool_state": "{\"__input_ext\": \"input\", \"chromInfo\": \"/opt/galaxy/tool-data/shared/ucsc/chrom/?.len\", \"filter\": {\"filter_type\": \"\", \"__current_case__\": 0}, \"input\": {\"__class__\": \"ConnectedValue\"}, \"map_qual\": \"10\", \"sorting\": {\"sortby\": \"nosort\", \"__current_case__\": 0}, \"__page__\": null, \"__rerun_remap_job_id__\": null}", - "tool_version": "0.1.9+galaxy0", - "type": "tool", - "uuid": "391244e8-1cea-4bc0-9710-e7a23e88d3f3", - "workflow_outputs": [ - { - "label": null, - "output_name": "pretext_map_out", - "uuid": "2a24aa1c-6b47-4eab-ab6a-314e1ee9a899" - } - ] - }, - "50": { - "annotation": "", - "content_id": "toolshed.g2.bx.psu.edu/repos/iuc/bedtools/bedtools_bamtobed/2.30.0+galaxy2", - "errors": null, - "id": 50, - "input_connections": { - "input": { - "id": 47, - "output_name": "outfile" - } - }, - "inputs": [], - "label": null, - "name": "bedtools BAM to BED", - "outputs": [ - { - "name": "output", - "type": "bed" - } - ], - "position": { - "bottom": 894.800176447088, - "height": 44.20452880859375, - "left": 3365.524384469697, - "right": 3431.524384469697, - "top": 850.5956476384943, - "width": 66, - "x": 3365.524384469697, - "y": 850.5956476384943 - }, - "post_job_actions": {}, - "tool_id": "toolshed.g2.bx.psu.edu/repos/iuc/bedtools/bedtools_bamtobed/2.30.0+galaxy2", - "tool_shed_repository": { - "changeset_revision": "7ab85ac5f64b", - "name": "bedtools", - "owner": "iuc", - "tool_shed": "toolshed.g2.bx.psu.edu" - }, - "tool_state": "{\"__input_ext\": \"input\", \"chromInfo\": \"/opt/galaxy/tool-data/shared/ucsc/chrom/?.len\", \"ed_score\": \"false\", \"input\": {\"__class__\": \"ConnectedValue\"}, \"option\": \"-bed12\", \"split\": \"false\", \"tag\": \"\", \"__page__\": null, \"__rerun_remap_job_id__\": null}", - "tool_version": "2.30.0+galaxy2", - "type": "tool", - "uuid": "84331994-3df0-4e1f-8ab4-f2cdcb798351", - "workflow_outputs": [ - { - "label": null, - "output_name": "output", - "uuid": "06d76a1c-47d1-41c8-b9c1-6637bcd8278a" - } - ] - }, - "51": { - "annotation": "", - "content_id": "toolshed.g2.bx.psu.edu/repos/iuc/purge_dups/purge_dups/1.2.5+galaxy4", - "errors": null, - "id": 51, - "input_connections": { - "function_select|bed_input": { - "id": 48, - "output_name": "purge_dups_bed" - }, - "function_select|fasta_input": { - "id": 37, - "output_name": "out_file1" - } - }, - "inputs": [], - "label": null, - "name": "Purge overlaps", - "outputs": [ - { - "name": "get_seqs_hap", - "type": "fasta" - }, - { - "name": "get_seqs_purged", - "type": "fasta" - } - ], - "position": { - "bottom": 1592.0603656190813, - "height": 84.4490966796875, - "left": 3365.524384469697, - "right": 3431.524384469697, - "top": 1507.6112689393938, - "width": 66, - "x": 3365.524384469697, - "y": 1507.6112689393938 - }, - "post_job_actions": {}, - "tool_id": "toolshed.g2.bx.psu.edu/repos/iuc/purge_dups/purge_dups/1.2.5+galaxy4", - "tool_shed_repository": { - "changeset_revision": "a315c25dc813", - "name": "purge_dups", - "owner": "iuc", - "tool_shed": "toolshed.g2.bx.psu.edu" - }, - "tool_state": "{\"__input_ext\": \"input\", \"chromInfo\": \"/opt/galaxy/tool-data/shared/ucsc/chrom/?.len\", \"function_select\": {\"functions\": \"get_seqs\", \"__current_case__\": 5, \"fasta_input\": {\"__class__\": \"ConnectedValue\"}, \"bed_input\": {\"__class__\": \"ConnectedValue\"}, \"advanced_options\": {\"coverage\": \"false\", \"haplotigs\": \"false\", \"length\": \"10000\", \"min_ratio\": \"0.05\", \"end_trim\": \"true\", \"split\": \"false\", \"min_gap\": \"10000\"}}, \"__page__\": null, \"__rerun_remap_job_id__\": null}", - "tool_version": "1.2.5+galaxy4", - "type": "tool", - "uuid": "3b6f995a-5b7a-4669-9bad-0bc980555b50", - "workflow_outputs": [ - { - "label": null, - "output_name": "get_seqs_hap", - "uuid": "4575d989-6648-4968-a0f0-b172e1995fa2" - }, - { - "label": null, - "output_name": "get_seqs_purged", - "uuid": "58175e84-7757-40fa-bbb1-96c88c5728e9" - } - ] - }, - "52": { - "annotation": "", - "content_id": "toolshed.g2.bx.psu.edu/repos/iuc/pretext_snapshot/pretext_snapshot/0.0.3+galaxy1", - "errors": null, - "id": 52, - "input_connections": { - "input": { - "id": 49, - "output_name": "pretext_map_out" - } - }, - "inputs": [], - "label": null, - "name": "Pretext Snapshot", - "outputs": [ - { - "name": "pretext_snap_out", - "type": "input" - } - ], - "position": { - "bottom": 742.8157940488873, - "height": 44.20452880859375, - "left": 3643.524354876894, - "right": 3709.524354876894, - "top": 698.6112652402935, - "width": 66, - "x": 3643.524354876894, - "y": 698.6112652402935 - }, - "post_job_actions": {}, - "tool_id": "toolshed.g2.bx.psu.edu/repos/iuc/pretext_snapshot/pretext_snapshot/0.0.3+galaxy1", - "tool_shed_repository": { - "changeset_revision": "44c66e8d21e6", - "name": "pretext_snapshot", - "owner": "iuc", - "tool_shed": "toolshed.g2.bx.psu.edu" - }, - "tool_state": "{\"__input_ext\": \"input\", \"chromInfo\": \"/opt/galaxy/tool-data/shared/ucsc/chrom/?.len\", \"colormap\": \"5\", \"formats\": {\"outformat\": \"png\", \"__current_case__\": 0}, \"grid\": {\"showGrid\": \"true\", \"__current_case__\": 0, \"gridsize\": \"1\", \"gridcolor\": \"black\"}, \"input\": {\"__class__\": \"ConnectedValue\"}, \"mintexels\": \"64\", \"resolution\": \"1000\", \"sequencenames\": \"false\", \"sequences\": \"=full, =all\", \"__page__\": null, \"__rerun_remap_job_id__\": null}", - "tool_version": "0.0.3+galaxy1", - "type": "tool", - "uuid": "0748eba5-6679-4049-93a8-387f97d0f062", - "workflow_outputs": [ - { - "label": null, - "output_name": "pretext_snap_out", - "uuid": "0b1d8b2a-dbf4-4ee2-b118-cb4afd53ddd5" - } - ] - }, - "53": { - "annotation": "", - "content_id": "sort1", - "errors": null, - "id": 53, - "input_connections": { - "input": { - "id": 50, - "output_name": "output" - } - }, - "inputs": [], - "label": null, - "name": "Sort", - "outputs": [ - { - "name": "out_file1", - "type": "input" - } - ], - "position": { - "bottom": 901.3267720540364, - "height": 30.746734619140625, - "left": 3643.524354876894, - "right": 3709.524354876894, - "top": 870.5800374348958, - "width": 66, - "x": 3643.524354876894, - "y": 870.5800374348958 - }, - "post_job_actions": {}, - "tool_id": "sort1", - "tool_state": "{\"__input_ext\": \"bed\", \"chromInfo\": \"/opt/galaxy/tool-data/shared/ucsc/chrom/?.len\", \"column\": \"4\", \"column_set\": [], \"header_lines\": \"0\", \"input\": {\"__class__\": \"ConnectedValue\"}, \"order\": \"ASC\", \"style\": \"alpha\", \"__page__\": null, \"__rerun_remap_job_id__\": null}", - "tool_version": "1.2.0", - "type": "tool", - "uuid": "699917c5-4c0e-429a-9cb9-e7c3211a8489", - "workflow_outputs": [ - { - "label": null, - "output_name": "out_file1", - "uuid": "237a5c4a-d28c-4370-ab5c-1eca7a08ea21" - } - ] - }, - "54": { - "annotation": "", - "content_id": "toolshed.g2.bx.psu.edu/repos/iuc/busco/busco/5.2.2+galaxy2", - "errors": null, - "id": 54, - "input_connections": { - "input": { - "id": 51, - "output_name": "get_seqs_purged" - } - }, - "inputs": [], - "label": null, - "name": "Busco", - "outputs": [ - { - "name": "busco_sum", - "type": "txt" - }, - { - "name": "busco_table", - "type": "tabular" - } - ], - "position": { - "bottom": 1069.3025512695312, - "height": 67.69122314453125, - "left": 3643.524354876894, - "right": 3709.524354876894, - "top": 1001.611328125, - "width": 66, - "x": 3643.524354876894, - "y": 1001.611328125 - }, - "post_job_actions": {}, - "tool_id": "toolshed.g2.bx.psu.edu/repos/iuc/busco/busco/5.2.2+galaxy2", - "tool_shed_repository": { - "changeset_revision": "46ae58b1d792", - "name": "busco", - "owner": "iuc", - "tool_shed": "toolshed.g2.bx.psu.edu" - }, - "tool_state": "{\"__input_ext\": \"input\", \"adv\": {\"evalue\": \"0.001\", \"limit\": \"3\"}, \"busco_mode\": {\"mode\": \"geno\", \"__current_case__\": 0, \"use_augustus\": {\"use_augustus_selector\": \"no\", \"__current_case__\": 0}}, \"chromInfo\": \"/opt/galaxy/tool-data/shared/ucsc/chrom/?.len\", \"input\": {\"__class__\": \"ConnectedValue\"}, \"lineage\": {\"lineage_mode\": \"select_lineage\", \"__current_case__\": 1, \"lineage_dataset\": \"saccharomycetes_odb10\"}, \"outputs\": [\"short_summary\"], \"__page__\": null, \"__rerun_remap_job_id__\": null}", - "tool_version": "5.2.2+galaxy2", - "type": "tool", - "uuid": "abe77b25-0bed-4ac9-a243-ec6fd0637446", - "workflow_outputs": [ - { - "label": null, - "output_name": "busco_table", - "uuid": "77118f8a-dfc0-4f4f-9767-3047ed5a846a" - }, - { - "label": null, - "output_name": "busco_sum", - "uuid": "fdd21f9e-463f-4f44-9fe7-894edb33e106" - } - ] - }, - "55": { - "annotation": "", - "content_id": "toolshed.g2.bx.psu.edu/repos/iuc/quast/quast/5.0.2+galaxy4", - "errors": null, - "id": 55, - "input_connections": { - "assembly|ref|est_ref_size": { - "id": 24, - "output_name": "integer_param" - }, - "in|inputs_0|input": { - "id": 34, - "output_name": "get_seqs_purged" - }, - "in|inputs_1|input": { - "id": 51, - "output_name": "get_seqs_purged" - }, - "reads|input_1": { - "id": 6, - "output_name": "output" - } - }, - "inputs": [ - { - "description": "runtime parameter for tool Quast", - "name": "reads" - } - ], - "label": null, - "name": "Quast", - "outputs": [ - { - "name": "report_html", - "type": "html" - } - ], - "position": { - "bottom": 2169.7868744821258, - "height": 101.20684814453125, - "left": 3643.524354876894, - "right": 3709.524354876894, - "top": 2068.5800263375945, - "width": 66, - "x": 3643.524354876894, - "y": 2068.5800263375945 - }, - "post_job_actions": {}, - "tool_id": "toolshed.g2.bx.psu.edu/repos/iuc/quast/quast/5.0.2+galaxy4", - "tool_shed_repository": { - "changeset_revision": "875d0f36d66f", - "name": "quast", - "owner": "iuc", - "tool_shed": "toolshed.g2.bx.psu.edu" - }, - "tool_state": "{\"advanced\": {\"contig_thresholds\": \"0,1000\", \"strict_NA\": \"false\", \"extensive_mis_size\": \"1000\", \"scaffold_gap_max_size\": \"1000\", \"unaligned_part_size\": \"500\", \"skip_unaligned_mis_contigs\": \"true\", \"fragmented_max_indent\": null}, \"alignments\": {\"use_all_alignments\": \"false\", \"min_alignment\": \"65\", \"min_identity\": \"95.0\", \"ambiguity_usage\": \"one\", \"ambiguity_score\": \"0.99\", \"fragmented\": \"false\", \"upper_bound_assembly\": \"false\", \"upper_bound_min_con\": null}, \"assembly\": {\"type\": \"genome\", \"__current_case__\": 0, \"ref\": {\"use_ref\": \"false\", \"__current_case__\": 1, \"est_ref_size\": {\"__class__\": \"ConnectedValue\"}}, \"orga_type\": \"--eukaryote\"}, \"genes\": {\"gene_finding\": {\"tool\": \"none\", \"__current_case__\": 0}, \"rna_finding\": \"false\", \"conserved_genes_finding\": \"false\"}, \"in\": {\"custom\": \"true\", \"__current_case__\": 0, \"inputs\": [{\"__index__\": 0, \"input\": {\"__class__\": \"RuntimeValue\"}, \"labels\": \"Primary assembly\"}, {\"__index__\": 1, \"input\": {\"__class__\": \"RuntimeValue\"}, \"labels\": \"Alternate assembly\"}]}, \"large\": \"false\", \"min_contig\": \"500\", \"output_files\": [\"html\"], \"reads\": {\"reads_option\": \"pacbio\", \"__current_case__\": 5, \"input_1\": {\"__class__\": \"RuntimeValue\"}}, \"split_scaffolds\": \"false\", \"__page__\": null, \"__rerun_remap_job_id__\": null}", - "tool_version": "5.0.2+galaxy4", - "type": "tool", - "uuid": "da02cb47-cd47-463e-81c2-061ecaf19b20", - "workflow_outputs": [ - { - "label": null, - "output_name": "report_html", - "uuid": "b5a3f213-36f5-4f1e-aa77-ea4a2c4cb07e" - } - ] - }, - "56": { - "annotation": "", - "content_id": "toolshed.g2.bx.psu.edu/repos/iuc/salsa/salsa/2.3+galaxy2", - "errors": null, - "id": 56, - "input_connections": { - "bed_file": { - "id": 53, - "output_name": "out_file1" - }, - "fasta_in": { - "id": 43, - "output_name": "outfile" - } - }, - "inputs": [ - { - "description": "runtime parameter for tool SALSA", - "name": "bed_file" - }, - { - "description": "runtime parameter for tool SALSA", - "name": "fasta_in" - }, - { - "description": "runtime parameter for tool SALSA", - "name": "gfa_file" - } - ], - "label": null, - "name": "SALSA", - "outputs": [ - { - "name": "scaffolds_fasta", - "type": "fasta" - }, - { - "name": "scaffolds_agp", - "type": "tabular" - } - ], - "position": { - "bottom": 957.3446988192471, - "height": 87.74908447265625, - "left": 3921.5243252840905, - "right": 3987.5243252840905, - "top": 869.5956143465909, - "width": 66, - "x": 3921.5243252840905, - "y": 869.5956143465909 - }, - "post_job_actions": {}, - "tool_id": "toolshed.g2.bx.psu.edu/repos/iuc/salsa/salsa/2.3+galaxy2", - "tool_shed_repository": { - "changeset_revision": "f77f7a7f3b83", - "name": "salsa", - "owner": "iuc", - "tool_shed": "toolshed.g2.bx.psu.edu" - }, - "tool_state": "{\"bed_file\": {\"__class__\": \"RuntimeValue\"}, \"cutoff\": null, \"enzyme_conditional\": {\"enzyme_options\": \"specific\", \"__current_case__\": 1, \"manual_enzyme\": \"CTTAAG\"}, \"fasta_in\": {\"__class__\": \"RuntimeValue\"}, \"gfa_file\": {\"__class__\": \"RuntimeValue\"}, \"iter\": null, \"__page__\": null, \"__rerun_remap_job_id__\": null}", - "tool_version": "2.3+galaxy2", - "type": "tool", - "uuid": "1b9f99a2-f931-4782-8bc1-44ba522a33a0", - "workflow_outputs": [ - { - "label": null, - "output_name": "scaffolds_fasta", - "uuid": "81f8bb38-a485-4f59-be03-089601221e7d" - }, - { - "label": null, - "output_name": "scaffolds_agp", - "uuid": "553b0bf4-90c9-40b8-8e76-ed5b49c39f84" - } - ] - }, - "57": { - "annotation": "", - "content_id": "toolshed.g2.bx.psu.edu/repos/devteam/bwa/bwa_mem/0.7.17.2", - "errors": null, - "id": 57, - "input_connections": { - "fastq_input|fastq_input1": { - "id": 0, - "output_name": "output" - }, - "reference_source|ref_file": { - "id": 56, - "output_name": "scaffolds_fasta" - } - }, - "inputs": [], - "label": null, - "name": "Map with BWA-MEM", - "outputs": [ - { - "name": "bam_output", - "type": "bam" - } - ], - "position": { - "bottom": 632.7447162974964, - "height": 81.14906311035156, - "left": 4209.524073745265, - "right": 4275.524073745265, - "top": 551.5956531871449, - "width": 66, - "x": 4209.524073745265, - "y": 551.5956531871449 - }, - "post_job_actions": {}, - "tool_id": "toolshed.g2.bx.psu.edu/repos/devteam/bwa/bwa_mem/0.7.17.2", - "tool_shed_repository": { - "changeset_revision": "64f11cf59c6e", - "name": "bwa", - "owner": "devteam", - "tool_shed": "toolshed.g2.bx.psu.edu" - }, - "tool_state": "{\"__input_ext\": \"fasta\", \"analysis_type\": {\"analysis_type_selector\": \"illumina\", \"__current_case__\": 0}, \"chromInfo\": \"/opt/galaxy/tool-data/shared/ucsc/chrom/?.len\", \"fastq_input\": {\"fastq_input_selector\": \"single\", \"__current_case__\": 1, \"fastq_input1\": {\"__class__\": \"ConnectedValue\"}}, \"output_sort\": \"name\", \"reference_source\": {\"reference_source_selector\": \"history\", \"__current_case__\": 1, \"ref_file\": {\"__class__\": \"ConnectedValue\"}, \"index_a\": \"auto\"}, \"rg\": {\"rg_selector\": \"do_not_set\", \"__current_case__\": 3}, \"__page__\": null, \"__rerun_remap_job_id__\": null}", - "tool_version": "0.7.17.2", - "type": "tool", - "uuid": "f5bb38bd-a0aa-4439-a4c6-801f47366610", - "workflow_outputs": [ - { - "label": null, - "output_name": "bam_output", - "uuid": "e593f0ce-7349-4008-b896-a4c68806f50c" - } - ] - }, - "58": { - "annotation": "", - "content_id": "toolshed.g2.bx.psu.edu/repos/devteam/bwa/bwa_mem/0.7.17.2", - "errors": null, - "id": 58, - "input_connections": { - "fastq_input|fastq_input1": { - "id": 1, - "output_name": "output" - }, - "reference_source|ref_file": { - "id": 56, - "output_name": "scaffolds_fasta" - } - }, - "inputs": [], - "label": null, - "name": "Map with BWA-MEM", - "outputs": [ - { - "name": "bam_output", - "type": "bam" - } - ], - "position": { - "bottom": 916.7447509765625, - "height": 81.1490478515625, - "left": 4209.524073745265, - "right": 4275.524073745265, - "top": 835.595703125, - "width": 66, - "x": 4209.524073745265, - "y": 835.595703125 - }, - "post_job_actions": {}, - "tool_id": "toolshed.g2.bx.psu.edu/repos/devteam/bwa/bwa_mem/0.7.17.2", - "tool_shed_repository": { - "changeset_revision": "64f11cf59c6e", - "name": "bwa", - "owner": "devteam", - "tool_shed": "toolshed.g2.bx.psu.edu" - }, - "tool_state": "{\"__input_ext\": \"fasta\", \"analysis_type\": {\"analysis_type_selector\": \"illumina\", \"__current_case__\": 0}, \"chromInfo\": \"/opt/galaxy/tool-data/shared/ucsc/chrom/?.len\", \"fastq_input\": {\"fastq_input_selector\": \"single\", \"__current_case__\": 1, \"fastq_input1\": {\"__class__\": \"ConnectedValue\"}}, \"output_sort\": \"name\", \"reference_source\": {\"reference_source_selector\": \"history\", \"__current_case__\": 1, \"ref_file\": {\"__class__\": \"ConnectedValue\"}, \"index_a\": \"auto\"}, \"rg\": {\"rg_selector\": \"do_not_set\", \"__current_case__\": 3}, \"__page__\": null, \"__rerun_remap_job_id__\": null}", - "tool_version": "0.7.17.2", - "type": "tool", - "uuid": "daacf6aa-0784-468f-8b6f-59f67ebe590a", - "workflow_outputs": [ - { - "label": null, - "output_name": "bam_output", - "uuid": "de6cbec3-7873-4027-aeec-7754b512665b" - } - ] - }, - "59": { - "annotation": "", - "content_id": "toolshed.g2.bx.psu.edu/repos/iuc/busco/busco/5.2.2+galaxy2", - "errors": null, - "id": 59, - "input_connections": { - "input": { - "id": 56, - "output_name": "scaffolds_fasta" - } - }, - "inputs": [], - "label": null, - "name": "Busco", - "outputs": [ - { - "name": "busco_sum", - "type": "txt" - }, - { - "name": "busco_table", - "type": "tabular" - } - ], - "position": { - "bottom": 1187.2868680087001, - "height": 67.69125366210938, - "left": 4209.524073745265, - "right": 4275.524073745265, - "top": 1119.5956143465908, - "width": 66, - "x": 4209.524073745265, - "y": 1119.5956143465908 - }, - "post_job_actions": {}, - "tool_id": "toolshed.g2.bx.psu.edu/repos/iuc/busco/busco/5.2.2+galaxy2", - "tool_shed_repository": { - "changeset_revision": "46ae58b1d792", - "name": "busco", - "owner": "iuc", - "tool_shed": "toolshed.g2.bx.psu.edu" - }, - "tool_state": "{\"__input_ext\": \"input\", \"adv\": {\"evalue\": \"0.001\", \"limit\": \"3\"}, \"busco_mode\": {\"mode\": \"geno\", \"__current_case__\": 0, \"use_augustus\": {\"use_augustus_selector\": \"no\", \"__current_case__\": 0}}, \"chromInfo\": \"/opt/galaxy/tool-data/shared/ucsc/chrom/?.len\", \"input\": {\"__class__\": \"ConnectedValue\"}, \"lineage\": {\"lineage_mode\": \"select_lineage\", \"__current_case__\": 1, \"lineage_dataset\": \"saccharomycetes_odb10\"}, \"outputs\": [\"short_summary\"], \"__page__\": null, \"__rerun_remap_job_id__\": null}", - "tool_version": "5.2.2+galaxy2", - "type": "tool", - "uuid": "87b52b5f-bac9-4a63-aed0-0535a1e4171d", - "workflow_outputs": [ - { - "label": null, - "output_name": "busco_sum", - "uuid": "32dca78e-789b-4654-9869-fdc1679d1802" - }, - { - "label": null, - "output_name": "busco_table", - "uuid": "d42b5539-a486-485b-a84b-254cd0dd7910" - } - ] - }, - "60": { - "annotation": "", - "content_id": "toolshed.g2.bx.psu.edu/repos/iuc/quast/quast/5.0.2+galaxy4", - "errors": null, - "id": 60, - "input_connections": { - "assembly|ref|est_ref_size": { - "id": 24, - "output_name": "integer_param" - }, - "in|inputs_0|input": { - "id": 56, - "output_name": "scaffolds_fasta" - }, - "reads|input_1": { - "id": 6, - "output_name": "output" - } - }, - "inputs": [ - { - "description": "runtime parameter for tool Quast", - "name": "reads" - } - ], - "label": null, - "name": "Quast", - "outputs": [ - { - "name": "report_html", - "type": "html" - } - ], - "position": { - "bottom": 2512.0602916370735, - "height": 84.4490966796875, - "left": 4209.524073745265, - "right": 4275.524073745265, - "top": 2427.611194957386, - "width": 66, - "x": 4209.524073745265, - "y": 2427.611194957386 - }, - "post_job_actions": {}, - "tool_id": "toolshed.g2.bx.psu.edu/repos/iuc/quast/quast/5.0.2+galaxy4", - "tool_shed_repository": { - "changeset_revision": "875d0f36d66f", - "name": "quast", - "owner": "iuc", - "tool_shed": "toolshed.g2.bx.psu.edu" - }, - "tool_state": "{\"advanced\": {\"contig_thresholds\": \"0,1000\", \"strict_NA\": \"false\", \"extensive_mis_size\": \"1000\", \"scaffold_gap_max_size\": \"1000\", \"unaligned_part_size\": \"500\", \"skip_unaligned_mis_contigs\": \"true\", \"fragmented_max_indent\": null}, \"alignments\": {\"use_all_alignments\": \"false\", \"min_alignment\": \"65\", \"min_identity\": \"95.0\", \"ambiguity_usage\": \"one\", \"ambiguity_score\": \"0.99\", \"fragmented\": \"false\", \"upper_bound_assembly\": \"false\", \"upper_bound_min_con\": null}, \"assembly\": {\"type\": \"genome\", \"__current_case__\": 0, \"ref\": {\"use_ref\": \"false\", \"__current_case__\": 1, \"est_ref_size\": {\"__class__\": \"ConnectedValue\"}}, \"orga_type\": \"--eukaryote\"}, \"genes\": {\"gene_finding\": {\"tool\": \"none\", \"__current_case__\": 0}, \"rna_finding\": \"false\", \"conserved_genes_finding\": \"false\"}, \"in\": {\"custom\": \"true\", \"__current_case__\": 0, \"inputs\": [{\"__index__\": 0, \"input\": {\"__class__\": \"RuntimeValue\"}, \"labels\": \"Primary assembly\"}]}, \"large\": \"false\", \"min_contig\": \"500\", \"output_files\": [\"html\"], \"reads\": {\"reads_option\": \"pacbio\", \"__current_case__\": 5, \"input_1\": {\"__class__\": \"RuntimeValue\"}}, \"split_scaffolds\": \"false\", \"__page__\": null, \"__rerun_remap_job_id__\": null}", - "tool_version": "5.0.2+galaxy4", - "type": "tool", - "uuid": "becc24fe-7b91-49d4-928a-2509b9e0f523", - "workflow_outputs": [ - { - "label": null, - "output_name": "report_html", - "uuid": "646cde64-76b7-40d5-ad5d-fe1fb856306a" - } - ] - }, - "61": { - "annotation": "", - "content_id": "toolshed.g2.bx.psu.edu/repos/iuc/bellerophon/bellerophon/1.0+galaxy0", - "errors": null, - "id": 61, - "input_connections": { - "forward": { - "id": 57, - "output_name": "bam_output" - }, - "reverse": { - "id": 58, - "output_name": "bam_output" - } - }, - "inputs": [], - "label": null, - "name": "Filter and merge", - "outputs": [ - { - "name": "outfile", - "type": "bam" - } - ], - "position": { - "bottom": 677.0845577355586, - "height": 47.5045166015625, - "left": 4487.524044152462, - "right": 4553.524044152462, - "top": 629.5800411339961, - "width": 66, - "x": 4487.524044152462, - "y": 629.5800411339961 - }, - "post_job_actions": {}, - "tool_id": "toolshed.g2.bx.psu.edu/repos/iuc/bellerophon/bellerophon/1.0+galaxy0", - "tool_shed_repository": { - "changeset_revision": "25ca5d73aedf", - "name": "bellerophon", - "owner": "iuc", - "tool_shed": "toolshed.g2.bx.psu.edu" - }, - "tool_state": "{\"__input_ext\": \"input\", \"chromInfo\": \"/opt/galaxy/tool-data/shared/ucsc/chrom/?.len\", \"forward\": {\"__class__\": \"ConnectedValue\"}, \"quality\": \"20\", \"reverse\": {\"__class__\": \"ConnectedValue\"}, \"__page__\": null, \"__rerun_remap_job_id__\": null}", - "tool_version": "1.0+galaxy0", - "type": "tool", - "uuid": "a3b4e573-b9e3-4d5e-9557-93ec3e153e62", - "workflow_outputs": [ - { - "label": null, - "output_name": "outfile", - "uuid": "999b7fa7-1b39-48fd-afb8-31498c4d0c20" - } - ] - }, - "62": { - "annotation": "", - "content_id": "toolshed.g2.bx.psu.edu/repos/iuc/pretext_map/pretext_map/0.1.9+galaxy0", - "errors": null, - "id": 62, - "input_connections": { - "input": { - "id": 61, - "output_name": "outfile" - } - }, - "inputs": [], - "label": null, - "name": "PretextMap", - "outputs": [ - { - "name": "pretext_map_out", - "type": "pretext" - } - ], - "position": { - "bottom": 678.8001801461884, - "height": 44.20452880859375, - "left": 4765.524014559659, - "right": 4831.524014559659, - "top": 634.5956513375946, - "width": 66, - "x": 4765.524014559659, - "y": 634.5956513375946 - }, - "post_job_actions": {}, - "tool_id": "toolshed.g2.bx.psu.edu/repos/iuc/pretext_map/pretext_map/0.1.9+galaxy0", - "tool_shed_repository": { - "changeset_revision": "dfb8a4497339", - "name": "pretext_map", - "owner": "iuc", - "tool_shed": "toolshed.g2.bx.psu.edu" - }, - "tool_state": "{\"__input_ext\": \"input\", \"chromInfo\": \"/opt/galaxy/tool-data/shared/ucsc/chrom/?.len\", \"filter\": {\"filter_type\": \"\", \"__current_case__\": 0}, \"input\": {\"__class__\": \"ConnectedValue\"}, \"map_qual\": \"10\", \"sorting\": {\"sortby\": \"nosort\", \"__current_case__\": 0}, \"__page__\": null, \"__rerun_remap_job_id__\": null}", - "tool_version": "0.1.9+galaxy0", - "type": "tool", - "uuid": "063a413a-2760-47be-b4a4-215eed48a2f7", - "workflow_outputs": [ - { - "label": null, - "output_name": "pretext_map_out", - "uuid": "658447e1-83da-4a2a-b2dd-0d6c219e2e05" - } - ] - }, - "63": { - "annotation": "", - "content_id": "toolshed.g2.bx.psu.edu/repos/iuc/pretext_snapshot/pretext_snapshot/0.0.3+galaxy1", - "errors": null, - "id": 63, - "input_connections": { - "input": { - "id": 62, - "output_name": "pretext_map_out" - } - }, - "inputs": [], - "label": null, - "name": "Pretext Snapshot", - "outputs": [ - { - "name": "pretext_snap_out", - "type": "input" - } - ], - "position": { - "bottom": 703.8001801461884, - "height": 44.20452880859375, - "left": 5043.492912523674, - "right": 5109.492912523674, - "top": 659.5956513375946, - "width": 66, - "x": 5043.492912523674, - "y": 659.5956513375946 - }, - "post_job_actions": {}, - "tool_id": "toolshed.g2.bx.psu.edu/repos/iuc/pretext_snapshot/pretext_snapshot/0.0.3+galaxy1", - "tool_shed_repository": { - "changeset_revision": "44c66e8d21e6", - "name": "pretext_snapshot", - "owner": "iuc", - "tool_shed": "toolshed.g2.bx.psu.edu" - }, - "tool_state": "{\"__input_ext\": \"input\", \"chromInfo\": \"/opt/galaxy/tool-data/shared/ucsc/chrom/?.len\", \"colormap\": \"5\", \"formats\": {\"outformat\": \"png\", \"__current_case__\": 0}, \"grid\": {\"showGrid\": \"true\", \"__current_case__\": 0, \"gridsize\": \"1\", \"gridcolor\": \"black\"}, \"input\": {\"__class__\": \"ConnectedValue\"}, \"mintexels\": \"64\", \"resolution\": \"1000\", \"sequencenames\": \"false\", \"sequences\": \"=full, =all\", \"__page__\": null, \"__rerun_remap_job_id__\": null}", - "tool_version": "0.0.3+galaxy1", - "type": "tool", - "uuid": "08c9443e-c104-4d78-bdad-900fa8b1baf4", - "workflow_outputs": [ - { - "label": null, - "output_name": "pretext_snap_out", - "uuid": "58cdd2e1-4049-419b-8639-24fe288cc2dd" - } - ] - } - }, - "tags": ["assembly"], - "uuid": "16f91f3c-8354-4384-bc27-8adf963fe030", - "version": 9 -} \ No newline at end of file diff --git a/topics/galaxy-interface/tutorials/collections/tutorial.md b/topics/galaxy-interface/tutorials/collections/tutorial.md index 02b81f4671884b..5fdf5bee5319b4 100644 --- a/topics/galaxy-interface/tutorials/collections/tutorial.md +++ b/topics/galaxy-interface/tutorials/collections/tutorial.md @@ -30,7 +30,7 @@ Here we will show Galaxy features designed to help with the analysis of large nu # Getting data -First, we need to upload datasets. Cut and paste the following URLs to Galaxy upload tool (see a {% icon tip %} **Tip** on how to do this [below](#tip-upload-fastqsanger-datasets-via-links)). +First, we need to upload datasets. Cut and paste the following URLs to Galaxy upload tool (see a {% icon tip %} **Tip** on how to do this [below](#tip-upload-fastqsanger-datasets-via-links)). ``` https://zenodo.org/record/5119008/files/M117-bl_1.fq.gz @@ -44,9 +44,12 @@ https://zenodo.org/record/5119008/files/M117C1-ch_2.fq.gz ``` > Set format to `fastqsanger.gz` -> The above datasets are in `fastqsanger.gz` format. It is necessary to explicitly set format in Galaxy. The {% icon tip %} **Tip** section below explains how to upload these data and set the correct format. There is a variety of [fastq format flavors](https://en.wikipedia.org/wiki/FASTQ_format) and it is difficult to guess them automatically. +> The above datasets are in `fastqsanger.gz` format. It is necessary to explicitly set format in Galaxy. The {% icon tip %} **Tip** section below explains how to upload these data and set the correct format. There is a variety of [fastq format flavors](https://en.wikipedia.org/wiki/FASTQ_format) and it is difficult to guess them automatically. +> +> {% snippet faqs/galaxy/datasets_import_via_link.md format="fastqsanger.gz" %} +> +> {% snippet topics/assembly/tutorials/vgp_genome_assembly/faqs/dataset_upload_fastqsanger_via_urls.md %} > -> {% snippet faqs/galaxy/dataset_upload_fastqsanger_via_urls.md %} {: .hands_on} ## About these datasets @@ -64,9 +67,9 @@ These datasets represent genomic DNA (enriched for mitochondria via a long range # Creating a paired dataset collection -You can see that there are eight datasets forming four pairs. Obviously, we can manipulate them one-by-one (e.g., start four mapping jobs, call variants four times and so on), but this will unnecessarily tedious. Moreover, imagine if you have 100s or 1,000s of pairs: it will be impossible to process them individually. +You can see that there are eight datasets forming four pairs. Obviously, we can manipulate them one-by-one (e.g., start four mapping jobs, call variants four times and so on), but this will unnecessarily tedious. Moreover, imagine if you have 100s or 1,000s of pairs: it will be impossible to process them individually. -This is exactly why we developed collections. Dataset collections allow combining multiple datasets into a single entity. Thus instead of dealing with four, a hundred, or a thousand of individual datasets you have only one item in Galaxy history to deal with. +This is exactly why we developed collections. Dataset collections allow combining multiple datasets into a single entity. Thus instead of dealing with four, a hundred, or a thousand of individual datasets you have only one item in Galaxy history to deal with. Because our data is *paired* we need to create a hierarchical collection called **Paired Dataset Collection** or **Paired Collection**. In such collection there are two layers. The first layer corresponds to individual samples (e.g., `M117-bl`). The second layer represent `forward` and `reverse` reads corresponding to each sample: @@ -106,7 +109,7 @@ https://zenodo.org/record/5119008/files/chrM.fa.gz ``` > Set format to `fasta.gz` -> The above dataset is in `fasta.gz` format. The {% icon tip %} **Tip** section below explains how to upload these data and set the correct format. +> The above dataset is in `fasta.gz` format. The {% icon tip %} **Tip** section below explains how to upload these data and set the correct format. > > {% snippet faqs/galaxy/datasets_import_via_link.md reset_form="True" link="https://zenodo.org/record/5119008/files/chrM.fa.gz" format="fasta.gz" %} {: .hands_on} @@ -127,11 +130,9 @@ https://zenodo.org/record/5119008/files/chrM.fa.gz > > The interface should look like this: > -> ------ > -> ![bwa_mem_interface](../../images/collections/bwa_mem_interface_coll_tut.png) +> ![bwa_mem_interface](../../images/collections/bwa_mem_interface_coll_tut.png "Tool interface") > -> ------ > > - Click **Run Tool** button > @@ -143,7 +144,7 @@ You will see jobs being submitted and new datasets appearing in the history. Bec ## Calling variants -After we mapped reads against the mitochondrial genome, we can now call variants. In this step a variant calling tool `lofreq` will take a collection of BAM datasets (the one produced by `BWA-MEM`), identify differences between reads and the reference, and output these differences as a collection of [VCF](https://en.wikipedia.org/wiki/Variant_Call_Format) datasets. +After we mapped reads against the mitochondrial genome, we can now call variants. In this step a variant calling tool `lofreq` will take a collection of BAM datasets (the one produced by `BWA-MEM`), identify differences between reads and the reference, and output these differences as a collection of [VCF](https://en.wikipedia.org/wiki/Variant_Call_Format) datasets. > Call variants > @@ -153,7 +154,7 @@ After we mapped reads against the mitochondrial genome, we can now call variants > - {% icon param-file %} *"Reference"*: `chrM.fa.gz (as fasta)` (Input dataset) > - *"Call variants across"*: `Whole reference` > - *"Types of variants to call"*: `SNVs and indels` -> +> > The interface should look like this: > > ------ @@ -168,7 +169,7 @@ After we mapped reads against the mitochondrial genome, we can now call variants ## Create table of variants using **SnpSift Extract Fields** -We will now convert VCF datasets into tab delimited format as it will be easier to work with. This will be done with `SNPSift`: a tool specifically designed for manipulation of tab-delimited data. +We will now convert VCF datasets into tab delimited format as it will be easier to work with. This will be done with `SNPSift`: a tool specifically designed for manipulation of tab-delimited data. > Create table of variants @@ -284,7 +285,7 @@ From there you can import histories to make them your own. # Collection operations -In this brief analysis we took four paired datasets, created a collection, analyzed this collection and finally created a single report. Such "lifecycle" is shown in the figure below. Here we started with eight fastq datasets representing four paired end samples. A paired collection was reduced to a list of BAM datasets by `BWA-MEM`. Varinat calling by `lofreq` and field extraction with `SnpEff` maintained collection structure: these tools processed four individual datasets changing their formats from BAM to VCF, and from VCF to Tab-delimited. Finally, we collapsed collection by merging its content into a single dataset. +In this brief analysis we took four paired datasets, created a collection, analyzed this collection and finally created a single report. Such "lifecycle" is shown in the figure below. Here we started with eight fastq datasets representing four paired end samples. A paired collection was reduced to a list of BAM datasets by `BWA-MEM`. Varinat calling by `lofreq` and field extraction with `SnpEff` maintained collection structure: these tools processed four individual datasets changing their formats from BAM to VCF, and from VCF to Tab-delimited. Finally, we collapsed collection by merging its content into a single dataset. ![Collection lifecycle](../../images/collections/collection_lifecycle.svg "Collection lifecycle. Arrows = individual fastq datasets; Four shades of yellow = four samples analyzed in this example. ") @@ -339,8 +340,8 @@ This tools allow filtering elements from a data collection. It takes an input c Given a collection: ``` - Collection: [Dataset A] - [Dataset B] + Collection: [Dataset A] + [Dataset B] [Dataset X] ``` @@ -368,8 +369,8 @@ the tool will return two collections: Given a collection: ``` - Collection: [Dataset A] - [Dataset B] + Collection: [Dataset A] + [Dataset B] [Dataset X] ``` and a text file: @@ -392,7 +393,7 @@ the tool will return two collections: ### Relabel identifiers -{% icon tool %} **Relabel identifiers** changes identifiers of datasets within a collection using identifiers from a supplied file. +{% icon tool %} **Relabel identifiers** changes identifiers of datasets within a collection using identifiers from a supplied file. New identifiers can be supplied as either a simple list or a tab-delimited file mapping old identifier to the new ones. This is controlled using **How should the new identifiers be specified?** drop-down: @@ -401,8 +402,8 @@ New identifiers can be supplied as either a simple list or a tab-delimited file Given a collection: ``` - Collection: [Dataset A] - [Dataset B] + Collection: [Dataset A] + [Dataset B] [Dataset X] ``` @@ -418,8 +419,8 @@ and a simple text file: the tool will return: ``` - Collection: [Dataset Alpha] - [Dataset Beta] + Collection: [Dataset Alpha] + [Dataset Beta] [Dataset Gamma] ``` @@ -428,8 +429,8 @@ the tool will return: Given a collection: ``` - Collection: [Dataset A] - [Dataset B] + Collection: [Dataset A] + [Dataset B] [Dataset X] ``` @@ -445,8 +446,8 @@ and a simple text file (you can see that entries do not have to be in order here the tool will return: ``` - Collection: [Dataset Alpha] - [Dataset Beta] + Collection: [Dataset Alpha] + [Dataset Beta] [Dataset Gamma] ``` @@ -459,8 +460,8 @@ the tool will return: The tool sort in ascending order. When *numeric* sort is chosen, the tool ignores non-numeric characters. For example, if a collection contains the following elements: ``` - Collection: [Horse123] - [Donkey543] + Collection: [Horse123] + [Donkey543] [Mule176] ``` @@ -468,8 +469,8 @@ The tool will output: ``` Collection: [Horse123] - [Mule176] - [Donkey543] + [Mule176] + [Donkey543] ``` #### Sorting from file @@ -477,8 +478,8 @@ The tool will output: Alternative, one can supply a single column text file containing elements identifiers in the desired sort order. For example, suppose there a collection: ``` - Collection: [Horse123] - [Donkey543] + Collection: [Horse123] + [Donkey543] [Mule176] ``` @@ -486,15 +487,15 @@ and a file specifying sort order: ``` Donkey543 - Horse123 + Horse123 Mule176 ``` the output will predictably look like this: ``` - Collection: [Donkey543] - [Horse123] + Collection: [Donkey543] + [Horse123] [Mule176] ``` @@ -504,7 +505,7 @@ the output will predictably look like this: The relationship between element names and tags is specified in a two column tab-delimited file. This file may contain less entries than elements in the collection. In that case only matching list identifiers will be tagged. -To create name: or group: tags prepend them with `#` (you can also use `name:`) or `group:`, respectively. +To create name: or group: tags prepend them with `#` (you can also use `name:`) or `group:`, respectively. More about tags @@ -526,24 +527,24 @@ This tool takes nested collections such as a list of lists or a list of dataset ### Merge collections -{% icon tool %} **Merge collections** takes two or more collections and creates a single collection from them. +{% icon tool %} **Merge collections** takes two or more collections and creates a single collection from them. By default the tool assumes that collections that are being merged have unique dataset names. If it not the case only one (the first) of the datasets with a repeated name will be included in the merged collection. For example, suppose you have two collections. Each has two datasets named "A" and "B": ``` - Collection 1: [Dataset A] - [Dataset B] + Collection 1: [Dataset A] + [Dataset B] [Dataset X] - Collection 2: [Dataset A] - [Dataset B] + Collection 2: [Dataset A] + [Dataset B] [Dataset Y] ``` Merging them will produce a single collection with only two datasets: ``` - Merged Collection: [Dataset A] - [Dataset B] - [Dataset X] + Merged Collection: [Dataset A] + [Dataset B] + [Dataset X] [Dataset Y] ``` @@ -554,20 +555,20 @@ This behavior can be changed by clicking on "*Advanced Options*" link. The follo Input: ``` - Collection 1: [Dataset A] - [Dataset B] + Collection 1: [Dataset A] + [Dataset B] [Dataset X] - Collection 2: [Dataset A] - [Dataset B] + Collection 2: [Dataset A] + [Dataset B] [Dataset Y] ``` Output: ``` - Merged Collection: [Dataset A] - [Dataset B] - [Dataset X] + Merged Collection: [Dataset A] + [Dataset B] + [Dataset X] [Dataset Y] ``` @@ -579,20 +580,20 @@ Here if two collection have identical dataset names, a dataset is chosen from th Input: ``` - Collection 1: [Dataset A] - [Dataset B] + Collection 1: [Dataset A] + [Dataset B] [Dataset X] - Collection 2: [Dataset A] - [Dataset B] + Collection 2: [Dataset A] + [Dataset B] [Dataset Y] ``` Output: ``` - Merged Collection: [Dataset A] - [Dataset B] - [Dataset X] + Merged Collection: [Dataset A] + [Dataset B] + [Dataset X] [Dataset Y] ``` @@ -604,22 +605,22 @@ Here if two collection have identical dataset names, a dataset is chosen from th Input: ``` - Collection 1: [Dataset A] - [Dataset B] + Collection 1: [Dataset A] + [Dataset B] [Dataset X] - Collection 2: [Dataset A] - [Dataset B] + Collection 2: [Dataset A] + [Dataset B] [Dataset Y] ``` Output: ``` - Merged Collection: [Dataset A_1] + Merged Collection: [Dataset A_1] [Dataset B_1] - [Dataset A_2] - [Dataset B_2] - [Dataset X] + [Dataset A_2] + [Dataset B_2] + [Dataset X] [Dataset Y] ``` @@ -630,22 +631,22 @@ Input: ``` - Collection 1: [Dataset A] - [Dataset B] + Collection 1: [Dataset A] + [Dataset B] [Dataset X] - Collection 2: [Dataset A] - [Dataset B] + Collection 2: [Dataset A] + [Dataset B] [Dataset Y] ``` Output: ``` - Merged Collection: [Dataset A] + Merged Collection: [Dataset A] [Dataset B] - [Dataset A_2] - [Dataset B_2] - [Dataset X] + [Dataset A_2] + [Dataset B_2] + [Dataset X] [Dataset Y] ``` @@ -654,22 +655,22 @@ Output: Input: ``` - Collection 1: [Dataset A] - [Dataset B] + Collection 1: [Dataset A] + [Dataset B] [Dataset X] - Collection 2: [Dataset A] - [Dataset B] + Collection 2: [Dataset A] + [Dataset B] [Dataset Y] ``` Output: ``` - Merged Collection: [Dataset A_1] + Merged Collection: [Dataset A_1] + [Dataset B_2] + [Dataset A_2] [Dataset B_2] - [Dataset A_2] - [Dataset B_2] - [Dataset X_1] + [Dataset X_1] [Dataset Y_2] ``` @@ -679,7 +680,7 @@ This option will simply trigger an error. ### Zip collection -{% icon tool %} **Zip collection** takes two collections and creates a paired collection from them. +{% icon tool %} **Zip collection** takes two collections and creates a paired collection from them. If you have one collection containing only forward reads and one containing only reverse, this tools will "zip" them together into a simple paired collection. For example, given two collections with `forward` and `reverse` reads they can be "zipped" into a single paired collection: @@ -687,7 +688,7 @@ If you have one collection containing only forward reads and one containing only ### Unzip collection -{% icon tool %} **Unzip collection** takes a paired collection and "unzips" it into two simple dataset collections (lists of datasets). +{% icon tool %} **Unzip collection** takes a paired collection and "unzips" it into two simple dataset collections (lists of datasets). Given a paired collection of forward and reverse reads this tool will "unzip" it into two collections containing forward and reverse reads, respectively: @@ -697,13 +698,13 @@ Given a paired collection of forward and reverse reads this tool will "unzip" it ### Column join -{% icon tool %} **Column join** merges elements of a collection on a given column. +{% icon tool %} **Column join** merges elements of a collection on a given column. If you have a collection with three elements (image below), merging it on the first column will first produce a union on values found in the first column of each elements and then paste elements having the same value side-by-side: ![Column join](../../images/collections/join_on_column.svg) -### Collapse collection +### Collapse collection {% icon tool %} **Collapse collection** merges elements together (head-to-tail) in the order of the collection. Its power comes from the ability to add identifiers when it performs the merge. Identifiers can be added in variety of ways specified by the **Prepend File name** option as shown in the figure below (we used option **A** in the last step of this tutorial). **A** = `Same line and each line in dataset`; **B** = `Same line and only once per dataset`; **C** = `Line above`