-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HIV workflow error #168
Comments
Dear Antoine,
I suspect the key problem is the my mac (latest gen) part: support for ARM processors such as Apple Silicon in latest gen macs is considered experimental in Bioconda and not all software is available already. Regarding the instructions and tutorials, a new version of those will be available soon, courtesy of Google's Season of Docs grant. You can get a preview at: https://gsod-vpipe.readthedocs.io/
Note that you haven't specified a read length in either the config file and/or the 3rd column of the sample file.
The main log sadly doesn't give enough information.
maybe other additional problems lurk beyond the CPU type mentioned above. |
thank you for the quick follow up. I will try to set up on a linux platform and keep you posted |
just a quick follow up when installing on linux server
I get this error
but git is already installed
|
FYI: |
after ~20 min, the run stops but jobs are not completed. I am attaching the log and yaml. run on linux server with
thoughts? |
As I mentionned, the gsod-vpipe fork is a preview, some details need ironning out... 😅
According to the logs, the custom specialized alligner ngshmmalign that we use for highly variable viruses is crashing:
Could you have a look into the log files (see above) specific to this rule job? Note Note though that currently I am mostly working on SARS-CoV-2, so my ngshmmalign is a bit rusty. And the former PhD student who wrote that has defended his thesis and moved out eons ago. (Maybe some of my colleague who have ran workshops could answer better once they're back from vacations?) |
this is the error I get in align
log files attached |
Okay found the error. In your specific case it's accidentally picking up the ( Temporary work-aroundUse relative pathYou should test putting relative paths in your config file, if the way you have setup you cluster job allows you. # go into working directory
cd /martinlab/users/achaillon/vp-analysis/work
# symlink the input and output directories into the currrent working directory
ln -s /martinlab/users/achaillon/_doluvoir/samples ./
ln -s /martinlab/users/achaillon/_doluvoir/results ./
# run V-pipe
./vpipe --jobs 48 --printshellcmds Then you could use this configuration: # …
input:
samples_file: samples.tsv
datadir: samples/
read_length: 150
output:
datadir: results/
# … This will cause snakemake to use relative paths in commands it generates (compare the Mac's logs from your first post: No
|
you are amazing! trying now and will confirm asap hopefully good news to come soon |
beside that, here are some additional info:
HXB2 is fine.
Yes we support sequencing of only some region (incidentally, that's how hiv is tested in our CI/CD tests) (You might be interested in having a look at primer trimming (section |
This is controlled in the
Tools alternatives are controlled in the In the directory where you checked out V-pipe (that should be
Yes, indeed. @LaraFuhrmann might give you some advice once she's back from holidays, as V-pipe has introduced a new benchmark mode that you could try. There is some documentation in resources/auxiliary_workflows/benchmark/README.md. For local haplotypes, you could try VILOCA, a more recent evolution of ShoRAH, also Lara's thesis subject -- so she could advise. For global haplotype, it's more hit and miss. Our testing have shown some edge in PredictHaplo, but all methods crash on some data set. (And we don't have an ETA on fixing PredictHaplo due to the same "not enough C/C++ devs with biology background" situation as ngshmmalign 😅) In general, the local haplotype reconstruction (based on windows -- either evenly spread with random fragmentations, or aligned to amplicons if full amplicons are sequenced with, e.g., multipex PCR amplicons) tends to be much more mature and stable than global haplotype (full lenght genome where multiple short reads needs to be stitched together). Also note that despite the 'ShoR' in 'ShoRAH' standing for 'short reads', it also works well on longer reads and especially VILOCA can even leverage the quality score into its new model (very useful on Nanopore long-reads) So depending on the research question you're trying to answer, local haploypes might work better for you. |
all sounds great! |
Oh, that sounds cool,.
It would be good to approach our professor (Prof Niko Bereenwinkel, see our group's page) |
just a quick follow up.
should I adjust n-core? I am running it from a server
alignment is only done for one of the 3 samples (took 8hrs??) and still far from finished for the other 2. logs also attached. thoughts? should I modify the yaml? other params ? the refs? thank you! |
after nearly 48 hrs, run failed with attached log. no haplotypes generated. I can be wrong but alignement step seems really long for such small region. there might also be problem with windows for shorah/viloca.
my final goal is to get full length haplotypes (~530bp for this project) and other metrics such as diversity,etc. I was able to generate haplotypes with cliqueSNV without issues. all suggestions are welcome |
Hi Antoine, thanks for your message. Looking at vpipe-249574.out.txt I found this line in the report: Regarding, VILOCA: The tool can only recover haplotypes of roughly read length, so no global haplotypes. If I understood correclty your reads are of length 150bp, hence it is not possible to recover haplotypes of lenght 535bps. |
thank you Lara regarding the haplotype reconstruction, I get it. I thought that the 'global=TRUE' option would allow me to get those 'FL" haplotypes. I confirm CliqueSNV works but I like the option of ignoring region with coverage < xx (ideally replace by N). I also explored other option like haploclique or haploconduct with shorah. all suggestions are welcome. |
Hi Antoine, I am not an expert on the quality control step, maybe @DrYak you can give some input on that? For the haplotype reconstruction, yes, 'global=TRUE' option allows you to run global haplotype reconstruction tools. However, none of the tools can guarantee the output of full-length haplotypes for any dataset. Global haplotype reconstruction from short-reads is still a complex problem. |
Hello
I am very interested in evaluating v-pipe
I followed the quick-install instruction on my mac (latest gen)
I set up the config and samples.tsv files (attached)
but I get attached error error when running
logs.zip
thank you in advance for your help
The text was updated successfully, but these errors were encountered: