Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bool object has no attribute split when running vcf2zarr explode #297

Open
no7ren opened this issue Dec 16, 2024 · 3 comments
Open

bool object has no attribute split when running vcf2zarr explode #297

no7ren opened this issue Dec 16, 2024 · 3 comments

Comments

@no7ren
Copy link

no7ren commented Dec 16, 2024

I run into the problem when converting one vcf file with command vcf2zarr explode. After some debugs, it turns out that the problem was caused by some of the 'INFO' field where at some positions they don't have any value and more important is they even don't have "=" like the following:
MQ=4.61443;AN=120;AC=6,26;MQRankSum=1.656;ReadPosRankSum;DP=2423

Can you take this into account like if there is no delimiter (=), you can skip split function?

Thank you!

@jeromekelleher
Copy link
Contributor

Thanks for the report here @no7ren. I think this probably is a bug, and it would really be helpful to have a minimal example to provoke it. Do you think you could concoct a very small VCF (i.e., one data line, if possible) that provokes the issue please?

What do you get when running this through cyvcf2? We don't actually access the VCF data but depend on htslib/cyvcf2 to do the parsing.

@no7ren
Copy link
Author

no7ren commented Dec 17, 2024

Hi,

You can find the test file in the attachment. And one more thing I found is that the type of field matters a lot. It has no error when I change from String to Float.

Here is the error I got:
concurrent.futures.process._RemoteTraceback:
"""
Traceback (most recent call last):
File "process.py", line 256, in _process_worker
r = call_item.fn(*call_item.args, **call_item.kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "python3.11/site-packages/bio2zarr/vcf2zarr/icf.py", line 1078, in process_partition
tcw.append(field.full_name, variant.INFO.get(field.name, None))
File "python3.11/site-packages/bio2zarr/vcf2zarr/icf.py", line 806, in append
self.field_writers[name].append(value)
File "python3.11/site-packages/bio2zarr/vcf2zarr/icf.py", line 724, in append
val = self.transformer.transform_and_update_bounds(val)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "python3.11/site-packages/bio2zarr/vcf2zarr/icf.py", line 503, in transform_and_update_bounds
value = self.transform(vcf_value)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "python3.11/site-packages/bio2zarr/vcf2zarr/icf.py", line 542, in transform
value = np.array(list(vcf_value.split(",")))
^^^^^^^^^^^^^^^
AttributeError: 'bool' object has no attribute 'split'
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "vcf2zarr", line 8, in
sys.exit(vcf2zarr_main())
^^^^^^^^^^^^^^^
File "python3.11/site-packages/click/core.py", line 1157, in call
return self.main(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "python3.11/site-packages/click/core.py", line 1078, in main
rv = self.invoke(ctx)
^^^^^^^^^^^^^^^^
File "python3.11/site-packages/click/core.py", line 1688, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "python3.11/site-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "python3.11/site-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "python3.11/site-packages/bio2zarr/cli.py", line 232, in explode
Explode: 0%| | 0.00/1.00 [00:01<?, ?vars/s] vcf2zarr.explode(
File "python3.11/site-packages/bio2zarr/vcf2zarr/icf.py", line 1188, in explode
writer.explode(worker_processes=worker_processes, show_progress=show_progress)
File "python3.11/site-packages/bio2zarr/vcf2zarr/icf.py", line 1128, in explode
with core.ParallelWorkManager(worker_processes, progress_config) as pwm:
File "python3.11/site-packages/bio2zarr/core.py", line 301, in exit
wait_on_futures(self.futures)
File "python3.11/site-packages/bio2zarr/core.py", line 104, in wait_on_futures
raise exception
AttributeError: 'bool' object has no attribute 'split'

testcase.vcf.gz

@jeromekelleher
Copy link
Contributor

Thanks for the test case @no7ren, this is very helpful. I'm afraid I won't get to this until the new year, but will pick it up then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants