Changing the metadata codec of existing metadata in a tree sequence from binary to JSON #629
-
Hey everyone, I tried reading about this in the documentation of tskit, but since I am new to the API and not the most experienced programmer I wasn't able to solve this problem on my own. So I'm sorry if my question is silly. Cheers and thanks for any help or advice of other sorts, |
Beta Was this translation helpful? Give feedback.
Replies: 4 comments 10 replies
-
That's a great question, and we should add the answer it to the metadata tutorial. I suspect @benjeffery will know the best answer here, but there must be a way. I guess the obvious thing is to decode the struct into python then reencode it into a json dict. I'll work up some example code. |
Beta Was this translation helpful? Give feedback.
-
Here's some dummy code using a schema for individuals (from here) - NB: now edited to reflect the discussion below import json
basic_schema = tskit.MetadataSchema({'codec': 'json'})
complex_schema = tskit.MetadataSchema({
'codec': 'json',
'additionalProperties': False,
'properties': {'accession': {'description': 'ENA accession number',
'type': 'string'},
'pcr': {'description': 'Was PCR used on this sample',
'name': 'PCR Used',
'type': 'boolean'}},
'required': ['accession', 'pcr'],
'type': 'object',
})
# Make an example ts with struct metadata
tables = tskit.TableCollection(sequence_length=1)
tables.individuals.metadata_schema = complex_schema
row_id = tables.individuals.add_row(0, metadata={"accession": "Bob1234", "pcr": True})
ts = tables.tree_sequence() Convert struct to json: def convert_to_json_metadata(ts):
# make a new ts with json metadata
new_tables = ts.dump_tables()
# iterate through (nearly) all the tables
for table_name, table in new_tables.name_map.items():
# provenance table doesn't have metadata
if table_name not in ["provenances"]:
metadata = []
for row in table:
try:
row_metadata = row.metadata or {} # convert empty byte strings to empty json
metadata.append(json.dumps(row_metadata).encode())
except TypeError:
raise TypeError(f"Can't convert {row.metadata} to JSON")
# packset_metadata doesn't validate, so dump json in here and switch schema after
table.packset_metadata(metadata)
table.metadata_schema = basic_schema
# May also need to convert top level metadata?
return new_tables.tree_sequence()
# test
print(convert_to_json_metadata(ts).individual(0)) |
Beta Was this translation helpful? Give feedback.
-
@JoshuaGensel Which versions of sLiM and tsinfer are you using? If SlIm has the correct schema everything should work. |
Beta Was this translation helpful? Give feedback.
-
Now open as an issue: tskit-dev/tskit#2129 |
Beta Was this translation helpful? Give feedback.
Here's some dummy code using a schema for individuals (from here) - NB: now edited to reflect the discussion below