Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

investigate dropping ome-zarr dependency #123

Draft
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

will-moore
Copy link
Member

@will-moore will-moore commented Dec 13, 2024

Investigating a lighter-weight alternative to ome-zarr-py.
Uses zarrv3.

Not looking to fully replace ome-zarr-py yet, but just investigating what an alternative could look like to inform discussions.

Pros:

  • Mapping from OME-Zarr metadata to napari data happens in one step, instead of being split between ome-zarr-py and napari-ome-zarr.

Cons / known issues:

  • no support for pre v0.4 (e.g. no support for missing axes) which was previously handled by ome-zarr-py
  • Plates not handled
  • Transformation, scale etc not handled

Currently handles bioformats2raw, channels metadata and labels. Testing with:

$ napari --plugin napari-ome-zarr https://uk1s3.embassy.ebi.ac.uk/idr/zarr/v0.5/idr0062A/6001240_labels.zarr

$ napari --plugin napari-ome-zarr https://uk1s3.embassy.ebi.ac.uk/idr/zarr/v0.5/idr0066/ExpD_chicken_embryo_MIP.ome.zarr

$ napari --plugin napari-ome-zarr https://uk1s3.embassy.ebi.ac.uk/idr/zarr/v0.4/idr0048A/9846152.zarr

# bioformats2raw.layout (single image)
$ napari --plugin napari-ome-zarr https://uk1s3.embassy.ebi.ac.uk/idr/zarr/v0.4/idr0048A/9846151.zarr

# bioformats2raw.layout (3 images in series)
$ napari --plugin napari-ome-zarr https://storage.googleapis.com/jax-public-ngff-2024/public_data/2957/whitej_205/2021-08/19/15-29-23.322/X54803_10.zarr

@will-moore
Copy link
Member Author

As discussed with @joshmoore this morning, I looked into whether https://github.com/BioImageTools/ome-zarr-models-py could perform some of the graph traversal logic in this PR, e.g. multiscales -> labels or bioformats2raw -> multiscales (not yet done in this PR). But I don't see any of that functionality in ome-zarr-models.py?
cc @dstansby

@dstansby
Copy link
Contributor

graph traversal logic in this PR

I had a quick look at the diff, but didn't quite understand what the required logic is. Could you explain a bit more? (or maybe add some short docstrings to the new classes/methods?)

We are generally 👍 on adding helpful functionality to ome-zarr-models-py, and I'm going to be sprinting to a first release tomorrow actually, so now is a good time to request stuff 😄

def matches(group: Group) -> bool:
return "multiscales" in Spec.get_attrs(group)

def children(self):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dstansby By "graph traversal" logic, I mean, if I start with multiscales group e.g. group = zarr.open("https://uk1s3.embassy.ebi.ac.uk/idr/zarr/v0.4/idr0062A/6001240.zarr") I then want to get the labels (if they exist). Here this is implemented in the children() method, where we know to look in a child "labels" group and check attrs for "labels": ["labels1.zarr", "labels2.zarr"] then return objects for those child labels so that the arrays (and metadata) can be added to the layers that are passed to napari.

I don't see that ome-zarr-models-py includes that kind of logic for traversing the graph between these objects?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can get the list of labels paths from Image.attributes.labels. But the labels part of the spec just says these point to "labels objects", which I don't think are more specifically defined anywhere else?

If the OME-Zarr spec was more prescriptive about what these "labels objects" were (are they meant to be groups with image-label metadata ??) then we could certainly do more, but I don't think the spec allows us to make those assumptions unfortunately.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am only now reading version 0.5 of OME-zarr, and see that the labels section is much improved over 0.4 😄 . It's definitely within scope of ome-zarr-models-py to provide logic for getting from an Image dataset to the labels dataset if it's in the metadata. Tracking issue at ome-zarr-models/ome-zarr-models-py#92

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently in the tutorial at https://github.com/BioImageTools/ome-zarr-models-py/blob/main/docs/tutorial.py
If I add:

print(ome_zarr_image.attributes.labels)

I get None (even though that image does have labels).
I don't see any population of the labels in https://github.com/BioImageTools/ome-zarr-models-py/blob/7659a114a2428fe9d8acbd06aa7bc1c9d32624bb/src/ome_zarr_models/v04/image.py#L85 ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

even though that image does have labels

There's a labels group, but looking at that dataset in the validator the top level group is missing the labels metadata, which is why .labels is giving None.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be precise, if you look at the image group in the validator, it should have a "labels" key at the same level as the "multiscales" and "omero" keys. If that was there, the paths under the "labels" key would be in the .labels attribute in ome-zarr-models-py

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh hold on, am I just reading the spec wrong? Does:

The special group "labels" found under an image Zarr

Really mean:

The special Zarr group "labels" found inside an image Zarr group

?

If so then we should definitely implement that in ome-zarr-models-py!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, image.zarr/labels/ group.
This is shown a bit more clearly in the layout at https://ngff.openmicroscopy.org/0.4/index.html#image-layout

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha, I always thought the "labels" group was an arbitrary name and the example was just an example 🤦 - thanks for explaining, and I'll let you know once I've implemented this in ome-zarr-models-py 😄

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@will-moore
Copy link
Member Author

One issue I'm having with supporting bioformats2raw.layout here is how to load the /OME/METADATA.ome.xml (which could be local or remote etc).

E.g. https://uk1s3.embassy.ebi.ac.uk/idr/zarr/v0.4/idr0048A/9846151.zarr/OME/METADATA.ome.xml

I'm trying something like:

from zarr.core.buffer import default_buffer_prototype
url = "https://uk1s3.embassy.ebi.ac.uk/idr/zarr/v0.4/idr0048A/9846151.zarr"
group = zarr.open(url)
xml_data = await group.store.get("OME/METADATA.ome.xml", prototype=default_buffer_prototype())

but I want to avoid using async await if I can.

In https://github.com/ome/ome-zarr-py/pull/174/files it looks like this only handles local OME.xml files with root = ET.parse(filename)

@joshmoore
Copy link
Member

In ome/ome-zarr-py#174 (files) it looks like this only handles local OME.xml files with root = ET.parse(filename)

Then that would have been a bug. I assume a method like get_json() (get_text or get_contents) would have been needed to slurp the XML.

@will-moore
Copy link
Member Author

Seems this works for getting XML, but need to check if there's a better way that doesn't use testing classes...

import zarr
from zarr.core.buffer import default_buffer_prototype
from zarr.testing.stateful import SyncStoreWrapper
url = "https://uk1s3.embassy.ebi.ac.uk/idr/zarr/v0.4/idr0048A/9846151.zarr"
group = zarr.open(url)
store = group.store
wrapper = SyncStoreWrapper(store)
xml_data = wrapper.get("OME/METADATA.ome.xml", prototype=default_buffer_prototype())
print("xml_data", xml_data.to_bytes())

@joshmoore
Copy link
Member

@will-moore
Copy link
Member Author

Thanks, this is working...

from zarr.core.sync import SyncMixin
from zarr.core.buffer import default_buffer_prototype   
url = "https://uk1s3.embassy.ebi.ac.uk/idr/zarr/v0.4/idr0048A/9846151.zarr"
group = zarr.open(url)
store = group.store
mx = SyncMixin()
xml_data = mx._sync(store.get("OME/METADATA.ome.xml", prototype=default_buffer_prototype()))
print("xml_data", xml_data.to_bytes())

@will-moore will-moore force-pushed the investigate_ome_zarr_py_alternative branch from bc3d71a to e4cda75 Compare January 6, 2025 15:07
@will-moore
Copy link
Member Author

If we wanted to use ome-zarr-models-py to handle some of the graph traversal (e.g. labels), this would now look like this:

import zarr

from ome_zarr_models.v04.image import Image
from ome_zarr_models.v04.labels import Labels, LabelsAttrs



group = zarr.open_group(
    "https://uk1s3.embassy.ebi.ac.uk/idr/zarr/v0.4/idr0062A/6001240.zarr", mode="r"
)
image = Image.from_zarr(group)
print("image.labels", image.labels)
assert image.labels == Labels(
    zarr_version=2, attributes=LabelsAttrs(labels=["0"]), members={}
)

for label in image.labels.attributes.labels:
    print("label", label)
    label_group = group[f"labels/{label}"]
    print("label_group", label_group)
    label_image = Image.from_zarr(label_group)
    print("label_image", label_image)

    first_dataset_path = label_image.attributes.multiscales[0].datasets[0].path
    zarr_arr = label_group[first_dataset_path]
    print("zarr_arr", zarr_arr)

See ome-zarr-models/ome-zarr-models-py#96

@will-moore
Copy link
Member Author

@joshmoore I'll stop working on this for now, as I think there's enough here to evaluate this approach.
I think this is a viable option to provide OME-Zarr support to napari without using ome-zarr-py, but would appreciate feedback & discussion

@dstansby
Copy link
Contributor

To chip in from ome-zarr-models-py, this is exactly the use case we'd love to support, so if there's anything else we can add to make this easier please let us know!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants