Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is it possible to write an updated transfer.xml after unpack? #71

Closed
glyg opened this issue Dec 4, 2023 · 7 comments
Closed

Is it possible to write an updated transfer.xml after unpack? #71

glyg opened this issue Dec 4, 2023 · 7 comments

Comments

@glyg
Copy link
Contributor

glyg commented Dec 4, 2023

Hi,
Thanks a lot for transfer, makes my life much easier.

As Sébastien said in #7 , omero-cli-transfer nicely computes a map of source to destination ids, but this mapping is lost once the transfer is done (not server side, but you need to query again).

I would like to use directly such mapping, e.g. to write a pointer to the newly imported image url https://server.omero/webclient/?show=image-96 in some import report (or on my data brokering system). I could query the server again, but why do that job twice 😉?

I saw in the code that you duplicate and update the original OME object. Would there be a way to dump that new one - with updated ids reflecting the state of the destination database and the imported data? Something called transfered.xml for example?

This is kinda related to the --server option from Josh, if I understand correctly. transfer creates research objects, so having a representation of that RO at its root directory is nice.

@erickmartins
Copy link
Collaborator

Ok, a few points;

  1. is the goal to just have a "dump" of image ID mapping between source and destination? Or a full snapshot of the destination server state after unpack?
  2. In case you want the latter, this would need to be done at the very end of the process; doing it any earlier than that would risk generating an XML that does not correspond to the actual state of the destination server (if any errors occur at any point, for example).
  3. at that point, that falls into the same use case as RFE: server-side option #69, basically - the only way to ensure you're getting a correct serialization of the objects of interest is by checking the server.

The main goal of cli-transfer is to always generate a 1-to-1 replica of the original element, and when that happens this is a straightforward problem. I worry about dealing with edge cases where an error happens.

I don't think "just query every image in the destination server for its original source ID" is a great answer, though (and it doesn't give you a mapping between dataset/project IDs, for example). We can do better and I think there are a few options for this:

  1. consolidate this effort into the RFE: server-side option #69 use case. That means (user-side) a single extra command that will generate an XML with all that information (and more!). The downside is that, for your use case, that would mean XML parsing and diff-ing for generating the mappings is on you.
  2. Come up with a decent standardized format for outputting ID mappings. The mappings themselves are generated after destination object creation, so they can never be wrong. You might need to live with the idea that the hierarchies and relationships between those objects might be different destination-side (in case anything breaks), and that is some level of effort duplication with RFE: server-side option #69 (and with coming up with yet another standard format for outputting these things).

As a developer, I worry about feature creep - running cli-transfer as a pure serialization tool (as #69 proposes) is a slam dunk in my opinion, and will be done at some point, so I worry about adding another extra feature that is close but not exactly the same. I think option 1 is a better idea - thoughts? @joshmoore?

@joshmoore
Copy link
Member

Briefly before this slips off my radar:

  • I tend to think it's orthogonal as well, since in the non-server use-case I'd like the same thing.
  • ergo I definitely see the value of having it.
  • Naively, I'd think "Yet-Another-Annotation" (of the same structure as an existing one) would provide much of what is needed, and I imagine you could write a largely duplicate "transfer-update.xml" file in case anything goes wrong.
    • Sidenote: getting these internal structures into XSD and then into ome-types would be a nice-to-have.
  • But as long as we're looking for a new format, I'd be happy to suggest RDF 😄

@glyg
Copy link
Contributor Author

glyg commented Dec 5, 2023

Thanks a lot for the detailed answer!

Just having the ID mappings would be nice, but I guess I'd have to walk back up its ancestry anyway...

consolidate this effort into the #69 use case. That means (user-side) a single extra command that will generate an XML with all that information (and more!). The downside is that, for your use case, that would mean XML parsing and diff-ing for generating the mappings is on you.

I think this is the best option. The 'server side' transfer.xml would also be parsed as an OME object, then walking down the OME hierarchies and name matching between source and destination objects should be quick (and local).

Furthermore I feel #69 solves a lot of other issues so a big +1.

But as long as we're looking for a new format, I'd be happy to suggest RDF 😄

Has someone written ome_types in linkML?

Wouldn't gen-pydantic ome_types.yml > ome_types/model.py
followed by transfer.xml -> OME -> linkml_runtime.dumpers.dump(...) "just work" then?

If you don't mind it, I can give #69 a try?

@joshmoore
Copy link
Member

joshmoore commented Dec 5, 2023

Has someone written ome_types in linkML?

cc: @tlambert03 (e.g., tlambert03/ome-types#222)

If you don't mind it, I can give #69 a try?

❤️ I'll be at a hackathon next week largely on o-c-transfer, so if I can test anything, let me know.

@glyg
Copy link
Contributor Author

glyg commented Dec 5, 2023

cc: @tlambert03 (e.g., https://github.com//ome-types/pull/222)

This link is not good :)

I am closing this as this should be solved by #69

@glyg glyg closed this as completed Dec 5, 2023
@tlambert03
Copy link

proper link: tlambert03/ome-types#222. fwiw :)

@joshmoore
Copy link
Member

Thanks, @tlambert03. Not sure how that happened! 😉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants