Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ASDF schema suggestions #83

Closed
eslavich opened this issue Aug 24, 2020 · 7 comments
Closed

ASDF schema suggestions #83

eslavich opened this issue Aug 24, 2020 · 7 comments

Comments

@eslavich
Copy link

Hi, I noticed a couple of issues in the ASDF schemas in this repository:

The dataset-0.1.0 schema specifies allowAdditionalProperties: true :
https://github.com/DKISTDC/dkist/blob/master/dkist/io/asdf/schemas/dkist.nso.edu/dkist/dataset-0.1.0.yaml#L38
but that isn't a recognized schema property. Probably it should be additionalProperties: true?

The $ref property is set to a tag URI in many cases:
https://github.com/DKISTDC/dkist/blob/master/dkist/io/asdf/schemas/dkist.nso.edu/dkist/dataset-0.1.0.yaml#L15
It should be a schema URI since $ref is an instruction to descend into the content of the referenced schema. This works currently due to an accident in the asdf Python implementation, but may not in the future. If the intention is to validate the tag of the child object, then the tag property is a good option:

tag: "tag:dkist.nso.edu:dkist/array_container-0.2.0"
@Cadair
Copy link
Member

Cadair commented Aug 24, 2020

Thanks for the pointers! I will have more questions, but what does a schema URI look like?

@eslavich
Copy link
Author

what does a schema URI look like?

That's the id property of the schema. In the case of that array container example it'll be http://dkist.nso.edu/schemas/dkist/array_container-0.2.0.

BTW, there's a new feature coming in asdf 2.8 that's going to make the tag validator super convenient. It'll allow wildcards like this:

tag: "tag:dkist.nso.edu:dkist/array_container-*"

So any version of the array_container tag will be valid. That'll save us from having to create e.g. a dataset-0.2.0 schema just because array_container-0.3.0 was released.

@Cadair
Copy link
Member

Cadair commented Aug 25, 2020

Despite reading over the docs like 10 times, I am still struggling to work out what the difference between referencing the schema and the tag is. Especially given the standard docs say:

ASDF implementations must be able to resolve references using both id and tag attributes

That'll save us from having to create e.g. a dataset-0.2.0 schema just because array_container-0.3.0 was released.

This is also interesting given I am currently trying to work out how to update these schemas to adapt to the gwcs 1.1.0 schema (and removal of the 1.0.0 schema in 0.14). I posted something in the astropy #gwcs channel about this (not sure if you are in there).

@eslavich
Copy link
Author

ASDF implementations must be able to resolve references using both id and tag attributes

Oh wow, I didn't know that was in the standard. So that behavior is not an accident at all. I'm going to have to think about that some more.

The tag URI identifies the object's YAML "type", it lets readers know that the object is something special and not just a vanilla YAML structure. The schema URI identifies the schema content, it's the document identifier for the schema itself. Historically we've had a 1:1 correspondence between schemas and tags but it doesn't have to be that way. For example, in the schemas for modeling, we noticed that ~ 1/3 of them are exact duplicates of other schemas. If we reuse the same schema file for multiple tags then we'll significantly cut down our maintenance burden.

I'm not sure what the original motivation was for allowing tag URIs to be $ref targets. It doesn't make sense to me, $ref is supposed to incorporate external schema content identified by a URI. Incorporating a tag into a schema has no meaning since tags and schemas are apples and oranges.

Anyway since as you rightly point out that the standard protects referencing schemas by tag, this doesn't have to change. It may be that we strike this from the standard someday but that could happen only with a new major version.

@Cadair
Copy link
Member

Cadair commented Aug 25, 2020

heh, glad I found something in the spec which might need to be removed 😀

Am I correct in thinking that there will always be a single schema for a tag though, even if there are multiple tags for a schema?

Also the wildcard behaviour in asdf-format/asdf-standard#271 seems useful to me here, for instance in dealing with the gwcs version update, does that make sense to extend to schemas as well as tags?

@eslavich
Copy link
Author

Am I correct in thinking that there will always be a single schema for a tag though, even if there are multiple tags for a schema?

I think so. If someone needed to validate against multiple schemas, they could just combine them into one with using allOf and $ref.

Also the wildcard behaviour in asdf-format/asdf-standard#271 seems useful to me here, for instance in dealing with the gwcs version update, does that make sense to extend to schemas as well as tags?

I'm hesitant to modify the behavior of $ref because it's defined by JSON schema, whereas the tag property is something we own.

@Cadair
Copy link
Member

Cadair commented Aug 26, 2020

I'm hesitant to modify the behavior of $ref because it's defined by JSON schema, whereas the tag property is something we own.

That makes sense, but then if I am using $ref and id's in the schemas, I am still in the situation where the schemas have to have hard versions in them?

@Cadair Cadair closed this as completed Apr 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants