-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
External secondary instances: I/O support for jr:
URLs
#201
Comments
I was thinking if host application could just provide secondary attachments along with Form XML so engine doesn't have make network calls and deal with all the network related errors. I am saying this with the assumptions:
This is closer to the Option 1 presented above, except required attachments are resolved at the host application levle before anything else happens. |
I want to make sure to sum up a couple key conclusions from our discussion yesterday: Design choice: Option 1, possibly supplemented by Option 0.5We decided to go with Option 1. As a stretch goal, we may also include aspects of Option 0.5 as an additive aid to clients/host applications. We refined the proposed Option 1 interface. Putting aside naming (included here so the type will be valid syntax/highlighted as such), this is the interface we anticipate clients/host applications to provide for all form attachments: type FormAttachmentMapping = Record<`jr:${string}`, () => Promise<Response>>; Open for bikesheddy discussion: is there any openness to making this a As for the value side of the mapping, providing a thunk per resource:
We decided to represent each resource as a On engine invocation of form requests broadly (i.e. including media)While this design is primarily focused on support for external secondary instances, it has obvious implications for other form attachments. We discussed this, which was also raised in the above comment. We determined it makes sense for the engine to invoke all such requests, largely for reasons discussed in the last section. Notably, we expect that the engine will perform media requests earlier in a form session (either as part of form load, or perhaps immediately following resolution of the initial form state). Addendum to yesterday's discussion of this pointAs an additional point not covered yesterday, but which I think helps to bolster this decision: insofar as the engine expects clients/host applications to provide resource data, there's another very good reason for the engine to invoke requests, and in particular to for the engine to get those requests back as a
And the engine does need to access at least some resource data for media attachments, as they may be associated with node values. In the future, we might consider expanding this interface to provide multiple representations, such as something like |
Good, that leaves room for the implementer to supply the data any which way. However, gor the engine it's less convenient as the annoying thing about responses is that you can read them (eg call .blob() or .text() on them) only once, so the engine will have to do some work - for instance in the case of secondary instances, read once and deserialize the xml while for the case of media URLs - pictures etc - it'll probably want to use Perhaps the most succinct typification of the expectations of the engine is... a blob after all? Initiator supplies blobs? Plain and simple?
Perhaps a name like But to the point. This leaves it up in the air how the instantiator would know what So I think a two-stage mechanism can be useful. One, the engine can load a survey definition and return a list of |
Thanks @brontolosone, lots of good stuff to dig into here!
Yep! That is exactly the intent. We just want to set a baseline for read semantics, without being proscriptive at all about how the thing we read gets into that state. And the read semantics of
This is more or less what I anticipate the engine doing. At least for a first pass, there will be a little bit more nuance in terms of timing (i.e. we may block initializing state for external instances, and kick off other attachment requests immediately after). There are some other nuances to think about, around the nature of certain network calls (i.e. streaming). But otherwise this pretty much captures the behavior we intend to implement.
I appreciate you raising this concern! I think we should make it explicit that we WILL NOT do that. As I mentioned in part of our call a bit earlier, that guarantee is one of the specific reasons we chose
The intent of not reflecting that expectation is to help communicate that we don't have that expectation.
Also discussed on our call a bit ago, part of the intent of taking a Capturing another, related part of our discussion here for posterity... The intent of the design is also to support clients which may want greater (or even total) control over this set of error conditions (as in, greater than the easy default we're setting up: simple functions wrapping
I'm fine with that name if you like it better. Again, the name as proposed is mostly there so there's a concept we can reference unambiguously. I do think in general the "mapping to what" is answered by the type itself, and the "for what" is mostly answered by requisite familiarity with underlying details of the spec. But I am always in favor of naming clarity as a way to help reinforce assumptions!
Again capturing content from our earlier call:
However! As I think more on this, I think we probably should relax the type of the key. At least from memory, I'm pretty sure the already-produced information we expect clients to have will be a mapping between filename and I/O-resolvable-resource. I think it's reasonable to accept: type FormAttachmentRetrievalMapping = Record<`${string}.${string}`, () => Promise<Response>>;
// ... or, preferable IMO: Map<...>
// ... if base.ext assumes too much: WhateverBagOfKVs<string, ...>
It's actually a bit more complicated than this. It's discussed a bit in #202, and we covered it a bit on our call, but I think it's really important to highlight here too: we are anticipating that at least for some usage scenarios, I/O failures are not inherently terminal. This is one of the biggest reasons the engine has an I/O responsibility (or this abstraction over it) at all:
This does actually raise the question of whether we're sufficiently serving the second point! It is certainly possible, with the proposed interface, for us to implement that interface in the engine so a client can totally block at least the stateful (post-parse) portion of form init. (It's possible because we already accept a similarly opaque I/O abstraction for the form definition itself.) But it certainly wouldn't be very ergonomic to do so. I don't think At this point, I think that portion of discussion is probably better had in #202. But briefly, my instinct is that we don't yet know enough about partial/total failure from a client perspective to merit thinking too hard on it yet. |
As soon as I posted this, I realized the likely mistake in my reasoning. While it's common to map |
I've read through the last couple of posts but I'm not able to follow all the details. Can you give me a sense of what other kinds of mappings might exist? |
So let's say we have a form definition which references these form attachments:
Taking the OpenRosa Manifest Document example as one of our typical cases, it would be sufficient to treat the manifest's But here already, If we have another form definition referencing:
The ODK Central REST API docs' section on listing form attachments also suggests a key of But wait. The Central API example has Suppose we have a form definition which references:
Now it seems likely we'll want to allow data type as part of the key. But we probably can't require it, because the OpenRosa manifest doesn't provide it. That covers our two most common use cases, but we can't assume either. The ODK XForms spec doesn't really say how these URLs map to anything, at all really. And there's ambiguity in both of the common use cases' examples. As I understand at least the ODK XForms side of things (and this is how it's handled in Enketo), a Also, based on experience on the Central frontend, I do know there's some accommodation for uploading form attachments that don't match the filename as defined in the form. I don't know offhand how that mismatch is represented at the API level. But if it doesn't match the I think the options are:
Footnotes
|
This feels like a good place to start to me. It should be possible to expand to one of the others from there if ever we did find it was needed, does that sound right? We could document the subset of the spec that is supported. I'm not aware of any system that allows expressing a subpath on the client side. Collect downloads all media declared in the manifest to a single directory. The source could come from some URL including a subpath but there's no way to get it to save in a subdirectory on the client.
XLSForm has different columns for specifying audio, video, images and different constructs for specifying data files. It uses that usage context to generate the Central has no notion of subdirectories for media either.
Central ignores the "actual" filename in that case and saves the file content in a slot with the filename referenced in the form.
Whoops! PR at getodk/central-backend#1247 This would be possible with XLSForm -- a user could put a |
My understanding: I like to think of the "full However. Outside of the form XML treatment of the "attachment identity" is a different story. For instance Central has a So to take an extremist standpoint: if all components (Central, Webforms, Collect) have historically done, and will keep on doing, interpretations of that thing (variations of "oh we feel like it's actually just a filename and we're going to discard parts of the identity") rather than keeping the ID the ID, then why do we bother having these The orthodox (also somewhat extremist) standpoint would be: the de-facto usage is heretic, the URI is an opaque identifier, it doesn't contain any filename, it doesn't contain an instruction on where to store it if you happen to be a J2ME client or even if it seems to then that's a historical error because the pious shouldn't store instructions in an identifier, so we should practice restraint on our instincts to treat it as anything else but an opaque identifier. I think I'm more on the Orthodox than on the Reformist side here, because I want there to be a clear and atomic identifier somewhere. Also, if we can get away with treating the
And then if a consumer of that API (could be us, will be us, but outside of the sanctuary of the webforms xforms engine) wants to do "interpretation" of those URIs, they may do that in the privacy of their own homes to match immaculate attachment IDs to earthly filenames. |
And this could potentially include the web forms Vue frontend? Host applications likely don't know the "attachment identifiers"/jr:// URIS so requiring those as keys feels like it could be a barrier to usage. Does that sound right? |
I think we could let our Vue frontend be the place to perform that match, yes. It should reference this issue in a code comment ;-)
Just FTR, the implementer's task sequence I see roughly as follows:
With that in mind, back to:
It depends! Perhaps in the host application's further ecosystem (say they have their own "Central"-like backend) the original sin of "Let's go off on tangents by extracting "filenames" and other meaning from those identifiers" was never committed, and in that case they won't have to do any mapping. Such a host application will be able to talk to their backends to express "give me the blob for For those that have sinful backends however, such as we do, we could make a utility shim or library that contains a function that does the mappings. We'd use it ourselves in our Vue frontend, and other host application developers can steal/use it (via a simple git submodule if need be, it doesn't need to be a full blown npm package) if they have the same sinful legacy as we do. That keeps the haram fiddly stuff contained there, rather than inside the webforms code (which would condone and enshrine it, and as such would proliferate the animist myth that (odk)xforms says something about "filenames"). So that'd be the "reactionary orthodox" view on the matter. It's not unreasonable I think? There's still the "revolutionary reformist" view, which asks "Well… if we live and breathe filenames, then why are we going through those pagan seances of converting them back and forth into/from those |
…chments **IMPORTANT NOTES!** 1. This change deprecates the former, more general, `fetchResource` config. While conceptually suitable for the purpose it was originally designed (and for this API expansion), it seems likely to be a footgun in real world use. 2. This change **intentionally deviates** from the design direction which we previously settled in #201. It is, instead, intended to be directly responsive to the discussion which followed that previously settled design. - - - **Why another `fetch`-like configuration?** Taking the above important notes in reverse order: _Intentional deviation_ The discussion in #201 ultimately convinced me that it would likely be a mistake to favor any Map-like interface for the purpose of resolving form attachments. There is too much ambiguity and baggage around the current mechanisms for _satisfying_ the interface. Ultimately, @brontolosone’s axiom—that a complete `jr:` URL is a particular form attachment’s identifier—**is correct**! Any truncation of them is a convenient shorthand which would align with known data sources. Paradoxically, in typical usage, clients ultimately **MUST** extrapolate those identifiers from more limited data: the source data itself will _only_ contain truncated shorthand values. We’d discussed supplying a parse-stage summary of the `jr:` URLs referenced by a form, allowing clients to populate a mapping before proceding with post-parse initialization. But the logic a client would implement to satisfy that mapping would be **identical** to the logic it would implement to satisfy a `fetch`-like function: given an expected `jr://foo/bar.ext`, and given a URL to an attachment named `bar.ext`, provide access to that URL. It seems more sensible to specify this API as a function rather than as a Map-like data structure, if for no other reason than to avoid introducing or further entrenching a particular shorthand interpretation. We will call a client’s `fetchFormAttachment` with each attachment’s complete identifier (the full `jr:` URL), and the client can satisfy that with whatever data it has available and whatever mapping logic it sees fit _at call time_, without codifying that shorthand into the engine’s logic. _Distinct `fetch`-like configurations_ A single `fetchResource` configuration would meet the minimum API requisite API surface area to retrieve form definitions _and their attachments_, but it would be considerably more error prone for clients to integrate: - `fetchResource` has up to this point been optional, and is typically ignored in existing client usage - a single `fetchResource` config would need to serve dual purposes: resolve a form definition by URL (which is likely to be a real, network-accessible URL) and resolve form attachments by identifier (which happens to be a valid URL, but will never resolve to a real network resource except if clients implement the same resolution logic with a Service Worker or other network IO-intercepting mechanism) - a client would need to determine the purpose of a given call _by inference_ (by inspecting the URL, and probably branching on a `jr:` prefix); the `fetch`-like API simply wouldn’t accommodate a more explicit distinction without **also overloading** HTTP semantics (e.g. by callling `fetchResource` with some special header) - - - It is also important to note that `fetchFormAttachment` is also optional, at least for now. The main reasons for this are detailed on the interface property’s JSDoc, but it also bears mentioning that this (along with keeping `fetchResource` as an alias to `fetchFormDefinition`) allows all currently supported functionality to keep working without any breaking changes.
…aders` This moves the existing resource-related types into a dedicated module (they probably already should have been!). It also revises their JSDoc to reflect some of the semantic expectations clarified in #201.
This design issue is part of broader support for external secondary instances:
jr:
URLsThe intent in this issue is to decide on a direction for how we want to handle retrieval of form attachments generally, with specific goal of supporting external secondary instance functionality.
These are the pertinent aspects of the ODK XForms spec.
The issue will focus on the engine/client interface to support the retrieval of
jr:
URLs. As I tend to do for engine/client interface design, I will present a few options which we can choose from or iterate on. Implementation in the engine will be derived from there.Note: it is expected that the design we choose here will also lay groundwork for supporting other
jr:
URL use cases—i.e. media attachments—so I've tried to be mindful of that (as we should in discussion).Note: this issue does not currently address the
jr://instance/last-saved
virtual endpoint, but I believe nothing in any of the proposed options would block or impede that functionality, when we're ready to address it.Option 0: client/host application handles resource resolution
This is a true null option: in the narrowest sense, we could claim this work is already done with the provision of a
fetchResource
configuration option. This option is technically sufficient to satisfy the engine's spec responsibilities.How this would work:
fetchResource
with any external secondary instance'sjr:
URLfetchResource
option, the client is then responsible for resolving thatjr:
URL to the referenced resource, for the active form instancefetchResource
option, the engine will produce a well-defined error result (with errors to be discussed in a separate design)At least until we support offline mode (or any other functionality that would imply runtime-level caching/persistence), this would leave resource resolution entirely to clients.
This is sort of the opposite of a "pit of success" option, with its primary appeal being limited engine-side work for this aspect of the targeted feature.
Beyond obvious non-"pit of success" drawbacks, I'll specifically note that it's the most likely option to result in disparities and drift between clients. It's also likely to promote disparities/drift between different functionality which intersect with it.1
Option 0.1: engine does not handle this aspect at all
Another null option variant.
This would effectively mean that clients must resolve
jr:
URLs before initializing a form. They'd probably supply the resources asdata:
orblob:
URLs, substituted directly in the form definition provided by clients to the engine.This option does not appeal to me, but I think it's worth mentioning so we can make a thoroughly informed decision.
Option 0.5: engine provides resolution handler(s) for common cases
An extension of option 0, this is similar in spirit to the submission API proposal (#188), and some of the discussion ongoing there. The idea would be that we recognize one or more typical resource mapping schemes, and expose default
fetchResource
implementations to address those (likely as some kind of factory function so clients can parameterize them for per-instance usage appropriately).I would imagine starting with handlers for:
Option 1: engine provides one or more explicit mechanisms for form attachment resolution, tailored to feature-specific use cases
Instead of the engine calling a generalized
fetchResource
option with ajr:
URL, the engine would instead accept a configuration mapping between specificjr:
URLs to one of:fetch
-able URL (blob:
,data:
, ?); this mapped URL would then be accessed by the samefetchResource
optionBlob
of the resource's data (Promise<Blob>
,() => Promise<Blob>
, ?)The mapping itself could be any of:
Map
-like object (or evenRecord<string, T>
if we're feeling really loosey goosey about it)Footnotes
This has been a particular pain point in Enketo. Support for
jr:
URLs is spread across three packages, and difficult to iterate on even after moving the projects to a monorepo. ↩The text was updated successfully, but these errors were encountered: