Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

External secondary instances: I/O support for jr: URLs #201

Closed
eyelidlessness opened this issue Aug 27, 2024 · 11 comments · Fixed by #259
Closed

External secondary instances: I/O support for jr: URLs #201

eyelidlessness opened this issue Aug 27, 2024 · 11 comments · Fixed by #259

Comments

@eyelidlessness
Copy link
Member

eyelidlessness commented Aug 27, 2024

This design issue is part of broader support for external secondary instances:

The intent in this issue is to decide on a direction for how we want to handle retrieval of form attachments generally, with specific goal of supporting external secondary instance functionality.

These are the pertinent aspects of the ODK XForms spec.

The issue will focus on the engine/client interface to support the retrieval of jr: URLs. As I tend to do for engine/client interface design, I will present a few options which we can choose from or iterate on. Implementation in the engine will be derived from there.

Note: it is expected that the design we choose here will also lay groundwork for supporting other jr: URL use cases—i.e. media attachments—so I've tried to be mindful of that (as we should in discussion).

Note: this issue does not currently address the jr://instance/last-saved virtual endpoint, but I believe nothing in any of the proposed options would block or impede that functionality, when we're ready to address it.

Option 0: client/host application handles resource resolution

This is a true null option: in the narrowest sense, we could claim this work is already done with the provision of a fetchResource configuration option. This option is technically sufficient to satisfy the engine's spec responsibilities.

How this would work:

  • The engine will call fetchResource with any external secondary instance's jr: URL
  • If the client has provided a fetchResource option, the client is then responsible for resolving that jr: URL to the referenced resource, for the active form instance
  • If the client has not provided a fetchResource option, the engine will produce a well-defined error result (with errors to be discussed in a separate design)

At least until we support offline mode (or any other functionality that would imply runtime-level caching/persistence), this would leave resource resolution entirely to clients.

This is sort of the opposite of a "pit of success" option, with its primary appeal being limited engine-side work for this aspect of the targeted feature.

Beyond obvious non-"pit of success" drawbacks, I'll specifically note that it's the most likely option to result in disparities and drift between clients. It's also likely to promote disparities/drift between different functionality which intersect with it.1

Option 0.1: engine does not handle this aspect at all

Another null option variant.

This would effectively mean that clients must resolve jr: URLs before initializing a form. They'd probably supply the resources as data: or blob: URLs, substituted directly in the form definition provided by clients to the engine.

This option does not appeal to me, but I think it's worth mentioning so we can make a thoroughly informed decision.

Option 0.5: engine provides resolution handler(s) for common cases

An extension of option 0, this is similar in spirit to the submission API proposal (#188), and some of the discussion ongoing there. The idea would be that we recognize one or more typical resource mapping schemes, and expose default fetchResource implementations to address those (likely as some kind of factory function so clients can parameterize them for per-instance usage appropriately).

I would imagine starting with handlers for:

Option 1: engine provides one or more explicit mechanisms for form attachment resolution, tailored to feature-specific use cases

Instead of the engine calling a generalized fetchResource option with a jr: URL, the engine would instead accept a configuration mapping between specific jr: URLs to one of:

  • the resource's real URL, or some other fetch-able URL (blob:, data:, ?); this mapped URL would then be accessed by the same fetchResource option
  • anything else capable of resolving a Blob of the resource's data (Promise<Blob>, () => Promise<Blob>, ?)

The mapping itself could be any of:

  • the OpenRosa manifest format
  • the response type from Central's form attachment list API
  • a simple Map-like object (or even Record<string, T> if we're feeling really loosey goosey about it)

Footnotes

  1. This has been a particular pain point in Enketo. Support for jr: URLs is spread across three packages, and difficult to iterate on even after moving the projects to a monorepo.

@sadiqkhoja
Copy link
Contributor

I was thinking if host application could just provide secondary attachments along with Form XML so engine doesn't have make network calls and deal with all the network related errors. I am saying this with the assumptions:

  • Engine doesn't really need media files like images/audio
  • Host application would have the list of all Form attachments (CSVs and XMLs only) and it can fetch them when it is fetching Form XML.

This is closer to the Option 1 presented above, except required attachments are resolved at the host application levle before anything else happens.

@eyelidlessness
Copy link
Member Author

eyelidlessness commented Sep 4, 2024

I want to make sure to sum up a couple key conclusions from our discussion yesterday:

Design choice: Option 1, possibly supplemented by Option 0.5

We decided to go with Option 1. As a stretch goal, we may also include aspects of Option 0.5 as an additive aid to clients/host applications.

We refined the proposed Option 1 interface. Putting aside naming (included here so the type will be valid syntax/highlighted as such), this is the interface we anticipate clients/host applications to provide for all form attachments:

type FormAttachmentMapping = Record<`jr:${string}`, () => Promise<Response>>;

Open for bikesheddy discussion: is there any openness to making this a Map rather than a Record? @sadiqkhoja your call. I would generally like to move away from using plain objects for bag-of-stuff collections where the keys aren't at least partially known/fixed at design time. But I can also understand this may be less convenient at the package boundary.

As for the value side of the mapping, providing a thunk per resource:

  • Allows clients and/or host applications to provide arbitrary, opaque implementations. Requesting a resource may simply pass through a fetch call, might perform some specialized error handling, may even handle fault tolerance (e.g. retries, which we determined not to do in the engine, at least for now).

  • Allows the engine to perform essential requests upfront, ensuring those resources' availability for form operations, and consistently surfacing any errors encountered in a format suitable for designs coming out of External secondary instances: Error conditions #202.

  • Allows the engine to throttle repeat requests (whether scheduled upfront or otherwise), ensuring consistency and predictability for a given form session.

We decided to represent each resource as a Response because it has semantics appropriate for likely usage scenarios, and suitable for error-reporting designs in #202 (without placing undue burden on clients to conform to a stricter Result-like representation as we've chosen there).

On engine invocation of form requests broadly (i.e. including media)

While this design is primarily focused on support for external secondary instances, it has obvious implications for other form attachments. We discussed this, which was also raised in the above comment. We determined it makes sense for the engine to invoke all such requests, largely for reasons discussed in the last section. Notably, we expect that the engine will perform media requests earlier in a form session (either as part of form load, or perhaps immediately following resolution of the initial form state).

Addendum to yesterday's discussion of this point

As an additional point not covered yesterday, but which I think helps to bolster this decision: insofar as the engine expects clients/host applications to provide resource data, there's another very good reason for the engine to invoke requests, and in particular to for the engine to get those requests back as a Response. Namely, streaming. If we expected:

  • Promise<Blob>, we'd block form load on potentially very large resources which typically load progressively
  • () => Promise<Blob>, we'd block either form load or other arbitrary form operations (potentially even infecting large amounts of the engine/client API with asynchrony in the process!)

And the engine does need to access at least some resource data for media attachments, as they may be associated with node values.


In the future, we might consider expanding this interface to provide multiple representations, such as something like Record<'jr:*', { request: () => Promise<Response>, url?: string }>. But I think it's best to stick to the simpler interface for now until we have some time to integrate it and understand from experience where there are real gaps.

@lognaturel lognaturel added this to the Next milestone Oct 22, 2024
@brontolosone
Copy link
Contributor

brontolosone commented Oct 24, 2024

  1. About

[...] represent each resource as a Response [...]

Good, that leaves room for the implementer to supply the data any which way.

However, gor the engine it's less convenient as the annoying thing about responses is that you can read them (eg call .blob() or .text() on them) only once, so the engine will have to do some work - for instance in the case of secondary instances, read once and deserialize the xml while for the case of media URLs - pictures etc - it'll probably want to use createObjectURL() so that it'll have a reusable URL to use in eg <img src="the-object-url"> or what have you.
Or, it could call the function-value in the below FormAttachmentMapping again under the assumption that the instantiator would then give it a freshly constructed, not-yet-read response. But then ideally that expectation would be reflected in the FormAttachmentMapping type. (and even if the type enforces it, people writing their own instantiator (which I hope people can do, I think it should be modular that way) directly against some JS wrapper might miss that detail, reuse responses, and won't understand that mistake months down the road when a survey needs to show image twice — so that's a bit fragile).
Probably the most prudent and robust thing to do is to write the engine with the assumption that one can only call the functions of FormAttachmentMapping once. Of course it's free to .clone() the responses or do whatever needs doing.

Perhaps the most succinct typification of the expectations of the engine is... a blob after all? Initiator supplies blobs? Plain and simple?

  1. About

type FormAttachmentMapping = Record<jr:${string}, () => Promise>;

Perhaps a name like FormAttachmentRetrievalMapping would be more descriptive, to pre-empt the question of "mapping to what? for what?".

But to the point. This leaves it up in the air how the instantiator would know what jr: URIs are actually used in the form. An instantiator could run a bit of xpath on the form xml, but it shouldn't have to, that kind of knowledge belongs inside the webforms engine. It could also get it as metadata from the place it gets the survey definition itself from — say Central. But that's just kicking the can down the road, because how does Central know? Run a bit of xpath? At some point we'd like this engine to be able to authoritatively answer "what attachments is this survey referencing?", and hopefully at that point we might use it as such in Central to construct the nudge for uploading form attachments. But at that point an instantiator should be able to get the same answers from the engine.

So I think a two-stage mechanism can be useful. One, the engine can load a survey definition and return a list of jr: URIs referenced. Two, the instantiator does its thing to construct the FormAttachmentMapping. Three, the instantiator instantiates (obviously), with that mapping.
Furthermore I think the I think error handling around those responses should be very thin inside the engine itself — simply fail loudly — as it's the instantiator's responsibility to handle the case where it cannot meet the precondition. Lazyloading things is thus the instantiator's risk!

@eyelidlessness
Copy link
Member Author

eyelidlessness commented Oct 24, 2024

Thanks @brontolosone, lots of good stuff to dig into here!

Good, that leaves room for the implementer to supply the data any which way.

Yep! That is exactly the intent. We just want to set a baseline for read semantics, without being proscriptive at all about how the thing we read gets into that state. And the read semantics of Response work really well for that:

  • It's the default behavior of the most idiomatic implementation, of the most typical use case (fetch, and network I/O, respectively)
  • It's a builtin which supports arbitrary construction, out of parts that also do; once constructed, the semantics are the same as if returned by a platform call like fetch

However, gor the engine it's less convenient as the annoying thing about responses is that you can read them (eg call .blob() or .text() on them) only once, so the engine will have to do some work - for instance in the case of secondary instances, read once and deserialize the xml while for the case of media URLs - pictures etc - it'll probably want to use createObjectURL() so that it'll have a reusable URL to use in eg <img src="the-object-url"> or what have you.

This is more or less what I anticipate the engine doing. At least for a first pass, there will be a little bit more nuance in terms of timing (i.e. we may block initializing state for external instances, and kick off other attachment requests immediately after). There are some other nuances to think about, around the nature of certain network calls (i.e. streaming). But otherwise this pretty much captures the behavior we intend to implement.

Or, it could call the function-value in the below FormAttachmentMapping again under the assumption that the instantiator would then give it a freshly constructed, not-yet-read response.

I appreciate you raising this concern! I think we should make it explicit that we WILL NOT do that. As I mentioned in part of our call a bit earlier, that guarantee is one of the specific reasons we chose Response for this: its semantics preclude repeatedly resolving the same body (or abstraction thereof). As you put it:

But then ideally that expectation would be reflected in the FormAttachmentMapping type.

The intent of not reflecting that expectation is to help communicate that we don't have that expectation.

Perhaps the most succinct typification of the expectations of the engine is... a blob after all? Initiator supplies blobs? Plain and simple?

Also discussed on our call a bit ago, part of the intent of taking a Response is in support of #202. Since Response has clear semantics for communicating error conditions, it's well positioned as a mechanism for us to collect those error conditions along with any others that occur during form init, and provide a uniform mechanism for conveying them back to the client.

Capturing another, related part of our discussion here for posterity...

The intent of the design is also to support clients which may want greater (or even total) control over this set of error conditions (as in, greater than the easy default we're setting up: simple functions wrapping fetch calls). Because Response and its component parts can be constructed out of basically anything, clients can:

  • Intercept network failure conditions. From there they can choose to suppress some or all failures, or provide alternate failure semantics, or access fallback resources, or perform any manner of retry logic, or...
  • Perform completely arbitrary I/O (or even no I/O at all!). From there they have a relatively minimal responsibility to map any error conditions they encounter to Response, i.e. HTTP, semantics. If a client fails to satisfy that responsibility, they'll get the same outcome they would if we took Blobs instead: anything fallible in their implementation is totally opaque to us. In which case, we'll effectively handle it the same way we would, say, an HTTP 500 error.

Perhaps a name like FormAttachmentRetrievalMapping would be more descriptive, to pre-empt the question of "mapping to what? for what?".

I'm fine with that name if you like it better. Again, the name as proposed is mostly there so there's a concept we can reference unambiguously. I do think in general the "mapping to what" is answered by the type itself, and the "for what" is mostly answered by requisite familiarity with underlying details of the spec. But I am always in favor of naming clarity as a way to help reinforce assumptions!


But to the point. This leaves it up in the air how the instantiator would know what jr: URIs are actually used in the form. An instantiator could run a bit of xpath on the form xml, but it shouldn't have to, that kind of knowledge belongs inside the webforms engine. It could also get it as metadata from the place it gets the survey definition itself from — say Central. But that's just kicking the can down the road, because how does Central know? Run a bit of xpath? At some point we'd like this engine to be able to authoritatively answer "what attachments is this survey referencing?", and hopefully at that point we might use it as such in Central to construct the nudge for uploading form attachments. But at that point an instantiator should be able to get the same answers from the engine.

So I think a two-stage mechanism can be useful. One, the engine can load a survey definition and return a list of jr: URIs referenced. Two, the instantiator does its thing to construct the FormAttachmentMapping. Three, the instantiator instantiates (obviously), with that mapping.

Again capturing content from our earlier call:

  • I agree that the engine has this responsibility. Further, it will have the responsibility internally as part of its own parsing logic, independent of whether we provide a client-facing interface to it. It's a previously implicit requirement, now explicit: the engine needs to identify jr: URL references and associate those with their usage in order to use any mapping provided by the client.

  • I agree that the engine should provide a client-facing interface to this functionality. I don't think it's well captured above, but it was actually considered in earlier discussion of the design. The reason we didn't make it a part of this design now is because the two common use cases we're targeting (OpenRosa protocol, Central REST API) will either already have produced the requisite information, or are already expected to do. As I said on our call, while Central is already doing similar work for various reasons, it's something we'll be doing in a consistent and predictable way. So it would make sense for us to think about how we want to expose that to clients/host applications in a way where they don't have to keep doing redundant work. The key here (I think) is getting a sense of those requirements, and then prioritizing API design.

However! As I think more on this, I think we probably should relax the type of the key. At least from memory, I'm pretty sure the already-produced information we expect clients to have will be a mapping between filename and I/O-resolvable-resource. I think it's reasonable to accept:

type FormAttachmentRetrievalMapping = Record<`${string}.${string}`, () => Promise<Response>>;
// ... or, preferable IMO:            Map<...>
// ... if base.ext assumes too much:  WhateverBagOfKVs<string, ...>

Furthermore I think the I think error handling around those responses should be very thin inside the engine itself — simply fail loudly — as it's the instantiator's responsibility to handle the case where it cannot meet the precondition. Lazyloading things is thus the instantiator's risk!

It's actually a bit more complicated than this. It's discussed a bit in #202, and we covered it a bit on our call, but I think it's really important to highlight here too: we are anticipating that at least for some usage scenarios, I/O failures are not inherently terminal. This is one of the biggest reasons the engine has an I/O responsibility (or this abstraction over it) at all:

  • The engine MAY be expected to initialize a form instance in spite of a failure to load some or all form attachments
  • The engine SHOULD NOT be the arbiter of that decision (excepting truly, unambiguously unrecoverable errors, of course)

This does actually raise the question of whether we're sufficiently serving the second point! It is certainly possible, with the proposed interface, for us to implement that interface in the engine so a client can totally block at least the stateful (post-parse) portion of form init. (It's possible because we already accept a similarly opaque I/O abstraction for the form definition itself.)

But it certainly wouldn't be very ergonomic to do so. I don't think Response is an ideal way to convey "this operation failed, and all other related operations should fail along with it."

At this point, I think that portion of discussion is probably better had in #202. But briefly, my instinct is that we don't yet know enough about partial/total failure from a client perspective to merit thinking too hard on it yet.

@eyelidlessness
Copy link
Member Author

However! As I think more on this, I think we probably should relax the type of the key. At least from memory, I'm pretty sure the already-produced information we expect clients to have will be a mapping between filename and I/O-resolvable-resource. [...]

As soon as I posted this, I realized the likely mistake in my reasoning. While it's common to map base.ext <-> jr://whatever/path/to/base.ext, it isn't a requirement. I do think we should consider accepting filenames as the common use case, but I'm not sure whether that's sufficient. But I'm betting @lognaturel will have thoughts on that!

@lognaturel
Copy link
Member

While it's common to map base.ext <-> jr://whatever/path/to/base.ext, it isn't a requirement.

I've read through the last couple of posts but I'm not able to follow all the details. Can you give me a sense of what other kinds of mappings might exist?

@eyelidlessness
Copy link
Member Author

So let's say we have a form definition which references these form attachments:

  • jr://images/badger.png
  • jr://images/path/to/agilefrog.png

Taking the OpenRosa Manifest Document example as one of our typical cases, it would be sufficient to treat the manifest's filename as a key. We'd need to compare against the URLs with "jr://" and the first path segment removed.

But here already, base.ext isn't sufficient, as there are potentially subpaths. I'm glad I looked, that's already clarifying!


If we have another form definition referencing:

  • jr://audio/myfile.mp3

The ODK Central REST API docs' section on listing form attachments also suggests a key of name would be sufficient.

But wait. The Central API example has "type": "image"! 😅1

Suppose we have a form definition which references:

  • jr://audio/myfile.mp3
  • jr://image/myfile.mp3

Now it seems likely we'll want to allow data type as part of the key. But we probably can't require it, because the OpenRosa manifest doesn't provide it.


That covers our two most common use cases, but we can't assume either. The ODK XForms spec doesn't really say how these URLs map to anything, at all really. And there's ambiguity in both of the common use cases' examples. As I understand at least the ODK XForms side of things (and this is how it's handled in Enketo), a jr: URL can be mapped to literally any URL at all, without any common substring key between them. I think that's why I originally landed on treating the full jr: URL as a key.

Also, based on experience on the Central frontend, I do know there's some accommodation for uploading form attachments that don't match the filename as defined in the form. I don't know offhand how that mismatch is represented at the API level. But if it doesn't match the jr: URL suffix we'll definitely want to account for it.


I think the options are:

  • Filename (base.ext) key: handles the obvious cases
  • Subpath key: handles the ambiguity in the OpenRosa manifest example
  • Suffix-of-jr: URL key: handles the type ambiguity in the Central example
  • Full jr: URL: handles the ambiguity of the ODK XForms spec

Footnotes

  1. This was just a lucky bit of silliness that happened to illustrate part of the point. I'd volunteer to make the example make a little bit more sense... but I'm not sure where the source for these docs lives!

@lognaturel
Copy link
Member

Filename (base.ext) key: handles the obvious cases

This feels like a good place to start to me. It should be possible to expand to one of the others from there if ever we did find it was needed, does that sound right? We could document the subset of the spec that is supported.

I'm not aware of any system that allows expressing a subpath on the client side. Collect downloads all media declared in the manifest to a single directory. The source could come from some URL including a subpath but there's no way to get it to save in a subdirectory on the client.

jr://image/, jr://audio/, etc all map to that same directory in Collect. I believe they were differentiated in the J2ME days so you could do things like specify that all images should be stored on auxiliary storage.

XLSForm has different columns for specifying audio, video, images and different constructs for specifying data files. It uses that usage context to generate the jr://image/, jr://audio, etc prefixes.

Central has no notion of subdirectories for media either.

do know there's some accommodation for uploading form attachments that don't match the filename as defined in the form

Central ignores the "actual" filename in that case and saves the file content in a slot with the filename referenced in the form.

The Central API example has "type": "image"

Whoops! PR at getodk/central-backend#1247 This would be possible with XLSForm -- a user could put a .mp3 file in the image column and what is now in the API docs would be the outcome in Central. Collect would happily consume it too and would try to render it as an image. It might crash or hopefully show an error message if the file is truly an mp3. I think it would succeed if the file were a jpg with an mp3 extension, for example.

@brontolosone
Copy link
Contributor

My understanding:

I like to think of the "full jr: URL" as the de-jure identity of an attachment. So not really an URL, URI or URN. Just an opaque ID. So let's take that as an axiom. It's how attachments are referenced in the form XML, so that's my gospel, my source of truth 🙏
Again: axiom. It may be wrong! Perhaps we have a need for it being an URL/URI/URN but I haven't seen a comprehensive articulation of that (or I haven't looked hard enough), I stand to be enlightened ;-)

However. Outside of the form XML treatment of the "attachment identity" is a different story. For instance Central has a form_attachments table with columns (formId | blobId | name | type | formDefId | updatedAt | datasetId) and stuffs the "file name part" in the name column; essentially deriving a new de-facto identity from the de-jure identity. ODKCollect at one point in time audaciously stored the attachments with the derived filename in the Android filesystem. And then the form definition can have <value form="image">jr://images/b.jpg</value> where the images part in the URI looks redundant indeed. All of that muddles the question of what the identity of an attachment is, which is why we end up discussing it right here more than a decade later still 😆

So to take an extremist standpoint: if all components (Central, Webforms, Collect) have historically done, and will keep on doing, interpretations of that thing (variations of "oh we feel like it's actually just a filename and we're going to discard parts of the identity") rather than keeping the ID the ID, then why do we bother having these jr: URIs at all? Just use the "filename" everywhere then; it's the de facto ID! (except now it's no longer a filename, it's the ID 🤡

The orthodox (also somewhat extremist) standpoint would be: the de-facto usage is heretic, the URI is an opaque identifier, it doesn't contain any filename, it doesn't contain an instruction on where to store it if you happen to be a J2ME client or even if it seems to then that's a historical error because the pious shouldn't store instructions in an identifier, so we should practice restraint on our instincts to treat it as anything else but an opaque identifier.
The one point at which we should allow ourselves to succumb to "filename urges" is in Central-like scenarios when we need to ask the user to supply the attachments that the form is missing; and the user reasons in files, browsers allow picking and uploading files, and probably the user was thinking in files when they made their .xlsx to start with. There, in the filename-y kingdom of sin, we would allow ourselves to talk about "filenames" — but once those uploads are done we wash our mouths with soap and will thenceforth only speak of attachment identifiers.

I think I'm more on the Orthodox than on the Reformist side here, because I want there to be a clear and atomic identifier somewhere. Also, if we can get away with treating the jr: thing as an opaque identifier rather than worrying about the semantics of inner structure, we should! I think we can. There are no current use cases for inner structure, are there?
And per that dogma, we should thus go with:

  • Full jr: URL: handles the ambiguity of the ODK XForms spec

And then if a consumer of that API (could be us, will be us, but outside of the sanctuary of the webforms xforms engine) wants to do "interpretation" of those URIs, they may do that in the privacy of their own homes to match immaculate attachment IDs to earthly filenames.

@lognaturel
Copy link
Member

lognaturel commented Oct 29, 2024

they may do that in the privacy of their own homes to match immaculate attachment IDs to earthly filenames.

And this could potentially include the web forms Vue frontend? Host applications likely don't know the "attachment identifiers"/jr:// URIS so requiring those as keys feels like it could be a barrier to usage. Does that sound right?

@brontolosone
Copy link
Contributor

brontolosone commented Oct 30, 2024

they may do that in the privacy of their own homes to match immaculate attachment IDs to earthly filenames.

And this could potentially include the web forms Vue frontend?

I think we could let our Vue frontend be the place to perform that match, yes. It should reference this issue in a code comment ;-)
I suspect that full reactionary-orthodoxy would say that actually, Central should have been returning jr:// identifiers at the attachment listing endpoint all along, and that would have allowed us to stay in ID-land and then we wouldn't have to do any matching.

Host applications likely don't know the "attachment identifiers"/jr:// URIS so requiring those as keys feels like it could be a barrier to usage. Does that sound right?

Just FTR, the implementer's task sequence I see roughly as follows:

  1. ask the webforms engine what attachments it sees referenced in the form xml
  2. it returns {"jr://file/blabla.png", …}
  3. I learn from Central what attachments are available for this form, it replies with earthly filenames: ["blabla.png", …]
  4. As a lazy developer, potentially unfamiliar with ODKXForm internals, I go "uhhhhh... Well OK, let me regex the 'filenames' out from those funny URL-y 'ids', piece of cake". Or I go "uhhhh... ok let me prepend "jr://file/" to all of those filenames I got from Central", also works.
  5. I supply the {"file-id": () => Promise<Response>} (or {"file-id": Blob}, as I think I'd still prefer) mapping to the webforms engine.

With that in mind, back to:

Host applications likely don't know the "attachment identifiers"/jr:// URIS so requiring those as keys feels like it could be a barrier to usage. Does that sound right?

It depends! Perhaps in the host application's further ecosystem (say they have their own "Central"-like backend) the original sin of "Let's go off on tangents by extracting "filenames" and other meaning from those identifiers" was never committed, and in that case they won't have to do any mapping. Such a host application will be able to talk to their backends to express "give me the blob for jr://file/asdasdasd.asda" and no one would need to be thinking or talking about "filenames".

For those that have sinful backends however, such as we do, we could make a utility shim or library that contains a function that does the mappings. We'd use it ourselves in our Vue frontend, and other host application developers can steal/use it (via a simple git submodule if need be, it doesn't need to be a full blown npm package) if they have the same sinful legacy as we do. That keeps the haram fiddly stuff contained there, rather than inside the webforms code (which would condone and enshrine it, and as such would proliferate the animist myth that (odk)xforms says something about "filenames").

So that'd be the "reactionary orthodox" view on the matter. It's not unreasonable I think?

There's still the "revolutionary reformist" view, which asks "Well… if we live and breathe filenames, then why are we going through those pagan seances of converting them back and forth into/from those jr:// URLs, can't we get rid off that useless liturgy?". But following up on that would break everyone's parsers and cause a schism, so I think the orthodox view is the more pragmatic and practical.

eyelidlessness added a commit that referenced this issue Nov 26, 2024
…chments

**IMPORTANT NOTES!**

1. This change deprecates the former, more general, `fetchResource` config. While conceptually suitable for the purpose it was originally designed (and for this API expansion), it seems likely to be a footgun in real world use.

2. This change **intentionally deviates** from the design direction which we previously settled in #201. It is, instead, intended to be directly responsive to the discussion which followed that previously settled design.

- - -

**Why another `fetch`-like configuration?**

Taking the above important notes in reverse order:

_Intentional deviation_

The discussion in #201 ultimately convinced me that it would likely be a mistake to favor any Map-like interface for the purpose of resolving form attachments. There is too much ambiguity and baggage around the current mechanisms for _satisfying_ the interface.

Ultimately, @brontolosone’s axiom—that a complete `jr:` URL is a particular form attachment’s identifier—**is correct**! Any truncation of them is a convenient shorthand which would align with known data sources.

Paradoxically, in typical usage, clients ultimately **MUST** extrapolate those identifiers from more limited data: the source data itself will _only_ contain truncated shorthand values.

We’d discussed supplying a parse-stage summary of the `jr:` URLs referenced by a form, allowing clients to populate a mapping before proceding with post-parse initialization. But the logic a client would implement to satisfy that mapping would be **identical** to the logic it would implement to satisfy a `fetch`-like function: given an expected `jr://foo/bar.ext`, and given a URL to an attachment named `bar.ext`, provide access to that URL.

It seems more sensible to specify this API as a function rather than as a Map-like data structure, if for no other reason than to avoid introducing or further entrenching a particular shorthand interpretation. We will call a client’s `fetchFormAttachment` with each attachment’s complete identifier (the full `jr:` URL), and the client can satisfy that with whatever data it has available and whatever mapping logic it sees fit _at call time_, without codifying that shorthand into the engine’s logic.

_Distinct `fetch`-like configurations_

A single `fetchResource` configuration would meet the minimum API requisite API surface area to retrieve form definitions _and their attachments_, but it would be considerably more error prone for clients to integrate:

- `fetchResource` has up to this point been optional, and is typically ignored in existing client usage
- a single `fetchResource` config would need to serve dual purposes: resolve a form definition by URL (which is likely to be a real, network-accessible URL) and resolve form attachments by identifier (which happens to be a valid URL, but will never resolve to a real network resource except if clients implement the same resolution logic with a Service Worker or other network IO-intercepting mechanism)
- a client would need to determine the purpose of a given call _by inference_ (by inspecting the URL, and probably branching on a `jr:` prefix); the `fetch`-like API simply wouldn’t accommodate a more explicit distinction without **also overloading** HTTP semantics (e.g. by callling `fetchResource` with some special header)

- - -

It is also important to note that `fetchFormAttachment` is also optional, at least for now. The main reasons for this are detailed on the interface property’s JSDoc, but it also bears mentioning that this (along with keeping `fetchResource` as an alias to `fetchFormDefinition`) allows all currently supported functionality to keep working without any breaking changes.
eyelidlessness added a commit that referenced this issue Nov 26, 2024
…aders`

This moves the existing resource-related types into a dedicated module (they probably already should have been!). It also revises their JSDoc to reflect some of the semantic expectations clarified in #201.
@eyelidlessness eyelidlessness moved this from Todo to In Progress in Web Forms Dec 3, 2024
@github-project-automation github-project-automation bot moved this from In Progress to Done in Web Forms Dec 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

4 participants