-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for external secondary instances #259
Conversation
…test This isn’t directly related to coming work on this branch. It addresses a mistake in the original JavaRosa test porting effort. I caught it now as I was looking into existing coverage of external secondary instance functionality.
This evolves a similar pattern in the setup for XML XForms fixtures. It’s difficult to completely generalize glob imports, because the actual call to `import.meta.glob`, a Vite API, strictly requires inline literal values both for the glob pattern itself and for the options object. The generalization here allows use of a single call, which always produces a “URL”: - in browser runtimes, this is a URL which can be requested by `fetch` - on Node, it’s a path with a `/@fs/` prefix The same generalization will be applied to `xforms.ts` in a subsequent commit, simplifying that aspect of fixture loading there and reusing the same logic as much as possible.
**JRResource:** - Represents a resource resolvable for a given `jr:` URL - Provides a mechanism to load the resource’s data (currently just string data, we’ll likely expand this to support media attachments) - Provides a helper static factory method to create a `JRResource` instance from an `XFormAttachmentFixture` **JRResourceURL:** simple subclass of standard `URL` to ensure we’re working specifically with resources associated with `jr:` URLs. This will be helpful for internals as well as engine/client APIs. **JRResourceService:** - Provides an in-memory “service” (in the sense of a server) to activate and access `JRResource`s by URL. - Activation is designed to support: - Current test usage (as ported from JavaRosa) where a set of fixtures are made available from their directory path, under a set of categories (“schemes” in ported JR terminology) which are ultimately used to service URLs in a pattern like `jr://$category*` - Anticipated test usage, where we may activate certain fixtures _inline in a specific test_. Conceptually this will be similar to inline form definitions using the JR-ported XForm DSL. - Access is designed to support a fetch-like interface. More details on this approach in a later commit introducing the engine API for clients to provide access to form attachments. - - - These abstractions are shared because it’s anticipated they’ll be used in: - `@getodk/scenario` for test setup - `@getodk/web-forms` for dev/demo access to fixtures We could theoretically use the same interfaces to support form attachments in the Web Forms Preview as well, i.e. to allow users to upload form definitions and their associated attachments. But we’d probably want UI/UX design to accompany that.
…rceService **SharedJRResourceService:** - Subclass of `JRResourceService` which only allows initialization as a singleton (stored as module-private state) - Automatically resets its internal state before and after every test run **ReferenceManagerTestUtils:** this was previously a stub porting the same interface from JavaRosa. It is now updated to use the `SharedJRResourceService` singleton’s state.
…chments **IMPORTANT NOTES!** 1. This change deprecates the former, more general, `fetchResource` config. While conceptually suitable for the purpose it was originally designed (and for this API expansion), it seems likely to be a footgun in real world use. 2. This change **intentionally deviates** from the design direction which we previously settled in #201. It is, instead, intended to be directly responsive to the discussion which followed that previously settled design. - - - **Why another `fetch`-like configuration?** Taking the above important notes in reverse order: _Intentional deviation_ The discussion in #201 ultimately convinced me that it would likely be a mistake to favor any Map-like interface for the purpose of resolving form attachments. There is too much ambiguity and baggage around the current mechanisms for _satisfying_ the interface. Ultimately, @brontolosone’s axiom—that a complete `jr:` URL is a particular form attachment’s identifier—**is correct**! Any truncation of them is a convenient shorthand which would align with known data sources. Paradoxically, in typical usage, clients ultimately **MUST** extrapolate those identifiers from more limited data: the source data itself will _only_ contain truncated shorthand values. We’d discussed supplying a parse-stage summary of the `jr:` URLs referenced by a form, allowing clients to populate a mapping before proceding with post-parse initialization. But the logic a client would implement to satisfy that mapping would be **identical** to the logic it would implement to satisfy a `fetch`-like function: given an expected `jr://foo/bar.ext`, and given a URL to an attachment named `bar.ext`, provide access to that URL. It seems more sensible to specify this API as a function rather than as a Map-like data structure, if for no other reason than to avoid introducing or further entrenching a particular shorthand interpretation. We will call a client’s `fetchFormAttachment` with each attachment’s complete identifier (the full `jr:` URL), and the client can satisfy that with whatever data it has available and whatever mapping logic it sees fit _at call time_, without codifying that shorthand into the engine’s logic. _Distinct `fetch`-like configurations_ A single `fetchResource` configuration would meet the minimum API requisite API surface area to retrieve form definitions _and their attachments_, but it would be considerably more error prone for clients to integrate: - `fetchResource` has up to this point been optional, and is typically ignored in existing client usage - a single `fetchResource` config would need to serve dual purposes: resolve a form definition by URL (which is likely to be a real, network-accessible URL) and resolve form attachments by identifier (which happens to be a valid URL, but will never resolve to a real network resource except if clients implement the same resolution logic with a Service Worker or other network IO-intercepting mechanism) - a client would need to determine the purpose of a given call _by inference_ (by inspecting the URL, and probably branching on a `jr:` prefix); the `fetch`-like API simply wouldn’t accommodate a more explicit distinction without **also overloading** HTTP semantics (e.g. by callling `fetchResource` with some special header) - - - It is also important to note that `fetchFormAttachment` is also optional, at least for now. The main reasons for this are detailed on the interface property’s JSDoc, but it also bears mentioning that this (along with keeping `fetchResource` as an alias to `fetchFormDefinition`) allows all currently supported functionality to keep working without any breaking changes.
…port (XML) I wasn’t able to find any tests this basic/of XML support specifically. Both tests utilize the previous commit’s additive support for specifying a (local) `JRResourceService` to handle form attachments. Both tests exercise the same _logic_. As noted in the comment above the second test, the substantive difference is that the attachment fixture is defined inline in the test, similarly to how we use the JavaRosa XForm DSL to define form definitions themselves.
…aders` This moves the existing resource-related types into a dedicated module (they probably already should have been!). It also revises their JSDoc to reflect some of the semantic expectations clarified in #201.
This also lays most of the groundwork for supporting CSV and GeoJSON formats, except for the actual parsing of those formats into `SecondaryInstanceDefinition`s. The intent of this internal design is to break up parsing into these staged layers: 1. I/O 2. Data format detection 3. Per-format parsing logic 4. Common representation of all **external** secondary instances 5. Finally, common representation of **all** secondary instances - - - There’s a bit of churn at some of the interface boundaries _within the engine_, but they’re all entirely internal so this doesn’t represent a breaking change to clients. Observation: some of the `XFormDefinition` boundaries are starting to feel a little creaky! - - - This also introduces a base abstraction for reprsenting form attachments **generally**, which I believe will be suitable for expanding support to media. (I locally explored expanding that further to support generating a local `blob:` URL for those media, to ensure we respect the `fetch`-like semantic guarantees for client-configured I/O. That’s out of scope for this work, but I think it’s well positioned to handle that when we’re ready.)
…ttachments As of this commit, this allows local dev preview of XML external secondary instances when: - Previewing a fixture from the local file system - The secondary instance resources are located in the same local file system directory, referencing the same file name as specified in their `jr:` URL
🦋 Changeset detectedLatest commit: 128e15f The changes in this PR will be included in the next version bump. This PR includes changesets to release 6 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
Implementation is largely derived from https://github.com/enketo/enketo/blob/05c9743b1a65edcf9f5d25a1e00f370b75243bcb/packages/enketo-express/public/js/src/module/geojson.js, which is in turn largely derived from JavaRosa. Deviations from the Enketo implementation: - Aside from building `StaticNode`s rather than a DOM representation, the actual logic to construct that subtree is much more localized to the individual elements being constructed - Various naming and type improvements
… including those identified in early review discussion (BOM, whitespace around column delimiter). Note the surprising case where parsing succeeds!
1ee1191
to
7222c30
Compare
… instances This matches the current behavior: when a user uploads a form with `<instance id=“whatever” src=“jr://whatever”/>`, it will be loaded as if the external secondary instance is blank. This also serves as a reference implementation for host applications integrating the same configuration.
Yay! The tests you added following our conversation look great. I haven't gotten to adding a test for JR around behavior when CSV tokens have whitespace around them. The behavior at 6b1f145#diff-0ae498d66cc2114687268adaea646f23d17985f3209cc8596693a1e55251eeffR918 looks like it matches the JR behavior and I'm not inclined to make any changes to it currently. For posterity -- Enketo strips those spaces. I'd be inclined to have an I am surprised by |
🎉
It might be interesting to track these specific divergences as they come up. If nothing else it could be valuable documentation. It could also theoretically be part of some kind of config-based way to trigger more Enketo-like behaviors, at least for behaviors users express interest in.
Unless I'm misunderstanding, I think that's covered by this prop. That's a prop on the I suppose it'd make sense to update the On that note: insofar as the
Yeah, this gets at the intent of making it an enum. The I also hope that |
Agreed. I try to at least have them in the differences doc and added some details at
I wasn't clear enough! I mean instead of having that at the top level, introduce a prop with name
So instead of a couple of booleans we'd end up with the corresponding cross product, basically. Something like Did you consider making this part of what the host returns on a per-resource basis instead of a blanket config ("here's how you get the resource and what you should do if you find you can't")? I imagine Central would just use the resource type to decide so what you have now is sufficient but other hosts could do things like state that a specific video is critical to the form but others aren't. |
Excellent! I had just created a note with this one case, but this is better and I'm glad it already exists.
Ah I see what you mean! I'm torn. On the one hand, the engine's API is the most intentionally designed at this point, so it's probably a good starting point for something at the UI's component boundary. On the other hand, I think a more intentional design for that boundary might lead somewhere different. For instance, I think we'll likely converge on a couple sets of config values for common cases. In that case, the config details themselves may become minutiae with less value at that granularity, and we might find it's friendlier to present those as distinct (wrapping) components or some other named/symbolic reference representing their distinct use cases.
Yeah. I haven't given too much thought to naming that most likely additive enum member, but conceptually this is pretty much what I had in mind.
It's funny you should mention this. In my last response, I almost mentioned: in a strict sense, Exampleimport type { JRResourceURL } from '@getodk/common/jr-resources/JRResourceURL.ts';
import { initializeForm } from '@getodk/xforms-engine';
declare const fileNameToURL: Map<string, URL>;
const formXML = '<h:html><!-- ... --></h:html>';
const missingResourceBehaviorEquivalents = {
/**
* This will fail, absent something like a Service Worker to intercept/handle
* requests for `jr:` URLs. (It's not **technically** equivalent since it won't
* produce a `404`, but close enough for an in-discussion example.)
*/
ERROR: (jrURL: JRResourceURL) => fetch(jrURL),
/**
* This will succeed, and it will produce a blank/empty external secondary
* instance for the requested resource.
*/
BLANK: (_: JRResourceURL) => {
return new Response('<root/>', {
status: 200,
headers: { 'Content-Type': 'text/xml' },
});
}
};
const rootNode = await initializeForm(formXML, {
config: {
fetchFormAttachment: async (jrURL) => {
const fileName = jrURL.pathname.split('/').at(-1)!;
const actualURL = fileNameToURL.get(fileName);
// When the `jr:` URL can be mapped to a real resource, perform a regular
// network request for it...
if (actualURL != null) {
return fetch(actualURL);
}
// ... otherwise, if the missing resource is critical, produce an error...
if (fileName === 'critical-instance.xml') {
return missingResourceBehaviorEquivalents.ERROR(jrURL);
}
// Lastly, for missing/non-critical resources, produce a blank response...
return missingResourceBehaviorEquivalents.BLANK(jrURL);
}
},
}); I don't think this is particularly friendly for clients/hosts, but it's available with the current design/approach if an integrator wants to do something really specific. I think a more likely scenario is that the engine will be able to determine whether a particular resource is critical for the form's functionality. Naively, this can be the distinction between external instance reference versus media. We can always get more specific than that with analysis of a form definition's specifics. Now having written all of this out... I do wonder if maybe a
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple of small suggestions but this is ready to merge as far as I'm concerned.
I don't love the full jr:
URL being passed to clients because I'm quite confident all will strip the jr://type
. But we can always iterate!
} | ||
|
||
static isJRResourceReference(reference: string | null): reference is JRResourceURLString { | ||
return reference?.startsWith('jr:') ?? false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use the constant everywhere rather than literal?
readonly fetchFormDefinition?: FetchResource; | ||
|
||
/** | ||
* @deprecated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need to mark things as deprecated and keep them around yet. I think it's simpler to immediately get rid of them.
…ecific `fetchFormDefinition`
…nce fixtures These were intended as a signal for future reference if we wanted to expand coverage around parsing minutiae for specific external secondary instance formats. That was in fact exercised as predicted in a subsequent commit, adding tests for various CSV parsing details. These tests no longer serve that purpose, so they can be removed!
}; | ||
|
||
const stripContentTypeCharset = (contentType: string): string => { | ||
return contentType.replace(/;charset=.*$/, ''); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about this:
- return contentType.replace(/;charset=.*$/, '');
+ return contentType.replace(/;\s?charset=.*$/, '');
Sometimes response contenttype includes space between media type and charset, including examples here: https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Content-Type#examples
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you file an issue? This PR has already been merged, so I can't make changes here.
Closes #201
I have verified this PR works in these browsers (latest versions):
[ ] Not applicableWhat else has been done to verify that this works as intended?
Some existing tests ported from JavaRosa now pass
At least one happy path test for each format (XML, CSV, GeoJSON)
Added a slew of CSV parsing tests based on conversation yesterday with @lognaturel
Added/updated integration in
@getodk/web-forms
dev/preview modes:These cases effectively represent the behaviors an integrating host application can expect. The latter is the default behavior for missing external secondary instance resources. The former requires a configuration option, and should be suitable for e.g. Central's form preview mode.
Worth noting: there is still a failing test ported from JavaRosa, where an error is expected when referencing a node that doesn't exist in a secondary instance. This test technically references an external secondary instance, but it isn't clear whether the behavior would be applicable to all secondary instances.
Another test ported from JavaRosa is now marked failing, representing the same default behavior for missing external secondary instances described above. A companion test has been added representing the optional/configured alternate behavior.
Why is this the best possible solution? Were any other approaches considered?
This builds on the work in #245, and mostly follows the design laid out in #201. It deviates somewhat from that design, mostly addressing some of the later discussion on that thread. I'll add more detail on that deviation below.
How does this change affect users? Describe intentional changes to behavior and behavior that could have accidentally been affected by code changes. In other words, what are the regression risks?
The main regression risk I can think of is that the newer default behavior is technically stricter for forms with external secondary instance references. Prior to this PR, those forms will always treat all such instances as blank. As of this PR, by default they'll be loaded and parsed by default, and evaluated as appropriate for any
instance(...)/...
expression defined in the form.The other regression risk is that this does introduce breaking changes for host applications. Those will also be detailed below.
Do we need any specific form for testing your changes? If so, please attach one.
This is pretty much covered by existing/added test fixtures. If you want to try it with your own form/attachments, they can be placed together in any fixtures subdirectory and accessed in dev mode.
What's changed
Breaking changes: host application <->
@getodk/web-forms
fetchFormAttachment
function.missingResourceBehavior
option.Both of these are specified (likely in
kebab-case
) as props on theOdkWebForm
component.Breaking changes: client <-> engine
The same host/client breaking changes apply to integration between clients and the engine. These breaking changes have been implemented in
@getodk/web-forms
to support host application integration. They've also been implemented in@getodk/scenario
to reflect the functionality in existing/added tests.Note: I am only realizing now that I've neglected to integrate these changes in
@getodk/ui-solid
. I am honestly not sure how valuable the package is at this point, so I'm hesitant to address this unless anyone feels strongly otherwise. If we all agree that the package itself isn't providing much value now, I'd like to consider removing it sometime soon (so it doesn't languish in a weird semi-broken state).Engine configuration:
fetchFormAttachment
The interface for this configuration is (simplified for reading inline):
... where
FetchResourceResponse
is the same subset ofResponse
already exposed for loading remote XForm XML resources, slightly expanded to allow an optionalheaders
property (which in turn is a subset of the nativeHeaders
type).This deviates from the design in #201: rather than providing a
Map
(or other map-like structure) for resolvingjr:
URLs, clients are expected to provide afetch
-like function. This deviation from the design is intended to address discussion in that issue around the appropriate keys for such a map-like structure.In spirit, it makes sense that the keys should be the full
jr:
URL (which I believe reflects @brontolosone's view in that discussion). In practice, a client and/or host will almost certainly be resolving thosejr:
URLs from a file name without additional information. In order for a caller to be able to satisfy a mapping betweenjr:
URL and something resolving it, we'd need to provide a parsed set of URLs to satisfy. We can certainly do this! And we've discussed doing it. But as a primary API, it would complicate the form initialization process by introducing a step I believe is unnecessary.This design change is a compromise recognizing that these are conceptually equivalent concepts:
I briefly considered providing a sample implementation for the two most common formats we'd expect to derive this configuration: OpenRosa Manifest documents and the Central Form Attachments API. But I decided that implementing them is trivial enough that it shouldn't hold up the PR. I'll be happy to assist with integration if it turns out to be less obvious downstream than I hope.
Engine configuration:
missingResourceBehavior
This configuration is mostly described above. It reflects a duality of expected behaviors:
The configuration options for this are exposed as constants, but a bare string is acceptable as well.
Notable engine internal implementation details
This is mostly covered in 0fbc3c5 and its commit notes. The most important detail is that parsing and loading is intentionally staged/layered:
That final common representation is then provided through
PrimaryInstance
to its instance ofXFormsXPathAdapter
. From there, evaluation of a form'sinstance(...)/...
expressions makes no distinction between any secondary instance's type (internal vs external), source (XForm definition,FetchFormAttachment
response) or format (XML DOM, XML text, CSV, GeoJSON).There is also a bit of groundwork in the I/O layer anticipating support for media attachments. Not much is done there other than recognizing the entrypoint for that functionality will likely flow through a common
FormAttachmentResource
abstraction (as do external secondary instance resources now).Acknowledgement: some of the
StaticNode
interfaces are super awkward(Where
StaticNode
and its subclasses are part of the engine's DOM adapter representation introduced in #245.)Parsing logic for both CSV and GeoJSON introduce some indirection around these representations to make them a little bit easier to reason about/more ergonomic to use. I think there's some room to simplify the base constructor signatures further, but I decided to put that off and live with that bit of tech debt for now. I'm eager to land the functionality, and expect others on the team are as well. So let's call this ready for review!