Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure backpressure is maintained in streams #71

Open
rubensworks opened this issue Nov 12, 2020 · 3 comments
Open

Ensure backpressure is maintained in streams #71

rubensworks opened this issue Nov 12, 2020 · 3 comments
Labels
enhancement New feature or request

Comments

@rubensworks
Copy link
Owner

rubensworks commented Nov 12, 2020

There may be some places where we are not adhering to Node's backpressure conventions with regards to streams.

Concretely, we seem to be creating new streams (such as Readable, PassThrough and Transform), and push-ing into them (via a 'data' handler on another stream). Node handles backpressuring via the return value of push, which we are ignoring in this manner.

A better solution would be to simply pipe instead of calling push on each data element.

Related to rubensworks/rdf-parse.js@269c757

Related to #66

@Tpt
Copy link
Contributor

Tpt commented Aug 16, 2022

I just had a quick look at it. I see two places where there might be a significant problems with backpressure:

  1. JsonLdParser.import does not care about backpressure at all. A quick way to add backpressure there would be to use the pipe method on the input stream when available.
  2. If no context is provided by the caller and not in streaming mode, the JSON-LD parser buffers all data before starting parsing only when the JSON is fully read. It will push all data when the last piece of JSON is read. I am not sure how much ignoring back pressure is bad knowing we already buffered the complete JSON into memory.

@rubensworks
Copy link
Owner Author

If no context is provided by the caller and not in streaming mode, the JSON-LD parser buffers all data before starting parsing only when the JSON is fully read. It will push all data when the last piece of JSON is read. I am not sure how much ignoring back pressure is bad knowing we already buffered the complete JSON into memory.

Indeed, unless the streaming profile is enabled, the full JSON-LD document is stored in memory before processing can start.

This is because without the streaming profile (which mandates that certain JSON keys come in specific orders), the following situation could occur:

{
  ... very long JSON-LD document
  "@context": "..."
}

Since the context in the example above comes at the end, the meaning of all preceding entries could be different. So that is why buffering must take place.
With the streaming profile on the other hand, the @context MUST occur first (or after @type).

Just pinging @wouterbeek and @LaurensRietveld on this as well, as this will have an impact on the bounty's resolution. I.e., if the streaming profile is not explicitly enabled, full buffering will take place, and memory issues can still arise.

rubensworks pushed a commit that referenced this issue Aug 17, 2022
rubensworks pushed a commit that referenced this issue Aug 17, 2022
@LaurensRietveld
Copy link

Hi @rubensworks and @Tpt , having to explicitly enable the streaming profile in order to properly (without buffering fully) stream through the jsonld is fine from our end

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants