ndjson-ld format #140

VladimirAlexiev · 2021-02-19T17:02:43Z

Why?

Newline-delimited JSON (line-oriented JSON) is often used in preference of JSON because it is streamable and can be processed with line-oriented tools (eg grep)

Previous work

specs: http://ndjson.org, https://github.com/ndjson/ndjson-spec, https://jsonlines.org (line-oriented JSON)
Sample data: https://sn-scigraph.figshare.com/ (projects/grants from Dimensions) (line-oriented JSON-LD)
Ontotext is now implementing ndjson-ld input in rdf4j and GraphDB GH-2840 Add Newline Delimited JSON-LD format desislava-hristova-ontotext/rdf4j#1
We're also considering output (SPARQL CONSTRUCT serialization as ndjson) but that's trickier

Proposed solution

We're considering MIME type application/x-ld+ndjson (derived from the existing MIME type for JSON-LD application/ld+json and the MIME type of Newline Delimited JSON application/x-ndjson)
We're considering file extensions .ndjsonld, and maybe .jsonl and .ndjson

Considerations for backward compatibility

None?

The text was updated successfully, but these errors were encountered:

ericprud · 2021-02-19T17:53:55Z

@VladimirAlexiev , i found some examples in the specs but didn't find what you were referring to in the "Sample data" link above.

I think streaming JSON would be an excellent tool for long-running SPARQL results and line-oriented is a nice benefit. I guess this is a small step from current JSON results as they already require newlines to be escaped, right?

TallTed · 2021-02-20T00:49:41Z

NDJSON is apparently also known as all of LDJSON, Line_Delimited_JSON, JSON_Lines, JSON_Streaming, JSONL, ndjson, NDJSON, and Newline_Delimited_JSON -- so this new thing could even be LD-JSON-LD!

Except that JSON-L (or JSONL) is definitely different from NDJSON... And I imagine there are other issues hiding behind the not-quite-synonym list above.

What is the (anticipated?) relationship between ND-JSON-LD (or NDJSON-LD) and JSON-LD (and 1.0, 1.1, etc.)?

Both JSON Lines and Newline Delimited JSON say they're also known by the other name, but as noted above these are different creatures. It's going to be necessary very quickly to clearly define which you're working with (and why not the other), as well as what may happen if the streams are crossed.

How and why is "Newline Delimited JSON-LD" (or is it "Linked Data in ND-JSON"?) related to the 1.2 update of SPARQL, which is the focus of this github project?

It seems to me that ND-JSON-LD should be a distinct project, maybe associated with JSON-LD given their apparent close cousin relationship.

On Media Type...

x- Media Types are generally frowned on these days, for good reason. Which the NDJSON folk know, and haven't done much about (ndjson/ndjson-spec#19, ndjson/ndjson-spec#21).

Media Types with Multiple Suffixes is heading toward RFC status, and application/ld+json already exists, so you might consider application/nd+ld+json, possibly with a synonymous application/ld+nd+json (which would need the apparently stagnant NDJSON project to change from application/x-ndjson to application/nd+json)

If you don't want to pin hopes on Media Types with Multiple Suffixes, you might also consider application/ld+ndjson, and again pushing the NDJSON project to change from application/x-ndjson to application/ndjson ...

Or leave the NDJSON project fallow as it stands, and consider application/ld+x-ndjson, which at least follows the general rules of Media Types, and parallels the existing application/ld+json.

This feels like a lot of frayed ends in search of a knot. That knot may be worthwhile, but I think it should be distinct from SPARQL 1.2.

afs · 2021-02-20T10:41:39Z

Won't it be application/sparql-results+x-ndjson for SELECT results and application/ld+x-ndjson for CONSTRUCT/DESCRIBE?

From JSON-LD, application/ld+... is about RDF graphs and datasets, and ...+json the concrete syntax choice. (c.f. rdf+xml).

gkellogg · 2021-02-20T19:20:44Z

It would seem that the appropriate place for this effort would be the JSON-LD CG (AKA the JSON for Linking Data Community Group), although the JSON-LD WG remains as a maintenance group.

Also, note that the WG published the Streaming JSON-LD note, which addresses the need for a streaming serialization format, but in this case by imposing an order object entries in the line serialization, although it is not a line format, per se.

At first glance, the NDJSON-LD would seem to follow well given an out-of-bound specified context, such as via Link header. That would make it much the same as parsing an outer object containing @context and the values of @graph. Going beyond, an extension for supporting an @context at the top level, either as a URL, or a one-line object, would be straight-forward. Nothing would prevent an individual NDJSON line from including @context, either, unless there is some limitation on line length I didn't notice.

VladimirAlexiev · 2021-02-21T15:42:48Z

@ericprud
The sample data we have cited in our jira looks like this

{"@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", "type": "MonetaryGrant", "id": "sg:grant.6616389",...\n 
{"@context": "https://springernature.github.io/scigraph/jsonld/sgcontext.json", "type": "MonetaryGrant", "id": "sg:grant.6616214",...\n ...

It's probably here http://scigraph.downloads.uberresearch.com/archives/current/grants.tar.gz

Right now we are considering NDJSON-LD for input,

but you make a good point that a streaming sparql-results-json for SELECT output would also be useful.

In fact, CONSTRUCT output as NDJSON-LD is non trivial because how would it know which triples to put on each line? How would it know which is the "main loop" of the query, or the "primary key" so to speak?

@TallTed thanks for the pointers to MIME developments!

@gkellogg thanks for the pointer to Streaming jsonld!

rubensworks · 2021-02-22T07:09:20Z

Pinging @wouterbeek here regarding NDJSON-LD, as he suggested it a while back here rubensworks/jsonld-streaming-parser.js#64

ericprud · 2021-02-22T07:40:34Z

There's a longish discussion of media subtypes containing '+' on [email protected].
(I don't actually think nd+json is viable because people assume that +json means the resource matchs 4627, but folks can always relax their standards if they don't mind breaking some stuff.)

afs · 2021-02-22T19:09:31Z

sparql-results+json is streaming if the fields are in the right order ("head" before "results").

Streaming a line format, used without the Content-length: and a line format, means there can be silent truncation of results.
No Content-Length interacts with connection management with some DOS potential by badly behaved clients.

These aren't reasons not to do it - they are things that should be noted in any design. Inside the enterprise is different environment to the open web.

jaw111 · 2021-09-08T09:54:38Z

Just to note a real-world use case for newline delimited JSON-LD. For one application we developed, we index suitably framed JSON-LD documents in Elasticsearch where the documents are imported to Elasticsearch as NDJSON. That process uses a Jena model to gather RDF data from various sources (blackboard design pattern), then extracts and frames a sub-graph for resources of a given type.

Whilst it would be nice to be able to get some NDJSON-LD serialization as the result of a SPARQL query directly, I think it would be necessary to have some way to indicate a JSON-LD frame (rather than just a context as @gkellogg suggested) in order to guarantee consistent nesting/embedding in the JSON object structure.

Arguably for our usage the JSON-LD frame IS the query, a SPARQL query is not even needed.

TallTed · 2021-09-08T19:34:00Z

Streaming a line format, used without the Content-length: and a line format, means there can be silent truncation of results.

@afs -- I would think that adding a specific termination marker to the syntax would avoid silent truncation without Content-length: -- and including the net line count in the termination marker (at which point, it should be trivially known) would prevent errors from missing lines, though it wouldn't give any good way to recover from such, other than repeating the request and running a diff on the two streams if the second also had some drop-outs...

afs · 2021-09-09T18:31:00Z

Content-Length is understood by HTTP/1.1 libraries and is used by them to reuse connections.

A trailer as protocol-level termination and including end-transfer information would be a good thing . It does not completely replace Content-Length though.

There is of course HTTP/2 - new protocol work ought to be an abstract design that exploits HTTP/2 features, can also be targeted at other transfer layers, for example, streaming gRPC. HTTP/1.1 may not be able to expose all of that design though improvements like early termination can be fitted.

VladimirAlexiev · 2022-02-17T09:26:11Z

@jaw111 thanks for the input!

necessary to have some way to indicate a JSON-LD frame

Yes, unless you have #39, #48, #73, #128 :-)

the JSON-LD frame IS the query

I think you're talking GraphQL here :-)

jaw111 · 2022-02-17T14:36:36Z

I think you're talking GraphQL here :-)

I was not able to come to terms with GraphQL-LD, still prefer SPARQL.

There is definitely some overlap between JSON-LD frames and GraphQL.

VladimirAlexiev · 2024-05-10T08:38:49Z

Just a note that @butaloto is working to upgrade our NDJSONLD implementation eclipse-rdf4j/rdf4j#2840 to JSONLD 1.1

JervenBolleman added the results application/sparql-results+rainbows label Nov 30, 2021

VladimirAlexiev mentioned this issue Sep 1, 2022

YAML Streams and JSON Sequences json-ld/yaml-ld#63

Open

pietercolpaert mentioned this issue Sep 6, 2024

Extending the writer with the possibility to write a comment to the outputStream rdfjs/N3.js#439

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ndjson-ld format #140

ndjson-ld format #140

VladimirAlexiev commented Feb 19, 2021 •

edited

Loading

ericprud commented Feb 19, 2021

TallTed commented Feb 20, 2021

afs commented Feb 20, 2021 •

edited

Loading

gkellogg commented Feb 20, 2021

VladimirAlexiev commented Feb 21, 2021 •

edited

Loading

rubensworks commented Feb 22, 2021 •

edited by ericprud

Loading

ericprud commented Feb 22, 2021

afs commented Feb 22, 2021

jaw111 commented Sep 8, 2021

TallTed commented Sep 8, 2021

afs commented Sep 9, 2021 •

edited

Loading

VladimirAlexiev commented Feb 17, 2022

jaw111 commented Feb 17, 2022

VladimirAlexiev commented May 10, 2024

ndjson-ld format #140

ndjson-ld format #140

Comments

VladimirAlexiev commented Feb 19, 2021 • edited Loading

Why?

Previous work

Proposed solution

Considerations for backward compatibility

ericprud commented Feb 19, 2021

TallTed commented Feb 20, 2021

afs commented Feb 20, 2021 • edited Loading

gkellogg commented Feb 20, 2021

VladimirAlexiev commented Feb 21, 2021 • edited Loading

rubensworks commented Feb 22, 2021 • edited by ericprud Loading

ericprud commented Feb 22, 2021

afs commented Feb 22, 2021

jaw111 commented Sep 8, 2021

TallTed commented Sep 8, 2021

afs commented Sep 9, 2021 • edited Loading

VladimirAlexiev commented Feb 17, 2022

jaw111 commented Feb 17, 2022

VladimirAlexiev commented May 10, 2024

VladimirAlexiev commented Feb 19, 2021 •

edited

Loading

afs commented Feb 20, 2021 •

edited

Loading

VladimirAlexiev commented Feb 21, 2021 •

edited

Loading

rubensworks commented Feb 22, 2021 •

edited by ericprud

Loading

afs commented Sep 9, 2021 •

edited

Loading