Skip to content

Latest commit

 

History

History
563 lines (469 loc) · 14 KB

README.md

File metadata and controls

563 lines (469 loc) · 14 KB

LDES extension

We added an extension to YARRRML to generate RML that generates a Linked Data Event Stream (LDES). LDES specifies how to model and publish changes in documents as a stream of events. Each event, called member in LDES speak, is a version of an original document.

We provide an ldes key in subjects mappings to generate necessary LDES members and metadata.

Example data

We will explain the different options for generating LDES by showing some examples. The YARRRML mappings and output are abbreviated (prefixes and sources are omitted) to focus on the relevant parts, but the complete examples can be found here.

All examples use the same input data: temperature readings from two sensors:

SensorID,Timestamp,Temperature
1,2023-01-01T08:00:00,8
2,2023-01-01T08:00:00,9
1,2023-01-01T09:00:00,9
2,2023-01-01T09:00:00,9

Example mappings

1. No configuration

A basic LDES can be generated by providing just the ldes key without values.

YARRML:

mappings:
  temperature-reading:
    sources: data-source
    subjects:
      - value: ex:$(SensorID)
        ldes:
        targets:
          - [out.ttl~void, turtle]
    po:
      - [a, ex:Thermometer]
      - [ex:temp, $(Temperature)]
      - [ex:ts, $(Timestamp), xsd:dateTime]

Output:

<1#0>
    ex:temp "8" ;
    ex:ts "2023-01-01T08:00:00"^^xsd:dateTime ;
    a ex:Thermometer .

<2#0>
    ex:temp "9" ;
    ex:ts "2023-01-01T08:00:00"^^xsd:dateTime ;
    a ex:Thermometer .

ex:eventstream
    a ldes:EventStream ;
    tree:member <1#0>, <2#0> ;
    tree:shape <shape.shacl> .

By default, the generated subject IRI is checked for uniqueness to determine if a new member needs to be generated. In this case the subject IRI is based on the SensorID, so there are only two members: one with id 1 and one with id 2.

2. Full configuration

Before diving into the details of every property, we show an example that uses all properties that define how an LDES gets generated, except memberIdFunction.

YARRRML:

mappings:
  temperature-reading:
    sources: data-source
    subjects:
      - value: ex:$(SensorID)
        targets:
          - [ out.ttl~void, turtle ]
        ldes:
          id: ex:myldes
          # basically generate a member for each record
          watchedProperties: [$(SensorID), $(Timestamp), $(Temperature)]
          shape: ex:shape.shacl
          timestampPath: [ex:ts, $(Timestamp), xsd:dateTime]
          versionOfPath: [ex:hasVersion, ex:$(SensorID)]

    po:
      - [a, ex:Thermometer]
      - [ex:temp, $(Temperature)]

Output:

<1#0>
    ex:hasVersion <1> ;
    ex:temp "8" ;
    ex:ts "2023-01-01T08:00:00" ;
    a ex:Thermometer .

<1#1>
    ex:hasVersion <1> ;
    ex:temp "9" ;
    ex:ts "2023-01-01T09:00:00" ;
    a ex:Thermometer .

<2#0>
    ex:hasVersion <2> ;
    ex:temp "9" ;
    ex:ts "2023-01-01T08:00:00" ;
    a ex:Thermometer .

<2#1>
    ex:hasVersion <2> ;
    ex:temp "9" ;
    ex:ts "2023-01-01T09:00:00" ;
    a ex:Thermometer .

ex:myldes
    a ldes:EventStream ;
    ldes:timestampPath ex:ts ;
    ldes:versionOfPath ex:hasVersion ;
    tree:member <1#0>, <1#1>, <2#0>, <2#1> ;
    tree:shape <shape.shacl> .

The id turns the IRI of the LDES EventSteam metadata to ex:myldes. We define a custom LDES id and shape IRI. We also define a timestampPath and a versionOfPath and specify what member triples they generate.

3. watchedProperties

The watchedProperties key is used to define which data records end up as members in the LDES. The watched properties, given as an array, are compared between members that would have the same subject IRI generated by the subject value template:

  • If at least one of these properties change, the generated subject IRI will be made unique and the member is added.
  • If the watched properties remain the same, or if none are given:
    • If the subject IRI template generates a unique IRI: add the new member.
    • If the subject IRI template doesn't generate a unique IRI: discard the new member because in this case this member is considered a duplicate of a previous one.

Here are some examples:

a. Temperature

In this case we're only interested in generating a new member if the temperature changes for a sensor:

YARRRML:

mappings:
  temperature-reading:
    sources: data-source
    subjects:
      - value: ex:$(SensorID)
        ldes:
          watchedProperties: [$(Temperature)]
        targets:
          - [out.ttl~void, turtle]
    po:
      - [a, ex:Thermometer]
      - [ex:temp, $(Temperature)]
      - [ex:ts, $(Timestamp), xsd:dateTime]

Output:

<1#0>
    ex:temp "8" ;
    ex:ts "2023-01-01T08:00:00"^^xsd:dateTime ;
    a ex:Thermometer .

<1#1>
    ex:temp "9" ;
    ex:ts "2023-01-01T09:00:00"^^xsd:dateTime ;
    a ex:Thermometer .

<2#0>
    ex:temp "9" ;
    ex:ts "2023-01-01T08:00:00"^^xsd:dateTime ;
    a ex:Thermometer .

ex:eventstream
    a ldes:EventStream ;
    tree:member <1#0>, <1#1>, <2#0> ;
    tree:shape <shape.shacl> .

There are two members for sensor 1 (two readings with different temperature values) and only one for sensor 2 (same values for temperature in each reading).

b. Timestamp

The previous example showed how to create new members if temperature changes. This example creates new members if the timestamp changes.

YARRRML:

mappings:
  temperature-reading:
    sources: data-source
    subjects:
      - value: ex:$(SensorID)
        ldes:
          watchedProperties: [$(Timestamp)]
        targets:
          - [out.ttl~void, turtle]
    po:
      - [a, ex:Thermometer]
      - [ex:temp, $(Temperature)]
      - [ex:ts, $(Timestamp), xsd:dateTime]

Output:

<1#0>
    ex:temp "8" ;
    ex:ts "2023-01-01T08:00:00"^^xsd:dateTime ;
    a ex:Thermometer .

<1#1>
    ex:temp "9" ;
    ex:ts "2023-01-01T09:00:00"^^xsd:dateTime ;
    a ex:Thermometer .

<2#0>
    ex:temp "9" ;
    ex:ts "2023-01-01T08:00:00"^^xsd:dateTime ;
    a ex:Thermometer .

<2#1>
    ex:temp "9" ;
    ex:ts "2023-01-01T09:00:00"^^xsd:dateTime ;
    a ex:Thermometer .

ex:eventstream
    a ldes:EventStream ;
    tree:member <1#0>, <1#1>, <2#0>, <2#1> ;
    tree:shape <shape.shacl> .

This time every reading produces a member because for every sensor each reading has a different timestamp.

c. No watchedProperties

In example 1. No configuration, no watchedProperties given, so member generation depends on the subject template, given by the value key in subjects. Since the template uses the SensorID it only generates members when a reading of a new sensor arrives.

4. versionOfPath

versionOfPath specifies LDESs ldes:versionOfPath predicate and object.

  • If not present, no ldes:versionOfPath is generated.
  • If a predicate and IRI template are given, then ldes:versionOfPath is defined by the predicate and the value that is defined by that template. E.g.:
    versionOfPath: [dcterms:isVersionOf, ex:$(SensorID)]
  • If only a predicate is given, then the versionOfPath is defined by that predicate and the value is defined by:
    • the corresponding object mapping for the predicate, if any, or
    • the subject template. E.g.:
      versionOfPath: [dcterms:isVersionOf]
      the value template is in this case the subject template: ex:$(SensorID)
  • If an empty array is given, then the predicate defaults to dcterms:isVersionOf and the value template defaults to:
    • the corresponding object mapping for the predicate, if any, or
    • the subject value template. E.g.:
      versionOfPath: []

Here are some examples:

a. Empty versionOfPath

This example shows that the default ldes:versionOfPath with a predicate dcters:isVersionOf is generated. The corresponding predicate and objects are generated for each member.

YARRRML:

mappings:
  temperature-reading:
    sources: data-source
    subjects: 
      - value: ex:$(SensorID)
        ldes:
          versionOfPath: []
        targets:
          - [out.ttl~void, turtle]
    po:
      - [a, ex:Thermometer]
      - [ex:temp, $(Temperature)]
      - [ex:ts, $(Timestamp), xsd:dateTime]

Output:

<1#0>
    ex:temp "8" ;
    ex:ts "2023-01-01T08:00:00"^^xsd:dateTime ;
    dcterms:isVersionOf <1> ;
    a ex:Thermometer .

<2#0>
    ex:temp "9" ;
    ex:ts "2023-01-01T08:00:00"^^xsd:dateTime ;
    dcterms:isVersionOf <2> ;
    a ex:Thermometer .

ex:eventstream
    a ldes:EventStream ;
    ldes:versionOfPath dcterms:isVersionOf ;
    tree:member <1#0>, <2#0> ;
    tree:shape <shape.shacl> .

b. versionOfPath with predicate

The next example shows how a versionOfPath property with a given predicate results in members using that predicate, without having to define it in the predicateobject mappings.

YARRRML:

mappings:
  temperature-reading:
    sources: data-source
    subjects: 
      - value: ex:$(SensorID)
        ldes:
          versionOfPath: [ex:hasOriginal]
        targets:
          - [out.ttl~void, turtle]
    po:
      - [a, ex:Thermometer]
      - [ex:temp, $(Temperature)]
      - [ex:ts, $(Timestamp), xsd:dateTime]

Output:

<1#0>
    ex:hasOriginal <1> ;
    ex:temp "8" ;
    ex:ts "2023-01-01T08:00:00"^^xsd:dateTime ;
    a ex:Thermometer .

<2#0>
    ex:hasOriginal <2> ;
    ex:temp "9" ;
    ex:ts "2023-01-01T08:00:00"^^xsd:dateTime ;
    a ex:Thermometer .

ex:eventstream
    a ldes:EventStream ;
    ldes:versionOfPath ex:hasOriginal ;
    tree:member <1#0>, <2#0> ;
    tree:shape <shape.shacl> .

c. Custom versionOfPath

This example shows a versionOfPath with a custom predicate and object referring to another IRI than the derived from the subject template.

YARRRML:

mappings:
  temperature-reading:
    sources: data-source
    subjects: 
      - value: ex:$(SensorID)
        ldes:
          versionOfPath: [ex:hasOriginal, ex:original/$(SensorID)]
        targets:
          - [out.ttl~void, turtle]
    po:
      - [a, ex:Thermometer]
      - [ex:temp, $(Temperature)]
      - [ex:ts, $(Timestamp), xsd:dateTime]

Output:

<1#0>
    ex:hasOriginal <original/1> ;
    ex:temp "8" ;
    ex:ts "2023-01-01T08:00:00"^^xsd:dateTime ;
    a ex:Thermometer .

<2#0>
    ex:hasOriginal <original/2> ;
    ex:temp "9" ;
    ex:ts "2023-01-01T08:00:00"^^xsd:dateTime ;
    a ex:Thermometer .

ex:eventstream
    a ldes:EventStream ;
    ldes:versionOfPath ex:hasOriginal ;
    tree:member <1#0>, <2#0> ;
    tree:shape <shape.shacl> .

5. timestampPath

timestampPath specifies the predicate and optionally object used to indicate the LDESs ldes:timestampPath.

  • If no timestampPath is present, no ldes:timestampPath will be generated.
  • If only a predicate is given, it has to be present in the predicateobject mappings. In that case the object is defined there. E.g.:
    timestampPath: [ex:ts]
    In this case a predicateobject mapping must exist, e.g.:
    po: [[ex:ts, $(Timestamp)]]
  • If a predicate and an object are given, an implicit predicateobject mapping with the given object will be added. E.g.:
    timestampPath: [ex:ts, $(Timestamp)]
    This is equivalent to the previous example, but no explicit predicateobjectmapping must be defined.

Here are some examples:

a. With predicate only

This example defines a timestampPath using an existing predicateobject mapping for ex:ts:

YARRRML:

mappings:
  temperature-reading:
    sources: data-source
    subjects: 
      - value: ex:$(SensorID)
        ldes:
          timestampPath: [ex:ts]
        targets:
          - [out.ttl~void, turtle]
    po:
      - [a, ex:Thermometer]
      - [ex:temp, $(Temperature)]
      - [ex:ts, $(Timestamp), xsd:dateTime]

Output:

<1#0>
    ex:temp "8" ;
    ex:ts "2023-01-01T08:00:00"^^xsd:dateTime ;
    a ex:Thermometer .

<2#0>
    ex:temp "9" ;
    ex:ts "2023-01-01T08:00:00"^^xsd:dateTime ;
    a ex:Thermometer .

ex:eventstream
    a ldes:EventStream ;
    ldes:timestampPath ex:ts ;
    tree:member <1#0>, <2#0> ;
    tree:shape <shape.shacl> .

b. Custom timestampPath

It is possible to define a custom timestampPath , where the predicate and object are not present in the predicateobject mappings.

YARRRML:

mappings:
  temperature-reading:
    sources: data-source
    subjects: 
      - value: ex:$(SensorID)
        ldes:
          timestampPath: [ex:ts, $(Timestamp), xsd:dateTime]
        targets:
          - [out.ttl~void, turtle]
    po:
      - [a, ex:Thermometer]
      - [ex:temp, $(Temperature)]

Output:

<1#0>
    ex:temp "8" ;
    ex:ts "2023-01-01T08:00:00"^^xsd:dateTime ;
    a ex:Thermometer .

<2#0>
    ex:temp "9" ;
    ex:ts "2023-01-01T08:00:00"^^xsd:dateTime ;
    a ex:Thermometer .

ex:eventstream
    a ldes:EventStream ;
    ldes:timestampPath ex:ts ;
    tree:member <1#0>, <2#0> ;
    tree:shape <shape.shacl> .

Event Stream ID (id)

The IRI of the generated 'ldes:EventStream' object defaults to http://example.org/eventStream. This is often not what you want. This IRI is easily cutomized with the id key:

YARRRML:

mappings:
  temperature-reading:
    sources: data-source
    subjects:
      - value: ex:$(SensorID)
        ldes:
          id: http://ldes.org/thisisanldeswithacustomid
        targets:
          - [out.ttl~void, turtle]
    po:
      - [a, ex:Thermometer]
      - [ex:temp, $(Temperature)]
      - [ex:ts, $(Timestamp), xsd:dateTime]

Output:

<1#0>
    ex:temp "8" ;
    ex:ts "2023-01-01T08:00:00"^^xsd:dateTime ;
    a ex:Thermometer .

<2#0>
    ex:temp "9" ;
    ex:ts "2023-01-01T08:00:00"^^xsd:dateTime ;
    a ex:Thermometer .

<http://ldes.org/thisisanldeswithacustomid>
    a ldes:EventStream ;
    tree:member <1#0>, <2#0> ;
    tree:shape <shape.shacl> .

SHACL Shape (shape)

The shape key allows to refer to a SHACL shape that can be used to validate members, for instance by an LDES Server implementation. It defaults to ex:shape.shacl, but can be customized. Note that the shape itself is not generated by the LDES extension.