-
Notifications
You must be signed in to change notification settings - Fork 6
JSLDT Template Language
The JSON-LD Template (JSLDT) language is a template language for the production of RDF from tabular data. It is used by SETLr as the transform process of loaded tabular Pandas data frames into RDF graphs by applying the template to each row in the data frame. The row
object provides access to the values of each row in the frame and is an instance of pandas.Series. Initially, the following variables are available:
- row: the row being processed in the table.
- table: the table currently being processed.
- name: the index of the row.
- template: the full JSON template being processed.
- transform: the RDFlib Resource describing the current transform.
- setl_graph: the RDFlib Graph that contains the SETL description.
- re: the python regular expression library module.
- isempty(): a function to safely test if a particular value exists
- hash(): a function that generates a random UUID.
- resources: the dictionary of resources that have been processed in the SETL script
The full RDFlib package is also imported locally. Tables are processed by using them:
<http://example.com/social> a void:Dataset;
prov:wasGeneratedBy [
a setl:Transform, setl:JSLDT;
prov:used :table.
Additionally, if you have a very large dataset, it might be important to persist the dataset to disk as it's generated instead of keeping it in memory. If this is important, add the type setl:Persisted to the artifact:
<http://example.com/social> a void:Dataset, setl:Persisted;
prov:wasGeneratedBy [
a setl:Transform, setl:JSLDT;
prov:used :table.
JSLDT is processed as a tree, creating a new tree copy for each list, map, or value in the template. The strings of each key and value are processed as Jinja templates, which provides significant flexibility when creating the resulting template. The rest of this tutorial builds on the example started in the SETLr Tutorial. Extend the prov:value of the transform with the templates we show below. We start with the template used in the SETLr Tutorial:
[{
"@id": "https://example.com/social/{{row.ID}}",
"@type": "foaf:Person",
"foaf:name": "{{row.Name}}"
}]
For the first row in our sample data, {{row.ID}}
is replaced with Alice
and row.Name
is replaced with Alice Smith
to create the following JSON-LD:
[{
"@id": "https://example.com/social/Alice",
"@type": "foaf:Person",
"foaf:name": "Alice Smith"
}]
One thing to note is that, while the engine may produce valid JSON based on an arbitrary JSONLDT template, the result may not parse as valid RDF. SETLr therefore parses the generated JSON-LD into an RDFlib graph for complete row by row validation.
The @if
keyword is used to only include maps when the conditional Python expression in the value of the @if key is met. For instance, if a cell is missing data, we can skip processing that value:
[{
"@id": "https://example.com/social/{{row.ID}}",
"@type": "foaf:Person",
"foaf:name": "{{row.Name}}",
"http://schema.org/spouse": [{
"@if" : "not isempty(row.MarriedTo)",
"@id" : "https://example.com/social/{{row.MarriedTo}}"
}]
}]
The array around the conditional map is needed so that valid JSON-LD is produced whether the conditional is true (and there is a single object) or false (where there are no objects, but instead an empty list). The output of our sample table then looks like this in Turtle:
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix ns1: <http://schema.org/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix xml: <http://www.w3.org/XML/1998/namespace> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
<https://example.com/social/Charles> a foaf:Person ;
foaf:name "Charles Brown" .
<https://example.com/social/Dave> a foaf:Person ;
foaf:name "Dave Jones" .
<https://example.com/social/Alice> a foaf:Person ;
ns1:spouse <https://example.com/social/Bob> ;
foaf:name "Alice Smith" .
<https://example.com/social/Bob> a foaf:Person ;
ns1:spouse <https://example.com/social/Alice> ;
foaf:name "Bob Smith" .
Note that Alice and Bob are stated to be spouses of each other, but Charles and Dave have none, because those cells were empty. One can add the schema:
prefix for schema.org to the contexts to have SETLr output more attractive prefixes for schema.org.
Sometimes you need to break cells into multiple fields, or iterate over other values. JSLDT provides the means to iterate nodes over Python expressions that evaluate to iterable objects. For instance, strings that are split up can be iterated over to provide multiple links:
[{
"@id": "https://example.com/social/{{row.ID}}",
"@type": "foaf:Person",
"foaf:name": "{{row.Name}}",
"http://schema.org/spouse": [{
"@if" : "not isempty(row.MarriedTo)",
"@id" : "https://example.com/social/{{row.ID}}"
}],
"foaf:knows": [{
"@if" : "not isempty(row.Knows)",
"@for" : "friend in row.Knows.split('; ')",
"@do" : { "@id" : "https://example.com/social/{{friend}}" }
}]
}]
This results in the following RDF:
<https://example.com/social/Dave> a foaf:Person ;
foaf:name "Dave Jones" .
<https://example.com/social/Charles> a foaf:Person ;
foaf:knows <https://example.com/social/Alice>,
<https://example.com/social/Bob> ;
foaf:name "Charles Brown" .
<https://example.com/social/Alice> a foaf:Person ;
ns1:spouse <https://example.com/social/Bob> ;
foaf:knows <https://example.com/social/Bob>,
<https://example.com/social/Charles> ;
foaf:name "Alice Smith" .
<https://example.com/social/Bob> a foaf:Person ;
ns1:spouse <https://example.com/social/Alice> ;
foaf:knows <https://example.com/social/Alice>,
<https://example.com/social/Charles> ;
foaf:name "Bob Smith" .
The @do
node is repeated in place of the parent @for
node for each value, and the value is assigned to a scoped variable called friend
. Note that we added in the use of @if
as well here: @if
has precedence in evaluation, so there can be a test on the overall value to determine if the @for
should be attempted. To test on each iterated value, place an @if
inside the value of @do
instead.
@for
loops can also assign multiple values if the iterable itself provides an iterable value. This makes it easier to operate on maps and other structures that might be returned by complex python expressions.
Another source of iterable values may be some sort of data specification or dictionary, a query, an ontology, or other possibilities. These resources can be made accessible by including a PROV qualified usage with an identified role. For instance, if you would like to access secondary table to iterate over in the transformation, you can include it as a variable that can be used in the template:
:graph a void:Dataset, dcat:Dataset;
prov:wasGeneratedBy [
a setl:Transform, setl:JSLDT;
prov:qualifiedUsage [ a prov:Usage; prov:entity :datadict; prov:hadRole [ dcterms:identifier "datadict"]];
prov:used :table;
# ...
].
This is the entirety of the JSLDT language. To better understand how to use this well, a deeper understanding of JSON-LD, Python expressions, Pandas, and the Jinja templating language would help significantly. The following recipes are ideas that help you explore the potential of JSLDT.
A naive RDF conversion can be done on almost any table using the following simple template:
[{
"@context":{
"@vocab" : "http://example.com/ns/"
},
"@for" : "p, o in row.iteritems()",
"@do" : {
"@if" : "not isempty(o)",
"@id" : "https://example.com/social/{{name}}",
"http://example.com/ns/{{p}}" : "{{o}}"
}
}]
This results in the following RDF:
@prefix : <http://example.com/ns/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix xml: <http://www.w3.org/XML/1998/namespace> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
<https://example.com/social/0> :DOB "1/12/1983" ;
:ID "Alice" ;
:Knows "Bob; Charles" ;
:MarriedTo "Bob" ;
:Name "Alice Smith" .
<https://example.com/social/1> :DOB "3/23/1985" ;
:ID "Bob" ;
:Knows "Alice; Charles" ;
:MarriedTo "Alice" ;
:Name "Bob Smith" .
<https://example.com/social/2> :DOB "12/15/1955" ;
:ID "Charles" ;
:Knows "Alice; Bob" ;
:Name "Charles Brown" .
<https://example.com/social/3> :DOB "4/25/1967" ;
:ID "Dave" ;
:Name "Dave Jones" .