-
Notifications
You must be signed in to change notification settings - Fork 6
SETLr Basics Tutorial
SETLr is a powerful tool for creating RDF from tabular sources. This page will teach you the fundamentals of using SETLr to create semantic extract, transform, and load (SETL) workflows. We will start with a simple example of a spreadsheet containing just a few rows and columns, gradually introducing new SETL concepts and ideas as we work with more columns. By the end you will know the principles of SETLr and how to write your own SETL scripts.
To start, check out the code from Github, optionally create a python virtual environment, and install it using pip:
# Optional, but recommended.
virtualenv --no-site-packages venv
source venv/bin/activate
pip install setlr
To follow along with this tutorial, copy and paste the table in Sample Data into a spreadsheet program like Excel and save it as a CSV file called social.csv
in an empty directory.
ID | Name | MarriedTo | Knows | DOB |
---|---|---|---|---|
Alice | Alice Smith | Bob | Bob; Charles | 1/12/1983 |
Bob | Bob Smith | Alice | Alice; Charles | 3/23/1985 |
Charles | Charles Brown | Alice; Bob | 12/15/1955 | |
Dave | Dave Jones | 4/25/1967 |
We are writing this SETL file in Turtle, which means we can define some convenient prefixes to make it easier to refer to certain vocabularies. The following prefixes should be added to the beginning of your SETL file, which here you should call social.setl.ttl
:
@prefix prov: <http://www.w3.org/ns/prov#> .
@prefix dcat: <http://www.w3.org/ns/dcat#> .
@prefix dcterms: <http://purl.org/dc/terms/> .
@prefix void: <http://rdfs.org/ns/void#> .
@prefix setl: <http://purl.org/twc/vocab/setl/> .
@prefix csvw: <http://www.w3.org/ns/csvw#> .
@prefix pv: <http://purl.org/net/provenance/ns#> .
@prefix : <http://example.com/setl/> .
A SETL file is an RDF file that uses the PROV Ontology to describe activities (extracts, transforms, and loads) that use and generate entities (tables and graphs). Extracting data is fairly straightforward. The following describes a process where a setl:Table entity, called :table
, is generated by a setl:Extract activity that uses the file social.csv
. Add it to your file to load social.csv
into the resource :table
:
:table a csvw:Table, setl:Table;
csvw:delimiter ",";
prov:wasGeneratedBy [
a setl:Extract;
prov:used <social.csv>;
].
The type csvw:Table tells setlr that the table is to be interpreted as a CSV table, using the CSV on the Web vocabulary. SETLr supports the ability to indicate the delimiter used (using csvw:delimiter) and the number of initial rows to skip (using csvw:skipRows) in the file. _setl:Table_s are parsed into a data frame object using Pandas internally, and directly extracting RDF files is also supported. SETLr supports extracting the following data types:
Type | Format | Options | Parsed Type |
---|---|---|---|
csvw:Table, setl:Table | Comma (or other) Separated Value (CSV, TSV, etc.) | csvw:delimiter, csvw:skipRows | Data Frame |
setl:XPORT, setl:Table | SAS Transport (XPORT) file format | Data Frame | |
setl:SAS7BDAT, setl:Table | SAS Dataset file format | Data Frame | |
setl:Excel, setl:Table | XLS or XLSX file format | Data Frame | |
owl:Ontology | OWL Ontology file in RDF | RDF Graph | |
void:Dataset | RDF File | RDF Graph |
We will use :table
in the transformation process to generate some RDF. For more on Extract activities, see the Extract page.
The transformation process is easily the most complex of the processes to write. JSLDT, or JSON-LD Templates, relies on the design of JSON-LD to be a flexible templating system for RDF. We start this by describing the transformation with a very simple template:
<http://example.com/social> a void:Dataset;
prov:wasGeneratedBy [
a setl:Transform, setl:JSLDT;
prov:used :table;
setl:hasContext '''{
"foaf" : "http://xmlns.com/foaf/0.1/"
}''';
prov:value '''[{
"@id": "https://example.com/social/{{row.ID}}",
"@type": "foaf:Person",
"foaf:name": "{{row.Name}}"
}]'''].
Note that the dataset was generated by a setl:Transform that is also a setl:JSLDT, which tells SETLr how to process that transform. The property setl:hasContext is used to process contexts for all of the JSON-LD that is generated by this transform, but context can also be provided inside the JSLDT directly. The prov:value of the transform is the template itself:
[{
"@id": "https://example.com/social/{{row.ID}}",
"@type": "foaf:Person",
"foaf:name": "{{row.Name}}"
}]
This template is generated over each row
in the table, and every JSON key and value is applied through the Jinja templating engine. When the JSLDT is processed on the first row, it produces the following RDF in JSON-LD:
[{
"@id": "https://example.com/social/Alice",
"@type": "foaf:Person",
"foaf:name": "Alice Smith"
}]
These individual JSON-LD graphs are then aggregated together into the final graph, here serialized into Turtle:
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix xml: <http://www.w3.org/XML/1998/namespace> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
<https://example.com/social/Alice> a foaf:Person ;
foaf:name "Alice Smith" .
<https://example.com/social/Bob> a foaf:Person ;
foaf:name "Bob Smith" .
<https://example.com/social/Charles> a foaf:Person ;
foaf:name "Charles Brown" .
<https://example.com/social/Dave> a foaf:Person ;
foaf:name "Dave Jones" .
There is a lot more to learn about using JSLDT that will help you create exactly the RDF that you want. The JSLDT Template Language wiki page has the full tutorial on it.
SETLr supports two types of loading, to a file on disk or to a SPARQL endpoint. Loading to a file is fairly straightforward:
<social.ttl> a pv:File;
dcterms:format "text/turtle";
prov:wasGeneratedBy [
a setl:Load;
prov:used <http://example.com/social> ;
].
SETLr supports the following formats:
- RDF/XML:
- default
- application/rdf+xml
- text/rdf
- Turtle:
- text/turtle
- application/turtle
- application/x-turtle
- N-Triples: text/plain
- N3: text/n3
- TriG: application/trig
- JSON-LD: application/json
SETLr loads data into a triple store if the type of the generated entity is _sd:Service and has a sd:endpoint value:
@prefix sd: <http://www.w3.org/ns/sparql-service-description#>.
:sparql_load a setl:Load, sd:Service;
sd:endpoint <http://example.com/sparql>.
You can run your SETL script using the setlr
command:
$ setlr social.setl.ttl
It will create a file called social.ttl
that matches the example.