ShEx#
This document contains a short introduction to ShEx using rudof.
Preliminaries: Install and configure rudof#
First, we install and configure rudof.
# @title
!pip install pyrudof
from pyrudof import Rudof, RudofConfig
rudof = Rudof(RudofConfig())
Requirement already satisfied: pyrudof in /opt/hostedtoolcache/Python/3.11.13/x64/lib/python3.11/site-packages (0.1.135)
Validating RDF using ShEx#
ShEx (Shape Expressions) is a concise and human-readable language to describe and validate RDF data.
A ShEx schema contains several declarations of shapes and can be defined in several formats: a compact format (ShExC), a JSON-LD format (ShExJ) and an RDF based on (ShExR).
Let’s start defining a simple ShEx schema as:
rudof.read_shex_str("""
prefix : <http://example.org/>
prefix xsd: <http://www.w3.org/2001/XMLSchema#>
:User {
:name xsd:string ;
:birthDate xsd:date ;
:knows @:User * ;
:worksFor @:Company *
}
:Company {
:name xsd:string ;
:code xsd:integer ;
:employee @:User *
}
""")
And let’s define some RDF data.
rudof.read_data_str("""
prefix : <http://example.org/>
prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
prefix xsd: <http://www.w3.org/2001/XMLSchema#>
:alice a :Person ;
:name "Alice" ;
:birthDate "2005-03-01"^^xsd:date ;
:worksFor :acme ;
:knows :bob .
:bob a :Person ;
:name "Robert Smith" ;
:birthDate "2003-01-02"^^xsd:date ;
:worksFor :acme ;
:knows :alice .
:acme a :Company ;
:name "Acme Inc." ;
:code 23 .
""")
In order to validate nodes in a graph, ShEx uses a ShapeMap, which associates a node selector with a shape.
The simplest shape map is a pair node@shape.
Rudof keeps the current shape map in a structure so it can be reused.
The method read_shapemap_str(shapemap) can be used to read a shapemap from a string and store it’s value as the currrent shape map.
In the previous example, if we want to validate :alice as a :Person we can use:
rudof.read_shapemap_str(":alice@:User")
Once the ShEx schema and the Shapemap have been added to rudof, it is possible to validate the current RDF data with the validate_shex() method:
results = rudof.validate_shex()
validate_shex() returns a ResultShapeMap object which contains a show_as_table() method to show the results of the validation.
print(results.show_as_table())
╭────────┬───────┬────────╮
│ Node │ Shape │ Status │
├────────┼───────┼────────┤
│ :alice │ :User │ OK │
╰────────┴───────┴────────╯
If we want to validate :acme as a :Person, we could do:
rudof.read_shapemap_str(":alice@:Company")
results = rudof.validate_shex()
print(results.show_as_table())
╭────────┬──────────┬────────╮
│ Node │ Shape │ Status │
├────────┼──────────┼────────┤
│ :alice │ :Company │ FAIL │
╰────────┴──────────┴────────╯
It is possible to know more details about the reason for failing with the with_details=True parameter.
print(results.show_as_table(with_details=True))
╭────────┬──────────┬────────┬─────────────────────────────────────────────────────────────────────╮
│ Node │ Shape │ Status │ Details │
├────────┼──────────┼────────┼─────────────────────────────────────────────────────────────────────┤
│ :alice │ :Company │ FAIL │ Error Shape 1 failed for node http://example.org/alice with errors │
│ │ │ │ │
╰────────┴──────────┴────────┴─────────────────────────────────────────────────────────────────────╯
The shapemap can contain a list of nodes and shapes, so it is possible to run several validations like:
rudof.read_shapemap_str(":alice@:User, :bob@:User")
results = rudof.validate_shex()
print(results.show_as_table())
╭────────┬───────┬────────╮
│ Node │ Shape │ Status │
├────────┼───────┼────────┤
│ :alice │ :User │ OK │
├────────┼───────┼────────┤
│ :bob │ :User │ OK │
╰────────┴───────┴────────╯
Sometimes, it is necessary to get the list of results in Python to process them. The ResultShapeMap class contains a method to_list() which returns a tuple of (node, shape, status).
for (node, shape, status) in results.to_list():
print(f"Node: {node.show()}")
print(f"Shape: {shape.show()}")
print(f"Conformant?: {status.is_conformant()}")
print(f"Appinfo: {status.as_json()}")
print("")
Node: http://example.org/bob
Shape: http://example.org/User
Conformant?: True
Appinfo: {'info': [{'reason': 'Shape passed. Node http://example.org/bob, shape 0: Shape Preds: http://example.org/name,http://example.org/birthDate,http://example.org/knows,http://example.org/worksFor, TripleExpr: RBE [C0;C1;C2*;C3*;], Keys: [http://example.org/name -> {C0}, http://example.org/birthDate -> {C1}, http://example.org/knows -> {C2}, http://example.org/worksFor -> {C3}], conds: [C0 -> xsd:string, C1 -> xsd:date, C2 -> @0, C3 -> @1], References: [http://example.org/knows->0, http://example.org/worksFor->1]'}], 'reason': 'Shape passed. Node :bob, shape 0: :User = {(:name xsd:string ; :birthDate xsd:date ; :knows @0* ; :worksFor @1* ; )}\n', 'status': 'conformant'}
Node: http://example.org/alice
Shape: http://example.org/User
Conformant?: True
Appinfo: {'info': [{'reason': 'Shape passed. Node http://example.org/alice, shape 0: Shape Preds: http://example.org/name,http://example.org/birthDate,http://example.org/knows,http://example.org/worksFor, TripleExpr: RBE [C0;C1;C2*;C3*;], Keys: [http://example.org/name -> {C0}, http://example.org/birthDate -> {C1}, http://example.org/knows -> {C2}, http://example.org/worksFor -> {C3}], conds: [C0 -> xsd:string, C1 -> xsd:date, C2 -> @0, C3 -> @1], References: [http://example.org/worksFor->1, http://example.org/knows->0]'}], 'reason': 'Shape passed. Node :alice, shape 0: :User = {(:name xsd:string ; :birthDate xsd:date ; :knows @0* ; :worksFor @1* ; )}\n', 'status': 'conformant'}
We reset the status of the ShEx schema, the Shapemap and the current RDF data for the next section.
# @title
rudof = Rudof(RudofConfig())
Validating SPARQL endpoints#
It is also possible to validate RDF data which is not local but is available in a SPARQL endpoint like wikidata or dbpedia. Let’s start with Wikidata:
rudof.use_endpoint("wikidata")
We can declare a simple shape in Wikidata as follows:
rudof.read_shex_str("""
prefix : <http://example.org/>
prefix wd: <http://www.wikidata.org/entity/>
prefix wdt: <http://www.wikidata.org/prop/direct/>
:Researcher {
wdt:P31 [ wd:Q5 ] ; # Instance of Human
wdt:P19 @:Place ; # BirthPlace
}
:Place {
wdt:P17 @:Country * ; # Country
}
:Country {}
""")
rudof.read_shapemap_str("wd:Q80@:Researcher")
results = rudof.validate_shex()
print(results.show_as_table(with_details=True))
╭────────────────────────────────────┬─────────────┬────────┬──────────────────────────────────────────────────────────────────────────────────╮
│ Node │ Shape │ Status │ Details │
├────────────────────────────────────┼─────────────┼────────┼──────────────────────────────────────────────────────────────────────────────────┤
│ http://www.wikidata.org/entity/Q80 │ :Researcher │ OK │ Shape passed. Node http://www.wikidata.org/entity/Q80, shape 0: :Researcher = {( │
│ │ │ │ wdt:P31 [wd:Q5 ] ; wdt:P19 @1 ; )} │
│ │ │ │ │
╰────────────────────────────────────┴─────────────┴────────┴──────────────────────────────────────────────────────────────────────────────────╯
Visualizing ShEx schemas#
rudof can be used to convert ShEx to diagrams in UML-like style. The converter generates a PlantUML string which can be written to a file and converted to an image using the PlantUML tool.
from pyrudof import UmlGenerationMode
rudof.read_shex_str("""
prefix : <http://example.org/>
prefix xsd: <http://www.w3.org/2001/XMLSchema#>
:User {
:name xsd:string ;
:worksFor @:Company * ;
:addres @:Address ;
:knows @:User
}
:Company {
:name xsd:string ;
:code xsd:string ;
:employee @:User
}
:Address {
:name xsd:string ;
:zip_code xsd:string
}
""")
plant_uml = rudof.shex2plantuml_file(UmlGenerationMode(), 'out.puml')
Now we install the PlantUML tools necessary to process the generated plant_uml
# @title
! pip install plantuml
! pip install ipython
!python -m plantuml out.puml
from IPython.display import Image
Requirement already satisfied: plantuml in /opt/hostedtoolcache/Python/3.11.13/x64/lib/python3.11/site-packages (0.3.0)
Requirement already satisfied: httplib2 in /opt/hostedtoolcache/Python/3.11.13/x64/lib/python3.11/site-packages (from plantuml) (0.31.0)
Requirement already satisfied: pyparsing<4,>=3.0.4 in /opt/hostedtoolcache/Python/3.11.13/x64/lib/python3.11/site-packages (from httplib2->plantuml) (3.2.5)
Requirement already satisfied: ipython in /opt/hostedtoolcache/Python/3.11.13/x64/lib/python3.11/site-packages (9.6.0)
Requirement already satisfied: decorator in /opt/hostedtoolcache/Python/3.11.13/x64/lib/python3.11/site-packages (from ipython) (5.2.1)
Requirement already satisfied: ipython-pygments-lexers in /opt/hostedtoolcache/Python/3.11.13/x64/lib/python3.11/site-packages (from ipython) (1.1.1)
Requirement already satisfied: jedi>=0.16 in /opt/hostedtoolcache/Python/3.11.13/x64/lib/python3.11/site-packages (from ipython) (0.19.2)
Requirement already satisfied: matplotlib-inline in /opt/hostedtoolcache/Python/3.11.13/x64/lib/python3.11/site-packages (from ipython) (0.2.1)
Requirement already satisfied: pexpect>4.3 in /opt/hostedtoolcache/Python/3.11.13/x64/lib/python3.11/site-packages (from ipython) (4.9.0)
Requirement already satisfied: prompt_toolkit<3.1.0,>=3.0.41 in /opt/hostedtoolcache/Python/3.11.13/x64/lib/python3.11/site-packages (from ipython) (3.0.52)
Requirement already satisfied: pygments>=2.4.0 in /opt/hostedtoolcache/Python/3.11.13/x64/lib/python3.11/site-packages (from ipython) (2.19.2)
Requirement already satisfied: stack_data in /opt/hostedtoolcache/Python/3.11.13/x64/lib/python3.11/site-packages (from ipython) (0.6.3)
Requirement already satisfied: traitlets>=5.13.0 in /opt/hostedtoolcache/Python/3.11.13/x64/lib/python3.11/site-packages (from ipython) (5.14.3)
Requirement already satisfied: typing_extensions>=4.6 in /opt/hostedtoolcache/Python/3.11.13/x64/lib/python3.11/site-packages (from ipython) (4.15.0)
Requirement already satisfied: wcwidth in /opt/hostedtoolcache/Python/3.11.13/x64/lib/python3.11/site-packages (from prompt_toolkit<3.1.0,>=3.0.41->ipython) (0.2.14)
Requirement already satisfied: parso<0.9.0,>=0.8.4 in /opt/hostedtoolcache/Python/3.11.13/x64/lib/python3.11/site-packages (from jedi>=0.16->ipython) (0.8.5)
Requirement already satisfied: ptyprocess>=0.5 in /opt/hostedtoolcache/Python/3.11.13/x64/lib/python3.11/site-packages (from pexpect>4.3->ipython) (0.7.0)
Requirement already satisfied: executing>=1.2.0 in /opt/hostedtoolcache/Python/3.11.13/x64/lib/python3.11/site-packages (from stack_data->ipython) (2.2.1)
Requirement already satisfied: asttokens>=2.1.0 in /opt/hostedtoolcache/Python/3.11.13/x64/lib/python3.11/site-packages (from stack_data->ipython) (3.0.0)
Requirement already satisfied: pure-eval in /opt/hostedtoolcache/Python/3.11.13/x64/lib/python3.11/site-packages (from stack_data->ipython) (0.2.3)
Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "/opt/hostedtoolcache/Python/3.11.13/x64/lib/python3.11/site-packages/plantuml.py", line 230, in <module>
main()
File "/opt/hostedtoolcache/Python/3.11.13/x64/lib/python3.11/site-packages/plantuml.py", line 225, in main
print(list(map(lambda filename: {'filename': filename,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/hostedtoolcache/Python/3.11.13/x64/lib/python3.11/site-packages/plantuml.py", line 226, in <lambda>
'gen_success': pl.processes_file(filename, directory=args.out)}, args.files)))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/hostedtoolcache/Python/3.11.13/x64/lib/python3.11/site-packages/plantuml.py", line 199, in processes_file
content = self.processes(data)
^^^^^^^^^^^^^^^^^^^^
File "/opt/hostedtoolcache/Python/3.11.13/x64/lib/python3.11/site-packages/plantuml.py", line 173, in processes
raise PlantUMLHTTPError(response, content)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/hostedtoolcache/Python/3.11.13/x64/lib/python3.11/site-packages/plantuml.py", line 56, in __init__
if not self.message:
^^^^^^^^^^^^
AttributeError: 'PlantUMLHTTPError' object has no attribute 'message'
Image(f"out.png")