ShEx#

This document contains a short introduction to ShEx using rudof.

Preliminaries: Install and configure rudof#

First, we install and configure rudof.

# @title
!pip install pyrudof
from pyrudof import Rudof, RudofConfig
rudof = Rudof(RudofConfig())
Requirement already satisfied: pyrudof in /opt/hostedtoolcache/Python/3.11.13/x64/lib/python3.11/site-packages (0.1.135)

Validating RDF using ShEx#

ShEx (Shape Expressions) is a concise and human-readable language to describe and validate RDF data.

A ShEx schema contains several declarations of shapes and can be defined in several formats: a compact format (ShExC), a JSON-LD format (ShExJ) and an RDF based on (ShExR).

Let’s start defining a simple ShEx schema as:

rudof.read_shex_str("""
prefix :     <http://example.org/>
prefix xsd:  <http://www.w3.org/2001/XMLSchema#>

:User {
 :name      xsd:string   ;
 :birthDate xsd:date     ;
 :knows     @:User     * ;
 :worksFor  @:Company  *
}

:Company {
  :name     xsd:string    ;
  :code     xsd:integer   ;
  :employee @:User      *
}
""")

And let’s define some RDF data.

rudof.read_data_str("""
prefix : <http://example.org/>
prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
prefix xsd: <http://www.w3.org/2001/XMLSchema#>

:alice a :Person ;
 :name      "Alice"                ;
 :birthDate "2005-03-01"^^xsd:date ;
 :worksFor  :acme                  ;
 :knows     :bob                   .
:bob a :Person   ;
 :name      "Robert Smith"         ;
 :birthDate "2003-01-02"^^xsd:date ;
 :worksFor  :acme                  ;
 :knows     :alice                 .
:acme  a :Company  ;
 :name      "Acme Inc." ;
 :code      23          .
""")

In order to validate nodes in a graph, ShEx uses a ShapeMap, which associates a node selector with a shape.

The simplest shape map is a pair node@shape.

Rudof keeps the current shape map in a structure so it can be reused.

The method read_shapemap_str(shapemap) can be used to read a shapemap from a string and store it’s value as the currrent shape map.

In the previous example, if we want to validate :alice as a :Person we can use:

rudof.read_shapemap_str(":alice@:User")

Once the ShEx schema and the Shapemap have been added to rudof, it is possible to validate the current RDF data with the validate_shex() method:

results = rudof.validate_shex()

validate_shex() returns a ResultShapeMap object which contains a show_as_table() method to show the results of the validation.

print(results.show_as_table())
╭────────┬───────┬────────╮
│ Node   │ Shape │ Status │
├────────┼───────┼────────┤
│ :alice │ :User │ OK     │
╰────────┴───────┴────────╯

If we want to validate :acme as a :Person, we could do:

rudof.read_shapemap_str(":alice@:Company")
results = rudof.validate_shex()
print(results.show_as_table())
╭────────┬──────────┬────────╮
│ Node   │ Shape    │ Status │
├────────┼──────────┼────────┤
│ :alice │ :Company │ FAIL   │
╰────────┴──────────┴────────╯

It is possible to know more details about the reason for failing with the with_details=True parameter.

print(results.show_as_table(with_details=True))
╭────────┬──────────┬────────┬─────────────────────────────────────────────────────────────────────╮
│ Node   │ Shape    │ Status │ Details                                                             │
├────────┼──────────┼────────┼─────────────────────────────────────────────────────────────────────┤
│ :alice │ :Company │ FAIL   │ Error Shape 1 failed for node http://example.org/alice with errors  │
│        │          │        │                                                                     │
╰────────┴──────────┴────────┴─────────────────────────────────────────────────────────────────────╯

The shapemap can contain a list of nodes and shapes, so it is possible to run several validations like:

rudof.read_shapemap_str(":alice@:User, :bob@:User")
results = rudof.validate_shex()
print(results.show_as_table())
╭────────┬───────┬────────╮
│ Node   │ Shape │ Status │
├────────┼───────┼────────┤
│ :alice │ :User │ OK     │
├────────┼───────┼────────┤
│ :bob   │ :User │ OK     │
╰────────┴───────┴────────╯

Sometimes, it is necessary to get the list of results in Python to process them. The ResultShapeMap class contains a method to_list() which returns a tuple of (node, shape, status).

for (node, shape, status) in results.to_list():
    print(f"Node: {node.show()}")
    print(f"Shape: {shape.show()}")
    print(f"Conformant?: {status.is_conformant()}")
    print(f"Appinfo: {status.as_json()}")
    print("")
Node: http://example.org/bob
Shape: http://example.org/User
Conformant?: True
Appinfo: {'info': [{'reason': 'Shape passed. Node http://example.org/bob, shape 0: Shape  Preds: http://example.org/name,http://example.org/birthDate,http://example.org/knows,http://example.org/worksFor, TripleExpr: RBE [C0;C1;C2*;C3*;], Keys: [http://example.org/name -> {C0}, http://example.org/birthDate -> {C1}, http://example.org/knows -> {C2}, http://example.org/worksFor -> {C3}], conds: [C0 -> xsd:string, C1 -> xsd:date, C2 -> @0, C3 -> @1], References: [http://example.org/knows->0, http://example.org/worksFor->1]'}], 'reason': 'Shape passed. Node :bob, shape 0: :User = {(:name xsd:string ; :birthDate xsd:date ; :knows @0* ; :worksFor @1* ; )}\n', 'status': 'conformant'}

Node: http://example.org/alice
Shape: http://example.org/User
Conformant?: True
Appinfo: {'info': [{'reason': 'Shape passed. Node http://example.org/alice, shape 0: Shape  Preds: http://example.org/name,http://example.org/birthDate,http://example.org/knows,http://example.org/worksFor, TripleExpr: RBE [C0;C1;C2*;C3*;], Keys: [http://example.org/name -> {C0}, http://example.org/birthDate -> {C1}, http://example.org/knows -> {C2}, http://example.org/worksFor -> {C3}], conds: [C0 -> xsd:string, C1 -> xsd:date, C2 -> @0, C3 -> @1], References: [http://example.org/worksFor->1, http://example.org/knows->0]'}], 'reason': 'Shape passed. Node :alice, shape 0: :User = {(:name xsd:string ; :birthDate xsd:date ; :knows @0* ; :worksFor @1* ; )}\n', 'status': 'conformant'}

We reset the status of the ShEx schema, the Shapemap and the current RDF data for the next section.

# @title
rudof = Rudof(RudofConfig())

Validating SPARQL endpoints#

It is also possible to validate RDF data which is not local but is available in a SPARQL endpoint like wikidata or dbpedia. Let’s start with Wikidata:

rudof.use_endpoint("wikidata")

We can declare a simple shape in Wikidata as follows:

rudof.read_shex_str("""
prefix : <http://example.org/>
prefix wd: <http://www.wikidata.org/entity/>
prefix wdt: <http://www.wikidata.org/prop/direct/>

:Researcher {
  wdt:P31 [ wd:Q5 ] ; # Instance of Human
  wdt:P19 @:Place   ; # BirthPlace
}
:Place {
  wdt:P17 @:Country * ; # Country
}
:Country {}
""")
rudof.read_shapemap_str("wd:Q80@:Researcher")
results = rudof.validate_shex()
print(results.show_as_table(with_details=True))
╭────────────────────────────────────┬─────────────┬────────┬──────────────────────────────────────────────────────────────────────────────────╮
│ Node                               │ Shape       │ Status │ Details                                                                          │
├────────────────────────────────────┼─────────────┼────────┼──────────────────────────────────────────────────────────────────────────────────┤
│ http://www.wikidata.org/entity/Q80 │ :Researcher │ OK     │ Shape passed. Node http://www.wikidata.org/entity/Q80, shape 0: :Researcher = {( │
│                                    │             │        │ wdt:P31 [wd:Q5 ] ; wdt:P19 @1 ; )}                                               │
│                                    │             │        │                                                                                  │
╰────────────────────────────────────┴─────────────┴────────┴──────────────────────────────────────────────────────────────────────────────────╯

Visualizing ShEx schemas#

rudof can be used to convert ShEx to diagrams in UML-like style. The converter generates a PlantUML string which can be written to a file and converted to an image using the PlantUML tool.

from pyrudof import UmlGenerationMode
rudof.read_shex_str("""
prefix : <http://example.org/>
prefix xsd:    <http://www.w3.org/2001/XMLSchema#>

:User {
 :name     xsd:string  ;
 :worksFor @:Company * ;
 :addres   @:Address   ;
 :knows    @:User
}

:Company {
  :name     xsd:string     ;
  :code     xsd:string     ;
  :employee @:User
}
:Address {
  :name     xsd:string ;
  :zip_code xsd:string
}
""")
plant_uml = rudof.shex2plantuml_file(UmlGenerationMode(), 'out.puml')

Now we install the PlantUML tools necessary to process the generated plant_uml

# @title
! pip install plantuml
! pip install ipython
!python -m plantuml out.puml
from IPython.display import Image
Requirement already satisfied: plantuml in /opt/hostedtoolcache/Python/3.11.13/x64/lib/python3.11/site-packages (0.3.0)
Requirement already satisfied: httplib2 in /opt/hostedtoolcache/Python/3.11.13/x64/lib/python3.11/site-packages (from plantuml) (0.31.0)
Requirement already satisfied: pyparsing<4,>=3.0.4 in /opt/hostedtoolcache/Python/3.11.13/x64/lib/python3.11/site-packages (from httplib2->plantuml) (3.2.5)
Requirement already satisfied: ipython in /opt/hostedtoolcache/Python/3.11.13/x64/lib/python3.11/site-packages (9.6.0)
Requirement already satisfied: decorator in /opt/hostedtoolcache/Python/3.11.13/x64/lib/python3.11/site-packages (from ipython) (5.2.1)
Requirement already satisfied: ipython-pygments-lexers in /opt/hostedtoolcache/Python/3.11.13/x64/lib/python3.11/site-packages (from ipython) (1.1.1)
Requirement already satisfied: jedi>=0.16 in /opt/hostedtoolcache/Python/3.11.13/x64/lib/python3.11/site-packages (from ipython) (0.19.2)
Requirement already satisfied: matplotlib-inline in /opt/hostedtoolcache/Python/3.11.13/x64/lib/python3.11/site-packages (from ipython) (0.2.1)
Requirement already satisfied: pexpect>4.3 in /opt/hostedtoolcache/Python/3.11.13/x64/lib/python3.11/site-packages (from ipython) (4.9.0)
Requirement already satisfied: prompt_toolkit<3.1.0,>=3.0.41 in /opt/hostedtoolcache/Python/3.11.13/x64/lib/python3.11/site-packages (from ipython) (3.0.52)
Requirement already satisfied: pygments>=2.4.0 in /opt/hostedtoolcache/Python/3.11.13/x64/lib/python3.11/site-packages (from ipython) (2.19.2)
Requirement already satisfied: stack_data in /opt/hostedtoolcache/Python/3.11.13/x64/lib/python3.11/site-packages (from ipython) (0.6.3)
Requirement already satisfied: traitlets>=5.13.0 in /opt/hostedtoolcache/Python/3.11.13/x64/lib/python3.11/site-packages (from ipython) (5.14.3)
Requirement already satisfied: typing_extensions>=4.6 in /opt/hostedtoolcache/Python/3.11.13/x64/lib/python3.11/site-packages (from ipython) (4.15.0)
Requirement already satisfied: wcwidth in /opt/hostedtoolcache/Python/3.11.13/x64/lib/python3.11/site-packages (from prompt_toolkit<3.1.0,>=3.0.41->ipython) (0.2.14)
Requirement already satisfied: parso<0.9.0,>=0.8.4 in /opt/hostedtoolcache/Python/3.11.13/x64/lib/python3.11/site-packages (from jedi>=0.16->ipython) (0.8.5)
Requirement already satisfied: ptyprocess>=0.5 in /opt/hostedtoolcache/Python/3.11.13/x64/lib/python3.11/site-packages (from pexpect>4.3->ipython) (0.7.0)
Requirement already satisfied: executing>=1.2.0 in /opt/hostedtoolcache/Python/3.11.13/x64/lib/python3.11/site-packages (from stack_data->ipython) (2.2.1)
Requirement already satisfied: asttokens>=2.1.0 in /opt/hostedtoolcache/Python/3.11.13/x64/lib/python3.11/site-packages (from stack_data->ipython) (3.0.0)
Requirement already satisfied: pure-eval in /opt/hostedtoolcache/Python/3.11.13/x64/lib/python3.11/site-packages (from stack_data->ipython) (0.2.3)
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/opt/hostedtoolcache/Python/3.11.13/x64/lib/python3.11/site-packages/plantuml.py", line 230, in <module>
    main()
  File "/opt/hostedtoolcache/Python/3.11.13/x64/lib/python3.11/site-packages/plantuml.py", line 225, in main
    print(list(map(lambda filename: {'filename': filename,
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/hostedtoolcache/Python/3.11.13/x64/lib/python3.11/site-packages/plantuml.py", line 226, in <lambda>
    'gen_success': pl.processes_file(filename, directory=args.out)}, args.files)))
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/hostedtoolcache/Python/3.11.13/x64/lib/python3.11/site-packages/plantuml.py", line 199, in processes_file
    content = self.processes(data)
              ^^^^^^^^^^^^^^^^^^^^
  File "/opt/hostedtoolcache/Python/3.11.13/x64/lib/python3.11/site-packages/plantuml.py", line 173, in processes
    raise PlantUMLHTTPError(response, content)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/hostedtoolcache/Python/3.11.13/x64/lib/python3.11/site-packages/plantuml.py", line 56, in __init__
    if not self.message:
           ^^^^^^^^^^^^
AttributeError: 'PlantUMLHTTPError' object has no attribute 'message'
Image(f"out.png")
_images/0bcf32ae66ac9358eb4e681f8271a1f3f8c7607911d0d4c61c3422bbc545461c.png