ShEx#

This document contains a short introduction to ShEx using rudof.

Preliminaries: Install and configure rudof#

First, we install and configure rudof.

# @title
!pip install pyrudof

from pyrudof import Rudof, RudofConfig, RudofError
rudof = Rudof(RudofConfig())
Requirement already satisfied: pyrudof in /opt/hostedtoolcache/Python/3.11.15/x64/lib/python3.11/site-packages (0.2.9)

Validating RDF using ShEx#

ShEx (Shape Expressions) is a concise and human-readable language to describe and validate RDF data.

A ShEx schema contains several declarations of shapes and can be defined in several formats: a compact format (ShExC), a JSON-LD format (ShExJ) and an RDF based on (ShExR).

Let’s start defining a simple ShEx schema as:

rudof.read_shex("""
prefix :     <http://example.org/>
prefix xsd:  <http://www.w3.org/2001/XMLSchema#>

:User {
 :name      xsd:string   ;
 :birthDate xsd:date     ;
 :knows     @:User     * ;
 :worksFor  @:Company  *
}

:Company {
  :name     xsd:string    ;
  :code     xsd:integer   ;
  :employee @:User      *
}
""")

And let’s define some RDF data.

rudof.read_data("""
prefix : <http://example.org/>
prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
prefix xsd: <http://www.w3.org/2001/XMLSchema#>

:alice a :Person ;
 :name      "Alice"                ;
 :birthDate "2005-03-01"^^xsd:date ;
 :worksFor  :acme                  ;
 :knows     :bob                   .
:bob a :Person   ;
 :name      "Robert Smith"         ;
 :birthDate "2003-01-02"^^xsd:date ;
 :worksFor  :acme                  ;
 :knows     :alice                 .
:acme  a :Company  ;
 :name      "Acme Inc." ;
 :code      23          .
""")

In order to validate nodes in a graph, ShEx uses a ShapeMap, which associates a node selector with a shape.

The simplest shape map is a pair node@shape.

Rudof keeps the current shape map in a structure so it can be reused.

The method read_shapemap(shapemap) can be used to read a shapemap from a string and store it’s value as the currrent shape map.

In the previous example, if we want to validate :alice as a :Person we can use:

rudof.read_shapemap(":alice@:User")

Once the ShEx schema and the Shapemap have been added to rudof, it is possible to validate the current RDF data with the validate_shex() method:

from pyrudof import ResultShexValidationFormat

rudof.validate_shex()
results = rudof.serialize_shex_validation_results(format=ResultShexValidationFormat.Compact)
print(results)
Results:
╭────────┬───────┬────────╮
│ Node   │ Shape │ Status │
├────────┼───────┼────────┤
│ :alice │ :User │ OK     │
╰────────┴───────┴────────╯

If we want to validate :acme as a :Company, we could do:

from pyrudof import ResultShexValidationFormat

rudof.read_shapemap(":alice@:Company")
rudof.validate_shex()
results = rudof.serialize_shex_validation_results(format=ResultShexValidationFormat.Details)
print(results)
Results:
╭────────┬──────────┬────────┬─────────────────────────────────────────────────────────────────────╮
│ Node   │ Shape    │ Status │ Details                                                             │
├────────┼──────────┼────────┼─────────────────────────────────────────────────────────────────────┤
│ :alice │ :Company │ FAIL   │ Error Shape 1 failed for node http://example.org/alice with errors  │
│        │          │        │                                                                     │
╰────────┴──────────┴────────┴─────────────────────────────────────────────────────────────────────╯

The shapemap can contain a list of nodes and shapes, so it is possible to run several validations like:

rudof.read_shapemap(":alice@:User, :bob@:User")
rudof.validate_shex()
results = rudof.serialize_shex_validation_results()
print(results)
Results:
╭────────┬───────┬────────┬──────────────────────────────────────────────────────────────────────────────────╮
│ Node   │ Shape │ Status │ Details                                                                          │
├────────┼───────┼────────┼──────────────────────────────────────────────────────────────────────────────────┤
│ :alice │ :User │ OK     │ Shape passed. Node :alice, shape 0: :User = {(:name xsd:string ; :birthDate xsd: │
│        │       │        │ date ; :knows @0* ; :worksFor @1* ; )}                                           │
│        │       │        │                                                                                  │
├────────┼───────┼────────┼──────────────────────────────────────────────────────────────────────────────────┤
│ :bob   │ :User │ OK     │ Shape passed. Node :bob, shape 0: :User = {(:name xsd:string ; :birthDate xsd:da │
│        │       │        │ te ; :knows @0* ; :worksFor @1* ; )}                                             │
│        │       │        │                                                                                  │
╰────────┴───────┴────────┴──────────────────────────────────────────────────────────────────────────────────╯

We reset the status of the ShEx schema, the Shapemap and the current RDF data for the next section.

# @title
rudof = Rudof(RudofConfig())

Validating SPARQL endpoints#

It is also possible to validate RDF data which is not local but is available in a SPARQL endpoint like wikidata or dbpedia. Let’s start with Wikidata:

rudof.read_data(endpoint="wikidata")

We can declare a simple shape in Wikidata as follows:

rudof.read_shex("""
prefix : <http://example.org/>
prefix wd: <http://www.wikidata.org/entity/>
prefix wdt: <http://www.wikidata.org/prop/direct/>

:Researcher {
  wdt:P31 [ wd:Q5 ] ; # Instance of Human
  wdt:P19 @:Place   ; # BirthPlace
}
:Place {
  wdt:P17 @:Country * ; # Country
}
:Country {}
""")
rudof.read_shapemap("wd:Q80@:Researcher")
rudof.validate_shex()
results = rudof.serialize_shex_validation_results()
print(results)
Results:
╭──────────────────────────────────────┬─────────────┬────────┬──────────────────────────────────────────────────────────────────────────────────╮
│ Node                                 │ Shape       │ Status │ Details                                                                          │
├──────────────────────────────────────┼─────────────┼────────┼──────────────────────────────────────────────────────────────────────────────────┤
│ <http://www.wikidata.org/entity/Q80> │ :Researcher │ OK     │ Shape passed. Node <http://www.wikidata.org/entity/Q80>, shape 0: :Researcher =  │
│                                      │             │        │ {(wdt:P31 [wd:Q5 ] ; wdt:P19 @1 ; )}                                             │
│                                      │             │        │                                                                                  │
╰──────────────────────────────────────┴─────────────┴────────┴──────────────────────────────────────────────────────────────────────────────────╯

Visualizing ShEx schemas#

rudof can be used to convert ShEx to diagrams in UML-like style. The converter generates a PlantUML string which can be written to a file and converted to an image using the PlantUML tool.

from pyrudof import ConversionMode, ResultConversionMode, ConversionFormat, ResultConversionFormat
shex_schema = """
prefix : <http://example.org/>
prefix xsd:    <http://www.w3.org/2001/XMLSchema#>

:User {
 :name     xsd:string  ;
 :worksFor @:Company * ;
 :addres   @:Address   ;
 :knows    @:User
}

:Company {
  :name     xsd:string     ;
  :code     xsd:string     ;
  :employee @:User
}
:Address {
  :name     xsd:string ;
  :zip_code xsd:string
}
"""
plant_uml = rudof.convert_schemas(
    shex_schema, 
    input_mode=ConversionMode.ShEx,
    output_mode=ResultConversionMode.Uml,
    input_format=ConversionFormat.ShExC,
    output_format=ResultConversionFormat.PlantUML
)

with open('out.puml', 'w') as _f:
    _f.write(plant_uml)

Now we install the PlantUML tools necessary to process the generated plant_uml

# @title
! pip install plantuml
! pip install ipython
!python -m plantuml out.puml
from IPython.display import Image
Requirement already satisfied: plantuml in /opt/hostedtoolcache/Python/3.11.15/x64/lib/python3.11/site-packages (0.3.0)
Requirement already satisfied: httplib2 in /opt/hostedtoolcache/Python/3.11.15/x64/lib/python3.11/site-packages (from plantuml) (0.31.2)
Requirement already satisfied: pyparsing<4,>=3.1 in /opt/hostedtoolcache/Python/3.11.15/x64/lib/python3.11/site-packages (from httplib2->plantuml) (3.3.2)
Requirement already satisfied: ipython in /opt/hostedtoolcache/Python/3.11.15/x64/lib/python3.11/site-packages (9.10.1)
Requirement already satisfied: decorator>=4.3.2 in /opt/hostedtoolcache/Python/3.11.15/x64/lib/python3.11/site-packages (from ipython) (5.2.1)
Requirement already satisfied: ipython-pygments-lexers>=1.0.0 in /opt/hostedtoolcache/Python/3.11.15/x64/lib/python3.11/site-packages (from ipython) (1.1.1)
Requirement already satisfied: jedi>=0.18.1 in /opt/hostedtoolcache/Python/3.11.15/x64/lib/python3.11/site-packages (from ipython) (0.19.2)
Requirement already satisfied: matplotlib-inline>=0.1.5 in /opt/hostedtoolcache/Python/3.11.15/x64/lib/python3.11/site-packages (from ipython) (0.2.1)
Requirement already satisfied: pexpect>4.3 in /opt/hostedtoolcache/Python/3.11.15/x64/lib/python3.11/site-packages (from ipython) (4.9.0)
Requirement already satisfied: prompt_toolkit<3.1.0,>=3.0.41 in /opt/hostedtoolcache/Python/3.11.15/x64/lib/python3.11/site-packages (from ipython) (3.0.52)
Requirement already satisfied: pygments>=2.11.0 in /opt/hostedtoolcache/Python/3.11.15/x64/lib/python3.11/site-packages (from ipython) (2.20.0)
Requirement already satisfied: stack_data>=0.6.0 in /opt/hostedtoolcache/Python/3.11.15/x64/lib/python3.11/site-packages (from ipython) (0.6.3)
Requirement already satisfied: traitlets>=5.13.0 in /opt/hostedtoolcache/Python/3.11.15/x64/lib/python3.11/site-packages (from ipython) (5.14.3)
Requirement already satisfied: typing_extensions>=4.6 in /opt/hostedtoolcache/Python/3.11.15/x64/lib/python3.11/site-packages (from ipython) (4.15.0)
Requirement already satisfied: wcwidth in /opt/hostedtoolcache/Python/3.11.15/x64/lib/python3.11/site-packages (from prompt_toolkit<3.1.0,>=3.0.41->ipython) (0.6.0)
Requirement already satisfied: parso<0.9.0,>=0.8.4 in /opt/hostedtoolcache/Python/3.11.15/x64/lib/python3.11/site-packages (from jedi>=0.18.1->ipython) (0.8.6)
Requirement already satisfied: ptyprocess>=0.5 in /opt/hostedtoolcache/Python/3.11.15/x64/lib/python3.11/site-packages (from pexpect>4.3->ipython) (0.7.0)
Requirement already satisfied: executing>=1.2.0 in /opt/hostedtoolcache/Python/3.11.15/x64/lib/python3.11/site-packages (from stack_data>=0.6.0->ipython) (2.2.1)
Requirement already satisfied: asttokens>=2.1.0 in /opt/hostedtoolcache/Python/3.11.15/x64/lib/python3.11/site-packages (from stack_data>=0.6.0->ipython) (3.0.1)
Requirement already satisfied: pure-eval in /opt/hostedtoolcache/Python/3.11.15/x64/lib/python3.11/site-packages (from stack_data>=0.6.0->ipython) (0.2.3)
[{'filename': 'out.puml', 'gen_success': True}]
Image(f"out.png")
_images/1bb97789b0838ec42cfe1fa78ccca488ce59cc33a819e10526a51795e769e29a.png