RDF#

Open In Colab

This document contains a short introduction to RDF using rudof.

Preliminaries: install and configure rudof#

The library is available as pyrudof.

!pip install pyrudof
Requirement already satisfied: pyrudof in /opt/hostedtoolcache/Python/3.11.13/x64/lib/python3.11/site-packages (0.1.135)

The main entry point if a class called Rudof through which most of the functionality is provided.

from pyrudof import Rudof, RudofConfig

In order to initialize that class, it is possible to pass a RudofConfig instance which contains configuration parameters for customization. An instance of RudofConfig can be obtained from the default initialization method or can be read from a TOML file.

rudof = Rudof(RudofConfig())

We will use Image to visualize images of RDF graphs generated using rudof and plantuml.

!pip install plantuml
from IPython.display import Image # For displaying images
Requirement already satisfied: plantuml in /opt/hostedtoolcache/Python/3.11.13/x64/lib/python3.11/site-packages (0.3.0)
Requirement already satisfied: httplib2 in /opt/hostedtoolcache/Python/3.11.13/x64/lib/python3.11/site-packages (from plantuml) (0.31.0)
Requirement already satisfied: pyparsing<4,>=3.0.4 in /opt/hostedtoolcache/Python/3.11.13/x64/lib/python3.11/site-packages (from httplib2->plantuml) (3.2.5)

The method reset_all() can be useful to clean the previous contents of rudof.

RDF data model#

RDF is based on statements or triples of the form <subject> <predicate> <object> where the predicates are identified by IRIs and in the most basic form, subjects and objects are also IRIs. An example could be: <http://example.org/alice> <http://example.org/knows> <http:/example.org/bob>.

In rudof, it is possible to load a triple as follows:

rudof.read_data_str("<http://example.org/alice> <http://example.org/knows> <http://example.org/bob> .")

RDF graphs are defined as a set of triples. In the basic notation, a set of triples is just one triple after another separated by a dot. So we can add more statements as:

rudof.read_data_str("""
  <http://example.org/alice> <http://example.org/knows> <http://example.org/carol> .
  <http://example.org/alice> <http://example.org/worksFor> <http://example.org/acme> .
  <http://example.org/alice> <http://example.org/birthPlace> <http://example.org/spain> .
  <http://example.org/carol> <http://example.org/knows> <http://example.org/bob> .
  <http://example.org/bob> <http://example.org/knows> <http://example.org/alice> .
""")

Rudof can be used to visualize small RDF triples.

uml = rudof.data2plantuml_file('out.puml')
!python -m plantuml out.puml
Image(f"out.png")
[{'filename': 'out.puml', 'gen_success': True}]
_images/5e54d1c6b3cc8c2334bddec8d0ca42b84f572b18cc847f6805f77f61ca48b9b9.png

In order to obtain interoperability, it is better to employ IRIs of some agreed vocabularies. In this case, we could use the predicate <https://schema.org/knows>, https://schema.org/worksFor and https://schema.org/birthPlace from Schema.org, and [<http://dbpedia.org/resource/Spain] from DBpedia to represent Spain.

rudof.reset_all()
rudof.read_data_str("""
  <http://example.org/alice> <https://schema.org/knows>      <http://example.org/carol> .
  <http://example.org/alice> <https://schema.org/worksFor>   <http://example.org/acme> .
  <http://example.org/alice> <https://schema.org/birthPlace> <http://dbpedia.org/resource/Spain> .
  <http://example.org/carol> <https://schema.org/knows>      <http://example.org/bob> .
  <http://example.org/bob>   <http://schema.org/knows>       <http://example.org/alice> .
""")
!rm -f out.puml out.png
uml = rudof.data2plantuml_file('out.puml')
!python -m plantuml out.puml
Image(f"out.png")
[{'filename': 'out.puml', 'gen_success': True}]
_images/8986db7b34feb448ccedfed9876ee54fa77028b74f5901b47243d9cb3ab8338f.png

Having long IRIs can make RDF difficult to read. In order to have more readable RDF documents, it is possible to declare prefixes for some URIs and use qualified names formed by prefix:name. So the previous example could be rewritten as:

Prefixed declarations and qualified names#

rudof.reset_all()
rudof.read_data_str("""
 prefix : <http://example.org/>
 prefix schema: <http://schema.org/>
 prefix dbr: <http://dbpedia.org/resource/>

 :alice schema:knows :carol .
 :alice schema:worksFor :acme .
 :alice schema:birthPlace :spain .
 :carol schema:knows :bob .
 :bob   schema:knows :alice .
""")
!rm -f out.puml out.png
uml = rudof.data2plantuml_file('out.puml')
!python -m plantuml out.puml
Image(f"out.png")
[{'filename': 'out.puml', 'gen_success': True}]
_images/1b941990cc125ce1a921650e1764408d2e90ebba023e526e7ee1f5948fe6b03f.png

RDF Literals#

Apart from IRIs, the objects in triples can also be literals, which can be seen as constants. There are 3 types of literals:

  • Plain strings, like "Robert Smith"

  • Language tagged strings, like "Spain"@en or "España@es"

  • Datatype literals, like "23"^^xsd:integer

rudof.reset_all()
rudof.read_data_str("""
 prefix : <http://example.org/>
 prefix schema: <http://schema.org/>
 prefix dbr: <http://dbpedia.org/resource/>
 prefix xsd: <http://www.w3.org/2001/XMLSchema#>

 :alice schema:knows :carol .
 :alice schema:worksFor :acme .
 :alice schema:birthPlace dbr:Spain .
 :alice schema:birthDate "1990-01-01"^^xsd:date .
 :carol schema:knows :bob .
 :bob   schema:name "Robert Smith" .
 :bob   schema:knows :alice .
 :acme  schema:name "Acme Inc." .
""")
!rm -f out.puml out.png
uml = rudof.data2plantuml_file('out.puml')
!python -m plantuml out.puml
Image(f"out.png")
[{'filename': 'out.puml', 'gen_success': True}]
_images/58f0904b088ab629457e4b3d99b82a2e6bcc1440562b085a52c5ae0863d3679f.png

Blank nodes#

Sometimes, we want to add some statements about things which don’t have some IRI. For example, we may want to say that :alice knows someone who was born in Italy that works for Acme as:

rudof.reset_all()
rudof.read_data_str("""
 prefix : <http://example.org/>
 prefix schema: <http://schema.org/>
 prefix dbr: <http://dbpedia.org/resource/>
 prefix xsd: <http://www.w3.org/2001/XMLSchema#>

 :alice schema:knows _:1 .
 _:1 schema:worksFor :acme .
 _:1 schema:birthPlace dbr:Italy .
""")
#
!rm -f out.puml out.png
uml = rudof.data2plantuml_file('out.puml')
!python -m plantuml out.puml
Image(f"out.png")
[{'filename': 'out.puml', 'gen_success': True}]
_images/fbbbcbb9ffc2f094a089cf3982554e128818f6e16ae3c3c7e76cb86dbf9f507c.png

The notation _:id represents a blank node. The id of the blank node can be used to refer to it in the definition of the RDF graph, but there is no warranty that it will be preserved internally.

If we want to add that :bob knows someone who works for Acme and was born in Germany, we can do the following:

rudof.read_data_str("""
 prefix : <http://example.org/>
 prefix schema: <http://schema.org/>
 prefix dbr: <http://dbpedia.org/resource/>
 prefix xsd: <http://www.w3.org/2001/XMLSchema#>

 :bob schema:knows _:2 .
 _:2 schema:worksFor :acme .
 _:2 schema:birthPlace dbr:Germany .
""")
# @title
!rm -f out.puml out.png
uml = rudof.data2plantuml_file('out.puml')
!python -m plantuml out.puml
Image(f"out.png")
[{'filename': 'out.puml', 'gen_success': True}]
_images/480f4c279f158610fb17406414ab0db12b6c29be74ed89b97b563d4aab0ef8c4.png

Merging RDF data#

rudof.reset_all()

You may have noticed that we did rudof.reset_all() in some of the previous examples. The reason is that by default, the rdf_data_str() method in rudof, merges the current RDF data with the data that has been read.

As RDF data models are defined as sets of triples, they support merging quite easily.

One RDF graph plus another RDf graph is another RDF graph. This is feature can be quite powerful and is one of the reasons why RDF can help data interoperability.

rudof.read_data_str("""
prefix : <http://example.org/>
prefix xsd: <http://www.w3.org/2001/XMLSchema#>
prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

:x a :Person     ;
   :name "Alice" ;
   :knows :y     .
:y a :Person     ;
   :name "Bob"   ;
   :knows :x     .
""")

The RDF data can easily be merged with other data.

rudof.read_data_str("""
prefix : <http://example.org/>
prefix xsd: <http://www.w3.org/2001/XMLSchema#>
prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

:u a :Person     ;
   :name "Dave"  ;
   :knows :x, :y .
""", merge = True)
# @title
!rm -f out.puml out.png
uml = rudof.data2plantuml_file('out.puml')
!python -m plantuml out.puml
Image(f"out.png")
[{'filename': 'out.puml', 'gen_success': True}]
_images/690fc9fe24ef5d25433b9bc02daf7da157ab24bea9f022860f2c43e642200a3d.png

Different RDF data formats#

There are several RDF data formats like:

  • N-Triples: this is the most basic format where an RDF graph is defined as a list of triples ended by a dot. This format doesn’t define any syntactic sugar like prefix declarations or other abreviations.

  • Turtle: this format is intended for human-readability. It contains some syntactic sugar like prefix declarations, numeric literals or joining together statements with the same subject-predicate using a comma or the same subject using a semicolon.

  • RDF/XML: this format was one of the first RDF formats and was defined when XML was popular.

  • JSON-LD: this format is a representation of RDF in JSON.

Rudof can be used to convert between different RDF formats.

The following code converts the current RDF data to N-Triples:

from pyrudof import RDFFormat
str = rudof.serialize_data(format=RDFFormat.NTriples)
print(str)
<http://example.org/x> <http://example.org/knows> <http://example.org/y> .
<http://example.org/x> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://example.org/Person> .
<http://example.org/x> <http://example.org/name> "Alice" .
<http://example.org/y> <http://example.org/knows> <http://example.org/x> .
<http://example.org/y> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://example.org/Person> .
<http://example.org/y> <http://example.org/name> "Bob" .
<http://example.org/u> <http://example.org/knows> <http://example.org/x> .
<http://example.org/u> <http://example.org/knows> <http://example.org/y> .
<http://example.org/u> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://example.org/Person> .
<http://example.org/u> <http://example.org/name> "Dave" .

The following code converts it to RDF/XML:

str = rudof.serialize_data(format=RDFFormat.RDFXML)
print(str)
<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF xmlns="http://example.org/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:xsd="http://www.w3.org/2001/XMLSchema#">
	<rdf:Description rdf:about="http://example.org/x">
		<knows rdf:resource="http://example.org/y"/>
		<rdf:type rdf:resource="http://example.org/Person"/>
		<name>Alice</name>
	</rdf:Description>
	<rdf:Description rdf:about="http://example.org/y">
		<knows rdf:resource="http://example.org/x"/>
		<rdf:type rdf:resource="http://example.org/Person"/>
		<name>Bob</name>
	</rdf:Description>
	<rdf:Description rdf:about="http://example.org/u">
		<knows rdf:resource="http://example.org/x"/>
		<knows rdf:resource="http://example.org/y"/>
		<rdf:type rdf:resource="http://example.org/Person"/>
		<name>Dave</name>
	</rdf:Description>
</rdf:RDF>

The following code converts it to JSON-LD:

str = rudof.serialize_data(format=RDFFormat.JsonLd)
print(str)
[{"@id":"http://example.org/x","http://example.org/knows":[{"@id":"http://example.org/y"}],"http://www.w3.org/1999/02/22-rdf-syntax-ns#type":[{"@id":"http://example.org/Person"}],"http://example.org/name":[{"@value":"Alice"}]},{"@id":"http://example.org/y","http://example.org/knows":[{"@id":"http://example.org/x"}],"http://www.w3.org/1999/02/22-rdf-syntax-ns#type":[{"@id":"http://example.org/Person"}],"http://example.org/name":[{"@value":"Bob"}]},{"@id":"http://example.org/u","http://example.org/knows":[{"@id":"http://example.org/x"},{"@id":"http://example.org/y"}],"http://www.w3.org/1999/02/22-rdf-syntax-ns#type":[{"@id":"http://example.org/Person"}],"http://example.org/name":[{"@value":"Dave"}]}]
rudof.reset_all()

Information about a node in an RDF graph#

rudof provides a command to get information about the neighbours of a node in an RDF graph. The neighbours are the outgoing arcs or the incoming arcs.

For example:

rudof.read_data_str("""
prefix : <http://example.org/>
prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
prefix xsd: <http://www.w3.org/2001/XMLSchema#>

:alice a :Person ;
 :name      "Alice"                ;
 :birthDate "2005-03-01"^^xsd:date ;
 :worksFor  :acme                  ;
 :knows     :bob                   .
:bob a :Person   ;
 :name      "Robert Smith"         ;
 :birthDate "2003-01-02"^^xsd:date ;
 :worksFor  :acme                  ;
 :knows     :alice                 .
:acme a :Company ;
 :name "Acme Inc." .
""")
print(rudof.node_info(":alice", []))
Outgoing arcs
:alice
├─── :birthDate ─► "2005-03-01"^^xsd:date
├─── :knows ─► :bob
├─── :name ─► "Alice"
├─── :worksFor ─► :acme
└─── rdf:type ─► :Person

If you want to get the incoming edges, you can use:

print(rudof.node_info(":alice", [], show_incoming = True, show_outgoing = False))
Incoming arcs
:alice
▲
└─── :knows ── :bob

The list of predicates can be used to filter only the predicates that we are interested in:

print(rudof.node_info(":alice", [ "rdf:type", ":worksFor"]))
Outgoing arcs
:alice
├─── :worksFor ─► :acme
└─── rdf:type ─► :Person

The command node_info can be useful when you work with SPARQL endpoints, for example:

rudof = Rudof(RudofConfig())
rudof.use_endpoint("dbpedia")
print(rudof.node_info("dbr:Oviedo", ["foaf:depiction"]))
Outgoing arcs
dbr:Oviedo
├─── foaf:depiction ─► <http://commons.wikimedia.org/wiki/Special:FilePath/Oviedo_Landscape_(230140305).jpeg>
├─── foaf:depiction ─► <http://commons.wikimedia.org/wiki/Special:FilePath/C_44_OVIEDO_ESTE.jpg>
├─── foaf:depiction ─► <http://commons.wikimedia.org/wiki/Special:FilePath/Mapa_Parroquial_Uviéu_(color).jpg>
├─── foaf:depiction ─► <http://commons.wikimedia.org/wiki/Special:FilePath/Church_of_San_Isidoro_el_Real,_Oviedo_16.jpg>
├─── foaf:depiction ─► <http://commons.wikimedia.org/wiki/Special:FilePath/Santa_María_Naranco.jpg>
├─── foaf:depiction ─► <http://commons.wikimedia.org/wiki/Special:FilePath/Real_Monasterio_de_San_Pelayo_(Oviedo).jpg>
├─── foaf:depiction ─► <http://commons.wikimedia.org/wiki/Special:FilePath/Catedral_de_Oviedo_03.jpg>
├─── foaf:depiction ─► <http://commons.wikimedia.org/wiki/Special:FilePath/Escudo_de_Oviedo.svg>
├─── foaf:depiction ─► <http://commons.wikimedia.org/wiki/Special:FilePath/Oviedo_desde_el_monte_Naranco.jpg>
├─── foaf:depiction ─► <http://commons.wikimedia.org/wiki/Special:FilePath/15._Junta_General_del_Principado_de_Asturias_(36143894785).jpg>
├─── foaf:depiction ─► <http://commons.wikimedia.org/wiki/Special:FilePath/Ceremonia_de_entrega_de_los_Premios_Príncipe_de_Asturias_2010.jpg>
├─── foaf:depiction ─► <http://commons.wikimedia.org/wiki/Special:FilePath/Oviedo_Uría.jpg>
├─── foaf:depiction ─► <http://commons.wikimedia.org/wiki/Special:FilePath/11._Teatro_Campoamor_(36104135806).jpg>
├─── foaf:depiction ─► <http://commons.wikimedia.org/wiki/Special:FilePath/Oviedo-Plaza_del_Fontán.jpg>
├─── foaf:depiction ─► <http://commons.wikimedia.org/wiki/Special:FilePath/1._Iglesia_San_Julián_de_los_Prados_(35752657690).jpg>
├─── foaf:depiction ─► <http://commons.wikimedia.org/wiki/Special:FilePath/Oviedo02.jpg>
├─── foaf:depiction ─► <http://commons.wikimedia.org/wiki/Special:FilePath/Oviedo-ayuntamiento4.jpg>
├─── foaf:depiction ─► <http://commons.wikimedia.org/wiki/Special:FilePath/San_Miguel_de_Lillo_01.jpg>
├─── foaf:depiction ─► <http://commons.wikimedia.org/wiki/Special:FilePath/Día_de_América_en_Asturias-2015_35.jpg>
├─── foaf:depiction ─► <http://commons.wikimedia.org/wiki/Special:FilePath/La-Regenta-y-Catedral.jpg>
├─── foaf:depiction ─► <http://commons.wikimedia.org/wiki/Special:FilePath/Museo_arte_oviedo.jpg>
├─── foaf:depiction ─► <http://commons.wikimedia.org/wiki/Special:FilePath/Ayuntamiento-de-oviedo.jpg>
├─── foaf:depiction ─► <http://commons.wikimedia.org/wiki/Special:FilePath/C_44_OVIEDO_CENTRO.jpg>
├─── foaf:depiction ─► <http://commons.wikimedia.org/wiki/Special:FilePath/C_44_OVIEDO_OESTE.jpg>
├─── foaf:depiction ─► <http://commons.wikimedia.org/wiki/Special:FilePath/Fachada_principal_del_Teatro_Campoamor.jpg>
├─── foaf:depiction ─► <http://commons.wikimedia.org/wiki/Special:FilePath/Uvieu_flag.svg>
├─── foaf:depiction ─► <http://commons.wikimedia.org/wiki/Special:FilePath/Santa_María_del_Naranco._Oviedo.jpg>
├─── foaf:depiction ─► <http://commons.wikimedia.org/wiki/Special:FilePath/Estadio_Municipal_Carlos_Tartiere_(Real_Oviedo_S.A.D.).jpg>
└─── foaf:depiction ─► <http://commons.wikimedia.org/wiki/Special:FilePath/Oviedo,_Espanha_-_panoramio_(9).jpg>

Notice that the command node_info can be simulated with SPARQL queries (see next chapter). Nevertheless, it can be useful as a nice and easy way to get the neighborhood of a node when you want to validate it.

Running SPARQL queries#

SPARQL is an RDF query language which is also available in rudof:

# @title
from pyrudof import Rudof, RudofConfig, QuerySolutions
rudof = Rudof(RudofConfig())
rudof.reset_all()
rdf = """
prefix :    <http://example.org/>
prefix xsd: <http://www.w3.org/2001/XMLSchema#>

:alice a :Person ;
 :name      "Alice"                ;
 :birthDate "2005-03-01"^^xsd:date ;
 :worksFor  :acme                   .
:bob a :Person   ;
 :name      "Robert Smith"         ;
 :birthDate "2003-01-02"^^xsd:date ;
 :worksFor  :acme  .
:acme a :Company ;
 :name "Acme Inc." .
"""
rudof.read_data_str(rdf)
query = """
PREFIX : <http://example.org/>

SELECT ?person ?name ?date ?company WHERE {
  ?person a          :Person ;
          :name      ?name   ;
          :birthDate ?date   ;
          :worksFor  ?c   .
  ?c      :name      ?company .
}
"""

results = rudof.run_query_str(query)

Show the results:

print(results.show())
╭───┬─────────┬────────────────┬────────────────────────┬─────────────╮
│   │ ?person │ ?name          │ ?date                  │ ?company    │
├───┼─────────┼────────────────┼────────────────────────┼─────────────┤
│ 1 │ :bob    │ "Robert Smith" │ "2003-01-02"^^xsd:date │ "Acme Inc." │
├───┼─────────┼────────────────┼────────────────────────┼─────────────┤
│ 2 │ :alice  │ "Alice"        │ "2005-03-01"^^xsd:date │ "Acme Inc." │
╰───┴─────────┴────────────────┴────────────────────────┴─────────────╯

Example about Tim Berners-Lee#

The following is an example of RDF used in some slides.

It describes a node :timbl which represents information about Tim Berners-Lee, declaring that he was born in London, in 1955 and knows someone who is from Spain.

rudof.read_data_str("""
prefix :       <http://example.org/>
prefix xsd:    <http://www.w3.org/2001/XMLSchema#>
prefix rdf:    <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
prefix schema: <http://schema.org/>
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>

:timbl  rdf:type    :Human ;
        rdfs:label  "Tim Berners-Lee" ;
        :birthPlace :london ;
        :birthDate  "1955-06-08"^^xsd:date ;
        :employer   :CERN ;
        :knows      _:1 .
_:1     :birthPlace :Spain .
:CERN   rdf:type    :Organization .
:london rdf:type    :City, :Metropolis ;
        :country    :UK .
""")

Which can be visualized as:

!rm -f out.puml out.png
uml = rudof.data2plantuml_file('out.puml')
!python -m plantuml out.puml
Image(f"out.png")
[{'filename': 'out.puml', 'gen_success': True}]
_images/e900b346ec316679982c33b9928b35e68bc985acca89f977f2f5a64d3961442d.png

And a simple SPARQL query:

rudof.read_query_str("""
prefix :       <http://example.org/>
prefix rdf:    <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

SELECT ?person ?date ?place ?country WHERE {
  ?person rdf:type    :Human    ;
          :birthDate  ?date     ;
          :birthPlace ?place .
  ?place  :country    ?country .
}
""")
results = rudof.run_current_query_select()
print(results.show())
╭───┬─────────┬────────────────────────┬─────────┬──────────╮
│   │ ?person │ ?date                  │ ?place  │ ?country │
├───┼─────────┼────────────────────────┼─────────┼──────────┤
│ 1 │ :timbl  │ "1955-06-08"^^xsd:date │ :london │ :UK      │
╰───┴─────────┴────────────────────────┴─────────┴──────────╯