Introduction to DCTAP#

This document contains a short introduction to DCTAP using rudof.

!pip install pyrudof
from pyrudof import Rudof, RudofConfig
rudof = Rudof(RudofConfig())
!pip install ipython # If not already installed
!pip install plantuml
from IPython.display import Image # For displaying images
Requirement already satisfied: pyrudof in /opt/hostedtoolcache/Python/3.11.13/x64/lib/python3.11/site-packages (0.1.135)
Requirement already satisfied: ipython in /opt/hostedtoolcache/Python/3.11.13/x64/lib/python3.11/site-packages (9.6.0)
Requirement already satisfied: decorator in /opt/hostedtoolcache/Python/3.11.13/x64/lib/python3.11/site-packages (from ipython) (5.2.1)
Requirement already satisfied: ipython-pygments-lexers in /opt/hostedtoolcache/Python/3.11.13/x64/lib/python3.11/site-packages (from ipython) (1.1.1)
Requirement already satisfied: jedi>=0.16 in /opt/hostedtoolcache/Python/3.11.13/x64/lib/python3.11/site-packages (from ipython) (0.19.2)
Requirement already satisfied: matplotlib-inline in /opt/hostedtoolcache/Python/3.11.13/x64/lib/python3.11/site-packages (from ipython) (0.2.1)
Requirement already satisfied: pexpect>4.3 in /opt/hostedtoolcache/Python/3.11.13/x64/lib/python3.11/site-packages (from ipython) (4.9.0)
Requirement already satisfied: prompt_toolkit<3.1.0,>=3.0.41 in /opt/hostedtoolcache/Python/3.11.13/x64/lib/python3.11/site-packages (from ipython) (3.0.52)
Requirement already satisfied: pygments>=2.4.0 in /opt/hostedtoolcache/Python/3.11.13/x64/lib/python3.11/site-packages (from ipython) (2.19.2)
Requirement already satisfied: stack_data in /opt/hostedtoolcache/Python/3.11.13/x64/lib/python3.11/site-packages (from ipython) (0.6.3)
Requirement already satisfied: traitlets>=5.13.0 in /opt/hostedtoolcache/Python/3.11.13/x64/lib/python3.11/site-packages (from ipython) (5.14.3)
Requirement already satisfied: typing_extensions>=4.6 in /opt/hostedtoolcache/Python/3.11.13/x64/lib/python3.11/site-packages (from ipython) (4.15.0)
Requirement already satisfied: wcwidth in /opt/hostedtoolcache/Python/3.11.13/x64/lib/python3.11/site-packages (from prompt_toolkit<3.1.0,>=3.0.41->ipython) (0.2.14)
Requirement already satisfied: parso<0.9.0,>=0.8.4 in /opt/hostedtoolcache/Python/3.11.13/x64/lib/python3.11/site-packages (from jedi>=0.16->ipython) (0.8.5)
Requirement already satisfied: ptyprocess>=0.5 in /opt/hostedtoolcache/Python/3.11.13/x64/lib/python3.11/site-packages (from pexpect>4.3->ipython) (0.7.0)
Requirement already satisfied: executing>=1.2.0 in /opt/hostedtoolcache/Python/3.11.13/x64/lib/python3.11/site-packages (from stack_data->ipython) (2.2.1)
Requirement already satisfied: asttokens>=2.1.0 in /opt/hostedtoolcache/Python/3.11.13/x64/lib/python3.11/site-packages (from stack_data->ipython) (3.0.0)
Requirement already satisfied: pure-eval in /opt/hostedtoolcache/Python/3.11.13/x64/lib/python3.11/site-packages (from stack_data->ipython) (0.2.3)
Collecting plantuml
  Using cached plantuml-0.3.0-py3-none-any.whl.metadata (2.5 kB)
Collecting httplib2 (from plantuml)
  Using cached httplib2-0.31.0-py3-none-any.whl.metadata (2.2 kB)
Collecting pyparsing<4,>=3.0.4 (from httplib2->plantuml)
  Using cached pyparsing-3.2.5-py3-none-any.whl.metadata (5.0 kB)
Using cached plantuml-0.3.0-py3-none-any.whl (5.8 kB)
Using cached httplib2-0.31.0-py3-none-any.whl (91 kB)
Using cached pyparsing-3.2.5-py3-none-any.whl (113 kB)
Installing collected packages: pyparsing, httplib2, plantuml
?25l
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3/3 [plantuml]
?25h
Successfully installed httplib2-0.31.0 plantuml-0.3.0 pyparsing-3.2.5

What is DCTAP?#

DCTAP (Dublin Core Tabular Application Profiles) is a model that can be used to define metadata using a tabular format.

In this way, it is possible to define models in CSV which can then be converted to other schema technologies like ShEx or SHACL.

Converting DCTAP to ShEx#

Rudof has support for DCTAP and can be used to read DCTAP files in CSV or Excel files and convert those models to other schema languages.

DCTAP can be used to represent shapes using a tabular representation using CSV or an spreadsheet format like XLSX. As an example, the following CSV data:

dctap_str = """shapeId,propertyId,Mandatory,Repeatable,valueDatatype,valueShape
Person,name,true,false,xsd:string,
,birthdate,false,false,xsd:date,
,worksFor,false,true,,Company
Company,name,true,false,xsd:string,
,employee,false,true,,Person
"""
rudof.read_dctap_str(dctap_str)
dctap = rudof.get_dctap()
print(dctap)
Shape(Person)  
 name xsd:string 
 birthdate xsd:date ?
 worksFor @Company *
Shape(Company)  
 name xsd:string 
 employee @Person *

It is possible to convert the DCTAP obtained to ShEx

rudof.dctap2shex()
from pyrudof import ShExFormatter, UmlGenerationMode
shex = rudof.get_shex()

result = rudof.serialize_shex(shex, ShExFormatter().without_colors())
print(result)
prefix dct: <http://purl.org/dc/terms/>
prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
prefix owl: <http://www.w3.org/2002/07/owl#>
prefix foaf: <http://xmlns.com/foaf/0.1/>
prefix xsd: <http://www.w3.org/2001/XMLSchema#>
prefix sdo: <https://schema.org/>
prefix schema: <http://schema.org/>
prefix ex: <http://example.org/>
prefix base: <http://base/>
ex:Person { ex:name xsd:string; ex:birthdate xsd:date ?; ex:worksFor @ex:Company * }
ex:Company { ex:name xsd:string; ex:employee @ex:Person * }

Validating data with the ShEx generated from DCTAP#

rudof.read_shex_str(result)
rudof.read_data_str("""
prefix : <http://example.org/>
prefix xsd: <http://www.w3.org/2001/XMLSchema#>
:alice :name "Alice" ;
       :birthdate "1970-01-01"^^xsd:date ;
       :worksFor :acme .
:acme  :name "ACME INC." .

:bob   :name 23 .
""")
rudof.read_shapemap_str(":alice@ex:Person, :bob@ex:Person")
validation_results = rudof.validate_shex()
print(validation_results.show_as_table())
╭────────┬───────────┬────────╮
│ Node   │ Shape     │ Status │
├────────┼───────────┼────────┤
│ :alice │ ex:Person │ OK     │
├────────┼───────────┼────────┤
│ :bob   │ ex:Person │ FAIL   │
╰────────┴───────────┴────────╯

Visualizing DCTAP content as a UML diagrams#

rudof.shex2plantuml_file(UmlGenerationMode(), 'out.puml')
!python -m plantuml out.puml
^C
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/opt/hostedtoolcache/Python/3.11.13/x64/lib/python3.11/site-packages/plantuml.py", line 230, in <module>
    main()
  File "/opt/hostedtoolcache/Python/3.11.13/x64/lib/python3.11/site-packages/plantuml.py", line 225, in main
    print(list(map(lambda filename: {'filename': filename,
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/hostedtoolcache/Python/3.11.13/x64/lib/python3.11/site-packages/plantuml.py", line 226, in <lambda>
    'gen_success': pl.processes_file(filename, directory=args.out)}, args.files)))
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/hostedtoolcache/Python/3.11.13/x64/lib/python3.11/site-packages/plantuml.py", line 199, in processes_file
    content = self.processes(data)
              ^^^^^^^^^^^^^^^^^^^^
  File "/opt/hostedtoolcache/Python/3.11.13/x64/lib/python3.11/site-packages/plantuml.py", line 169, in processes
    response, content = self.http.request(url, **self.request_opts)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/hostedtoolcache/Python/3.11.13/x64/lib/python3.11/site-packages/httplib2/__init__.py", line 1727, in request
    (response, content) = self._request(
                          ^^^^^^^^^^^^^^
  File "/opt/hostedtoolcache/Python/3.11.13/x64/lib/python3.11/site-packages/httplib2/__init__.py", line 1447, in _request
    (response, content) = self._conn_request(conn, request_uri, method, body, headers)
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/hostedtoolcache/Python/3.11.13/x64/lib/python3.11/site-packages/httplib2/__init__.py", line 1399, in _conn_request
    response = conn.getresponse()
               ^^^^^^^^^^^^^^^^^^
  File "/opt/hostedtoolcache/Python/3.11.13/x64/lib/python3.11/http/client.py", line 1395, in getresponse
    response.begin()
  File "/opt/hostedtoolcache/Python/3.11.13/x64/lib/python3.11/http/client.py", line 325, in begin
    version, status, reason = self._read_status()
                              ^^^^^^^^^^^^^^^^^^^
  File "/opt/hostedtoolcache/Python/3.11.13/x64/lib/python3.11/http/client.py", line 286, in _read_status
    line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/hostedtoolcache/Python/3.11.13/x64/lib/python3.11/socket.py", line 718, in readinto
    return self._sock.recv_into(b)
           ^^^^^^^^^^^^^^^^^^^^^^^
KeyboardInterrupt
Image("out.png")
_images/ab06d753e9534e13ddd6ac5aab335b514143ae362587bdff4c48206701d455af.png