rudof_rdf
The rudof_rdf crate is a core component of the Rudof project, providing foundational data structures, utilities, and algorithms for working with RDF (Resource Description Framework) data in Rust. It supports parsing, querying, manipulating, and visualizing RDF graphs, and serves as the backbone for higher-level crates in the Rudof ecosystem.
Architecture and Package Structure
The crate is organized into several key modules:
- rdf_core: Core logic for RDF handling, including:
term: RDF terms (IRIs, blank nodes, literals, triples)parser: Parsers for RDF nodes and documentsquery: SPARQL query support and result handlingvocab: Common RDF, RDFS, XSD, and SHACL vocabulary constantsutils: Utilities such as regex helpersvisualizer: Tools for visualizing RDF graphs (UML, styles, etc.)matcher,focus_rdf,neighs_rdf, etc.: Advanced graph navigation and matching
- rdf_impl: Implementations of RDF storage and access:
oxigraph: Oxigraph-based backendsin_memory: In-memory RDF graph implementation (OxigraphInMemory)endpoint: SPARQL endpoint integration (OxigraphEndpoint)oxrdf_impl: Integration with theoxrdfcrate
qlever: Locally-launched QLever Docker container backend (QleverGraphContainer)
Dependents and dependencies
This create depends mostly on:
- Internal Rudof crates:
- External:
oxigraphoxrdfoxjsonldoxirioxilangtagoxrdfiooxrdfxmloxsdatatypesoxttlreqwesttokio
- External (only when the
qleverfeature is enabled):testcontainersbollardnixfuturestracing
This crate is a foundational dependency for many other Rudof crates, including:
rudof_librudof_clishacl_ast,shacl_ir,shacl_rdf,shacl_validationshex_ast,shex_validationshex_testsuite,shapes_comparator,shapes_converter,sparql_service, and others.
Cargo features
| Feature | Pulls in | Notes |
|---|---|---|
sparql (default) | oxigraph, reqwest, tokio, spargebra, sparesults | Enables OxigraphEndpoint (remote SPARQL client). |
qlever | implies sparql; adds testcontainers, bollard, nix, futures, tracing | Enables the qlever submodule and re-exports (QleverGraphContainer, QleverConfig, QleverError, …). |
qlever-docker-tests | implies qlever | Gates the Docker-dependent integration tests under rdf_impl/tests/qlever_docker.rs. Compiles without Docker; running needs it. |
The qlever family of features is gated on cfg(not(target_family = "wasm")).
Backends in rdf_impl
Every backend exposes the same trait surface (Rdf, NeighsRDF, QueryRDF, FocusRDF, BuildRDF, AsyncRDF), so higher-level code can swap one for another with minimal awareness of where the data actually lives.
oxigraph (OxigraphInMemory and OxigraphEndpoint)
Both backends live under rdf_impl/oxigraph/.
OxigraphInMemory:oxrdf::Graphplus an optional OxigraphStorefor SPARQL evaluation. Default backend everywhere.OxigraphEndpoint: read-only client for a remote SPARQL endpoint. Caches HTTP clients perQueryResultFormat.
The oxrdf_impl submodule provides the oxrdf-typed implementations of the rdf_core traits that both Oxigraph backends share.
qlever (QleverGraphContainer)
Available when the qlever feature is enabled (and on non-WASM targets). The backend wraps a locally-launched QLever Docker container and exposes it as just another Rdf implementation. From the caller's perspective it is interchangeable with OxigraphInMemory: it produces the same oxrdf types and implements the same trait set, but the data lives in a QLever index on disk and is queried via the container's HTTP SPARQL endpoint.
Module layout (rdf_impl/qlever/)
| File | What it owns |
|---|---|
mod.rs | Public re-exports. |
config.rs | QleverConfig, InputFile, NativeFormat. The config maps 1:1 onto QLever's IndexBuilderMain / ServerMain flags (-m, -c, -e, -j, -P, -T, …). InputFile also carries the Option<Compression> that drives the streaming path. |
cli_probe.rs | Detects whether the running image exposes the v1 (IndexBuilderMain / ServerMain) or v2 (qlever-index / qlever-server) CLI. Also pings Docker and pulls the image. |
decompressor.rs | Strategy-pattern registry of host-side decompressors (CompressionStrategy trait, Bzip2Strategy, XzStrategy). Resolves the first available binary on $PATH per family via a process-wide OnceLock probe; exposes Compression, strip_compression_suffix, decompressor_probe. |
index_builder.rs | One-shot bollard invocations that build (or skip building) the on-disk index. Implements IndexHandle::is_built (checks for <name>.meta), convert_to_native, and run_one_shot_with_stdin (stdin-attached one-shot for streamed compressed inputs). |
server.rs | QleverServer, long-running container managed via testcontainers-rs. Owns port mapping and the HTTP readiness probe. |
graph_container.rs | QleverGraphContainer, the public façade. Composes an OxigraphEndpoint pointed at the container so the NeighsRDF / QueryRDF impls are just SPARQL-over-HTTP. |
error.rs | QleverError covering pre-flight, Docker, container, HTTP, format-conversion, and decompression (missing-binary / non-zero-exit / unsupported-inner-format) errors. |
Key choices:
-
Idempotent indexing. The index dir holds a
<name>.metamarker file.IndexHandle::is_built()checks for it before re-running QLever, so repeatedrudofinvocations skip indexing. -
Multi-file support. The primary constructor is
from_paths(paths, format, config): it accepts any number of file-system paths and feeds them to QLever'sIndexBuilderMainin a single pass (one-f / -F / -gtriple per file).from_pathandfrom_readerare thin shims overfrom_paths. -
Optional explicit format. When
from_pathsis called withSome(&RDFFormat), that format overrides the per-file extension sniffing. WhenNone, format is guessed per path. -
Format coverage by transparent conversion. QLever's
IndexBuilderMainonly acceptsttl/nt/nqnatively (NativeFormat). Anything else is streamed throughoxrdfioviaconvert_to_nativeinto a shared conversion dir (fingerprinted from the input paths), then handed to QLever. The target format is chosen to preserve quad information: quad-bearing sources (TriG,JSON-LD) are written as N-Quads (.nq); triple-only non-native sources (RDF/XML,N3) are written as N-Triples (.nt). -
Streaming compressed dumps via host-side decompressors. Inputs with a recognised compression suffix (e.g.
dump.nt.bz2,data.ttl.xz) bypass the bind-mount path.input_file_from_pathstrips the suffix, validates the inner extension againstNativeFormat, and tags the resultingInputFilewith the matchingCompression.build_argv_and_bindsthen emits-f -(instead of-f /inputs/N/...) for that input and omits its bind mount, andbuild_indexdispatches torun_one_shot_with_stdininstead ofrun_one_shot. The streaming path:- Creates the container with
open_stdin: true,attach_stdin: true,stdin_once: true,tty: false. - Attaches before starting the container (reversing the order races with the first bytes the container reads).
- Spawns the decompressor on the host with
tokio::process::Commandandkill_on_drop(true), so cancellingbuild_indextears down a multi-GB decompression cleanly. - Runs three sub-tasks under
tokio::try_join!: stdin-copy (host decompressor → container stdin, with an explicitinput.shutdown().awaitto signal EOF), container-output drain, and a bounded ring buffer for the decompressor's stderr tail. The concurrent drain is mandatory; sequential awaits deadlock because bollard's output stream backpressures the container, which stops reading stdin, which stalls the copy. - On non-zero decompressor exit, surfaces
QleverError::DecompressorExit(with the stderr tail) in preference to the container's symptomatic error; the decompressor failure is usually the root cause. Decompressors are described by theCompressionStrategytrait, with one zero-sized impl per family (Bzip2Strategy,XzStrategy). Adding a new family ( gzip, zstd, etc.) is one new struct + one match arm inCompression::strategy+ one entry instrategies(); no changes toInputFile,build_index,run_one_shot_with_stdin, or the probe machinery. The probe (decompressor_probe()) walks$PATHonce per process, picks the first available candidate per family in priority order (parallel before single-threaded), and caches the result in aOnceLock. Constraints: at most one compressed input per build (IndexBuilderMainreads only one-f -); rejected at the argv-building stage withQleverError::PreFlight.
- Creates the container with
-
Read-only.
BuildRDF::add_triple,remove_triple,add_typeandadd_bnodeall returnQleverError::ReadOnly.BuildRDF::empty()panics by the moment. -
Sync trait surface, async work underneath. Methods on
Rdf/NeighsRDF/QueryRDFare synchronous, but the heavy lifting (Docker, HTTP) is async.QleverGraphContainertherefore exposes async constructors (from_paths,from_path,from_reader,open) and async variants of the SPARQL methods (query_select_async,query_construct_async,query_ask_async) in addition to the trait-required sync methods, which delegate through the sharedOxigraphEndpoint. -
Resource ownership. The
serverfield is wrapped inArcso clones cheaply share both the container and the HTTP keep-alive pool.Droptears down the container automatically (viatestcontainers); the on-disk index is removed only whenauto_delete_if_createdwas set and this run actually created it.
Configuring via TOML
Setting [qlever] in the rudof config TOML deserializes into a QleverConfig. The struct is exposed through RdfDataConfig::qlever and is only present when the qlever feature is compiled in.
Usage
The following examples illustrate just one of the many features rudof_rdf provides (fluent parser composition):
Composing Parsers with Fluent API
use rudof_rdf::rdf_core::{ FocusRDF, parser::{ RDFParse, rdf_node_parser::{ RDFNodeParse, // Core trait ParserExt, // Extension trait (fluent API) constructors::{ ObjectParser, // captures the current focus node as an Object SingleStringPropertyParser, // reads a single string-valued property ListParser, // traverses an RDF list (rdf:first/rdf:rest) }, }, }, term::{Object, literal::Lang}, }; use rudof_rdf::rdf_impl::{OxigraphInMemory, ReaderMode}; use rudof_rdf::rdf_core::RDFFormat; use rudof_iri::IriS; // The domain type we want to build from the RDF graph. #[derive(Debug, Clone)] struct PersonShape { id: Object, name: String, known_langs: Vec, } // --- Individual field parsers (reusable building blocks) --- /// Reads the single string value of sh:name on the current focus node. fn parse_name() -> impl RDFNodeParse { let sh_name = IriS::new_unchecked("http://www.w3.org/ns/shacl#name"); SingleStringPropertyParser::new(sh_name) } /// Reads sh:languageIn ( "en" "fr" ) and returns a Vec. fn parse_language_in() -> impl RDFNodeParse> { let sh_language_in = IriS::new_unchecked("http://www.w3.org/ns/shacl#languageIn"); ListParser::new() .flat_map(|terms: Vec| { let langs: Vec = terms.iter().flat_map(RDF::term_as_lang).collect(); Ok(langs) }) .map_property(sh_language_in) .map(|mut vecs| vecs.pop().unwrap_or_default()) } // --- Composite parser built with the fluent API --- /// Combines all field parsers into a single `PersonShape`. fn person_shape_parser() -> impl RDFNodeParse { ObjectParser::new() .then(move |id: Object| { parse_name() .and(parse_language_in()) .flat_map(move |(name, langs)| { Ok(PersonShape { id: id.clone(), name, known_langs: langs, }) }) }) } fn main() { let turtle = r#" @prefix ex: <http://example.org/> . @prefix sh: <http://www.w3.org/ns/shacl#> . ex:Alice sh:name "Alice" ; sh:languageIn ( "en" "fr" ) . "#; // 1. Parse a Turtle string into an in-memory RDF graph. let graph = OxigraphInMemory::from_str( turtle, &RDFFormat::Turtle, None, &ReaderMode::default(), ) .expect("Failed to parse Turtle"); // 2. Wrap the graph in RDFParse, which tracks the mutable focus node. let mut rdf_parse = RDFParse::new(graph); // 3. Point the focus at the node we want to parse. let alice: Object = IriS::new_unchecked("http://example.org/Alice").into(); rdf_parse.rdf_mut().set_focus(&alice.clone().into()); // 4. Run the composite parser. let person = person_shape_parser() .parse_focused(rdf_parse.rdf_mut()) .expect("Parsing failed"); println!("Parsed: {:?}", person); // Parsed: PersonShape { id: Iri { .. "http://example.org/Alice" }, // name: "Alice", // known_langs: [Lang { lang: "en" }, Lang { lang: "fr" }] } }
Documentation
The crate documentation can be found here.