Markdown Frontmatter
What this is
MarkdownFrontmatter is a serialization of an RDF directed labeled graph.
The mathematical invariant comes from the RDF data model (W3C 2004; Cyganiak, Wood, Lanthaler 2014). RDF (Resource Description Framework) represents knowledge as a directed labeled graph G = (V, E) where:
- V is a set of nodes — IRIs (named resources) or literals (typed values)
- E is a set of directed labeled edges — triples (subject, predicate, object) where subject ∈ IRIs, predicate ∈ IRIs, object ∈ IRIs ∪ Literals
A YAML frontmatter block is a serialization of the RDF triples for one subject node. Each .md file is a subject IRI; each frontmatter key is a predicate IRI; each frontmatter value is the object — an IRI if it references a known entity id, a typed literal otherwise. The complete set of all frontmatter blocks in the corpus is a serialization of the full RDF graph over the corpus.
This encoding requires no separate .ttl data files: the graph is distributed across the frontmatter of the entity files themselves.
Frontmatter-to-graph conversion
- Subject IRI: derived from the entity’s
idfield using the repository’s base namespace (<repo:{id}>) - Predicate IRI: each frontmatter key maps to
<repo:{key}> - Object: IRI if value matches a known entity
id; typed literal otherwise - Structured values: YAML lists under a key produce one triple per list item, or blank nodes with sub-triples for nested structure
SHACL validation
SHACL (Shapes Constraint Language; W3C 2017) is a constraint language over RDF graphs. A SHACL shape is a formula in a restricted fragment of first-order logic that targets a subset of nodes and states properties they must satisfy. The constraint language is decidable and polynomially tractable for the core fragment.
One SHACL shape file per predicate lives at shacl/{predicate-id}.ttl. Each shape uses sh:targetSubjectsOf :{predicate-id} to target every node carrying that predicate. The shapes applicable to a given node are exactly those for the predicates in its frontmatter — the node selects its own applicable constraints.
shacl/id.ttl targets every node (every node has id) and states universal constraints: description is required, id is a kebab-case string, filename matches. Every other shape adds constraints specific to one predicate.
Validation pipeline
- Parse the entity file’s YAML frontmatter
- Convert to an in-memory
rdflib.Graph(no disk write) - For each frontmatter key, load
shacl/{key}.ttlif it exists - Merge the applicable shapes into one shape graph
- Call
pyshacl.validate(data_graph, shacl_graph=shapes) - Fail if
conformsis False; return the violation report
Cross-file queries load multiple entity files, merge their frontmatter graphs into one in-memory graph, and run SPARQL against the merged graph.
Open questions
- Whether the graph should be materialized to disk (as Turtle or JSON-LD) for use by external tools, or kept in-memory only.
- Whether SHACL shapes cover all currently used frontmatter predicates, or whether there are unvalidated predicates in the wild.