Objective

The semantic pipeline is the path from human-authored prose to machine-actionable knowledge: prose is encoded in frontmatter, frontmatter generates TTL/RDF, TTL is queryable through MCP tools, and agents use those tools to reason about and improve the repository.

The pipeline runs in both directions. The encoding direction (prose→frontmatter→TTL→MCP→Agent) makes repository knowledge available to agents. The action direction (Agent→skills→MCP→scripts→repository) lets agents improve the repository. Improving the encoding direction makes the action direction work smoother — agents that can query the predicate graph make better decisions about what to improve.

What exists

  • Prose: extensive content across disciplines
  • Frontmatter: semantic-frontmatter spec with typed relations, partially populated (8,130 triplets across 2,991 pages)
  • TTL: RDF generation script exists (generate-rdf.py), SHACL shapes and OWL ontologies written for 5 domains
  • MCP: ASR MCP server with 9 tools (find, triage, enrich, validate, plans, skills, frontmatter enrichment via Ollama)
  • Agent: skills and policies for agent work
  • Predicate graph: satisfaction checking against domain axiom registries (91.3% satisfaction rate, 37 errors, 308 warnings)

What is missing

  • TTL generation is not integrated into the build
  • SHACL validation is not automated or available via MCP
  • Predicate graph satisfaction is not available via MCP
  • SPARQL querying is not available for the main repository (only in the separate rdf-cms prototype)
  • Frontmatter enrichment (Ollama) covers triage but not published files
  • The 37 errors and 308 warnings are known but not being systematically addressed

Key results

  1. Predicate graph satisfaction checking available as an MCP tool
  2. SHACL validation available as an MCP tool
  3. At least one frontmatter enrichment skill works on published files (not just triage)
  4. Agent can query “what are the weakest files?” and get actionable results from the predicate graph
  5. Agent can run an improvement cycle: query weakness → enrich frontmatter → verify satisfaction improved

Constraints

  • Existing infrastructure is substantial — build on it, do not replace
  • Progressive automation: each improvement should make the next one easier for a less capable agent
  • Ollama must be running for inference-based operations