The Semiotic Publisher
The Semiotic Publisher
From Markdown Vault to Semiotic Universe and Static Outputs
This document explains how the Semiotic Publisher (semiotic_site.py) turns a Markdown vault into:
- a finite approximation of the Semiotic Universe (atoms, fragments, semantics, evaluation), and
- a set of static outputs (HTML site, JSON API, RDF graph, OpenAPI schema, and optional Pandoc documents).
It is the concrete, computable companion to the theoretical universes described in semiotic-universe.md and stewardable-semiotic-concept-universe.md.
1. High‑Level Overview
At a high level, the publisher does the following:
- Scan a vault of Markdown files with YAML frontmatter.
- Parse concepts: each file becomes a
Conceptwith metadata, body, links, and annotations. - Interpret annotations into atoms (tags, folders, links, status, type, facts, temporal markers).
- Configure j‑relations: a graph of implications between atoms (folder→tag, synonyms, status/type→tag).
- Compute fragments and semantics by iterated closure under
jandG(trace). - Evaluate concepts with a simple quantitative score based on semantic features.
- Build indices (tag, folder, type, status) over the universe.
- Export one or more targets:
html: a static site with an index and concept pages,json: a static JSON API,rdf: an RDF graph of concepts and atoms,openapi: a static OpenAPI description of the JSON API,pandoc: Markdown suitable for Pandoc (single concept or combined “book”).
The pipeline is driven from the CLI entrypoint and configured via a YAML file (semiotic.config.yml) plus command‑line overrides.
2. Concepts, Atoms, and the Universe
2.1 Concepts
Each Markdown file in the vault corresponds to a Concept:
- ID: derived from frontmatter
id, or the relative path (slugified). - Title:
titlefrom frontmatter, or the file stem. - Body: the Markdown content (later rendered to HTML for the site and to Pandoc Markdown for documents).
- Metadata:
tags: list of free‑form tags;type: concept type;status: lifecycle or quality status;aliases: alternative names;subjects/facts: structured subject–predicate–object assertions;created,updated: temporal metadata.
- Structural context:
folders: path segments from the vault root;links: internal links from Markdown links and[[wikilinks]].
All of this is assembled by load_markdown_concept, which uses an internal YAML frontmatter parser and a small wiki/Markdown link parser.
2.2 Atoms
Atoms are the finite feature vocabulary of the publisher:
tag:foofolder:researchlink:some-concept-idstatus:draft,type:definitionalias:alternate-namefact:predicate:subject->objectcreated:2025-01-01,created_year:2025,updated:…,updated_year:…
Each (kind, value) pair is interned as a unique Atom with an integer ID.
The universe maintains:
- a map from
(kind, value)to atom ID, - the reverse map from ID to
Atom, and - relations between atoms (for
j‑closure).
2.3 Universe
The Universe is the finite structure over which the publisher computes:
atoms: all atoms that appear in any concept;concepts: mapping from concept ID toConcept;j_relations: adjacency information for the modal closure;indices: derived indices (tag,folder,type,status) built from semantic features;config: the configuration used to build this universe.
Although the theoretical universe is an infinite Heyting structure, this implementation uses a finite feature set (atoms) and a graph closure as an approximation.
3. From Frontmatter to Annotations and Features
3.1 Annotation Terms
The publisher treats frontmatter and structure as a tiny annotation calculus:
HasTag(tag)for each frontmattertagsentry,HasStatus(status)for frontmatterstatus,HasType(type)for frontmattertype,AliasFor(alias)for frontmatteraliases,InFolder(folder)for each folder segment,LinksTo(target)for each internal link,Fact(predicate, subject, object)for frontmatterfacts.
These are stored as AnnotationTerm values on each concept.
3.2 Annotation Interpreters
The global registry ANNOTATION_INTERPRETERS maps annotation operators to small functions:
HasTag→ creates atagatom;InFolder→ creates afolderatom;LinksTo→ creates alinkatom;HasStatus→ creates astatusatom;HasType→ creates atypeatom;AliasFor→ creates analiasatom;Fact→ creates afactatom of the formpredicate:subject->object.
interpret_annotations runs all interpreters for a concept and produces its base feature set.
3.3 Temporal Features and G‑Trace
Temporal metadata (created, updated) is turned into atoms:
created:YYYY-MM-DD,created_year:YYYY,updated:YYYY-MM-DD,updated_year:YYYY.
The G_trace operation enriches any feature set with these temporal atoms, providing a minimal approximation to the comonadic trace from the theory.
4. Modal Closure, Fragments, and Semantics
4.1 j‑Closure as Graph Reachability
The modal operator j is implemented as a graph closure on atoms:
j_relationsis a directed graph whose nodes are atoms;j_closuretakes a feature set and returns all atoms reachable via this graph;- this satisfies the usual closure properties (extensive, monotone, idempotent) in the finite setting.
Configuration (see §6) determines many of these relations (folder→tag, tag synonyms, type/status→tag).
4.2 Fragments via j∘G Fixed Point
For each concept:
- Start from its base features (from annotations).
- Apply
G_traceto add temporal atoms. - Apply
j_closureto propagate along the graph. - Iterate until a fixed point is reached.
This fixed point is the concept’s fragment: the least j,G‑closed feature set containing the seed.
4.3 Semantic Contribution
The semantic feature set of a concept is built by one more iteration of the same closure process starting from its fragment. This provides a slight “semantic halo” around the fragment, in line with the theoretical notion of semantic contribution.
4.4 Evaluation
Each concept receives a simple evaluation score based on its semantic features:
- +1 for each
tagatom, - +0.5 for each
linkatom, - +0.3 for each
factatom, - +0.1 for each temporal year atom (
created_year,updated_year).
This is a deliberately simple monoid‑like score; in the future it can be replaced by more sophisticated stewardship‑informed metrics.
5. Indices and Relatedness
5.1 Indices
Once all concepts have semantic features, the publisher builds indices:
- Tag index:
tag → {concept IDs}, - Folder index:
folder → {concept IDs}, - Type index:
type → {concept IDs}, - Status index:
status → {concept IDs}.
These indices support:
- the HTML index page (displaying tag and folder counts, top tags, top concepts), and
- the JSON API and RDF exports (structured access to the same information).
5.2 Related Concepts
Relatedness between concepts uses Jaccard similarity over semantic feature sets:
[ \mathrm{rel}(k, k') = \frac{|Sem(k) \cap Sem(k')|} {|Sem(k) \cup Sem(k')|}. ]
For each concept, the publisher:
- computes related concepts with positive similarity,
- sorts them by decreasing similarity,
- keeps the top few, and
- displays them on the HTML concept page as a “related concepts” mini‑index.
6. Configuration
Configuration is loaded from a YAML file (e.g. semiotic.config.yml) into Config. Key fields:
-
Identity and routing
base_url: base URL for API and RDF URIs;site_title: title displayed in the HTML site header.
-
Vault and output
vault_dir: path to the Markdown vault;output_dir: where generated files are written.
-
Mode and targets
mode:"site"or"doc";targets: list of enabled exporters (html,json,rdf,openapi,pandoc).
-
Pandoc
pandoc_enabled: gate for Pandoc exports;pandoc_formats: e.g.["pdf", "epub"];pandoc_template: optional Pandoc template;pandoc_filters: list of Pandoc filters;pandoc_output_single: override output filename for single‑concept export;pandoc_book_title: title for combined documents.
-
Vault filtering and semantic rules
ignore: glob patterns for files to skip;folder_tag_rules: mapfolder → tag, used to configure j‑relations;tag_synonyms: symmetric tag synonym relations;type_tag_rules: maptype → tag;status_tag_rules: mapstatus → tag.
These rules are applied when configuring j_relations and therefore affect fragments, semantics, evaluation, and indices.
7. CLI Usage
The script exposes a small CLI with subcommands. The primary entrypoint is:
python -m semiotic_site publish [VAULT_DIR] [OUTPUT_DIR] [options]
or, directly:
./semiotic_site.py publish [VAULT_DIR] [OUTPUT_DIR] [options]
7.1 Commands
publish: main command (also aliased asbuild).
7.2 Arguments and Options
-
vault_dir(positional, optional):
path to the Markdown vault. If omitted, the config’svault_diris used. -
output_dir(positional, optional):
output directory. If omitted, the config’soutput_diris used. -
--config PATH:
path to the YAML config file (e.g.semiotic.config.yml). -
--dry-run:
run the full pipeline but do not write any files. -
--targets "html,json,rdf,openapi,pandoc":
override the list of export targets for this run. -
--mode site|doc:
overridemodefrom the config (site= static site & APIs,doc= Pandoc‑focused). -
--allow-pandoc:
allow invoking Pandoc forpandoctargets (otherwise the publisher will print a message and skip). -
--concept ID:
when usingpandoctargets, restrict export to a single concept ID.
7.3 Example Invocations
Build a full site and APIs using config defaults:
./semiotic_site.py publish --config semiotic.config.yml
Build only HTML and JSON to a specific directory:
./semiotic_site.py publish ./vault ./public \
--targets html,json \
--config semiotic.config.yml
Produce a single‑concept PDF via Pandoc:
./semiotic_site.py publish ./vault ./pub \
--mode doc \
--targets pandoc \
--allow-pandoc \
--concept some-concept-id \
--config semiotic.config.yml
8. Exported Targets
8.1 HTML Site
When html is in targets, the publisher writes:
-
site/index.html:
overview metrics, top tags, top concepts, concept list, tag list. -
site/concepts/<id>.html:
one page per concept, with:- title and evaluation meter,
- metadata and tags in a left margin column,
- rendered Markdown body,
- related concepts based on Jaccard similarity.
Templates are small Jinja2 strings embedded in the script; CSS is inline for fully static, self‑contained pages.
8.2 JSON API
When json is in targets, the publisher writes:
-
api/concepts/index.json:
summary information for all concepts. -
api/concepts/<id>.json:
detailed information for each concept, including fragment and semantic features rendered as atom lists.
These JSON payloads can serve lightweight clients or be used for downstream processing.
8.3 RDF
When rdf is in targets, the publisher writes an RDF graph (typically api/graph.ttl or similar) encoding:
- concepts as RDF resources,
- atoms as resources or literals,
- relations induced by semantic features (tags, folders, links, facts, temporal markers).
The exact vocabulary is simple and pragmatic, aimed at making the semiotic universe queryable with SPARQL.
8.4 OpenAPI
When openapi is in targets, the publisher emits:
api/openapi.json: an OpenAPI 3 specification describing the JSON API.
This makes it easy to generate clients or explore the API using standard tools.
8.5 Pandoc
When pandoc is in targets and pandoc_enabled is true:
- Each selected concept (or a combined set of concepts) is converted to Markdown with:
- YAML frontmatter capturing key metadata,
- the concept body,
- an optional appendix listing semantic features.
The publisher then invokes pandoc with configured formats, templates, and filters to produce documents such as PDF or EPUB.
9. Extending the Publisher
Some common extension points:
-
New annotation operators:
add a new@register_annotation("OpName")function mapping frontmatter or structural patterns to atoms. -
New j‑relations:
extendfolder_tag_rules,tag_synonyms,type_tag_rules, orstatus_tag_rulesin the config to change how fragment closure behaves. -
Alternative evaluation metrics:
modifyevaluate_conceptto incorporate new signals (e.g. length of body, stewardship status, or user‑provided weights). -
Additional exporters:
add newexport_*functions and register them inexport_allandtargetshandling.
These extensions stay within the same finite‐feature and closure framework, preserving alignment with the underlying theoretical universe.
10. Relationship to the Theoretical Docs
semiotic-universe.mddescribes the idealized, infinite Heyting–modal–comonadic universe and its fusion mechanism.stewardable-semiotic-concept-universe.mddescribes a stewardship‑oriented extension with failure semantics, deltas, and sheaf‑like behavior.- This document explains the concrete publisher implementation that:
- approximates the universe using finite atoms and graph closure,
- treats concepts as vault documents,
- and exports multiple static views (site, API, RDF, documents) suitable for stewardship and analysis.
In short, the Semiotic Publisher is the bridge between vault practice and semiotic theory: it turns annotated Markdown into a working, inspectable instance of the semiotic universe.