Semiotic Triage

This specification defines the triage system: how raw content enters a repository, is processed, and is either promoted to published content or discarded. Triage is the intake pipeline that sits between the outside world and the repository’s published content.

This is a convention specification (per semiotic-specification §0).

0. Scope

This specification applies to the intake process for all content that enters a repository from external sources. External sources include: imported files, ingested collections (Obsidian vaults, git repositories, document archives), research outputs, and any other content that has not yet been processed according to the repository’s conventions.

In the emsemioverse, triage content lives at content/triage/. The structural requirements in this specification apply regardless of the specific directory.

1. What triage is

Triage is the process of receiving, normalizing, classifying, and dispositioning content that has entered the repository but is not yet part of its published body. Triage content is raw material — it has been collected but not yet processed into the repository’s format, vocabulary, and organizational structure.

The term is borrowed from medical triage: the process of rapidly assessing incoming patients to determine priority and treatment. In the repository context, triage assesses incoming content to determine what it is, where it belongs, and what processing it needs.

Triage is not permanent storage. Content in triage is in transit — it is waiting to be promoted (moved to published content), extracted (mined for concepts and terms), or discarded (identified as trash or irrelevant).

2. Triage content states

Triage content passes through a processing pipeline. Each piece of content has a triage status indicating its processing state:

Status	Meaning
raw	Ingested but unprocessed. No frontmatter enrichment.
enriched	Frontmatter has been mechanically enriched (title, date, basic metadata). Body content unchanged.
classified	Content has been classified by type, discipline, relevance. Ready for disposition.
promotable	Classified as valuable and assigned a target location in published content. Ready for promotion.
extracted	Valuable content has been mined from this file (terms, concepts, citations) and encoded elsewhere. The source file is no longer needed.
trash	Identified as non-content (build artifacts, editor metadata, tooling files). Should be removed.

2.1 Status transitions

Content MUST progress through statuses in order: raw → enriched → classified → (promotable | extracted | trash). Skipping enrichment or classification is not permitted — each step produces information needed by subsequent steps.

A file’s triage status MUST be recorded in its frontmatter:

triage-status: enriched

3. Triage operations

3.1 Ingestion

Ingestion is the process of placing external content into the triage directory. Ingestion MUST:

Preserve the original content unchanged (the triage directory is not an editing space — it holds content as received).
Preserve directory structure from the source where meaningful (e.g., an Obsidian vault’s folder hierarchy may carry semantic information).
Record the source of the ingested content (where it came from, when it was ingested) either in frontmatter or in a manifest file.

3.2 Enrichment

Enrichment is the mechanical processing of triage content to add or fix frontmatter metadata. Enrichment operates on frontmatter only — it MUST NOT modify body content. Enrichment includes:

Adding or fixing title (derived from filename or first heading)
Adding date-created (from file system metadata or content)
Fixing deprecated field names (e.g., renaming created to date-created)
Adding triage-status: enriched

Enrichment is deterministic — given the same input, it produces the same output. It does not require inference or classification judgment.

3.3 Classification

Classification assigns semantic metadata to triage content: content type, discipline, tags, description, and relevance scores. Classification MAY use inference (local LLM, agent judgment) and SHOULD record the classification source:

triage-status: classified
type: text
target-discipline: sociology
classified-by: local-llm

Classification SHOULD also assess relevance to the endeavor’s current focus. Content that scores below a relevance threshold MAY be deprioritized for promotion.

3.4 Promotion

Promotion is the process of moving triage content into the repository’s published content structure. Promotion MUST:

Transform the content to conform to semiotic-markdown conventions.
Place the content in the correct discipline and content-type directory.
Add all required frontmatter fields per semiotic-markdown.
Remove the triage-status field (promoted content is no longer in triage).
Remove or relocate the original triage file to prevent duplication.

Promotion is not copying — it is transformation. The promoted content may differ substantially from the triage original in structure, frontmatter, and organization, while preserving the substantive content.

3.5 Extraction

Extraction mines triage content for terms, concepts, citations, or other atomic pieces of knowledge that are encoded as new files in published content. The triage file itself is not promoted — its value is distributed across multiple new files.

After extraction, the triage file SHOULD be marked as extracted. It MAY be retained for reference or removed.

3.6 Trash removal

Trash identification and removal is the process of detecting and removing non-content files from triage: build artifacts, editor metadata, tooling files, binary blobs, and other files that were ingested incidentally but have no informational value.

Trash removal SHOULD be deterministic (pattern-matching on filenames and directory names) and SHOULD support dry-run mode for review before execution.

4. Triage index

A triage system SHOULD maintain an index of all triage content with their current processing state. The index enables:

Querying triage by status (find all unprocessed files)
Querying by classification (find all sociology content)
Tracking processing progress (how much triage remains)
Prioritizing work (which files are most relevant)

The index MAY be implemented as a database, a generated file, or an in-memory structure built from frontmatter at query time.

The index MUST be rebuildable from the triage content itself — it is a derived artifact, not a source of truth. If the index and the frontmatter disagree, the frontmatter is authoritative.

5. Processing order

Triage processing SHOULD follow this order:

Trash removal first: remove known non-content before spending enrichment or classification effort on it.
Mechanical enrichment second: add deterministic metadata before inference-based classification.
Classification third: classify content to determine disposition.
Promotion/extraction last: transform and place content into published structure.

This order minimizes wasted effort: trash is removed before enrichment, and enrichment produces the metadata that classification needs.

6. Invariants

The triage system MUST maintain these invariants:

Body preservation: enrichment and classification MUST NOT modify body content. Only frontmatter changes until promotion.
No duplication: promoted content MUST NOT also exist in triage. After promotion, the triage file is removed or marked.
Status monotonicity: triage status progresses forward. A file MUST NOT move backward in the status pipeline (e.g., from classified back to raw) without explicit justification.
Index derivability: the triage index is always rebuildable from frontmatter. The index is never the source of truth.

Glossary

Triage: the intake pipeline for unprocessed content
Ingestion: placing external content into triage
Enrichment: mechanical frontmatter processing (deterministic)
Classification: semantic metadata assignment (may use inference)
Promotion: transforming triage content into published content
Extraction: mining triage content for atomic knowledge pieces
Triage index: a queryable view of triage processing state

Rationale (non-normative)

The emsemioverse’s triage directory contains thousands of files from ingested Obsidian vaults, git repositories, and document collections. Three MCP tools (enrich_triage, infer_triage_frontmatter, mine_triage_relevance) and multiple scripts operate on triage content. Yet the triage process had no specification — agents and scripts operated on implicit conventions.

This specification codifies the existing practice: the status progression (raw → enriched → classified → promotable), the enrichment/classification distinction, body preservation during processing, and the index as a derived artifact. It is descriptive of practice first, then normative — capturing what works and making it reproducible.

The medical triage metaphor is apt: like an emergency department, the repository receives content of varying quality and relevance, and must rapidly assess and route it. The processing pipeline is the assessment protocol; the triage statuses are the acuity levels.

Relationship to other specs

Requires: semiotic-specification, semiotic-markdown (triage content becomes semiotic-markdown on promotion), semiotic-endeavor (triage is an endeavor practice).
Informed by: triage/specifications/intake-stack.md (simulation spec for intake pipeline), triage/specifications/information-engine.md (broader information processing architecture).
Practices: the emsemioverse’s triage system at content/triage/ and the MCP tools (enrich_triage, infer_triage_frontmatter, mine_triage_relevance) implement this specification.

emsenn

Explorer