controlled vocabulary
A controlled vocabulary is a curated, maintained list of terms used for labeling, indexing, and categorizing content within a collection. Its purpose is consistency: ensuring that the same concept is always referred to by the same term, and that different concepts are not accidentally given the same label.
Controlled vocabularies range in complexity:
- Term lists: flat lists of approved terms (e.g., a list of valid tags for a knowledge base).
- Taxonomies: hierarchical arrangements of terms with broader/narrower relationships.
- Thesauri: terms with broader, narrower, related, and “use for” relationships (e.g., Library of Congress Subject Headings).
- Ontologies: formal specifications of classes, properties, and relationships, often machine-readable (e.g., OWL ontologies for the semantic web).
Controlled vocabularies require maintenance [@hedden_AccidentalTaxonomist_2016]. As a collection grows, new terms are needed, old terms become obsolete, and relationships between terms shift. Without maintenance, the vocabulary drifts and consistency degrades.