Fine-Tuning Local Models on a Knowledge Corpus

Sun, 08 Mar 2026 00:00:00 +0000

Fine-tuning adapts a pre-trained large language model to perform better on a specific domain or task by training it further on a curated dataset. For a structured knowledge repository — thousands of markdown files with frontmatter, cross-references, and discipline-specific vocabulary — fine-tuning could produce a model that generates content matching the repository’s conventions without explicit prompting.

¶What fine-tuning does

A base model like Qwen 2.5 3B knows general language but nothing about semiotic markdown, CamelCase tags, or the difference between a term and a concept in this repository’s type system. Every inference call must include the frontmatter specification, valid type list, and formatting rules in the prompt — consuming context window and adding latency.

Model Selection for Local Inference Tasks

Sun, 08 Mar 2026 00:00:00 +0000

Running multiple local large language models on heterogeneous hardware — CPU via Ollama and NPU via Foundry Local — requires a strategy for which model handles which task. The wrong choice wastes either time (running a large model on a simple classification) or quality (running a tiny model on a nuanced generation task).

¶Task categories

Local inference tasks in a repository management context fall into three categories:

Classification tasks assign labels, scores, or categories to content. Examples: scoring triage file relevance (0-3), tagging content type (term/concept/text), identifying target discipline. These tasks have constrained output (a label or short JSON), benefit from low latency, and tolerate lower model capability. A 3B-parameter model performs comparably to a 7B model on well-prompted classification.

On-Device-Inference on emsenn.net

Fine-Tuning Local Models on a Knowledge Corpus

¶What fine-tuning does

Model Selection for Local Inference Tasks

¶Task categories