Model Selection for Local Inference Tasks

Sun, 08 Mar 2026 00:00:00 +0000

Running multiple local large language models on heterogeneous hardware — CPU via Ollama and NPU via Foundry Local — requires a strategy for which model handles which task. The wrong choice wastes either time (running a large model on a simple classification) or quality (running a tiny model on a nuanced generation task).

¶Task categories

Local inference tasks in a repository management context fall into three categories:

Classification tasks assign labels, scores, or categories to content. Examples: scoring triage file relevance (0-3), tagging content type (term/concept/text), identifying target discipline. These tasks have constrained output (a label or short JSON), benefit from low latency, and tolerate lower model capability. A 3B-parameter model performs comparably to a 7B model on well-prompted classification.

Task-Routing on emsenn.net

Model Selection for Local Inference Tasks

¶Task categories