<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>Task-Routing on emsenn.net</title>
    <link>https://emsenn.net/tags/task-routing/</link>
    <description>Recent content in Task-Routing on emsenn.net</description>
    <generator>Hugo</generator>
    <language>en</language>
    <lastBuildDate>Sun, 08 Mar 2026 00:00:00 +0000</lastBuildDate>
    <atom:link href="https://emsenn.net/tags/task-routing/index.xml" rel="self" type="application/rss+xml" />
    <item>
      <title>Model Selection for Local Inference Tasks</title>
      <link>https://emsenn.net/library/domains/engineering/domains/tech/domains/computing/domains/on-device-inference/model-selection-for-local-inference/</link>
      <pubDate>Sun, 08 Mar 2026 00:00:00 +0000</pubDate>
      <guid>https://emsenn.net/library/domains/engineering/domains/tech/domains/computing/domains/on-device-inference/model-selection-for-local-inference/</guid>
      <description>&lt;p&gt;Running multiple local &lt;a href=&#34;terms/large-language-model.md&#34; class=&#34;link-internal&#34;&gt;large language models&lt;/a&gt; on heterogeneous hardware — CPU via &lt;a href=&#34;../../software/ollama/index.md&#34; class=&#34;link-internal&#34;&gt;Ollama&lt;/a&gt; and &lt;a href=&#34;../../terms/neural-processing-unit.md&#34; class=&#34;link-internal&#34;&gt;NPU&lt;/a&gt; via Foundry Local — requires a strategy for which model handles which task. The wrong choice wastes either time (running a large model on a simple classification) or quality (running a tiny model on a nuanced generation task).&lt;/p&gt;&#xA;&lt;h2 id=&#34;task-categories&#34;&gt;&lt;a href=&#34;#task-categories&#34; class=&#34;heading-anchor&#34; aria-label=&#34;Link to this section&#34;&gt;¶&lt;/a&gt;Task categories&#xA;&lt;/h2&gt;&#xA;&lt;p&gt;Local inference tasks in a repository management context fall into three categories:&lt;/p&gt;&#xA;&lt;p&gt;&lt;strong&gt;Classification&lt;/strong&gt; tasks assign labels, scores, or categories to content. Examples: scoring triage file relevance (0-3), tagging content type (term/concept/text), identifying target discipline. These tasks have constrained output (a label or short JSON), benefit from low latency, and tolerate lower model capability. A 3B-parameter model performs comparably to a 7B model on well-prompted classification.&lt;/p&gt;</description>
    </item>
  </channel>
</rss>
