<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>On-Device-Inference on emsenn.net</title>
    <link>https://emsenn.net/tags/on-device-inference/</link>
    <description>Recent content in On-Device-Inference on emsenn.net</description>
    <generator>Hugo</generator>
    <language>en</language>
    <lastBuildDate>Sun, 08 Mar 2026 00:00:00 +0000</lastBuildDate>
    <atom:link href="https://emsenn.net/tags/on-device-inference/index.xml" rel="self" type="application/rss+xml" />
    <item>
      <title>Fine-Tuning Local Models on a Knowledge Corpus</title>
      <link>https://emsenn.net/library/domains/engineering/domains/tech/domains/computing/domains/on-device-inference/fine-tuning-local-models/</link>
      <pubDate>Sun, 08 Mar 2026 00:00:00 +0000</pubDate>
      <guid>https://emsenn.net/library/domains/engineering/domains/tech/domains/computing/domains/on-device-inference/fine-tuning-local-models/</guid>
      <description>&lt;p&gt;Fine-tuning adapts a pre-trained &lt;a href=&#34;terms/large-language-model.md&#34; class=&#34;link-internal&#34;&gt;large language model&lt;/a&gt; to perform better on a specific domain or task by training it further on a curated dataset. For a structured knowledge repository — thousands of markdown files with frontmatter, cross-references, and discipline-specific vocabulary — fine-tuning could produce a model that generates content matching the repository&amp;rsquo;s conventions without explicit prompting.&lt;/p&gt;&#xA;&lt;h2 id=&#34;what-fine-tuning-does&#34;&gt;&lt;a href=&#34;#what-fine-tuning-does&#34; class=&#34;heading-anchor&#34; aria-label=&#34;Link to this section&#34;&gt;¶&lt;/a&gt;What fine-tuning does&#xA;&lt;/h2&gt;&#xA;&lt;p&gt;A base model like &lt;a href=&#34;terms/qwen.md&#34; class=&#34;link-internal&#34;&gt;Qwen&lt;/a&gt; 2.5 3B knows general language but nothing about semiotic markdown, CamelCase tags, or the difference between a term and a concept in this repository&amp;rsquo;s type system. Every inference call must include the frontmatter specification, valid type list, and formatting rules in the prompt — consuming context window and adding latency.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Model Selection for Local Inference Tasks</title>
      <link>https://emsenn.net/library/domains/engineering/domains/tech/domains/computing/domains/on-device-inference/model-selection-for-local-inference/</link>
      <pubDate>Sun, 08 Mar 2026 00:00:00 +0000</pubDate>
      <guid>https://emsenn.net/library/domains/engineering/domains/tech/domains/computing/domains/on-device-inference/model-selection-for-local-inference/</guid>
      <description>&lt;p&gt;Running multiple local &lt;a href=&#34;terms/large-language-model.md&#34; class=&#34;link-internal&#34;&gt;large language models&lt;/a&gt; on heterogeneous hardware — CPU via &lt;a href=&#34;../../software/ollama/index.md&#34; class=&#34;link-internal&#34;&gt;Ollama&lt;/a&gt; and &lt;a href=&#34;../../terms/neural-processing-unit.md&#34; class=&#34;link-internal&#34;&gt;NPU&lt;/a&gt; via Foundry Local — requires a strategy for which model handles which task. The wrong choice wastes either time (running a large model on a simple classification) or quality (running a tiny model on a nuanced generation task).&lt;/p&gt;&#xA;&lt;h2 id=&#34;task-categories&#34;&gt;&lt;a href=&#34;#task-categories&#34; class=&#34;heading-anchor&#34; aria-label=&#34;Link to this section&#34;&gt;¶&lt;/a&gt;Task categories&#xA;&lt;/h2&gt;&#xA;&lt;p&gt;Local inference tasks in a repository management context fall into three categories:&lt;/p&gt;&#xA;&lt;p&gt;&lt;strong&gt;Classification&lt;/strong&gt; tasks assign labels, scores, or categories to content. Examples: scoring triage file relevance (0-3), tagging content type (term/concept/text), identifying target discipline. These tasks have constrained output (a label or short JSON), benefit from low latency, and tolerate lower model capability. A 3B-parameter model performs comparably to a 7B model on well-prompted classification.&lt;/p&gt;</description>
    </item>
  </channel>
</rss>
