Theorem of necessary misalignment of truth value under epistemic constraint
Abstract
We establish a general theorem describing the inevitable trade-off between proxy optimization and semantic fidelity under finite epistemic capacity. A bounded agent is modeled as an encoder producing messages from inputs , subject to a rate constraint . The environment defines a latent semantic variable and a computable proxy reward . When the sufficient statistics for differ from those sufficient for , optimization that increases expected reward necessarily increases semantic distortion and decreases decoder-level semantic information .
This necessary misalignment follows from the geometry of the achievable region in rate–distortion space and holds for any selection process that monotonically increases reward. The result formalizes a general informational limit on alignment in bounded optimization.
Introduction
Bounded rational agents must compress observations into finite representations to act on or communicate about the world. When their optimization objective depends on a computable proxy that is only partially informative about a semantic variable , the achievable trade-off between reward and semantic fidelity forms a Pareto frontier. We show that, under mild assumptions, any selection dynamics that increase expected reward move the system along this frontier in a direction that necessarily increases semantic distortion and reduces semantic information. The theorem does not depend on any specific architecture, loss function, or empirical domain.
Probabilistic Setting
Let be a probability space supporting random variables (semantic or “truth” variable), (input context), (encoded message), and (semantic decoding). The joint source specifies the dependence between and .
Definition (Encoder and Rate Constraint). An encoder is a Markov kernel . It is rate-bounded if where mutual information is computed under .
Definition (Semantic Distortion). Fix a measurable loss . For an encoder–decoder pair , the semantic distortion is with .
A proxy reward is any measurable function . The agent’s expected reward is .
Information-Theoretic Preliminaries
Definition (Mutual Information). For random variables with joint law , .
Definition (Semantic Rate–Distortion Function). Given and distortion , the semantic rate–distortion function is , the minimal information rate required to achieve expected distortion . The function is non-increasing and convex.
Lemma (Data-Processing Inequality). For , with equality iff is -sufficient for .
Proxy–Semantic Mismatch and Achievable Region
Assumption (Strict Proxy–Semantic Mismatch). No encoder–decoder pair with simultaneously maximizes and minimizes . Equivalently, no statistic of that is sufficient for is also reward-optimal at rate .
Assumption (Convexity and Time-Sharing). For fixed , the set is convex and compact, as ensured by bounded losses and time-sharing.
Proposition (Monotone Trade-Off Frontier). Under strict mismatch, the efficient frontier of satisfies where differentiable.
Proof sketch. If the frontier had non-positive slope, one could increase reward without increasing distortion, contradicting strict mismatch. Convexity guarantees existence and monotonicity of the frontier.
Selection Dynamics
Let be a population distribution over encoders that evolves under replicator or logit dynamics with fitness and selection intensity . Larger concentrates on encoders with higher expected reward.
Theorem 1
Theorem (Necessary Misalignment under Epistemic Constraint). Fix a finite rate , a distortion measure , and source distribution . Assume strict proxy–semantic mismatch and convexity of . Then there exists such that for all , [ \frac{dD_T(\kappa)}{d\kappa}>0,\qquad \frac{dR_T(D_T(\kappa))}{d\kappa}<0. ] If the decoder is semantically efficient (), then .
Proof sketch. As selection intensity increases, shifts toward reward-maximizing encoders on . By the monotone frontier, increases with expected reward. Because is non-increasing, decreases. Under semantic efficiency, , yielding a strict decline of semantic information with .
Corollary (Information Bound)
By the data-processing inequality, . Hence a decrease of implies a non-increasing lower bound on , quantifying unavoidable semantic information loss under intensified optimization.
Remarks and Edge Cases
- Sufficiency. If the proxy reward is -sufficient for , the frontier may be locally flat, and misalignment need not increase. This equality case is excluded by the strict mismatch assumption.
- Scope of “necessary.” Necessity is with respect to the assumptions: finite rate, mismatch, convexity, and selection that increases reward efficiently.
- Why decoder-level information. Semantic performance is realized through the decoded variable ; rate–distortion bounds directly relate and , and DPI then connects to .
Discussion
The theorem identifies misalignment as a structural consequence of limited epistemic capacity. Whenever optimization intensifies for a mismatched proxy under fixed rate , the system traverses the achievable frontier, sacrificing semantic information about to improve computable reward. Improving alignment therefore requires epistemic expansion (increasing ) or proxy refinement (reducing mismatch between and ). The result applies to any bounded optimizer, regardless of implementation, and situates alignment limits within classical rate–distortion theory and evolutionary dynamics.
References
- C. E. Shannon. A Mathematical Theory of Communication. Bell System Technical Journal 27 (1948).
- T. M. Cover & J. A. Thomas. Elements of Information Theory. Wiley (2006).
- I. Csiszár & J. Körner. Information Theory: Coding Theorems for Discrete Memoryless Systems. Cambridge (2011).
- A. Kolchinsky & D. H. Wolpert. Semantic Information and Its Measures. Entropy 20 (12): 884 (2018).
- J. Hofbauer & K. Sigmund. Evolutionary Games and Population Dynamics. Cambridge (1998).