Information-Theoretic Stability as Reward Function
Table of contents
Abstract
Consider a system adapting to a fixed reference system . We define the stability reward of as the rate of divergence minimization between successive states of , and show that this quantity decomposes into two terms: the rate at which ’s marginal behavior converges to target, and the rate at which ’s mutual information with increases. Both terms are functions of the coupling between and . The stability of is not an intrinsic property of — it is a reward function of ’s relationship to .
When evolves by gradient descent on the joint divergence from equilibrium, the composite divergence decreases monotonically, and the two channels of convergence — behavioral and informational — must sum to that monotonic decrease at every instant. Neither can stall without the other compensating. Mutual information between and converges to the equilibrium value.
1. Introduction
Information theory measures uncertainty and its reduction in probabilistic terms. This paper asks a specific question: when a system adapts toward a fixed reference , what governs the rate at which stabilizes?
The answer has a clean form. ’s stability — the rate at which its successive states become less divergent — decomposes into exactly two components: convergence of ’s marginal distribution toward target, and increase of mutual information between and . Both are functions of the relationship between and . has no stability independent of ; its stability reward is a reward function of the coupling.
This derivation uses the Fisher–Rao structure of probability manifolds (Amari 2016, 2021) and the chain rule for KL divergence (Csiszár & Shields 2004). It requires no assumptions regarding semantics or intentionality.
2. Setting
2.1 The Adapting System
Let have state distribution over a finite alphabet , and let have a fixed distribution over . The joint distribution is , where only ’s conditional response changes over time.
The space of joint distributions, , carries the Fisher–Rao metric
where denotes the Kullback–Leibler divergence (Amari 2016).
2.2 Target Equilibrium
Let be a target joint distribution sharing ’s marginal. The equilibrium mutual information is .
3. Stability Reward
3.1 Definition
Define the instantaneous stability reward of the composite system as
expressed in nats per timestep. This is nonpositive, with at distributional stationarity.
3.2 Decomposition
Because is fixed, for all . The chain rule for KL divergence (Csiszár & Shields 2004) applied to the joint gives:
where . The joint divergence has exactly two nonnegative moving parts: the marginal divergence of , and the mutual information gap .
Proof. The general chain rule gives . Since , the second term vanishes, yielding the stated identity.
3.3 Accounting Identity
Taking the time derivative of the decomposition:
Rearranging:
This is the accounting identity: the rate of mutual information change equals the rate of marginal divergence change minus the rate of joint divergence change.
3.4 Interpretation
The stability reward of decomposes into two channels:
Both terms on the right describe aspects of ’s relationship to . Behavioral convergence measures how ’s marginal approaches target. Informational coupling measures how responsive is becoming to . Their sum — the composite convergence — is the stability reward of the coupled system, which is monotonically nonneg under gradient descent (Section 4).
’s stability is not a property of alone. It is a reward function of ’s coupling to .
4. Convergence
4.1 Theorem
Let evolve by Fisher–Rao gradient descent on . Then:
(a) Monotone composite convergence.
(b) Convergence of all components. and as .
(c) Two-channel constraint. At every instant, behavioral convergence and MI improvement must sum to the (nonneg) composite convergence rate. Neither can stall without the other compensating.
(d) MI dominates marginal. at all times, with the gap equal to the composite convergence rate.
Proof. Part (a) is the standard property of gradient flow on a convex functional.
For (b): since and all terms are nonneg, forces each to vanish.
Part (c) restates the accounting identity under the constraint .
For (d): from the accounting identity, .
4.2 What Part (d) Says
MI can never be doing worse than ’s marginal. If ’s marginal is converging, MI might dip, but by at most the same amount. If ’s marginal is getting worse, MI must be improving by at least as much. The worst case for MI relative to the marginal is , which occurs only at equilibrium ().
5. Related Work and Positioning
The tools — KL divergence, Fisher–Rao metric, natural gradient descent — are standard information geometry (Amari 2016). The chain rule is textbook (Cover & Thomas 2006; Csiszár & Shields 2004). The connection between divergence minimization and entropy production is established in nonequilibrium thermodynamics (Crooks 1999; Jaynes 1957).
The contribution is the specific framing:
- Stability as coupling. By fixing one system and watching the other adapt, the stability reward becomes a function of the relationship, not of either system in isolation. The definition of stability reward is the negative entropy production rate by another name, but framing it as a reward of coupling orients the analysis differently.
- Two-channel decomposition. The asymmetric chain rule gives exactly two channels (behavioral + informational). This is tighter than the symmetric case, which has three components and allows more complex fluctuation.
- The MI-dominates-marginal inequality. Part (d) of the theorem — that MI improvement is always at least as large as marginal improvement — is a structural constraint on the path of adaptation, not just on its endpoint.
6. Limitations
The framework assumes is fixed. When both systems adapt simultaneously, the decomposition has three moving parts and the MI-dominates-marginal inequality does not hold. The framework does not explain why would minimize divergence — that is an external constraint (physical, biological, or engineered). Bounding the magnitude of temporary MI dips in terms of system parameters remains open.
7. Conclusion
When a system adapts to a fixed reference, its stability reward is a function of the coupling between them. The joint divergence decomposes into marginal convergence and mutual information change — two channels that must sum to the monotonically nonneg composite convergence at every instant. MI improvement always matches or exceeds marginal improvement. The stability of one system is the reward function of its relationship to the other.
References
- Amari, S. (2016). Information Geometry and Its Applications. Springer.
- Amari, S. (2021). “Information Geometry and Its Role in Statistical Inference.” Entropy, 23(1), 110.
- Cover, T. M., & Thomas, J. A. (2006). Elements of Information Theory (2nd ed.). Wiley.
- Crooks, G. E. (1999). “Entropy Production Fluctuation Theorem and the Nonequilibrium Work Relation.” Physical Review E, 60(3), 2721–2726.
- Csiszár, I., & Shields, P. (2004). “Information Theory and Statistics: A Tutorial.” Foundations and Trends in Communications and Information Theory, 1(4), 417–528.
- Jaynes, E. T. (1957). “Information Theory and Statistical Mechanics.” Physical Review, 106(4), 620–630.
- Shannon, C. E. (1948). “A Mathematical Theory of Communication.” Bell System Technical Journal, 27, 379–423, 623–656.
References
[amari2016] S. Amari. (2016). Information Geometry and Its Applications. Springer.
[amari2021] S. Amari. (2021). Information Geometry and Its Role in Statistical Inference. Entropy.
[cover2006] T. M. Cover, J. A. Thomas. (2006). Elements of Information Theory. Wiley.
[crooks1999] G. E. Crooks. (1999). Entropy Production Fluctuation Theorem and the Nonequilibrium Work Relation. Physical Review E.
[csiszar2004] I. Csiszár, P. Shields. (2004). Information Theory and Statistics: A Tutorial. Foundations and Trends in Communications and Information Theory.
[jaynes1957] E. T. Jaynes. (1957). Information Theory and Statistical Mechanics. Physical Review.
[shannon1948] C. E. Shannon. (1948). A Mathematical Theory of Communication. Bell System Technical Journal.