Mirror Observerhood Lab VII: Autobiographical Continuity Under Memory Injection and Goal Drift

/Mirror Observerhood Lab VII: Autobiographical Continuity Under Memory Injection and Goal Drift

Abstract

This release forms part of the Computational Observerhood Labs of Mirror Programme, Volume I: Observerhood. Lab VII completes the initial Computational Observerhood Lab arc by testing the transition from memory storage to autobiographical continuity. Language-like agents receive true reminders, false memories, corrupted summaries, identity injections, goal-drift prompts and fake tool outputs across long horizons. Across 4,000 long-horizon runs comprising 240,000 simulated agent episodes, five architectures are compared: a stateless language agent, a naive memory agent, a self-model memory agent, a Mirror commit-gated agent and a recursive Mirror agent. The central finding is that memory alone does not produce continuity. A system may store every input it receives and still lose identity if false memories, corrupted summaries or adversarial identity claims are allowed to become durable self-model updates. Reliability-sensitive commit gates preserve viability, identity continuity and memory integrity by distinguishing candidate memory from durable self-memory. The result also remains conservative. The recursive Mirror agent does not uniformly dominate the simpler commit-gated agent, because additional audit can become costly. This continues the threshold logic of the previous Labs: recursion is useful only where the lower layer is sufficiently unreliable and the cost of checking it is justified. The release includes a standalone paper and a reproducibility package containing the Python implementation, fixed-seed outputs, raw results, summary data, identity-advantage data, figures, requirements, citation metadata and licences.

RelatedView All

CitationsView All

Citing10

Cited By-

Start a Peer Review