Large language model (LLM)–based agents are rapidly transitioning from isolated question-answering systems to autonomous entities capable of multi-step reasoning, tool interaction, and long-term task execution. This evolution significantly expands their operational capabilities but also introduces new security risks. Prior research has extensively examined several major attack vectors against LLM systems, including prompt injections, jailbreak attacks, and data poisoning, which primarily focus on immediate or short-horizon model manipulation. But a new class of stealthy, long‑term attacks is emerging: dormant injection attacks.
Beyond these well-studied attack categories, recent research indicates that LLM-based agents face an even more subtle and insidious class of threats when operating in long-term, multi-step settings. As agents increasingly rely on persistent memory, tool usage, and extended execution pipelines, attackers are no longer constrained to triggering malicious behaviors immediately. Instead, they can plant instructions that remain dormant for long periods and only activate under specific future conditions.
We conduct a systematic investigation of this class of attacks in financial task agent settings. Our preliminary experiments already reveal several concerning observations. First, dormant injections can persist within an agent’s memory or instruction history for extended periods without immediately triggering existing safeguards. In controlled financial task environments, we observe that once the delayed activation condition is met, these latent instructions can reliably influence downstream decision making. Notably, this attack demonstrates consistent effectiveness across different large language model backends and under varying prompt formulations.
More importantly, many commonly adopted defense strategies exhibit limited effectiveness and practicality when confronted with such long-horizon dependencies. Techniques that perform well against immediate prompt injection attacks often fail to detect these threats until after the LLM-based agent has already executed risky actions. In many cases, the original injection source cannot be reliably traced. This suggests that the current security evaluation paradigm, which largely centers on short context attacks, may substantially underestimate the real risk surface faced by modern LLM agents.
Taken together, these early observations indicate that dormant injection is not merely a theoretical concern but a practical threat that can naturally emerge in real world agent workflows. Addressing this challenge will require defenses that reason over long-term agent trajectories, track the provenance and influence of instructions, and continuously reassess previously accepted inputs as new contexts arise.
Author: Ching-Yu Kao, Fraunhofer AISEC
