On Attacks and Defenses for LLMs and LLM-based Agents

With the widespread adoption of large language models (LLMs), LLM-based agents are increasingly becoming integrated into everyday human life. However, alongside their rapid development comes a growing and evolving landscape of attacks and security threats. These risks can not only undermine the performance and stability of the models themselves but also pose significant challenges to user privacy, data integrity, and overall system trustworthiness. Consequently, ensuring the security of LLMs and their powered agents has become a critical and urgent issue.


Existing attack methods can be broadly categorized into several types, including prompt injection attacks, which involve crafting malicious inputs to induce abnormal model behaviors; data poisoning attacks, where malicious data is inserted during training or inference; and data leakage attacks, which exploit the model’s output channels to exfiltrate sensitive information. Such attacks occur not only in online environments but can also be combined with real-world interactions, creating more complex cross-domain threats.


Defending against these attacks poses multiple challenges. On one hand, the high complexity and black-box nature of LLMs make attack detection and attribution inherently difficult. On the other hand, defense strategies must balance security with system performance, avoiding overprotection that could significantly degrade efficiency. Moreover, as attack techniques rapidly evolve, static
security measures often fail to respond in time to emerging threats, necessitating defense mechanisms that are continuously updated and adaptive.

Related Posts