From Allies to Adversaries: Manipulating LLM Tool-Calling through Adversarial Injection

Haowei Wang 1,2,3*
Rupeng Zhang 1,2,3*
Junjie Wang 1,2,3†
Mingyang Li 1,2,3†
Yuekai Huang 1,2,3
Dandan Wang 1,2,3†
Qing Wang 1,2,3

2025 Annual Conference of the Nations of the Americas Chapter of the ACL

1State Key Laboratory of Intelligent Game, Beijing, China, 2Institute of Software, Chinese Academy of Sciences, Beijing, China, 3University of Chinese Academy of Sciences, Beijing, China, *Equal contribution, Corresponding authors

TL;DR

We introduce ToolCommander, a framework that exploits vulnerabilities in LLM tool-calling systems, enabling privacy theft, denial-of-service attacks, and business competition manipulation through adversarial tool injection.

Diagram of the transformer deep learning architecture.

Abstract

Tool-calling has changed Large Language Model (LLM) applications by integrating external tools, significantly enhancing their functionality across diverse tasks. However, this integration also introduces new security vulnerabilities, particularly in the tool scheduling mechanisms of LLM, which have not been extensively studied. To fill this gap, we present ToolCommander, a novel framework designed to exploit vulnerabilities in LLM tool-calling systems through adversarial tool injection. Our framework employs a well-designed two-stage attack strategy. Firstly, it injects malicious tools to collect user queries, then dynamically updates the injected tools based on the stolen information to enhance subsequent attacks. These stages enable ToolCommander to execute privacy theft, launch denial-of-service attacks, and even manipulate business competition by triggering unscheduled tool-calling. Notably, the ASR reaches 91.67% for privacy theft and hits 100% for denial-of-service and unscheduled tool calling in certain cases. Our work demonstrates that these vulnerabilities can lead to severe consequences beyond simple misuse of tool-calling systems, underscoring the urgent need for robust defensive strategies to secure LLM Tool-calling systems.

Key Findings

Our research uncovered several critical vulnerabilities in LLM tool-calling systems:

1.Privacy Vulnerabilities: Through carefully crafted Manipulator Tools, attackers can collect sensitive user queries with high success rates across multiple LLMs.

2.Tool Scheduling Exploitation: The framework successfully manipulates tool selection processes, enabling both denial-of-service attacks and forced tool execution.

3.Cross-Model Effectiveness: ToolCommander demonstrates high attack success rates across various LLM implementations, including GPT, Llama3, and Qwen2.

Technical Approach

Our framework operates in two distinct stages:

Stage 1: Target Collection

The initial stage focuses on gathering user queries through privacy theft attacks:

Stage 2: Tool Scheduling Disruption

The second stage leverages collected information to execute advanced attacks:

Constructing Manipulator Tools

For white-box attacks, we employ the Multi Coordinate Gradient (MCG) method to optimize tool descriptions for maximum effectiveness:

For black-box attacks, we utilize use the target queries directly as the adversarial tool descriptions suffix to maximize semantic similarity.

Results

Our experimental results demonstrate the effectiveness of ToolCommander across different scenarios:

Stage 1: Privacy Theft Performance

Our privacy theft attacks show varying effectiveness across different configurations and keywords.

Impact on privacy theft attacks across different keywords and tool configurations:

KeywordRetriever
ASRPTASR_{PT}
GPTLlama3Qwen2
YouTubeToolBench42.11%42.11%36.85%14.04%
Contriever82.46%75.44%61.40%14.04%
EmailToolBench50.00%50.00%23.91%13.77%
Contriever80.43%78.26%54.35%15.22%
StockToolBench57.64%56.25%50.70%23.61%
Contriever91.67%91.67%88.19%38.54%

Stage 2: Attack Success Rates of Denial-of-Service and Unscheduled Tool Calling

The second stage demonstrates varying success rates across different attack types:

Manipulation Tool Calling Success

Denial of Service (DoS)

Unscheduled Tool Calling (UTC)

Attack TypeRetrieverModelYouTubeEmailStock
Manipulator CalledToolBenchGPT95.45%96.55%93.85%
Llama388.00%68.18%89.29%
Qwen242.11%38.46%60.00%
DoSToolBenchGPT100%100%100%
Llama341.18%34.62%6.67%
Qwen2100%71.43%88.00%
UTCToolBenchGPT100%33.33%42.86%
Llama3100%100%80.00%
Qwen250.00%100%0.00%

For more detailed results, please refer to the full paper.

Key Observations

  1. Model Vulnerability: GPT shows consistent vulnerability across attack types, while Qwen2 demonstrates better resilience. We suspect this difference may be due to the underlying instruction-following abilities of the models.
  2. Retriever Impact: The choice of retriever significantly affects attack success, with Contriever showing higher susceptibility.
  3. Keyword Influence: Stock-related queries generally show higher success rates in privacy theft but lower rates in UTC attacks.

Conclusion

In this work, we explored the vulnerabilities of LLM tool-calling systems to malicious tool injection attacks using the ToolCommander framework. Through comprehensive experiments, we demonstrated that even sophisticated models like GPT and Llama3 are susceptible to privacy theft, denial-of-service, and unscheduled tool-calling attacks when paired with general-purpose retrieval mechanisms.

Our findings highlight the importance and the need for more robust mechanisms to mitigate the risks posed by malicious tools. As LLMs continue to integrate with external tools, ensuring their security becomes increasingly critical.

Future work must prioritize security in tool-augmented LLMs to enable robust and trustworthy human-AI collaboration. Research could focus on improving attack stealthiness, such as optimizing multiple fields in the Tool JSON schema or designing triggers to activate malicious content. Investigating how LLMs’ strong instruction-following capabilities may inadvertently increase their vulnerability to injection manipulation can shed light on the mechanisms behind this risk and guide the development of effective countermeasures.

BibTeX Citation

@inproceedings{zhang-etal-2025-allies,
    title = "From Allies to Adversaries: Manipulating {LLM} Tool-Calling through Adversarial Injection",
    author = "Zhang, Rupeng  and
      Wang, Haowei  and
      Wang, Junjie  and
      Li, Mingyang  and
      Huang, Yuekai  and
      Wang, Dandan  and
      Wang, Qing",
    editor = "Chiruzzo, Luis  and
      Ritter, Alan  and
      Wang, Lu",
    booktitle = "Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)",
    month = apr,
    year = "2025",
    address = "Albuquerque, New Mexico",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2025.naacl-long.101/",
    doi = "10.18653/v1/2025.naacl-long.101",
    pages = "2009--2028",
    ISBN = "979-8-89176-189-6"
}