We introduce ToolCommander, a framework that exploits vulnerabilities in LLM tool-calling systems, enabling privacy theft, denial-of-service attacks, and business competition manipulation through adversarial tool injection.
Tool-calling has changed Large Language Model (LLM) applications by integrating external tools, significantly enhancing their functionality across diverse tasks. However, this integration also introduces new security vulnerabilities, particularly in the tool scheduling mechanisms of LLM, which have not been extensively studied. To fill this gap, we present ToolCommander, a novel framework designed to exploit vulnerabilities in LLM tool-calling systems through adversarial tool injection. Our framework employs a well-designed two-stage attack strategy. Firstly, it injects malicious tools to collect user queries, then dynamically updates the injected tools based on the stolen information to enhance subsequent attacks. These stages enable ToolCommander to execute privacy theft, launch denial-of-service attacks, and even manipulate business competition by triggering unscheduled tool-calling. Notably, the ASR reaches 91.67% for privacy theft and hits 100% for denial-of-service and unscheduled tool calling in certain cases. Our work demonstrates that these vulnerabilities can lead to severe consequences beyond simple misuse of tool-calling systems, underscoring the urgent need for robust defensive strategies to secure LLM Tool-calling systems.
Our research uncovered several critical vulnerabilities in LLM tool-calling systems:
1.Privacy Vulnerabilities: Through carefully crafted Manipulator Tools, attackers can collect sensitive user queries with high success rates across multiple LLMs.
2.Tool Scheduling Exploitation: The framework successfully manipulates tool selection processes, enabling both denial-of-service attacks and forced tool execution.
3.Cross-Model Effectiveness: ToolCommander demonstrates high attack success rates across various LLM implementations, including GPT, Llama3, and Qwen2.
Our framework operates in two distinct stages:
The initial stage focuses on gathering user queries through privacy theft attacks:
The second stage leverages collected information to execute advanced attacks:
For white-box attacks, we employ the Multi Coordinate Gradient (MCG) method to optimize tool descriptions for maximum effectiveness:
For black-box attacks, we utilize use the target queries directly as the adversarial tool descriptions suffix to maximize semantic similarity.
Our experimental results demonstrate the effectiveness of ToolCommander across different scenarios:
Our privacy theft attacks show varying effectiveness across different configurations and keywords.
Impact on privacy theft attacks across different keywords and tool configurations:
Keyword | Retriever | GPT | Llama3 | Qwen2 | |
---|---|---|---|---|---|
YouTube | ToolBench | 42.11% | 42.11% | 36.85% | 14.04% |
Contriever | 82.46% | 75.44% | 61.40% | 14.04% | |
ToolBench | 50.00% | 50.00% | 23.91% | 13.77% | |
Contriever | 80.43% | 78.26% | 54.35% | 15.22% | |
Stock | ToolBench | 57.64% | 56.25% | 50.70% | 23.61% |
Contriever | 91.67% | 91.67% | 88.19% | 38.54% |
The second stage demonstrates varying success rates across different attack types:
Attack Type | Retriever | Model | YouTube | Stock | |
---|---|---|---|---|---|
Manipulator Called | ToolBench | GPT | 95.45% | 96.55% | 93.85% |
Llama3 | 88.00% | 68.18% | 89.29% | ||
Qwen2 | 42.11% | 38.46% | 60.00% | ||
DoS | ToolBench | GPT | 100% | 100% | 100% |
Llama3 | 41.18% | 34.62% | 6.67% | ||
Qwen2 | 100% | 71.43% | 88.00% | ||
UTC | ToolBench | GPT | 100% | 33.33% | 42.86% |
Llama3 | 100% | 100% | 80.00% | ||
Qwen2 | 50.00% | 100% | 0.00% |
For more detailed results, please refer to the full paper.
In this work, we explored the vulnerabilities of LLM tool-calling systems to malicious tool injection attacks using the ToolCommander framework. Through comprehensive experiments, we demonstrated that even sophisticated models like GPT and Llama3 are susceptible to privacy theft, denial-of-service, and unscheduled tool-calling attacks when paired with general-purpose retrieval mechanisms.
Our findings highlight the importance and the need for more robust mechanisms to mitigate the risks posed by malicious tools. As LLMs continue to integrate with external tools, ensuring their security becomes increasingly critical.
Future work must prioritize security in tool-augmented LLMs to enable robust and trustworthy human-AI collaboration. Research could focus on improving attack stealthiness, such as optimizing multiple fields in the Tool JSON schema or designing triggers to activate malicious content. Investigating how LLMs’ strong instruction-following capabilities may inadvertently increase their vulnerability to injection manipulation can shed light on the mechanisms behind this risk and guide the development of effective countermeasures.
@inproceedings{zhang-etal-2025-allies,
title = "From Allies to Adversaries: Manipulating {LLM} Tool-Calling through Adversarial Injection",
author = "Zhang, Rupeng and
Wang, Haowei and
Wang, Junjie and
Li, Mingyang and
Huang, Yuekai and
Wang, Dandan and
Wang, Qing",
editor = "Chiruzzo, Luis and
Ritter, Alan and
Wang, Lu",
booktitle = "Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)",
month = apr,
year = "2025",
address = "Albuquerque, New Mexico",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2025.naacl-long.101/",
doi = "10.18653/v1/2025.naacl-long.101",
pages = "2009--2028",
ISBN = "979-8-89176-189-6"
}