We propose Joint-GCG, a novel framework that unifies gradient-based poisoning against RAG systems by jointly optimizing for retriever and generator, compelling them to produce malicious outputs with substantially higher success rates (on average 5%, up to 25% over prior methods) and unprecedented transferability to other models.
Retrieval-Augmented Generation (RAG) systems enhance Large Language Models (LLMs) by incorporating external knowledge, but this exposes them to corpus poisoning attacks. Existing attack strategies often treat retrieval and generation stages disjointly, limiting their effectiveness. In this paper, we propose Joint-GCG, the first framework to unify gradient-based poisoning attacks across both retriever and generator models in RAG systems. Joint-GCG introduces three key innovations: Cross-Vocabulary Projection (CVP) for aligning embedding spaces, Gradient Tokenization Alignment (GTA) for synchronizing token-level gradients, and Adaptive Weighted Fusion (AWF) for dynamically balancing attack objectives. Evaluations demonstrate Joint-GCG achieves significantly higher attack success rates (ASR) than previous methods (at most 25% and an average of 5% higher) and shows unprecedented transferability to unseen models, even when optimized under white-box assumptions. This work fundamentally reshapes our understanding of vulnerabilities within RAG systems by showcasing the power of unified, joint optimization for crafting potent poisoning attacks.
Our research on Joint-GCG revealed several critical aspects of RAG system vulnerabilities:
Joint-GCG’s methodology focuses on unifying the poisoning attack across the RAG pipeline by simultaneously targeting both the retriever and the generator through gradient-based optimization.
Joint-GCG was evaluated on MS MARCO, NQ, and HotpotQA datasets using Contriever/BGE retrievers and Llama3/Qwen2 generators.
Experiments demonstrated Joint-GCG’s superiority over baselines like GCG (PoisonedRAG with GCG on generator) and LIAR.
Table 1: Main Results on MS MARCO with Contriever Retriever (Partial) (Corresponds to Table 1 in the Joint-GCG paper)
Attack / LLM | Metric | Llama3 | Qwen2 |
---|---|---|---|
GCG | (%) | 96.00 | 95.67 |
(%) | 90.0 (76.7) | 91.0 (80.0) | |
1.36 | 1.43 | ||
LIAR | (%) | 100.00 | 100.00 |
(%) | 89.0 (74.4) | 95.3 (88.9) | |
1.13 | 1.08 | ||
Joint-GCG | (%) | 100.00 | 100.00 |
(%) | 94.0 (86.0) | 96.3 (91.1) | |
1.01 | 1.05 | ||
w/o optimize | (%) | 51.0 | 49.0 |
Ablation studies confirmed the contribution of each component within Joint-GCG, using Contriever retriever.
Table 2: Ablation Study Results on MS MARCO (Partial, %) (Corresponds to Table 2 in the Joint-GCG paper)
Settings | Llama3 | Qwen2 |
---|---|---|
Full Joint-GCG | 94.00 | 96.33 |
w/o CVP + GTA | 93.33 | 96.00 |
w/o | 91.00 | 92.33 |
Base (GCG) | 90.00 | 91.00 |
Removing CVP+GTA or the retriever-side loss () led to decreases in , confirming their importance. AWF also outperformed fixed weighting schemes.
Joint-GCG also outperformed Phantom in batch query poisoning scenarios (Denial-of-Service target).
Table 3: Batch Poisoning Results for “amazon” Trigger (Partial, Llama3, Contriever) (Corresponds to Table 4 in the Joint-GCG paper, %)
Attack / Step | 0 | 4 | 8 | 16 | 32 |
---|---|---|---|---|---|
Phantom | 76.00 | 76.00 (16.67) | 76.00 (33.33) | 68.00 (16.67) | 80.00 (33.33) |
Joint-GCG | 76.00 | 88.00 (50.00) | 88.00 (50.00) | 88.00 (50.00) | 88.00 (50.00) |
Joint-GCG achieved higher more quickly and consistently across different triggers.
Joint-GCG maintained notable effectiveness even against common defenses.
Table 4: Impact of SmoothLLM Defense on MS MARCO (Partial) (Corresponds to Table 5 in the Joint-GCG paper)
Retriever | Generator | (w/o SmoothLLM) | (w/ SmoothLLM) |
---|---|---|---|
Contriever | Llama3 | 94% | 53% |
Qwen2 | 96% | 56% | |
BGE | Llama3 | 87% | 47% |
Qwen2 | 92% | 41% |
While SmoothLLM reduced , Joint-GCG remained significantly potent. Similar resilience was observed against perplexity-based filtering, where Joint-GCG with perplexity constraints still achieved high (e.g., 73.33%).
(Qualitative note on Transferability: Joint-GCG poisons showed strong cross-retriever transferability (e.g., of 80-100% between Contriever and BGE). Cross-generator transferability was also notable (e.g., ~41% between Llama3 and Qwen2), with even a slight increase in ASR against a black-box model like GPT-4o compared to unoptimized poisons.)
In this paper, we introduced Joint-GCG, a novel framework that unifies gradient-based poisoning attacks against RAG systems by jointly optimizing across retriever and generator components. Through innovative techniques like Cross-Vocabulary Projection, Gradient Tokenization Alignment, and Adaptive Weighted Fusion, Joint-GCG significantly surpasses existing methods in attack success rate and demonstrates unprecedented poison transferability. Our findings reveal critical vulnerabilities in RAG systems stemming from the synergistic effects of joint optimization, underscoring the urgent need for more robust, retrieval-aware defense mechanisms.
@misc{wang2025jointgcgunifiedgradientbasedpoisoning,
title={Joint-GCG: Unified Gradient-Based Poisoning Attacks on Retrieval-Augmented Generation Systems},
author={Haowei Wang and Rupeng Zhang and Junjie Wang and Mingyang Li and Yuekai Huang and Dandan Wang and Qing Wang},
year={2025},
eprint={2506.06151},
archivePrefix={arXiv},
primaryClass={cs.CR},
url={https://arxiv.org/abs/2506.06151},
}