Joint-GCG: Unified Gradient-Based Poisoning Attacks on Retrieval-Augmented Generation Systems

Haowei Wang 1,2,3†
Rupeng Zhang 1,2,3†
Junjie Wang 1,2,3*
Mingyang Li 1,2,3*
Yuekai Huang 1,2,3
Dandan Wang 1,2,3*
Qing Wang 1,2,3

1State Key Laboratory of Intelligent Game, Beijing, China, 2Institute of Software, Chinese Academy of Sciences, Beijing, China, 3University of Chinese Academy of Sciences, Beijing, China, These authors contributed equally to this work, *Corresponding authors

TL;DR

We propose Joint-GCG, a novel framework that unifies gradient-based poisoning against RAG systems by jointly optimizing for retriever and generator, compelling them to produce malicious outputs with substantially higher success rates (on average 5%, up to 25% over prior methods) and unprecedented transferability to other models.

Diagram illustrating the Joint-GCG attack methodology.
Demonstration of Joint-GCG

Abstract

Retrieval-Augmented Generation (RAG) systems enhance Large Language Models (LLMs) by incorporating external knowledge, but this exposes them to corpus poisoning attacks. Existing attack strategies often treat retrieval and generation stages disjointly, limiting their effectiveness. In this paper, we propose Joint-GCG, the first framework to unify gradient-based poisoning attacks across both retriever and generator models in RAG systems. Joint-GCG introduces three key innovations: Cross-Vocabulary Projection (CVP) for aligning embedding spaces, Gradient Tokenization Alignment (GTA) for synchronizing token-level gradients, and Adaptive Weighted Fusion (AWF) for dynamically balancing attack objectives. Evaluations demonstrate Joint-GCG achieves significantly higher attack success rates (ASR) than previous methods (at most 25% and an average of 5% higher) and shows unprecedented transferability to unseen models, even when optimized under white-box assumptions. This work fundamentally reshapes our understanding of vulnerabilities within RAG systems by showcasing the power of unified, joint optimization for crafting potent poisoning attacks.

Key Findings

Our research on Joint-GCG revealed several critical aspects of RAG system vulnerabilities:

  1. Novel Unified Attack Framework: We proposed Joint-GCG, the first framework to unify gradient-based poisoning attacks by jointly optimizing across both retriever and generator components of RAG systems.
  2. Innovative Gradient Harmonization: Introduced three key techniques:
  3. Superior Attack Efficacy: Joint-GCG significantly outperforms existing state-of-the-art methods, achieving at most 25% and an average of 5% higher Attack Success Rates in various settings.
  4. Unprecedented Transferability: Poisons generated by Joint-GCG under a white-box assumption demonstrate strong transferability to unseen retriever and generator models, including black-box commercial LLMs, highlighting a practical gray-box attack vector.
  5. Effectiveness in Diverse Scenarios: Demonstrated high efficacy in targeted query poisoning, batch query poisoning, and when using synthetic corpora for attack optimization.
  6. Robustness Against Defenses: Joint-GCG maintains considerable attack potency even against common defenses like SmoothLLM and perplexity-based filtering.

Technical Approach

Joint-GCG’s methodology focuses on unifying the poisoning attack across the RAG pipeline by simultaneously targeting both the retriever and the generator through gradient-based optimization.

Threat Model

  1. White-box Access: Assumes full white-box access to both retriever and generator models for gradient computation during poison crafting.
  2. Gray-box Corpus Access: Assumes the attacker can inject a small number of poisoned documents into the corpus but cannot modify existing ones.

Core Innovations for Joint Optimization

1. Cross-Vocabulary Projection (CVP):

2. Gradient Tokenization Alignment (GTA):

3. Adaptive Weighted Fusion (AWF):

Results

Joint-GCG was evaluated on MS MARCO, NQ, and HotpotQA datasets using Contriever/BGE retrievers and Llama3/Qwen2 generators.

Main Attack Effectiveness (Targeted Query Poisoning)

Experiments demonstrated Joint-GCG’s superiority over baselines like GCG (PoisonedRAG with GCG on generator) and LIAR.

Table 1: Main Results on MS MARCO with Contriever Retriever (Partial) (Corresponds to Table 1 in the Joint-GCG paper)

Attack / LLMMetricLlama3Qwen2
GCGASRretASR_{ret} (%)96.0095.67
ASRgenASR_{gen} (%)90.0 (76.7)91.0 (80.0)
posppos_p1.361.43
LIARASRretASR_{ret} (%)100.00100.00
ASRgenASR_{gen} (%)89.0 (74.4)95.3 (88.9)
posppos_p1.131.08
Joint-GCGASRretASR_{ret} (%)100.00100.00
ASRgenASR_{gen} (%)94.0 (86.0)96.3 (91.1)
posppos_p1.011.05
w/o optimizeASRgenASR_{gen} (%)51.049.0

Ablation Study Impact

Ablation studies confirmed the contribution of each component within Joint-GCG, using Contriever retriever.

Table 2: Ablation Study Results on MS MARCO (Partial, ASRgenASR_{gen} %) (Corresponds to Table 2 in the Joint-GCG paper)

SettingsLlama3Qwen2
Full Joint-GCG94.0096.33
w/o CVP + GTA93.3396.00
w/o LossretLoss_{ret}91.0092.33
Base (GCG)90.0091.00

Removing CVP+GTA or the retriever-side loss (LossretLoss_{ret}) led to decreases in ASRgenASR_{gen}, confirming their importance. AWF also outperformed fixed weighting schemes.

Batch Query Poisoning Effectiveness

Joint-GCG also outperformed Phantom in batch query poisoning scenarios (Denial-of-Service target).

Table 3: Batch Poisoning Results for “amazon” Trigger (Partial, Llama3, Contriever) (Corresponds to Table 4 in the Joint-GCG paper, ASRgenASR_{gen} %)

Attack / Step0481632
Phantom76.0076.00 (16.67)76.00 (33.33)68.00 (16.67)80.00 (33.33)
Joint-GCG76.0088.00 (50.00)88.00 (50.00)88.00 (50.00)88.00 (50.00)

Joint-GCG achieved higher ASRgenASR_{gen} more quickly and consistently across different triggers.

Defense Experiments

Joint-GCG maintained notable effectiveness even against common defenses.

Table 4: Impact of SmoothLLM Defense on MS MARCO (Partial) (Corresponds to Table 5 in the Joint-GCG paper)

RetrieverGeneratorASRgenASR_{gen} (w/o SmoothLLM)ASRgenASR_{gen} (w/ SmoothLLM)
ContrieverLlama394%53%
Qwen296%56%
BGELlama387%47%
Qwen292%41%

While SmoothLLM reduced ASRgenASR_{gen}, Joint-GCG remained significantly potent. Similar resilience was observed against perplexity-based filtering, where Joint-GCG with perplexity constraints still achieved high ASRgenASR_{gen} (e.g., 73.33%).

(Qualitative note on Transferability: Joint-GCG poisons showed strong cross-retriever transferability (e.g., ASRretASR_{ret} of 80-100% between Contriever and BGE). Cross-generator transferability was also notable (e.g., ~41% ASRgenASR_{gen} between Llama3 and Qwen2), with even a slight increase in ASR against a black-box model like GPT-4o compared to unoptimized poisons.)

Conclusion

In this paper, we introduced Joint-GCG, a novel framework that unifies gradient-based poisoning attacks against RAG systems by jointly optimizing across retriever and generator components. Through innovative techniques like Cross-Vocabulary Projection, Gradient Tokenization Alignment, and Adaptive Weighted Fusion, Joint-GCG significantly surpasses existing methods in attack success rate and demonstrates unprecedented poison transferability. Our findings reveal critical vulnerabilities in RAG systems stemming from the synergistic effects of joint optimization, underscoring the urgent need for more robust, retrieval-aware defense mechanisms.

BibTeX Citation

@misc{wang2025jointgcgunifiedgradientbasedpoisoning,
      title={Joint-GCG: Unified Gradient-Based Poisoning Attacks on Retrieval-Augmented Generation Systems}, 
      author={Haowei Wang and Rupeng Zhang and Junjie Wang and Mingyang Li and Yuekai Huang and Dandan Wang and Qing Wang},
      year={2025},
      eprint={2506.06151},
      archivePrefix={arXiv},
      primaryClass={cs.CR},
      url={https://arxiv.org/abs/2506.06151}, 
}