6-Agent Adversarial Pipeline

Your Legal Brief Has Weaknesses.
Find Them Before Opposing Counsel Does.

Six AI agents stress-test your brief using the same adversarial method that has powered legal systems for centuries -- attack, defend, judge, verify, and rewrite.

The 6-Agent Pipeline

Attacker
Defender
Judge
Citation Verifier
Jurisdiction Expert
Brief Rewriter

58%

GPT-4 legal hallucination rateStanford HAI

+15pp

Accuracy boost from multi-agent debateICML 2024

300+

Judges require AI citation verificationPost Mata v. Avianca

$31K

Sanctions for AI-fabricated citationsU.S. Courts 2023-2025
Simple Process

How It Works

1

Upload Your Brief

Paste or upload your legal brief. Our system accepts any jurisdiction, any practice area.
2

AI Stress Test

Six specialized agents attack, defend, verify citations, and check jurisdiction-specific requirements.
3

Get Your Report

Receive a scored report with every weakness found, every citation verified, and a rewritten brief.
Research-Backed

Why Adversarial Analysis Works

Our approach isn't a guess -- it's grounded in decades of legal tradition and cutting-edge AI safety research. The adversarial system has been humanity's best truth-finding mechanism for centuries. We apply it to AI.

58%

GPT-4 legal hallucination rateStanford HAI, Journal of Legal Analysis 2024

+15pp

Accuracy boost from multi-agent debateDu et al., ICML 2024

300+

Judges now require AI citation verificationPost Mata v. Avianca standing orders

$31K

Sanctions for AI-fabricated citationsCombined across U.S. courts, 2023-2025

The Core Insight

A single AI reviewing a legal brief will miss weaknesses and hallucinate citations -- Stanford found that even GPT-4 fabricates legal facts 58% of the time. The adversarial system solves this: one agent attacks, another defends, and an impartial judge weighs the evidence. Lon Fuller argued in 1961 that adversarial challenge is the only effective means for combating the natural tendency to judge too swiftly -- a fact-finder who develops a premature hypothesis will unconsciously commit to it. LLMs exhibit this exact behavior. Our 6-agent pipeline breaks the pattern.

The Research

AI Debate2018

AI Safety via Debate

Irving, Christiano & Amodei -- arXiv 1805.00899
Finding: Two AI agents debating can answer questions in PSPACE -- exponentially harder than what a single judge could verify alone. The authors explicitly cite legal adversarial proceedings as the motivating analogy.Our Attacker/Defender/Judge pipeline is a direct implementation of this framework, applied to legal brief analysis.
AI Debate2024

Debating with More Persuasive LLMs Leads to More Truthful Answers

Khan, Hughes, Valentine et al. -- ICML 2024 (Best Paper Award)
Finding: When two LLM experts debate, non-expert judges achieve 76% accuracy (models) and 88% accuracy (humans), vs. 48% and 60% baselines. Debate helps weaker agents evaluate stronger ones.Empirically proves our core mechanism: structured debate between AI agents surfaces truth even when the judge is less expert than the debaters.
Multi-Agent2024

Improving Factuality and Reasoning through Multiagent Debate

Du, Li, Torralba, Tenenbaum & Mordatch -- ICML 2024
Finding: Multi-agent debate boosts reasoning accuracy by +15 percentage points on arithmetic and +8 on math reasoning. Even when all agents start wrong, debate converges on the correct answer.Validates our multi-round approach: 3 agents debating over 2+ rounds catch errors that a single pass never would.
Multi-Agent2024

Encouraging Divergent Thinking in LLMs through Multi-Agent Debate

Liang, He, Jiao, Wang et al. -- EMNLP 2024
Finding: Identifies the "Degeneration-of-Thought" problem: LLMs become locked into initial (potentially incorrect) positions during self-reflection. Multi-agent debate with a judge breaks this pattern.Explains exactly why single-agent brief review fails -- the model commits to its first reading and cannot self-correct. Our adversarial structure forces genuine re-examination.
Hallucination2024

Large Legal Fictions: Profiling Legal Hallucinations in LLMs

Dahl, Magesh, Suzgun & Ho (Stanford) -- Journal of Legal Analysis
Finding: LLMs hallucinate legal facts 69-88% of the time on verifiable questions about federal court cases. GPT-4 hallucinated 58%, GPT-3.5 at 69%, Llama 2 at 88%.The scientific basis for our Citation Verifier agent. Single-pass AI cannot be trusted with legal citations -- period.
Hallucination2025

Hallucination-Free? Assessing Reliability of Leading AI Legal Research Tools

Magesh, Surani, Dahl et al. (Stanford) -- Journal of Empirical Legal Studies
Finding: Even purpose-built legal AI tools hallucinate: Lexis+ AI at 17%, Westlaw AI at 33%, GPT-4 at 43%. RAG alone does not solve the legal hallucination problem.Even the best single-pass tools fail. Adversarial verification provides the additional layer that RAG-based tools lack.
Legal Scholarship1975

Procedural Justice: A Psychological Analysis

Thibaut & Walker -- Lawrence Erlbaum Associates
Finding: Landmark empirical study comparing adversarial and inquisitorial procedures. Found that adversarial systems -- where parties control evidence presentation -- produce more thorough fact-finding than any single investigator.The foundational experiment proving adversarial systems work. 50 years of legal scholarship builds on this. We apply it to AI.
Case Law2023

Mata v. Avianca, Inc. -- The Case That Changed Legal AI

Judge P. Kevin Castel, S.D.N.Y. -- 1:22-cv-01461 (S.D.N.Y.)
Finding: Lawyer sanctioned $5,000 for filing a ChatGPT-generated brief with 6 fabricated cases. Since then, $31K+ in combined sanctions across courts and 300+ judges now require AI citation verification.The landmark wake-up call. Our system would have caught every fabricated citation before filing.

How the Research Maps to Our Pipeline

AttackerIrving et al. (2018), Perez et al. (2022)
The adversarial probe. Grounded in the AI Safety via Debate framework and red-teaming methodology -- adversarial agents expose flaws a cooperative reviewer would miss.
DefenderDu et al. (ICML 2024), Liang et al. (EMNLP 2024)
Multi-agent debate shows wrong initial answers converge on truth through rounds of challenge. The Defender breaks the Degeneration-of-Thought trap.
JudgeKhan et al. (ICML 2024 Best Paper)
Non-expert judges achieve 88% accuracy when aided by structured debate vs. 60% baseline. The Judge leverages adversarial structure to make better decisions.
Citation VerifierStanford HAI (2024, 2025)
With 58-88% hallucination rates on legal citations -- and even Lexis+ AI wrong 17% of the time -- a dedicated verification agent is a professional obligation.
Jurisdiction ExpertThibaut & Walker (1975)
A brief that wins in one jurisdiction can lose in another. Procedural precision is the adversarial system's core requirement -- and its greatest strength.
Brief RewriterMata v. Avianca (2023)
The lesson of $31K+ in sanctions: never file unreviewed AI output. The Rewriter produces a revised brief only after 5 other agents have vetted every claim.

Dashboard

Find the weaknesses in your brief before opposing counsel does.Upload a brief and get an instant adversarial tear-down — weak arguments, shaky citations, and jurisdiction-specific risks scored and ranked.
Cases Analyzed

4

Average Score

64

Critical Findings

3

Total Findings

19

Recent Critical Findings

Floyd v. City of New York -- Stop-and-Frisk Class Actioncritical
Brief claims 88% of stops resulted in no further action, but the cited dataset (2004-2012) includes years after policy changes. Opposing counsel can argue the aggregate figure is misleading for the class period at issue.
Floyd v. City of New York -- Stop-and-Frisk Class Actionhigh
The brief applies strict scrutiny to the equal protection claim without establishing that the stop-and-frisk policy constitutes a racial classification on its face. Under Arlington Heights, a facially neutral policy requires showing discriminatory intent, not just disparate impact.
Floyd v. City of New York -- Stop-and-Frisk Class Actionhigh
Brief cites Terry v. Ohio (1968) for the reasonable suspicion standard but omits Illinois v. Wardlow (2000), which established that presence in a high-crime area is a relevant factor. Opposing counsel will use Wardlow to justify many of the challenged stops.
Smith v. Jones Construction LLC -- Breach of Contract / Specific Performancehigh
Brief seeks specific performance of a construction contract but fails to establish that money damages are inadequate -- a prerequisite under NY law. The property is a standard commercial build, not unique.
People v. Weinstein -- Criminal Prosecution Motion in Liminecritical
The motion in limine seeks to exclude all prior-bad-act testimony under Molineux, but fails to address the People's likely argument under the doctrine of chances. With multiple complainants alleging similar conduct, the pattern itself is probative of intent and absence of consent.
Brief Stress-TesterAdversarial Legal Analysis