The Real Economics of AI in Legal Operations

Token costs are falling dramatically. Headlines celebrate each price reduction as a victory for AI adoption. Yet organisations deploying AI at scale are watching their total spend climb steadily upward. Understanding this paradox requires looking beyond the simple price-per-token metric to how modern AI workflows actually consume resources.

Between 2023 and 2025, the cost per million tokens for frontier models dropped from roughly $10 to under $2.50—a 75% reduction. The natural assumption is that AI-powered legal operations should be getting dramatically cheaper. The reality on the ground tells a different story.

The Token Cost Paradox

While per-token costs have plummeted, total AI expenditure at most organisations has increased. The explanation lies in how AI deployment has evolved. Early adopters used simple, single-prompt interactions: summarise this document, extract these fields, answer this question. Token consumption was predictable and modest.

Modern deployments look nothing like this. Agentic workflows—where AI systems reason through multi-step processes, use tools, cross-reference information, and iteratively refine outputs—consume tokens at rates that dwarf simple interactions. The price per unit fell, but units consumed exploded.

The Token Economics Paradox: Price vs. Consumption

Context Window Economics

The shift to larger context windows has amplified this effect. Models that can process 100,000+ tokens enable new capabilities—analysing entire contract portfolios, maintaining conversation context across complex negotiations, understanding documents in full rather than in fragments. But those capabilities come with a hidden cost structure that catches many organisations off-guard.

Input and output tokens are priced asymmetrically. Output tokens typically cost 3-10x more than input tokens. An agentic workflow that reads a 50,000-token document and produces 15,000 tokens of analysis, markup, and recommendations doesn't cost what naive arithmetic suggests. The output-heavy nature of useful legal AI work means the effective cost per interaction is higher than headline rates imply.

Input vs Output Token Cost Asymmetry

The Agentic Multiplier

Consider a contract review task. A simple summarisation might consume 10,000 tokens total—document input plus brief output. Useful, but limited. An agentic workflow that actually helps with contract review looks very different.

Agentic Contract Review: Token Consumption Flow

flowchart TB subgraph WORKFLOW["AGENTIC CONTRACT REVIEW"] direction TB A["Read Contract
~30,000 tokens"] --> B["Identify Key Provisions
~8,000 tokens"] B --> C["Cross-Reference Policy
~25,000 tokens"] C --> D["Draft Initial Markup
~12,000 tokens"] D --> E["Self-Review & Refine
~35,000 tokens"] E --> F["Generate Explanation
~15,000 tokens"] F --> G["Produce Final Output
~20,000 tokens"] end G --> TOTAL["Total: ~145,000 tokens"] subgraph SIMPLE["SIMPLE SUMMARISATION"] S1["Read Contract
~8,000 tokens"] --> S2["Generate Summary
~2,000 tokens"] end S2 --> SIMPLE_TOTAL["Total: ~10,000 tokens"]

The agentic approach consumes 14x more tokens than simple summarisation. But it delivers something fundamentally different: actual contract review assistance rather than a summary that still requires a human to do all the analytical work. The question isn't whether the agentic approach costs more—it does. The question is whether the value delivered justifies that cost.

Simple Prompt

~10K

tokens per document

Basic summarisation, field extraction, simple Q&A. Human still does analytical work.

Agentic Workflow

~145K

tokens per document

Full analysis, cross-referencing, markup generation, self-review. Reduces human work significantly.

Cost Comparison

14.5x

token consumption increase

Per-token costs fell 75%, but task costs rose significantly. Value per token is the metric that matters.

The Legal AI Walled Garden Problem

Legal operations face a constraint that general-purpose AI deployments don't: the information they need lives in walled gardens. Westlaw, LexisNexis, and similar platforms contain the case law, statutory materials, and secondary sources that legal analysis requires. These platforms aren't designed for AI integration.

This creates an architectural problem. A general-purpose LLM can't access proprietary legal databases directly. Its training data includes some legal materials, but not the comprehensive, current coverage that professional legal work requires. The model knows law exists but can't reliably cite to current authority or access the specific documents a matter requires.

Retrieval-Augmented Generation (RAG) addresses this by connecting the LLM to relevant sources at inference time. But RAG adds its own costs: embedding generation, vector storage, retrieval operations, and—critically—the tokens required to include retrieved context in each prompt. A RAG-enhanced legal query might include 20,000 tokens of retrieved context before the actual question is even asked.

Self-Hosted Solutions: The New Calculus

The economics of self-hosted AI have shifted dramatically. What once required enterprise-scale infrastructure can now run on commodity hardware. A capable open-source model running on a mid-range server can handle many legal AI tasks at a fraction of API costs—with the added benefits of data sovereignty and predictable pricing.

The decision framework for self-hosting versus API-based solutions depends on several factors: volume of usage, sensitivity of data, need for customisation, and tolerance for operational complexity. Neither approach is universally superior.

Self-Hosted vs API: Decision Framework

flowchart TB START["Evaluate AI
Deployment"] --> VOL{"High Volume?
>100K tokens/day"} VOL -->|"Yes"| SENS{"Sensitive Data?"} VOL -->|"No"| API1["API Services
More Cost-Effective"] SENS -->|"Yes"| SELF["Self-Hosted
Recommended"] SENS -->|"No"| CUSTOM{"Need Custom
Fine-Tuning?"} CUSTOM -->|"Yes"| SELF CUSTOM -->|"No"| HYBRID["Hybrid Approach
Consider Both"] SELF --> COSTS["Fixed Costs:
Hardware + Ops"] API1 --> VARIABLE["Variable Costs:
Per-Token Pricing"] HYBRID --> BOTH["Mixed Model:
Sensitive = Local
General = API"]

API-Based Deployment

+ Zero infrastructure management
+ Access to frontier models
+ Scales instantly
- Variable, unpredictable costs
- Data leaves your environment
- Limited customisation

Self-Hosted Deployment

+ Complete data sovereignty
+ Predictable fixed costs
+ Full customisation control
- Requires ops capability
- Hardware capital expense
- Model capability ceiling

Legal Tokenomics: A Framework

Managing AI economics in legal operations requires moving beyond simple cost tracking to what we call "legal tokenomics"—understanding the relationship between token consumption and legal value delivered.

The framework has three components: consumption monitoring, value attribution, and optimisation cycles. Each serves a distinct purpose in controlling costs while maximising the utility of AI investments.

flowchart TB subgraph FRAMEWORK["LEGAL TOKENOMICS FRAMEWORK"] direction TB subgraph MONITOR["CONSUMPTION MONITORING"] direction LR M1["Track tokens
per task type"] M2["Identify cost
drivers"] M3["Benchmark against
outcomes"] end subgraph VALUE["VALUE ATTRIBUTION"] direction LR V1["Map AI tasks to
business outcomes"] V2["Calculate cost per
completed matter"] V3["Compare to
traditional costs"] end subgraph OPTIMISE["OPTIMISATION CYCLES"] direction LR O1["Refine prompts
& workflows"] O2["Implement
caching"] O3["Right-size
model selection"] end end MONITOR --> VALUE VALUE --> OPTIMISE OPTIMISE --> |"Continuous
Improvement"| MONITOR

Key Optimisation Strategies

Organisations successfully managing AI economics focus on three areas that compound over time:

Workflow Design

Minimise unnecessary iteration. Not every task needs agentic processing. Simple extractions can use simple prompts. Reserve complex workflows for tasks that genuinely require them.

Intelligent Caching

Cache embeddings, common retrievals, and repeated analysis patterns. Legal work often involves similar documents and questions. Caching can reduce redundant token consumption by 30-50% in mature deployments.

Model Selection

Match model capability to task requirements. Not every task needs the most capable (and expensive) model. Smaller models handle routine work effectively; reserve frontier models for complex analysis.

Outcome Metrics

Track value delivered, not just tokens consumed. The goal isn't minimising token consumption—it's maximising value per token spent. A workflow that costs more but delivers substantially better outcomes may be the right choice.

Token Cost Optimisation Over Time

Looking Forward

The economics of AI in legal operations will continue evolving. Token costs will likely continue falling. Model capabilities will improve. New architectures may change the consumption patterns we see today.

But the fundamental principle will remain: understanding your actual consumption patterns and their relationship to business value matters more than tracking headline token prices. Organisations that build measurement and optimisation capabilities now will be better positioned to capture value as the technology evolves.

The companies getting AI economics right aren't the ones spending the least on tokens. They're the ones who understand what they're getting for each token spent and continuously optimise that ratio. In legal operations, where the value of accuracy is high and the cost of errors significant, this disciplined approach to AI economics isn't optional—it's essential.