Outcome forecasting
We design prediction tasks around disputes where the decision path can be inspected, not just guessed from labels.
Tap ESC to exit search
CaseCrunch Lab studies whether automated reasoning systems can predict legal outcomes with enough discipline to assist lawyers, researchers, policymakers, and justice technologists.
The work starts with a practical question: can a machine read the same dispute material as a trained lawyer and reach a defensible forecast before the final decision is known? That question sounds narrow, but it touches almost every hard problem in legal AI: factual ambiguity, incomplete records, shifting standards of review, and the difference between a plausible argument and a likely result.
We design prediction tasks around disputes where the decision path can be inspected, not just guessed from labels.
The lab studies how systems represent facts, claims, legal rules, and uncertainty when a case does not fit a clean template.
Controlled lawyer-versus-system exercises help separate model confidence from useful legal judgment.
One completed prediction exercise may begin with fewer than a few hundred disputes, but the work around those files is exacting. Each matter needs a stable record, a defined question, a time cut-off, and an outcome that can be checked after prediction.
According to measurements from completed exercises, the strongest signal rarely comes from a single dramatic fact. It comes from the structure of the claim: who bears the burden, what evidence was available, which remedy was requested, and whether the decision-maker had followed similar reasoning before.
Dataset design is therefore treated as legal work, not clerical preparation. A case file becomes useful only when the relevant facts, procedural posture, decision horizon, and scoring rule have been fixed before the prediction is made.
Prediction quality still turns on the framing of the dispute and the completeness of the record. A model can score a defined legal question; it cannot repair a badly specified one.
Calibration confirmed whether confidence tracked reality, not whether a model sounded persuasive. In controlled trials, this distinction mattered: a system that gave modest probabilities for difficult cases often served the research question better than one that gave bold answers across the board.
The technical side of this work connects closely with evaluation metrics for legal reasoning engines, where accuracy, calibration, coverage, and error type each tell a different part of the story.
Some projects ask whether a system can beat a human prediction panel. Others ask a quieter question: which parts of legal knowledge can be represented in a machine-readable form without stripping away the context that makes the dispute legally meaningful?
The first approach works well for public demonstrations and tightly scoped contests. The second is slower, but it gives better material for court administrators, regulators, and researchers who need to understand why a system reached a forecast. CaseCrunch Lab uses both approaches where they fit.
AI versus human lawyer contests and outcome forecasting experiments.
Machine-driven legal analysis, rule representation, and decision modeling.
Financial mis-selling, PPI, and ombudsman-style dispute prediction.
The implications of AI for legal knowledge and access to justice.
Datasets, experimental design, scoring frameworks, and evaluation practice.
Legal AI attracts confident language quickly. CaseCrunch Lab takes a narrower route: define the task, run the comparison, publish enough method for scrutiny, then explain what the result can and cannot support.
Recorded results show why this discipline matters. A model may perform well on consumer redress disputes and still need different testing before anyone relies on it for judicial triage, settlement valuation, or regulatory review.
The lab director and research contributors bring expertise across legal analysis, machine learning evaluation, dispute datasets, and justice-system design. Their shared concern is not whether automation can replace legal judgment. It is where machine prediction can make human judgment more consistent, auditable, and empirically grounded.
Legal outcome prediction is useful when the question is precise, the dataset is inspectable, and the system is tested against the decision standard it claims to forecast.
Gather and review existing literature and data.
Apply rigorous methods to evaluate findings.
Share results with the scientific community.