Red Label
Red Label
Methodology

How We Calculate Confidence Levels

The intelligence community solved this problem in 1964. Commercial due diligence still hasn't adopted it.

Red Label Intelligence

62 years

Since Sherman Kent published standardized confidence frameworks for the CIA. Commercial due diligence still has no equivalent standard.

CIA Studies in Intelligence, Vol. 8, No. 4 (1964)

The Problem

When a due diligence report states that a finding is "confirmed" or "verified," what does that actually mean?

  • One source said so?
  • Three independent sources corroborated?
  • Primary documentation was reviewed?
  • The analyst feels confident based on experience?

In most DD reports, you cannot tell. The language is imprecise, the methodology unstated, and the certainty level undefined.

The Cost of Imprecision

A high-confidence finding of regulatory exposure is decision-relevant. A low-confidence finding of the same thing is a research lead. Treating them identically is how deals get killed for wrong reasons, or approved when they shouldn't be.

How Intelligence Agencies Solved This

In 1964, Sherman Kent, head of the CIA's Office of National Estimates, published "Words of Estimative Probability" in the agency's classified journal. Kent's problem: when analysts wrote "probable" in a National Intelligence Estimate, what did that mean?

Kent surveyed his colleagues. He found that when presented with the phrase "there is a serious possibility," interpretations ranged from 20% to 80% probability. A four-fold variance that rendered the language meaningless for decision-makers.

The Kent Scale (1964)

Term Probability Range
Almost certain 93% 87-99%
Probable 75% 63-87%
Chances about even 50% 40-60%
Probably not 30% 20-40%
Almost certainly not 7% 2-12%

Source: CIA Center for the Study of Intelligence, declassified 1993

The Regulatory Standard: ICD 203

In 2004, the Intelligence Reform and Terrorism Prevention Act mandated formal analytic standards across the U.S. Intelligence Community. The result was Intelligence Community Directive 203, first issued in 2007 and revised in January 2015.

ICD 203 establishes nine Analytic Tradecraft Standards. Three are directly relevant to confidence calibration:

Standard 1

Source Quality

Properly describe the quality and credibility of underlying sources, data, and methodologies

Standard 2

Uncertainty Expression

Properly express and explain uncertainties associated with major analytic judgments

Standard 3

Intelligence vs. Judgment

Properly distinguish between underlying intelligence information and analysts' assumptions and judgments

Key Distinction

ICD 203 requires analysts to separate what the intelligence shows from what the analyst concludes. This distinction is absent from virtually all commercial due diligence reports.

Case Study: The 2002 Iraq WMD NIE

The October 2002 National Intelligence Estimate on Iraq's weapons programs illustrates what happens when confidence calibration fails.

The NIE stated with "high confidence" that Iraq possessed chemical and biological weapons. According to former CIA briefer Andy Makridis: "We were wrong on most of them."

Layering

Judgments layered without carrying forward uncertainties

Confidence Inflation

No rigor in calibrating claimed confidence levels

Missing Debate

Internal disagreement not visible to decision-makers

Source: CBS News, "U.S. invasion of Iraq 20 years later" (2023)

The Science of Calibration

Philip Tetlock, professor at Wharton, spent two decades studying forecast accuracy. His Good Judgment Project (2011-2015) recruited 20,000+ volunteers to make predictions on geopolitical events.

60%

Good Judgment Project volunteers beat the official control group in Year 1. By Year 2, they beat analysts with access to classified information.

Tetlock & Gardner, "Superforecasting" (2015)

Brier Scores: Measuring Calibration

Developed by meteorologist Glenn W. Brier in 1950, the Brier score measures the accuracy of probabilistic predictions:

0.0

Perfect prediction

0.25

Random guessing

1.0

Worst possible

Superforecasters in Tetlock's study achieved Brier scores around 0.15, beating intelligence analysts despite having no classified access. Their advantage: rigorous calibration of stated confidence levels.

The Due Diligence Gap

Commercial due diligence has not adopted these frameworks. A comparison:

Intelligence Community

  • Standardized confidence levels (ICD 203)
  • Source quality disclosure required
  • Uncertainty must be explicitly stated
  • Annual review against analytic standards

Commercial DD Industry

  • No standardized confidence framework
  • Source quality rarely disclosed
  • Uncertainty typically unstated
  • No industry-wide quality standards

How Red Label Does It

We apply intelligence community tradecraft to commercial due diligence. Every Red Label finding carries an explicit confidence rating with disclosed methodology.

Red Label Confidence Framework

Level Criteria What It Means
High Multiple independent sources; primary documentation; corroboration across source types Decision-ready finding. We would stake our reputation on this.
Moderate Credible sourcing; plausible finding; intelligence gaps prevent higher rating Material finding with caveats. We state what would change the assessment.
Low Limited, fragmented, or single-source; included because material if true Research lead, not conclusion. We state what additional investigation would clarify.

The Difference

Most DD reports give you findings. We give you findings plus the epistemological basis for believing them. That's not academic rigor for its own sake. It's how you make better investment decisions.

The Question to Ask

When was the last time your DD provider told you their confidence level on a finding, and what would change it?

Sources

Claim Source Date
Sherman Kent "Words of Estimative Probability" CIA Center for the Study of Intelligence 1964
ICD 203 Analytic Tradecraft Standards DNI Intelligence Community Directive 203 January 2015
Iraq WMD NIE failures CBS News Intelligence Matters 2023
Good Judgment Project results Tetlock & Gardner, "Superforecasting" 2015
Brier score methodology UVA Library 1950