Skip to main content
Technology & Society

Evaluating claims about artificial intelligence

· Marco Deforest

Claims about artificial intelligence arrive faster than the evidence to judge them. Our approach is to treat each claim as a testable proposition and ask what would confirm or refute it.

Capability versus projection

A benchmark result is a measured capability under stated conditions. A forecast about what the same system will do in a different setting is a projection. The two are often reported in the same sentence; we keep them apart.

  • What task was measured, and on whose data?
  • Would the result hold outside the test distribution?
  • Who bears the cost if the projection is wrong?

Independent analysis, not endorsement

We do not certify products. Our role is to describe, in plain terms, what the available evidence supports and where it runs out. For decision-makers, the useful output is usually not a verdict but a clear map of the uncertainty.

That map is more durable than any single benchmark. Systems change quickly; the questions a reader should ask about them change more slowly.