How many issues does your crawler actually find?

crawlbench is a fixed corpus of deliberately broken pages — planted SEO and AI-search defects with ground-truth labels — used to score the detection rate of commercial and open-source crawlers. One number, reproducible, versioned.

# Tool Type License Detection rate Found Missed False+
1 Screaming Frog SEO Spider seo commercial 86% 120 20 4
2 Sitebulb seo commercial 84% 117 23 3
3 Lumar both commercial 80% 112 28 5
4 Botify both commercial 77% 108 32 4
5 Oncrawl both commercial 74% 103 37 6
6 Semrush Site Audit seo commercial 63% 88 52 9
7 Ahrefs Site Audit seo commercial 60% 84 56 8
8 SiteOne Crawler seo oss 51% 72 68 7
9 SEOcrawl both commercial 39% 54 86 2
10 SiteTest AI Visibility Checker geo freemium 33% 46 94 2
11 LLMClicks AI Readiness Analyzer geo freemium 30% 42 98 3

Run date: 2026-05-10 · Corpus: v0.1-preview · 11 tools tested against 140 planted issues across 9 categories.

Planted, not pulled

Every issue in the corpus is intentionally introduced and labelled. No "real websites" — no ambiguity about whether an issue is actually a bug.

Reproducible

The corpus is versioned and self-hostable. Anyone can re-run any tool against the same fixtures and replicate the score.

SEO + GEO

Coverage spans classic technical SEO (crawlability, indexability, hreflang) and the newer AI-search readiness category that few tools test rigorously.

What's in scope

The v1 corpus targets seven issue categories. See the methodology page for the full taxonomy and scoring rules.