How many issues does your crawler actually find?
crawlbench is a fixed corpus of deliberately broken pages — planted SEO and AI-search defects with ground-truth labels — used to score the detection rate of commercial and open-source crawlers. One number, reproducible, versioned.
| # | Tool | Type | License | Detection rate | Found | Missed | False+ |
|---|---|---|---|---|---|---|---|
| 1 | Screaming Frog SEO Spider | seo | commercial | 86% | 120 | 20 | 4 |
| 2 | Sitebulb | seo | commercial | 84% | 117 | 23 | 3 |
| 3 | Lumar | both | commercial | 80% | 112 | 28 | 5 |
| 4 | Botify | both | commercial | 77% | 108 | 32 | 4 |
| 5 | Oncrawl | both | commercial | 74% | 103 | 37 | 6 |
| 6 | Semrush Site Audit | seo | commercial | 63% | 88 | 52 | 9 |
| 7 | Ahrefs Site Audit | seo | commercial | 60% | 84 | 56 | 8 |
| 8 | SiteOne Crawler | seo | oss | 51% | 72 | 68 | 7 |
| 9 | SEOcrawl | both | commercial | 39% | 54 | 86 | 2 |
| 10 | SiteTest AI Visibility Checker | geo | freemium | 33% | 46 | 94 | 2 |
| 11 | LLMClicks AI Readiness Analyzer | geo | freemium | 30% | 42 | 98 | 3 |
Run date: 2026-05-10 · Corpus: v0.1-preview
· 11 tools tested against 140 planted issues across 9 categories.
Planted, not pulled
Every issue in the corpus is intentionally introduced and labelled. No "real websites" — no ambiguity about whether an issue is actually a bug.
Reproducible
The corpus is versioned and self-hostable. Anyone can re-run any tool against the same fixtures and replicate the score.
SEO + GEO
Coverage spans classic technical SEO (crawlability, indexability, hreflang) and the newer AI-search readiness category that few tools test rigorously.
What's in scope
The v1 corpus targets seven issue categories. See the methodology page for the full taxonomy and scoring rules.
- Crawlability — robots.txt, redirects, status codes, blocked resources
- Indexability — canonical, meta robots, sitemap consistency
- On-page — titles, meta descriptions, headings, content gaps
- Structured data — schema.org validity, JSON-LD errors
- Internationalization — hreflang, language tags, regional duplicates
- Performance — render-blocking, image sizing, Core Web Vitals signals
- AI readiness — AI crawler access, citability, structured passage extraction