How many issues does your crawler actually find?
crawlbench is a fixed corpus of deliberately broken pages — planted SEO and AI-search defects with ground-truth labels — used to score the detection rate of commercial and open-source crawlers. One number, reproducible, versioned.
| # | Tool | Type | License | Detection rate | Found | Missed | False+ | Ignored |
|---|---|---|---|---|---|---|---|---|
| 1 | SECrawl | both | freemium | 80% | 242 | 61 | 0 | 7,570 |
| 2 | Screaming Frog SEO Spider | seo | commercial | 31% | 94 | 209 | 0 | 2,809 |
Run date: 2026-06-15 · Corpus: v0.1
· 2 tools tested against 303 planted issues across 25 categories.
Planted, not pulled
Every issue in the corpus is intentionally introduced and labelled. No "real websites" — no ambiguity about whether an issue is actually a bug.
Reproducible
The corpus is versioned and self-hostable. Anyone can re-run any tool against the same fixtures and replicate the score.
SEO + GEO
Coverage spans classic technical SEO (crawlability, indexability, hreflang) and the newer AI-search readiness category that few tools test rigorously.
What's in scope
The v1 corpus targets seven issue categories. See the methodology page for the full taxonomy and scoring rules.
- Crawlability — robots.txt, redirects, status codes, blocked resources
- Indexability — canonical, meta robots, sitemap consistency
- On-page — titles, meta descriptions, headings, content gaps
- Structured data — schema.org validity, JSON-LD errors
- Internationalization — hreflang, language tags, regional duplicates
- Performance — render-blocking, image sizing, Core Web Vitals signals
- AI readiness — AI crawler access, citability, structured passage extraction