AI Crawler Access — RankAudit Checks GPTBot, ClaudeBot, More
RankAudit checks crawler access for GPTBot, ClaudeBot, PerplexityBot, and Google-Extended. Learn how to unblock each bot so your content reaches AI engines
---
Introduction
Before AI engines can cite your content, their crawlers must be able to reach it. RankAudit checks four crawler user-agents specifically — GPTBot, ClaudeBot, PerplexityBot, and Google-Extended — and flags any that are blocked by your robots.txt or by HTTP-level rules. This article walks through each crawler, what it serves, the robots.txt rule patterns that block each one, and the exact fix.
Where the Checks Live
Inside the AI Readiness tab of any RankAudit report, scroll past the Live AI Visibility Probe. The first checklist section is titled "Can AI reach you? — Crawler accessibility — directives observed at scan time."
For rankar.ai the four checks all pass. Each shows a green checkmark with the bot name and a one-line confirmation: "GPTBot can access your site," "ClaudeBot can access your site," and so on.
GPTBot — OpenAI / ChatGPT
GPTBot is the crawler OpenAI uses to update ChatGPT's knowledge of the web. If GPTBot is blocked, your content does not flow into ChatGPT.
How it gets blocked. A robots.txt rule like: ``
User-agent: GPTBot
Disallow: /
`
Or a server-level rule that blocks the GPTBot user-agent string.
The fix. Remove the blocking rule. Or, if you want partial access, allow GPTBot on your public marketing pages and disallow it from gated content:
`
User-agent: GPTBot
Allow: /
Disallow: /app/
Disallow: /admin/
`
After saving robots.txt, re-run the RankAudit scan to confirm the unblock landed.
ClaudeBot — Anthropic / Claude
ClaudeBot is the crawler Anthropic uses to update Claude's knowledge of the web. Same logic as GPTBot but a different user-agent.
The block pattern:
`
User-agent: ClaudeBot
Disallow: /
`
The fix. Remove the rule, or allow with selective disallow as with GPTBot.
A note on naming: Anthropic also operates
anthropic-ai, claude-web, and Claude-User user-agents for different surface contexts. The RankAudit check covers the main ClaudeBot agent which is the one that influences training and search updates.
PerplexityBot — Perplexity
PerplexityBot is the crawler Perplexity uses to find content for its search-style answers. Perplexity is heavy on real-time retrieval, so unblocking PerplexityBot has unusually fast effects on visibility.
The block pattern:
`
User-agent: PerplexityBot
Disallow: /
`
The fix. Remove the rule. PerplexityBot is particularly worth unblocking because Perplexity surfaces explicit source citations — when it includes you, users see your domain name and may click through.
Google-Extended — Google AI Overviews
Google-Extended is the crawler Google uses to power AI Overviews and Bard/Gemini training. It is separate from Googlebot. You can block Google-Extended without affecting your regular Google ranking.
The block pattern:
`
User-agent: Google-Extended
Disallow: /
`
Important nuance. Some sites intentionally block Google-Extended while allowing Googlebot — they want to rank in regular Google but not contribute to AI Overviews. RankAudit will still flag this as a warning since it reduces your AI Readiness score, but it is a defensible business decision for paywalled content.
For most sites, allowing Google-Extended is the right choice. AI Overviews are increasingly the first thing users see for many queries; opting out means invisible to that surface.
The Fix Workflow
For each blocked crawler:
- Open your robots.txt at
yourdomain.com/robots.txt.
Find the blocking rule for the specific user-agent.
Remove the rule or replace Disallow: / with Allow: /`.
The whole loop takes 5-15 minutes per blocked bot. For most sites, all four bots can be allowed in one robots.txt edit.
Why This Matters
Sites accidentally block AI crawlers in three common ways:
Over-broad Disallow rules. A line meant to block scrapers ends up blocking GPTBot. Default WAF rules. Web Application Firewalls (Cloudflare, AWS WAF) sometimes ship default rules that block known LLM user-agents. Privacy-driven policy. Some teams intentionally block all AI crawlers out of an abundance of caution about training data — a defensible stance but one with real visibility cost.RankAudit makes the choice explicit and lets you decide deliberately rather than discovering by accident that ChatGPT has not indexed your site for the last six months.
What's Next
Crawlers can reach your site. The next article moves to making sure they can parse what they find — content structure, llms.txt, FAQPage schema, and brand entity signals.
Apply This With the Rankar Toolkit
RankAudit works best paired with the rest of the Rankar suite. Spin up the relevant tools: RankTalk • RankOps • RankAudit • RankWriter • RankTracker • RankAIO • RankBridge • RankLinks • RankLocal • RankLaunch • RankSpy • RankUX • RankLead. Each module shares data with the others — fewer tabs, one source of truth.