Is your firewall (WAF) accidentally blocking ChatGPT? Here is how to check.
If you are a technical founder or SEO, you probably checked your robots.txt file recently.
You added:
User-agent: *
Allow: /
And you thought you were safe.
You are not safe.
We recently audited 1,000 B2B SaaS websites to see if they were accessible to the new generation of AI search engines (ChatGPT Search, Perplexity, Gemini).
The Result: 30% of sites that thought they were open were returning 403 Forbidden errors to AI crawlers.
The "Silent Killer": Legacy WA Rules
The problem isn't your robots.txt. The problem is your Web Application Firewall (WAF).
If you use Cloudflare, AWS WAF, or Fastly, you likely set up "Bot Protection" rules years ago. These rules are designed to stop scrapers, DDoS attacks, and unknown non-browser agents.
To a strict WAF, GPTBot/1.0 looks exactly like a python scraper. It's a non-browser User Agent hitting your site. So the WAF kills the connection before it even reaches your server (or your robots.txt).
How to verify if you are blocking AI
You can't test this with a browser. You need to spoof the User Agent.
Open your terminal and run:
# Test OpenAI (ChatGPT)
curl -I -A "GPTBot" https://yourdomain.com
# Test Perplexity
curl -I -A "PerplexityBot" https://yourdomain.com
interpreting the results:
- HTML 200 OK: You are fine. The bot can see you.
- HTML 403 Forbidden: You are blocked by your WAF.
- HTML 406 Not Acceptable: You are actively filtering unwanted UAs.
The Fix (Cloudflare Example)
If you are using Cloudflare, the "Bot Fight Mode" is a common culprit.
- Go to Security -> WAF.
- Create a Custom Rule.
- Field:
User Agent-> Contains ->GPTBot(andPerplexityBot,ClaudeBot) - Action:
Skip-> selectAll Super Bot Fight Mode Rules(orAllow).
Note: Do NOT simply turn off your WAF. Just whitelist the specific authenticated bots you want to allow.
Why this matters now
In 2026, "Being Indexed" doesn't mean being in Google's database. It means being in the LLM's context window.
If GPTBot cannot crawl your new changelog or blog post, that data never enters the RAG (Retrieval Augmented Generation) pipeline.
When a user asks: "What is the best alternative to X?", your tool won't be cited. Not because it's bad, but because it's invisible.
Check your site instantly (Free Tool)
If you don't feel like opening the terminal, we built a free tool that runs these specific cURL checks for you (and also checks your Content Density).
Frequently Asked Questions
- What is GPTBot?
- GPTBot is OpenAI's web crawler. If you block it, your content cannot be indexed for ChatGPT Search.
- Why is Cloudflare blocking GPTBot?
- Legacy 'Bot Fight Mode' or strict WAF rules often classify unknown non-browser User-Agents as malicious bots.
