Robots.txt Generator for WordPress: Control AI Crawlers Without Editing Code

The robots.txt file is one of the smallest files on your site and one of the easiest to get wrong. A single stray line can hide your whole site from Google, and a missing one can leave you with no say over the wave of AI crawlers now fetching pages for ChatGPT, Perplexity, and Google’s AI features. Editing it by hand, over FTP, on a live site, is exactly the kind of task where a typo turns into a traffic drop.

A robots.txt generator takes that risk away. Instead of remembering the exact syntax, you choose what to allow and block, and the file is written for you. This guide covers what belongs in a WordPress robots.txt in 2026, including the new AI crawler rules, and how to generate and manage it without touching code using RankReady.

Table Of Contents
Rankready ai crawler control plugin for wordpress
RankReady manages robots.txt and AI crawler rules from the WordPress dashboard.

What robots.txt does, and what changed in 2026

Robots.txt is a plain text file at the root of your domain that tells automated visitors which parts of your site they may crawl. It works on the honor system: it is a set of instructions, not a lock. Search engines like Google read it to decide what to fetch, and it is the first place you express crawl preferences.

What changed is who is reading it. Alongside Googlebot and Bingbot, a growing list of AI crawlers now checks your robots.txt: OpenAI’s GPTBot, Anthropic’s ClaudeBot, Google-Extended, PerplexityBot, and others. Each one can be allowed or blocked by name, which means robots.txt is no longer just an SEO housekeeping file. It is now where you decide whether your content is available for AI training and AI answers at all.

Google documentation on how robots. Txt works
Google documents how robots.txt guides crawlers.

Do you actually need a generator?

WordPress creates a virtual robots.txt automatically, so every site already has one, even if you never made a file. The problem is that the default is minimal, and the moment you want to customize it, you are editing raw directives where the syntax matters and mistakes are silent. There is no error message for a robots.txt that accidentally blocks your blog.

A generator solves three things at once: it uses the correct syntax so a typo cannot slip through, it gives you named toggles for the crawlers you care about instead of memorized user-agent strings, and on WordPress it can manage the file from the dashboard rather than over FTP. If you would rather understand the allow-and-block decisions first, our full guide to the WordPress robots.txt for AI crawlers walks through which bots to permit and why.

What goes in a WordPress robots.txt

A healthy WordPress robots.txt is short. It names the user agents it is addressing, allows the parts of the site that should be crawled, disallows anything that should not be, and points to your XML sitemap so crawlers can find your content map. A common, sensible baseline allows general crawling, keeps bots out of admin areas that do not belong in search, and lists the sitemap URL.

The two mistakes worth avoiding are over-blocking and under-thinking. Disallowing your uploads folder can stop images from being indexed, and blocking resources like CSS or JavaScript can stop Google from rendering your pages properly. When in doubt, block less rather than more, and lean on a generator to keep the syntax clean.

Controlling AI crawlers

This is the part that did not exist a couple of years ago. AI crawlers do not all work the same way, and the differences matter for your decision. Training crawlers, such as OpenAI’s GPTBot, fetch content to help train or improve models. Google handles it differently: Google-Extended is not a separate crawler but a control token that lets you decide whether content Googlebot already crawls may be used to train Gemini. Then there are real-time fetchers, such as Perplexity-User and the fetcher behind ChatGPT’s browsing, which grab a page while answering a specific question, and these are the ones that can lead to your site being cited. Perplexity also runs PerplexityBot, an indexing crawler that lists sites in its results.

That framing usually points to a nuanced choice rather than a blanket block: many sites are happy to let retrieval crawlers through, because being fetched is a prerequisite for being cited in AI search, while being more selective about training crawlers. If you want to see the full cast of bots before deciding, our web crawlers list and the deep dive on ChatGPT-User and OpenAI’s crawlers lay them out.

Google list of common crawlers including google-extended
Google lists its crawlers, including Google-Extended for AI training.

Generate and manage it in WordPress with RankReady

This is where RankReady turns the theory into a few clicks. It ships with 31 AI crawler controls, covering the bots that matter today including GPTBot, ClaudeBot, PerplexityBot, Google-Extended, and Bytespider, so you toggle a crawler by name rather than hand-writing user-agent blocks. Because it runs inside WordPress, you manage all of this from the dashboard, and there is no FTP and no raw file editing.

RankReady also supports Content Signals, the ai-train, search, and ai-input directives, which let you express how your content may be used rather than only whether a bot may fetch it. Pairing that with the llms.txt file it generates gives AI systems both a permission layer and a clean content map. If you want the companion piece, our guide to the llms.txt generator for WordPress covers that side.

One honest point to keep in mind: robots.txt and content signals are advisory. Well-behaved crawlers from the major AI companies respect them, but the standard cannot force a badly behaved bot to comply. What a generator gives you is a correct, current, and easy-to-change set of instructions for the crawlers that do play by the rules, which is the large majority of the ones you care about. RankReady is free on WordPress.org under a GPL license, so adding this control costs nothing.

Test your robots.txt

Never trust a robots.txt you have not tested. After you generate or change it, open the robots.txt report in Google Search Console to confirm Google can read the file and see which version it has cached. It is also worth loading yourdomain.com/robots.txt directly in a browser to check the live output matches what you intended.

Make this a habit after any big change, since a migration or a new plugin can quietly rewrite the file. Checking robots.txt is a small item on a larger list, and it fits naturally into a wider review, which our guide on how to do an SEO audit covers step by step.

Google search console robots. Txt report
Use the robots.txt report in Google Search Console to test your file.

Wrapping up

Robots.txt went from a quiet technical file to the front door for both search engines and AI systems. A generator keeps that door working: correct syntax, named controls for the crawlers you care about, and no risky hand-editing on a live site. Decide which AI crawlers you want to allow, generate the file, test it, and revisit it whenever your site changes. On WordPress, RankReady lets you do all of that from the dashboard, for free.

Want to manage your AI crawler rules without touching a file? See how RankReady handles robots.txt and content signals.

About the Author

Photo of Aditya Sharma CMO of The Plus Addons for Elementor
CMO at POSIMYTH Innovations · The Plus Addons for Elementor · 7 years experience

He has spent years in the WordPress ecosystem building, breaking, and optimizing sites until they actually perform. He works at the intersection of speed, growth, and usability, helping creators ship websites that load fast and convert. An active WordPress community contributor sharing through tools, tutorials, and direct collaboration. Tested practice, not theory.

WordPressThemesElementorn8nAIClaudeAutomationServer

Related Frequently Asked Questions