HowTo Disallow Webbots For WordPress

For any website administrator it is important to regulate how search engine robots suck the life out of your site. While it is important to consider .htaccess for password protecting and leaching of images simply inserting a robots.txt file can reduce a lot of load on your site. This is very important for administrators who use Amazon Webstores and for anyone that feeds their content off of remote xml data. If the generation of your sites pages can be unlimited based on an external data resource you should examine more indepth ways to restrict access to any bot. For WordPress users you want to restrict access to files and directories that are not part of your site. You may also want to restrict access to your images and cache files. A basic WordPress robots.txt file could include the following: Notice User Agent * is all.... and that disallows are at the top followed by overrides for Google AdSense. An included badbot is also banned based on their user agent info found in the webserver logs.
User-agent: * Disallow: /cgi-bin Disallow: /wp-admin Disallow: /wp-includes Disallow: /wp-content/plugins Disallow: /wp-content/cache Disallow: /wp-content/themes Disallow: /trackback Disallow: /feed Disallow: /comments Disallow: /category/*/* Disallow: */trackback Disallow: */feed Disallow: */comments Disallow: /*?* Disallow: /*? Allow: /wp-content/uploads # Google AdSense User-agent: Mediapartners-Google* Disallow: Allow: /* # BadBot found in my logs User-agent: badbot Disallow: /
Remember that a robots.txt file is only as good as the bot that reads it. If someone is using wget to grab your site or a bot designed to harvest emails and links you are better off stopping it with script level denial, .htaccess or firewall commands. If your site is getting hammered then contact your Service provider for help. This info is just for reducing unneeded or unnecessary load by having bots run around in circles not finding what they should be looking for.