Iâ€™m going to talk about setting up a robots.txt especially for your self hosted wordpress blog, to help the search engine crawlers to best index your site and help with with search engine optimization. Due to the recent content duplication rules in the google index, you want to make sure that youâ€™re submitting one version of your posts/pages, and also that the crawler isnâ€™t trying to index pages it really does need to at all. Pages like trackbacks, admin, includes, and your rss feed.
It seems from reading many blogs and postings that not everyone agrees about category pages. Iâ€™ve heard some say that they want their category pages indexed – and that helps them. I think it seems to depend on the site, and how you have been tagging things. Sometimes on some of my sites I go overboard on tagging, so I end up with a ton of category pages. And also, many times I tag things in many different categories. Having a post have itâ€™s own page, be listed on the front page, and 5 category pages wouldnâ€™t seem to be a very good plan for good seo and an obvious setup for content duplication (in my eyes). So just to be safe, I filter out my category pages too in my robots.txt.
First, I read over or Lorelle on WordPress (link in sidebar) that now google has sitemap inclusion, and you can add this line to your
and you no longer have to submit your sitemap (the crawler will know what to do with it). So this is a new entry for me. I also read that you can tell the google image crawler where to (and not to) go in your site, so I added this:
# The Googlebot-Image is the image bot for google
# Allow Everything
I also saw that can do the same for the adsense crawler, which has nothing to do with indexing, but if you use adsense it would be smart to have this as well:
# This is the ad bot for google
# Allow Everything
So these are all new entries for me. Now daily blog tips (link in sidebar) has a quick, down and dirty post on a robots.txt file for wordpress. Itâ€™s pretty simple:
I kinda like that, but it doesnâ€™t seem to cover everything. Filiâ€™s Tech has an article on wordpress seo for wordpress too, and I like his ideas. So I ended up with something like this:
# Disallow all directories and files within
# Disallow all files ending with these extensions
# Disallow parsing individual post feeds, categories and trackbacks..
For right or wrong, I have one section for:
and another section for:
If you have any comments, improvements, or suggestions – please comment now!