Free Checking for Bloggers - Sign up in 5 Minutes!
Powered by MaxBlogPress  

 

Concentrating on robots.txt specifically for Wordpress | JTPRATT's Blogging Mistakes
JTPratt's Blogging Mistakes





Home » Concentrating on robots.txt specifically for Wordpress


 
 
 

Posted in:

wordpress category image wordpress-seo category image
1,470 views


I’m going to talk about setting up a robots.txt especially for your self hosted wordpress blog, to help the search engine crawlers to best index your site and help with with search engine optimization. Due to the recent content duplication rules in the google index, you want to make sure that you’re submitting one version of your posts/pages, and also that the crawler isn’t trying to index pages it really does need to at all. Pages like trackbacks, admin, includes, and your rss feed.

It seems from reading many blogs and postings that not everyone agrees about category pages. I’ve heard some say that they want their category pages indexed - and that helps them. I think it seems to depend on the site, and how you have been tagging things. Sometimes on some of my sites I go overboard on tagging, so I end up with a ton of category pages. And also, many times I tag things in many different categories. Having a post have it’s own page, be listed on the front page, and 5 category pages wouldn’t seem to be a very good plan for good seo and an obvious setup for content duplication (in my eyes). So just to be safe, I filter out my category pages too in my robots.txt.

First, I read over or Lorelle on Wordpress (link in sidebar) that now google has sitemap inclusion, and you can add this line to your

robots.txt file:
User-agent: *
Sitemap: http://www.jtpratt.com/sitemap.xml

and you no longer have to submit your sitemap (the crawler will know what to do with it). So this is a new entry for me. I also read that you can tell the google image crawler where to (and not to) go in your site, so I added this:

# The Googlebot-Image is the image bot for google
User-agent: Googlebot-Image
# Allow Everything
Allow: /*

I also saw that can do the same for the adsense crawler, which has nothing to do with indexing, but if you use adsense it would be smart to have this as well:

# This is the ad bot for google
User-agent: Mediapartners-Google*
# Allow Everything
Allow: /*

So these are all new entries for me. Now daily blog tips (link in sidebar) has a quick, down and dirty post on a robots.txt file for wordpress. It’s pretty simple:

User-agent: *
Disallow: /wp-
Disallow: /feed/
Disallow: /trackback/

I kinda like that, but it doesn’t seem to cover everything. Fili’s Tech has an article on wordpress seo for wordpress too, and I like his ideas. So I ended up with something like this:

# Disallow all directories and files within
Disallow: /cgi-bin/
Disallow: /wp-admin/
Disallow: /wp-includes/

# Disallow all files ending with these extensions
Disallow: /*.php$
Disallow: /*.js$
Disallow: /*.inc$
Disallow: /*.css$

# Disallow parsing individual post feeds, categories and trackbacks..
Disallow: /trackback/
Disallow: /feed/
Disallow: /category/

For right or wrong, I have one section for:

User-agent: Googlebot

and another section for:

User-agent: ia_archiver
User-agent: Scooter

User-agent: Atomz
User-agent: FAST-WebCrawler
User-agent: ArchitextSpider
User-agent: Googlebot
User-agent: Slurp.so/1.0
User-agent: Slurp/2.0j
User-agent: Slurp/2.0-KiteHourly
User-agent: Slurp/2.0-OwlWeekly
User-agent: Slurp/3.0-AU

User-agent: UltraSeek
User-agent: MantraAgent
User-agent: Lycos_Spider_(T-Rex)
User-agent: MSNBOT/0.1
User-agent: Gulliver
User-agent: Scrubby/
User-agent: ZyBorg

If you have any comments, improvements, or suggestions - please comment now!

Related:


3 Responses to “Concentrating on robots.txt specifically for Wordpress”

  1. Dave Has the following to say...

    You say you have one section for Googlebot and one for the others “for right or wrong”. Do you do anything different between the two sections?

  2. admin Has the following to say...

    no, I do both sections the same way - I just want to make sure google’s instructions are very clean and don’t get muddied by the other crawlers listings.

  3. increase backlinks Has the following to say...

    Backlinks are key to increasing your search engine popularity LinkPartnerExpress is the best on the web…I have 6500 links, all quality!!

Question or Comment?? Spill it Now...

Jumping for Joy over comments!

We Reward Comments!


We dofollow links, and get your latest blog post as a byline under every new comment from the "CommentLuv" plugin! Top commenters for every month are listed on every page of this site in a sidebar widget linked back to your URL! We would like to reward you for becoming part of our community! Your comment is valuable not only to us, but also all the other readers of this blog!
 

 


Click to add smilies to your post! = =[] ^=( =(( =(| =)r =|8 =0 =)~ =00 =( =;; =)] =;;;