Above the Fold & Socially Acceptable

Where Search and Social Have a Party

Robots.txt and 500 Server Errors: Toxic Combination

Posted by on Apr 23, 2012 in Articles by Erik Dafforn | 4 comments

About a month ago, we got a call from a partner agency, worried that the site for a pretty recognizable brand had somehow run afoul of Google’s guidelines.

On first glance, it appeared to be exactly that. A search for the brand name showed only a deep page showing up on page 3 of the SERP. A site: query showed some deep pages indexed, but not the core (home and product) URLs.

The usual suspects in this case were accidental crawling exclusion and penalty, but we also asked for access to Google Webmaster Tools. While waiting for GWT access, we ran a full crawl and asked the client for information about anything happening on (or to) the site over the past couple of months.

The crawl didn’t turn up anything odd. As for the site, there had been a push to drive some affiliate traffic recently, but nothing that set off any big alarms. Still, this was a lingering concern, due to the sheer number of sites that had been receiving Google’s warnings of unnatural links.

There was no accidental exclusion. The robots.txt file was showing 404 messaging and there were no on-page meta directives for robots. Surfing as Googlebot (with a user-agent spoofer, not through GWT) showed identical results, so there was no inadvertent cloaking going on, either.

We were leaning toward the affiliate linking and were preparing a full backlink analysis, but then we got GWT access, and that changed everything.

Webmaster Tools' robots.txt fetch alerts vs. organic traffic

(Click to enlarge.) The correlation between organic traffic and 500 errors for robots.txt

The robots.txt page was not giving a 404 error, as the error page suggested. Instead, it was showing a 500 error. In GWT’s “Robots.txt Fetch” report, we learned that this had been the case since about February 17th.

We quickly wrote up a robots.txt file with no exclusions and asked the client to upload it immediately. As Google had just attempted the fetch two hours earlier (and it seems to document an attempt about once per day), we had a long wait ahead. Upon next documented fetch the next day, Google downloaded the new robots.txt file without any problems. More important, the Crawler Access report showed that the new file was valid.

Just in case you’re interested, the preceding diagram (click it to open a larger version in a separate window) shows some key points in the event:

A. This is a 10-day period between Google getting a 500 error when fetching the robots.txt file, and organic traffic to the home page crashing pretty hard.

B. This is a date when, inexplicably, the 500 errors subsided briefly. Notice the subsequent growth, then decline, of organic traffic.

C. This is the date when the new robots.txt file was uploaded. The errors drop, and traffic slowly begins to return to normal state.

The takeaway here is pretty obvious: A 404 error and a 500 error could not be more different, especially when the page we’re talking about is the robots.txt file. One says to Google, “Go ahead and crawl me,” while the other holds up a shotgun and says “Get offa my porch.”

GWT's Crawl Stats during the 500-error period

(Click to enlarge.) This report confirms that crawling more or less ceased during the period.

As seen here, the Crawl Stats charts more or less echo what we’ve already seen, but if you can explain to me how we have KB/day and “time spent” values greater than zero for the “non-crawling” days, I’m all ears. It may boil down to multiple crawling sessions that are begun and ended independently of one another, and which may continue based on an older fetch of the robots.txt file.

Erik joined Intrapromote full-time in 2002 after coming on as a contractor in 1999. Yeah, he’s been here a while. Currently, as President, he divides his time between overseeing the SEO department and managing organic SEO campaigns. He’s also one of Intrapromote’s Chief Big Idea Guys. Prior to working at Intrapromote, Erik worked in publishing as a development editor in the programming imprints of IDG Books and Prentice-Hall Computer Publishing so if anyone needs a keyboard shortcut, they ask Erik.

Among Erik’s professional bragging rights include the fact that he led the team that was awarded Honda’s prestigious “Premier Partner” vendor award for SEO Services. He also contributed on the recently published Search Engine Optimization Secrets with Danny Dover.

Follow Erik Dafforn on Twitter:

Opt In Image
Ready to Work With Us?

Are you ready to have results that you've only dreamed of? Need someone on your side who can explain the ins and outs of organic search and social media marketing? Ready to have your brand be extended, amplified and protected?

Contact our sales team about working with IP and find out what options exist for your business today.

Share via email
Send to Kindle
Erik Dafforn

Erik Dafforn

President at Intrapromote
Erik joined Intrapromote full-time in 2002 after coming on as a contractor in 1999. Yeah, he's been here a while. Currently, as President, he divides his time between overseeing the SEO department and managing organic SEO campaigns. He's also one of Intrapromote's Chief Big Idea Guys. Prior to working at Intrapromote, Erik worked in publishing as a development editor in the programming imprints of IDG Books and Prentice-Hall Computer Publishing so if anyone needs a keyboard shortcut, they ask Erik. Among Erik's professional bragging rights include the fact that he led the team that was awarded Honda's prestigious "Premier Partner" vendor award for SEO Services. He also contributed on the recently published Search Engine Optimization Secrets with Danny Dover. Erik rivals Aaron Sorkin in his candid wit (although he will completely disagree with that statement) and has been a tremendous asset to Intrapromote since day one. He is incredibly intelligent and when you read any of his posts, you'll be able to sit back and say "Wow, I've just been Dafforned."
Erik Dafforn
Erik Dafforn
Erik Dafforn

4 Responses to “Robots.txt and 500 Server Errors: Toxic Combination”

  1. “if you can explain to me how we have KB/day and “time spent” values greater than zero for the “non-crawling” days”

    Take a look at the server logs and check what files Googlebot crawled. It should be pretty straight forward :)

  2. Thanks Ryan. Yes, I get the part about finding out what files were crawled by using server logs (which, for this client, we don’t have access to), but I was just trying to get a handle on the apparent contradiction of Google saying that on average, 1500KB per day were crawled (if you look at the middle chart) yet if you look at the top chart, zero pages were crawled.

  3. The tricky part about the charts in Google Webmaster Tools is that you can never know if it is really zero pages.

    In your case, the values on the vertical axis are 300 and 600 – then I guess any number between 0 to 10 looks like 0 on the chart, which is actually not. They same may also apply to KB downloaded per day and Time spent downloading a page.

    I think charts of Crawl Stats in GWT only give you a feel on the trend instead of exact numbers :)

  4. how do i fix it i have a Robots.txt Fetch error

Leave a Reply