Robots.txt and 500 Server Errors: Toxic Combination
About a month ago, we got a call from a partner agency, worried that the site for a pretty recognizable brand had somehow run afoul of Google’s guidelines.
On first glance, it appeared to be exactly that. A search for the brand name showed only a deep page showing up on page 3 of the SERP. A site: query showed some deep pages indexed, but not the core (home and product) URLs.
The usual suspects in this case were accidental crawling exclusion and penalty, but we also asked for access to Google Webmaster Tools. While waiting for GWT access, we ran a full crawl and asked the client for information about anything happening on (or to) the site over the past couple of months.
The crawl didn’t turn up anything odd. As for the site, there had been a push to drive some affiliate traffic recently, but nothing that set off any big alarms. Still, this was a lingering concern, due to the sheer number of sites that had been receiving Google’s warnings of unnatural links.
There was no accidental exclusion. The robots.txt file was showing 404 messaging and there were no on-page meta directives for robots. Surfing as Googlebot (with a user-agent spoofer, not through GWT) showed identical results, so there was no inadvertent cloaking going on, either.
We were leaning toward the affiliate linking and were preparing a full backlink analysis, but then we got GWT access, and that changed everything.
The robots.txt page was not giving a 404 error, as the error page suggested. Instead, it was showing a 500 error. In GWT’s “Robots.txt Fetch” report, we learned that this had been the case since about February 17th.
We quickly wrote up a robots.txt file with no exclusions and asked the client to upload it immediately. As Google had just attempted the fetch two hours earlier (and it seems to document an attempt about once per day), we had a long wait ahead. Upon next documented fetch the next day, Google downloaded the new robots.txt file without any problems. More important, the Crawler Access report showed that the new file was valid.
Just in case you’re interested, the preceding diagram (click it to open a larger version in a separate window) shows some key points in the event:
A. This is a 10-day period between Google getting a 500 error when fetching the robots.txt file, and organic traffic to the home page crashing pretty hard.
B. This is a date when, inexplicably, the 500 errors subsided briefly. Notice the subsequent growth, then decline, of organic traffic.
C. This is the date when the new robots.txt file was uploaded. The errors drop, and traffic slowly begins to return to normal state.
The takeaway here is pretty obvious: A 404 error and a 500 error could not be more different, especially when the page we’re talking about is the robots.txt file. One says to Google, “Go ahead and crawl me,” while the other holds up a shotgun and says “Get offa my porch.”
As seen here, the Crawl Stats charts more or less echo what we’ve already seen, but if you can explain to me how we have KB/day and “time spent” values greater than zero for the “non-crawling” days, I’m all ears. It may boil down to multiple crawling sessions that are begun and ended independently of one another, and which may continue based on an older fetch of the robots.txt file.
Follow Erik Dafforn on Twitter:

Are you ready to have results that you've only dreamed of? Need someone on your side who can explain the ins and outs of organic search and social media marketing? Ready to have your brand be extended, amplified and protected?
Contact our sales team about working with IP and find out what options exist for your business today.
Send to Kindle4 Responses to “Robots.txt and 500 Server Errors: Toxic Combination”







“if you can explain to me how we have KB/day and “time spent” values greater than zero for the “non-crawling” days”
Take a look at the server logs and check what files Googlebot crawled. It should be pretty straight forward
Thanks Ryan. Yes, I get the part about finding out what files were crawled by using server logs (which, for this client, we don’t have access to), but I was just trying to get a handle on the apparent contradiction of Google saying that on average, 1500KB per day were crawled (if you look at the middle chart) yet if you look at the top chart, zero pages were crawled.
The tricky part about the charts in Google Webmaster Tools is that you can never know if it is really zero pages.
In your case, the values on the vertical axis are 300 and 600 – then I guess any number between 0 to 10 looks like 0 on the chart, which is actually not. They same may also apply to KB downloaded per day and Time spent downloading a page.
I think charts of Crawl Stats in GWT only give you a feel on the trend instead of exact numbers
how do i fix it i have a Robots.txt Fetch error