- Redesign best practices have been well documented for years. If you’re not following them, you’re not doing your job.
- Even the biggest brands in the world are not immune from losing clicks due to sloppy practices.
- If you’re not there to claim your clicks, someone else will be. Probably. Someone. You. Hate.
Point is, Costco had some pretty healthy results for [costco tvs] prior to its redesign: A couple of dupes to its main TV page as well as two subcategory pages. (And what was Walmart doing there? We’ll get to that.)
But no 301s went up. Oh, wait. 301s did go up, but they went to the home page. Yes. Every legacy URL on the site got 301-redirected to the root. Chew on that for a while and imagine the sort of chest pains you’d be feeling if this was your site.
A few days later, Google was having some real problems understanding what was going on. Which is to say, Google knew exactly what was going on, but it was taking way longer to process than it should have. Here’s the same SERP about five days into the rollover:An “extended warranty” page for TVs, the home page (which is no surprise, given that about 4.6 jillion old pages now redirected there), and Walmart. We’ll get to Walmart.
Ten days in, the same SERP shows a link to a specific TV, the home page, and the extended warranty. Walmart still at 4th. We’ll get to Walmart:So about two weeks into this redesign, Google is really just starting to put the pieces together. Costco has painted itself into a corner of full, from-scratch crawling and indexing, and their organic traffic likely looks like the backside of a mountain right now.
Moving on, you may find it interesting to see Walmart showing up for [costco]-related queries. (I sure do.) Better yet, it’s interesting to see titles with “Costco TVs on Sale” on its site. This seems about as likely as finding a Seattle’s Best endcap at Starbucks.
Costco, of course, does not make televisions. So technically, there is no such thing as a “Costco TV.” But Walmart is smart enough to know that people search this way to find the televisions at the Costco.com site. To put it in SEO parlance, [costco tvs] is the layperson’s version of [intitle:tv|tvs site:costco.com].
What Walmart is doing is really nothing new. They’re simply allowing their own internal search results to be indexed, and it just so happens many of these search results are poised to compete for terms that Walmart.com would typically not rank for naturally. It’s no secret that Google doesn’t care for pages like this (“Use robots.txt to prevent crawling of search results pages …”), but again, given Walmart’s footprint, it’s unlikely to care too much about slight transgressions.
(Digging deeper, about 47 million of Walmart.com’s 22 million indexed pages are search results. Yes, you read that right. What an inverted ratio like this typically means is that the vast majority of those 47 million are considered fairly low quality. Obviously not all of them are, though, or we wouldn’t see them in SERPs like these.)
The indexing of internal search results is secondary. What’s really interesting is how you can go to Walmart’s internal search engine, search for [costco] anything (such as costco computers or costco prepaid phone cards), and it comes up with results. This means that somehow, Walmart is tagging these SKUs with “costco” for its internal search appliance and then getting them crawled. Or at a minimum, it’s allowing search results to appear even when all query terms aren’t satisfied. I can’t find any trace of “costco” on the actual product pages for SKUs that show up in these results, or in the user reviews. So all in all, this process is pretty sly. But again, the indexing is secondary. They may exist somewhere in Walmart’s vast XML repository or someplace else. So feel free to dig in and find out.
Just the same way, people search for [costco laptops] to find laptops on Costco.com. In this case, Walmart is there too, but they’re doing it with a little more class — by buying “costco” keywords through AdWords:
While Walmart doesn’t have an organic presence for this query, Costco doesn’t do much better. Why? Because its top organic listing results in an error page. A 404? They wish. It’s a “connection refused” error, which is such a slap in the face that it doesn’t even merit a numerical error code, because the server won’t even connect to give you one. So good luck reclaiming any of these clicks. This is another instance where a simple two-column 301 map would have made all the difference:
Here’s the conclusion: Walmart is no angel here, but truthfully, if you’re going to sleep through the SEO portion of a redesign, you’re practically asking your traffic to go somewhere else. When your mistake corresponds with someone else’s tenacity, you lose.
Follow Erik Dafforn on Twitter:
One of the major annoyances of automobile manufacturers is that their footprint in image search rarely matches their footprint in organic search. One reason for this is that Google’s image search algorithm is (in my opinion) less sophisticated and doesn’t contain the same sort of recognition of authoritative domains that its traditional organic algorithm contains.
But another reason is that for many OEMs (original equipment manufacturers — another name for “automakers”), the presentation format for images on their sites is fairly technical, often including script-based carousel galleries, Flash, or other rich media technologies that make it difficult for engines to interpret all the image data.
To add insult to injury, imagine poorly crafted, low-quality, no-content, made-for-Adsense sites ranking above yours in image search — using images that your company paid for and that were lifted from your own site with little more than a right-click or a screen shot.
That’s what we found in a large study of automotive OEMs and [make + model] image queries. In an effort to account for those low-tech but high-performing sites, we made several recommendations designed to increase the visibility of the OEM’s own image data.
To represent pre- and post-optimization data, we measured organic image visits from Google to the OEM site across partial (but equal) time periods in 2011 and 2012. Here are the resulting visits from image search:
Percent increase: 592%
… or, ”Be the First of Your Friends to Whitewash This Fence.”
While social media sharing buttons can do wonders for the sites they sit upon (and those wonders are best explained by people smarter than I), their benefit to users is, at best, minimal. Consequently, if we’d offer the user a bit more for their coveted click, we’d get more of what we want.
One thing I find very annoying about Facebook and Twitter is that’s it’s very hard to find an historically accurate list of external links I’ve liked, recommended, or (re)tweeted. These networks live more or less in “the now,” and while that’s fine to a point, I can’t tell you how many times I’ve remembered something about an article I liked from someone’s stream, but it’s nearly impossible to find it again.
Twitter’s “Favorites” feature is a decent attempt, but it’s yet another action to remember. You not only have to tweet something, you have to go back and “favorite” it. And it’s not very searchable.
In short, simply offering a way for your users to collect (and possibly manage) the list of URLs they’ve “promoted” would be a very nice feature.
Google+ already does this. It’s one of the lesser known and more helpful features of the service.
Consider this article on McSweeney’s, which (ahem) offers insight into your various depravities based on which ’80s band is your fave.
I “+1′d” this article (which was significantly easier than actually typing “+1′d”) and shared it on Google+. That’s fine for the time. But what if I want to read it in parts, or come back to it later, or find the actual link?
In Facebook, this is a nightmare. Good luck sifting through your own timeline, or worse yet, your friends’, in trying to find where you think you saw it. Twitter is not much better, if you’re talking about a story you tweeted several months ago. And if you use URL shorteners, it makes it even less intuitive to find the link.
In Google+, however, there’s a very easy way to see the URLs you’ve +1′d. Just go to your Profile link and click the +1′s tab, as seen here:
As I said, few people know about this, and it’s so helpful that I fear Google might quickly delete it.
But back to the main point. Just knowing that using the +1 button on an article will put it in a private repository for later reading is a great incentive for me to use that button. But have you ever seen anything on a page that lets you know that?
(And so you know, I realize I’m guilty. At the bottom of this post are the typical sharing buttons. What’s in it for you? On the surface, not much, except the appreciation of a grateful blog post writer.)
The closest thing I ever see to social media sharing buttons offering a benefit to the user is Facebook’s last-resortish “Be the first” button. And to be honest, along the lines of Groucho Marx’s line about not wanting to be the member of any club that would have him, I typically think that if none of my friends currently like a page, there’s probably a reason:
When I see Facebook’s “Be the first of your friends to like this” button, I can’t help but think of a passage written by Mark Twain over 130 years ago:
Tom resumed his whitewashing, and answered carelessly:
“Well, maybe it is [work], and maybe it ain’t. All I know, is, it suits Tom Sawyer.”
“Oh come, now, you don’t mean to let on that you like it?”
The brush continued to move.
“Like it? Well, I don’t see why I oughtn’t to like it. Does a boy get a chance to whitewash a fence every day?”
That put the thing in a new light. Ben stopped nibbling his apple. Tom swept his brush daintily back and forth – stepped back to note the effect – added a touch here and there – criticised the effect again – Ben watching every move and getting more and more interested, more and more absorbed. Presently he said:
“Say, Tom, let me whitewash a little.”
In short, this isn’t really much of a motivation for me. And in the long run, just being the guy who shares the content isn’t really going to be much of a motivator for anyone except a small few. So social media platforms need to get on the stick about what’s in it for everyone else out there, so that the content that everyone is spending so much writing is complemented by a similarly effective call to action.
And if you enjoy this post, why not hit the +1 button, so that you can store it cleanly, for immediate, any-time retrieval from your Google+ profile page? It’s about you, my friend. You.
About a month ago, we got a call from a partner agency, worried that the site for a pretty recognizable brand had somehow run afoul of Google’s guidelines.
On first glance, it appeared to be exactly that. A search for the brand name showed only a deep page showing up on page 3 of the SERP. A site: query showed some deep pages indexed, but not the core (home and product) URLs.
The usual suspects in this case were accidental crawling exclusion and penalty, but we also asked for access to Google Webmaster Tools. While waiting for GWT access, we ran a full crawl and asked the client for information about anything happening on (or to) the site over the past couple of months.
The crawl didn’t turn up anything odd. As for the site, there had been a push to drive some affiliate traffic recently, but nothing that set off any big alarms. Still, this was a lingering concern, due to the sheer number of sites that had been receiving Google’s warnings of unnatural links.
There was no accidental exclusion. The robots.txt file was showing 404 messaging and there were no on-page meta directives for robots. Surfing as Googlebot (with a user-agent spoofer, not through GWT) showed identical results, so there was no inadvertent cloaking going on, either.
We were leaning toward the affiliate linking and were preparing a full backlink analysis, but then we got GWT access, and that changed everything.
The robots.txt page was not giving a 404 error, as the error page suggested. Instead, it was showing a 500 error. In GWT’s “Robots.txt Fetch” report, we learned that this had been the case since about February 17th.
We quickly wrote up a robots.txt file with no exclusions and asked the client to upload it immediately. As Google had just attempted the fetch two hours earlier (and it seems to document an attempt about once per day), we had a long wait ahead. Upon next documented fetch the next day, Google downloaded the new robots.txt file without any problems. More important, the Crawler Access report showed that the new file was valid.
Just in case you’re interested, the preceding diagram (click it to open a larger version in a separate window) shows some key points in the event:
A. This is a 10-day period between Google getting a 500 error when fetching the robots.txt file, and organic traffic to the home page crashing pretty hard.
B. This is a date when, inexplicably, the 500 errors subsided briefly. Notice the subsequent growth, then decline, of organic traffic.
C. This is the date when the new robots.txt file was uploaded. The errors drop, and traffic slowly begins to return to normal state.
The takeaway here is pretty obvious: A 404 error and a 500 error could not be more different, especially when the page we’re talking about is the robots.txt file. One says to Google, “Go ahead and crawl me,” while the other holds up a shotgun and says “Get offa my porch.”
As seen here, the Crawl Stats charts more or less echo what we’ve already seen, but if you can explain to me how we have KB/day and “time spent” values greater than zero for the “non-crawling” days, I’m all ears. It may boil down to multiple crawling sessions that are begun and ended independently of one another, and which may continue based on an older fetch of the robots.txt file.
Follow Erik Dafforn on Twitter:
Pay attention. The 600 series had rubber skin. We spotted them easy. But these are new. They look human. Sweat, bad breath, everything. Very hard to spot. I had to wait ’til he moved on you before I could zero him.
— Kyle Reese, in “The Terminator“
Suppose you’re a company who’s purchased a link-building campaign. And part of the deal was that your vendor promises you a specific number of links per month. 50. 75. 150. 200. Whatever. And that was part of what made them an attractive offer, I’m sure: The promise of a guaranteed quantity of links in a given time period.
We’re going to be following that train of thought in the next few months, but for now, I want to focus on one specific aspect of a quantity-based link-building program: Links from auto-generated content.
It’s certainly not happening to you, because unlike everyone else, you get your link report each month, click each one, make all sorts of checks to ensure that it’s really a link, read the entire article in which it sits, and so on. But believe it or not, some clients just look at the Excel sheet, see 200 rows filled, and check it off their mental list. Let’s talk about what those people are missing.
We get offers all the time from vendors who want us to outsource our link-development programs to them. One recent offer came from a fellow who owns 18,000 domains and is offering us a tidy link package. I checked out some of the sample domains he listed in his message to see what style of link development he’s selling.
I’ll show my age here a bit: the interesting thing about the posts on his sample domains is not that they’re poorly written, but that they’re not “written” at all. Instead, they’re built, assembled, compiled, (use any verb you want, except for “written”) by a content generation program. The early generations of those programs produced some real garbage, but with these, it’s harder to detect. They look very much like they are simply written by someone who’s not a great writer, or perhaps someone who knows English as a second language, but there are other signs that they’re “generated.” Take a look at the following passage from one of his sample blog posts — the type of posts he is offering to use to link back to our clients:
Now, much more about this excellent product! This skin firming product is put on thoroughly clean pores and skin and during first minutes it dries and tightens the skin to some smooth delicate, satin -like finish. I totally adore the feeling- it really businesses the skin, it is quite amazing actually! Once dried, you merely apply make-up ( I propose mineral make-up) and you’re simply all set! Your skin can look organization, pores are decreased, as well as your skin’s appearance will you should be SMOOTH. I completely adore adore really like this stuff!
At the end of sentence 1, the space between “satin” and “-like” is one clue that terms are changed and replaced regularly, programmatically, as variables. The same goes for inappropriate spaces before and after certain parentheses. To make the text seem more random, content generators use a thesaurus database. I’ve underlined some words that are pulled from such a database, and it’s clear that in these circumstances, the matching was off. Take the use of “organization,” which, on this thesaurus page, can be matched with “make-up.”
Then there’s the line “… it really businesses up the skin.” It took me a while, but I think I’ve figured that out. Down in the list of synonyms for “business,” I saw “contract.” It makes the skin tighter, or contracts it. From “contract,” the program found “business” as a synonym, and voilà.
And then, of course, there’s the last sentence:
I completely adore adore really like this stuff.
It reminds me of the old Certs commercial: “… two, two, two mints in one.” But it leaves the opposite taste in my mouth.
This isn’t new technology. But it may be new to you. Recently, several sales prospects have come to us complaining of having received Google’s warning of “possibly artificial or unnatural links pointing to your site that could be intended to manipulate PageRank.” When we examine their links, we see a lot of these auto-generated pages.
So watch for this when you’re looking at links your SEO company sends to you.
Follow Erik Dafforn on Twitter:
[A]n enterprise-level client recently came to us with an issue we typically file under “good problems to have:” He’s running an organic campaign in which the number of unique referring phrases will soon exceed 50,000 per day.
The 50,000 number is important, because that’s the practical limit of data rows you’re allowed to export under the Google Analytics API.
His challenge was to find out whether a) any third-party API tools can circumvent that 50,000-row limit. As far as we could tell, none can. (Google Analytics Premium extends that limit to one million for $150,000 per year.)
They all talk about working “within” the limit but none discusses breaking the ceiling, with the exception of some that piece together multiple queries.
As a result, I was playing around with the keyword filters and found a bit of a hacky solution. Using some simple Regular Expressions (simple RegEx is the only RegEx I know), it became pretty easy to break up a set that’s larger than 50,000 rows into two or more smaller bits.
For example, suppose your Organic Search Traffic report for a given day has around 52,000 unique phrases. You can divide the list of terms alphabetically, in theory breaking the set into two roughly equal halves.
Following are the instructions to break a large (> 50K row) dataset into two sets. Start at your Google Analytics (Traffic Sources | Sources | Search | Organic).
To obtain the “first” half (words beginning with A-K or a digit), click the “Advanced” link, which will allow you to create a filter for your keywords. Configure the first filter like this:
Include -> Keyword -> Matching RegExp -> ^[a-k0-9]
Basically, this command tells GA to list all keywords that begin with any letter from A to K, or any digit (0-9).
When this filter is complete, click the “Apply” button. The resulting dataset will reflect the terms that we’re looking for — the “first half” of our large dataset. The “advanced” link will now say “edit” because there is a filter currently being used.
To find the “second” half (words beginning with L-Z or a non-alphanumeric character, such as a comma, colon, or other punctuation mark), click the “edit” button and set up the following criteria:
Include -> Keyword -> Matching RegExp -> ^[l-z\W]
As stated, this configuration will show you the remaining phrases — any queries that begin with letters L through Z or a non-alphanumeric character.
I tried this on a few random days and in all cases, the sum of the two segments equalled exactly the total number of visits, so I feel like these expressions cover all the character bases. (Incidentally, all your “(not provided)” terms will appear in the rows pulled from the second half of the dataset, since they technically begin with an opening parenthesis).
You can export both halves of the dataset and re-combine them in Excel and have the entire set to work with. It’s a little clunky, but with traffic growing, it’s a good way to deal with days that contain more than 50,000 unique phrases.
RegEx, of course, can do far more than divide a large dataset into two smaller chunks. It’s a very powerful filtering mechanism and can help you with very complex sorts and advanced segmentation. A couple hours of reading on the subject will enhance your analytics skills immensely.
Follow Erik Dafforn on Twitter: