Breaking Google Analytics’ 50,000 Row Limit: Enterprise Analytics
An enterprise-level client recently came to us with an issue we typically file under “good problems to have:” He’s running an organic campaign in which the number of unique referring phrases will soon exceed 50,000 per day.
The 50,000 number is important, because that’s the practical limit of data rows you’re allowed to export under the Google Analytics API.
His challenge was to find out whether a) any third-party API tools can circumvent that 50,000-row limit. As far as we could tell, none can. (Google Analytics Premium extends that limit to one million for $150,000 per year.)
They all talk about working “within” the limit but none discusses breaking the ceiling, with the exception of some that piece together multiple queries.
As a result, I was playing around with the keyword filters and found a bit of a hacky solution. Using some simple Regular Expressions (simple RegEx is the only RegEx I know), it became pretty easy to break up a set that’s larger than 50,000 rows into two or more smaller bits.
For example, suppose your Organic Search Traffic report for a given day has around 52,000 unique phrases. You can divide the list of terms alphabetically, in theory breaking the set into two roughly equal halves.
Following are the instructions to break a large (> 50K row) dataset into two sets. Start at your Google Analytics (Traffic Sources | Sources | Search | Organic).
To obtain the “first” half (words beginning with A-K or a digit), click the “Advanced” link, which will allow you to create a filter for your keywords. Configure the first filter like this:
Include -> Keyword -> Matching RegExp -> ^[a-k0-9]
Basically, this command tells GA to list all keywords that begin with any letter from A to K, or any digit (0-9).
When this filter is complete, click the “Apply” button. The resulting dataset will reflect the terms that we’re looking for — the “first half” of our large dataset. The “advanced” link will now say “edit” because there is a filter currently being used.
To find the “second” half (words beginning with L-Z or a non-alphanumeric character, such as a comma, colon, or other punctuation mark), click the “edit” button and set up the following criteria:
Include -> Keyword -> Matching RegExp -> ^[l-z\W]
As stated, this configuration will show you the remaining phrases — any queries that begin with letters L through Z or a non-alphanumeric character.
I tried this on a few random days and in all cases, the sum of the two segments equalled exactly the total number of visits, so I feel like these expressions cover all the character bases. (Incidentally, all your “(not provided)” terms will appear in the rows pulled from the second half of the dataset, since they technically begin with an opening parenthesis).
You can export both halves of the dataset and re-combine them in Excel and have the entire set to work with. It’s a little clunky, but with traffic growing, it’s a good way to deal with days that contain more than 50,000 unique phrases.
RegEx, of course, can do far more than divide a large dataset into two smaller chunks. It’s a very powerful filtering mechanism and can help you with very complex sorts and advanced segmentation. A couple hours of reading on the subject will enhance your analytics skills immensely.
Follow Erik Dafforn on Twitter: