The Long Tail is a Power Law

So I’d like to introduce you to a friend of mine. His name is George. George Zipf. Besides having an exceedingly cool last name, George here was a pretty smart guy. George studied at Harvard and became a world renown linguist. He became an expert in what is known as quantitative linguistics, meaning the study of how we learn language, how language changes, and the structure of language, including the frequency of word usage. Now that last one is important as that’s what made George famous. He studied the frequency of word usage in language. He discovered something interesting. When mapping the frequency of word usage in any language, only a few words are used very often, and most are only rarely used. And how does one determine the frequency of words in a language? Well, you count a lot of words in a lot of books.

To put that in scientific language: given a corpus of natural language word usage, the frequency of any word is inversely proportional to its rank in the frequency table.

Well, what does that mean exactly? Let’s take a look at the most commonly used words in the English language.

You can find this on Wikipedia. Unsurprisingly, “the” in the most common word, follow by be, to, of, and, and a and so on.  Let’s go to the virtual whiteboard and write this out. Alright, let’s write out our words. We have, the, be, to, of, and, and a. If “the” is the most popular word that would be 1 or fractionally-speaking 1/1. The second most popular occurs half as often, so it is half or 1/2. The third happens 1/3 of the. The forth 1/4 and so on.

If we graph this out it looks like this: a line that quickly falls and flattens out, but never actually hits zero. An asymptote. This is what’s called a power law.

The interesting things is that George Zipf and others started to notice that a lot of other data sets from the physical and social sciences seemed to follow the exact same “Zipfian” pattern. 

And thus was born Zipf’s Law.

Let’s look at an example completely unrelated to to word frequency: City populations in the United States.

Here’s our list of the major cities in the US and here are their populations. If Zipf’s Law is actually reflected in city size than we would expect the second most populous city to be right around half as big as the most populous. So Los Angeles should be around half the size of New York City, or 4 million people. 

And that’s what we see. The predicted population using Zipf’s law (or multiplying New York City’s population by one half) gives us 4 million, which ends up only being off by 7%. 

Looking at Chicago, which would be one third the size of New York City, the results are darn close. For the rest of the top cities, you can see the results are also pretty close.

So what does this have to do with digital marketing?

Let’s look at an important example, search marketing. Let’s say you’re a logistics management software development firm looking to advertising yourself on Google Search. Because you know that Google’s Search Ads system is based on a real-time bidding auction, you know that bidding so your ad comes up for the popular keyword searches is going to give you the most number of individuals seeing your ad.

If you bid on the keyword “logistics” you are guaranteed to get exposure to the most number of eyeballs. The problem is, you’re selling logistics management software. The keyword “logistics” is overly broad. Let’s narrow our keyword bidding a bit. Now we bid on the keyword “logistics management software”. Because the keyword is narrower, fewer people will see it, but chances are those people are going to be more relevant to us because they typed in “logistics management software” or something similar into the search engine.

Now, you’ve done your marketing research and your know that the vast majority of logistic management software purchases are preceded by purchasing managers reading up on successful deployment case studies. Wouldn’t it be great if we could target those people? Well, we can. Again, just narrow the keyword we bid on to “logistics management software case studies” and now our ad will show up to fewer people, but those people are far more likely to convert to a sale.

The great thing about this is, as you increase the complexity of your keywords and more narrowly target your audience—looking for only those people typing in the search terms that will match your keywords—your competition goes down. And when your competition goes down, so does your bid cost. Thus those lowest-common denominator keywords in the so-called “fat head”, which are expensive and a fairly blunt instrument, are far less appealing than the keywords that sit here in the long tail.

How is Google addressing the Long Tail? 

Obviously, the more narrow people are with the search terms they use to find things the better result that gives the user. But does Google benefit from pushing people towards more common, “fat head” searches instead? Turns out, they do.

Only 15% of Google’s daily search engine queries are unique. Just a few years ago, that number was 20%. When unique queries are run by a search engine, it is more processor-intensive and thus more expensive. So convincing users to utilize search results that have already been indexed is more cost effective. Here are a few ways they do that.

When you start typing into the google search field you are offered suggestions. This is Google subtlety steering you toward those pre-indexed pages. In some cases, as in this Disneyland example, you don’t even have to finish typing a question before the Google dropdown provides an answer. Finally, if you’ve already made a query, you’ve established a context. And future searches operate within the search context. For example, if I do a search for the original opening date of Disneyland, but then start a new query with “Who is the…” Google will fill in the rest using artificial intelligence trying to figure out what you might want. Again, subtlety steering you toward pre-indexed, less expensive results.

To sum up, George Zipf observed a pattern in nature that’s reflected in many physical and social science phenomena. This pattern creates the same long tail distributions that we can use to our advantage as digital marketers. Oh and by the way, the long tail doesn’t just apply to search engine keywords, it’s can be used to understand a number of phenomena that are relevant to digital marketers.