Keyword Prevalence for SEO

One component of text analysis is the prevalence of keywords. Prevalence is pretty similar to frequency (or density), in that it measures how common or how often a word is used. There are some subtle ambiguities though, and when we talk about prevalence being a factor, we may use the term synonymously with frequency, but relative frequency is actually more important to consider.

It should be clear why prevalence is an important factor, but determining its objective value and appropriateness is not so clear. In this article, we will discuss some of the issues and factors involved in a ranking method that utilizes keyword prevalence.

Metadata and Page Details
Creator:	Devin Peterson
Date:	Created 04/26/2016 - (Updated 04/28/2016)
Subject:	Text analysis, Keyword frequency, prevalent, density of phrases
Publisher:	SEO Writ
Contributors:
Peer Review:
Resources:
Citation:	Peterson, D. (2016), "Document Ranking Based on Keyword Prevalence", Retrieved (date), from http://seowrit.com/prevalence

What is Keyword Prevalence?

Keyword prevalence is the more technical way to describe frequency or incidence of keyword. In short, it's a measure of how common or how often a word or term is used in a document. This is generally an easy value to obtain, you can simply count up all the occurrences of a term and that value is a pretty solid, objective value of prevalence. However, this method of determining prevalence is not the best way of determining relevance of that term to said document. This simplified method may also make it easy to fool ranking engines by fluffing up the frequency of terms used. Fortunately, there are some very clever ways to filter this sort of black hat tactiv out, and many other methods for taking into account what's referred to as 'relative frequency' using the tf-idf method and other advanced techniques.

Boring Fact: This measuremant is often used for generating WORD CLOUDS!

What is Term Frequency - Inverse Document Frequency?

For the most thorough explanation of tf-idf, there's an entire website about TF-IDF. But we will summarize a few main points to help understand the ranking function a little better.

Term Frequency measures how many times a term is present in a document.

Inverse Document Frequency measures how important the presence of a term is, based on how frequently it occurs in other texts.

These 2 metrics combined (usually as a mathematical product) generate the TF-IDF

There are numerous ways to calculate each metric and combine them to generate some sort of 'score'. The way Google does it specifically is probably proprietary, but the details of HOW they do it should not matter much to the common SEO. The method we discuss below is one of many ways to assess keyword prevalence and it may or may not be the best way, but it still provides valuable insights to SEOs to consider keyword implementation when creating content.

Going Overboard With Keyword Stuffing

Google now refers to "keyword stuffing" as "irrelevant keywords". The current procedure of calculating keyword prominence allows for one to overzealously stuff keywords into a document hoping to make it super-duper relevant. But in the real world, this is too easy for spammers, so there has to be a clever way to negate the effects of added irrelevant keywords.

The simplest method of blocking keyword stuffers is to set a maximum frequency on key terms, causing any added terms to not be factored into the keyword prevalence function. However, this practice would be somewhat arbitrary, and rather than 'abruptly' ignoring additional keywords, it probably makes more sense to decrease the added value of repeated keywords (density wise), as it increases to a point of logical arbitration.

Even still, without implementing a way to actually 'penalize' keyword stuffers, adding too many keywords would still be a beneficial practice, therefore we must implement a means of reducing the quality of a document as a result of too many irrelevant keywords. To do so, whether we make there be a linear or exponential/logarithmic relation, there will be a threshhold of which higher keyword frequency goes from being beneficial to harmful. Although, this threshhol may not necessarily be a constant. Further exploration of the irrelevant keyword function will be in order. Keep in mind, this score will be assessed as a quality factor and not a relevance factor.

The Concept of Prevelances (multiple)

Sometimes words or phrases can be too prevalent, not in the sense of keyword stuffing, but for terms that aren't necessarily that relevant but show up more often simply because that's how the writer writes and the text flows. For example, in this document, I may use the word "Google" often, but it's certainly not about Google. However, a document can only be relevant about so many things, thus it becomes too general and diminishes the perceived focus on a particular topic or set of keywords.

Likewise, I haven't even mentioned words like "abundance", "recurrance", "commonness", and "significance". But they would all be quite relevant to this document when coupled with another term such as "catchword", "phrasing" or "terminology". Did you see what I did there?

The Keyword Prevalence Ranking Function

ALGORITHM COMING SOON!

Best Practices

Let's be clear, there is no single recommended value for keyword density, although this metric is measured and there is a threshhold from which it goes from being helpful to hurtful. There is probably a range which it doesn't effect anything either. With that in mind, here are a few tips to remember when considering keyword prevalence in your document.

Allow the document to be written naturally, without giving any thought to keyword frequency. Natural language algorithms will weed out a majority of attempts to stuff irrelavent keywords.
Make sure there is at least one or two instances of your keywords in a more prominent part of the document.
For single word terms, you can get away with higher frequencies, typically in the 4-8% range before it becomes 'unnatural' sounding.
For multi-word phrases (2-4 words), a safe range would be 1-3%.
For longer phrases or long-tail keywords, it's probably best practice to keep the frequency below 1%. Perhaps even closer to 0.1% - 0.5%. If you find yourself want to throw the phrase into your document more than that, you may want to consider a variant of the phrase.
Do not REDUCE your keyword frequency for the sake of avoiding the 'penalty' unless you think it seems poorly written or unnatural.

SEO "Expert" Opinions

What do other SEOs think about keyword prevalence as a ranking factor?

Shaun Anderson, Hobo-Web

There may not be a perfect % for you to aim for – but I do think you run the risk of tripping keyword penalty filters if you, for instance, were to keyword stuff a page and every element on it with your focus terms.

Matt Cutts

So the way that modern search engines, or at least Google, are built is that the first time you mention a word-- hey, that's still about that word. And once you start to mention it a whole lot, it really doesn't help that much more. There's diminishing returns. It's just an incremental benefit, but it's really not that large. And then what you'll find is if you continue to repeat stuff over and over again, then you're in danger of getting into keyword stuffing, or gibberish and those kinds of things. So the first one or two times you mention a word, then that might help with your ranking, absolutely. But just because you can say it seven or eight times, that doesn't mean that it will necessarily help your rankings.

Common Questions About Keyword Density

How Important is Keyword Density for SEO?

I would not describe keyword density as "important", but it is certainly a metric (in one way or another) that is considered. That does NOT mean that higher densities are better, but merely that how frequent words appear is something that is measured for a very complex formula.

Is Keyword Density a Factor in Search Engines?

How densely certain words appear is measured in the form of how frequent those words are, and compared to their expected occurence. It's obviously important to use your keywords in a text, so a density of 0% is not helpful. But as discussed, there is a point of diminishing return which itself is not a constant value.

The Prevalence of a Keyword as a Ranking Factor