SEO Autopilot Forum

Full Version: TF*IDF Equation explained!
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hello guys,

How many of you are using TF*IDF equation in your Technical SEO Strategies?

[Image: TFIDF-equation.jpg]

First of, let's see what is TF*IDF equation...

- Term Frequency times Inverse Document Frequency -

TF*IDF is an equation that combines those two measurements — the measurement of how frequently a term is used on a page (TF), and the measurement of how often that term appears in all pages of a collection (IDF) — to assign a score, or weight, to the importance of that term to the page.

Generally, you don’t want to be caught doing this work by hand if you’re trying to optimize a site. These equations will help you understand how TF*IDF functions, but there are tools that calculate TF*IDF for you and I'll talk about them later on.

Well...Let's do some SEO Maths, shall we?

How to calculate Term Frequency?

By doing a raw count of the number of times a term appears on one page. Then, put that number into the equation below:

Term frequency = (raw count of terms) / (total word count of document)

Alone, the TF score can tell you whether you’re using a word too rarely or too often, but it’s only really useful when weighed against the other measure.

How to calculate IDF?

By dividing the number of documents the term appears in by the total number of documents in the chosen collection, like so:

Inverse Document Frequency (term) = log (number of docs / (docs containing keyword)

With the IDF score, you can now measure the importance of a phrase to a page, not just its number of uses. This is important because it’s putting you in the mindset of the people who are building search engine algorithms.

Why does it Matter to SEOs?

The end goal of being able to fill out this equation is to be able to give an actionable relevance score to your content. Using the TF*IDF tools available now, you can then compare your scores to the scores of the top-performing pages for any term.

By grading pages on this measure, you can nearly pull back the curtain on how Google might grade sites dedicated to the same topic.

TF*IDF Tools you can use:

Two words...

Website Auditor

TF-IDF (short for "term frequency — inverse document frequency") is used to measure the importance of a given keyword on a page. 

Unlike keyword density, it doesn't just look at the number of times the term is used on the page; it also analyzes a larger set of pages and tries to determine how important this or that keyword is.

Say, in car repair, the term "tire repair" is likely more important than "turbocharged engine repair" — simply because every car has tires, and only a small number of cars have turbo engines. Because of that, the term "tire repair" is going to be used in a larger set of pages that speak about car repair. And this is exactly what the TF-IDF tool in WebSite Auditor will be able to catch! Looking at keyword usage stats of a large number of your competitors, the TF-IDF formula is going to show you:

1) Which keywords are the most important and relevant to your topic;

2) Which of them are used on your page properly (as much as search engines expect them to appear, since Google's known to use TF-IDF in indexing);

3) Which terms on your page are used too much or too little.

Under Content Analysis > TF-IDF, you'll find the full list of terms and phrases associated with your target keyword, based on your top-ranking competitors' content. Here, you'll also get usage recommendations for specific terms and see how your keyword usage compares to competitors' on the TF-IDF chart.

[Image: Image-01.png]

TF-IDF is now also used to calculate optimization metrics under WebSite Auditor's Page Audit dashboard. Unlike the old-school metrics like keyword density, TF-IDF will accurately determine if there are keyword stuffing or under-optimization issues in your content or any given page element.

[Image: Image-02.png]

How do you use TF-IDF in WebSite Auditor?

1. See how well your page is optimized.

With TF-IDF used to calculate content optimization factors, these metrics have become much more robust and reliable. Jump to Content Analysis > Page Audit and analyze how well your page is optimized for your target keywords.

2. Discover new topic-relevant keywords.

Go to the TF-IDF dashboard to discover the keywords and phrases that your top-ranking competitors are using — those are the terms that can improve your page's topic relevance and help it rank better. Switch between Single-word Keywords and Multi-word Keywords to look for both individual words and phrases. Look for the keywords with an Add recommendation — these are the terms most of your competitors are using, while you aren't.

3. Find the terms you need to use more of — or tone down on.

You may be surprised to find that you're overusing certain terms in your content, and not using enough of others. Still under the TF-IDF dashboard, look for the words and phrases with Use less or Use more recommendations to see how you can tweak your copy to improve relevance.

4. Make the changes!

Now that you know which terms you want to add, use more, or use less of, jump right to the Content Editor dashboard and make the necessary changes. When you're done, hit the Save button to save the optimized version of your HTML to your computer, ready for upload to your site.

There's a Free version of Website Auditor and you can get it here.


Elias Livadaras