Artificial intelligence in SEO and content optimization

This post examines how artificial intelligence is changing SEO and proposes specific techniques for content marketers to adapt accordingly.

In the past five years, Google has introduced two algorithm updates that put a clear focus on content quality and language comprehensiveness. In 2013, Hummingbird gave search engines semantic analysis capability. In 2015, Google announced RankBrain, which marked the beginning of Google‘s AI-first strategy. This means that Google uses multiple AI-driven techniques to rank search results.

As a result, search engine optimization (SEO) has shifted focus from keywords to topical authority. Keyword research is still important, but its role has changed. Simply put, AI systems can now understand way beyond individual keywords. Much like humans do, new systems can understand relationships between topics and develop a contextual interpretation. In other words, AI is learning to read.

Increasingly, marketers are using AI-powered tools to help them reverse engineer the way search engines find the best content. When it comes to research tools, one widely discussed topic is the difference between SEO and content optimization.

SEO runs on keywords, content optimization runs on NLP

SEO has traditionally run on keywords, while content optimization runs on Natural Language Processing (NLP).

Typically, your content strategy will start with a set of important keywords. However, each keyword now has more sophisticated properties than it used to: what user intent does it relate to? what broader topics does it link with? what cluster does it belong to?

Content optimization is all about understanding these additional properties in language. The ultimate goal is to create the most authoritative content for a given query, optimized for both topic breadth (horizontal topic coverage) and depth (how detailed you go into the topic).

We are teaching machines to read

The terms ” topic modeling” and ” latent semantic indexing” have been widely used in the digital marketing and SEO arenas to describe the way semantic search works. It is worth exploring some specific data science techniques that are powering the latest AI-powered tools.

Word vectors:

Word vectorization is a natural language processing (NLP) technique where words and phrases from a vocabulary are mapped to vectors of real numbers. Word vectors typically have around 200 dimensions, meaning each word gets a position in a 200 dimension space. Placing words in a multi-dimensional vector allows us to perform similarity comparisons, among other operations. The sum of word vectors may also be used to calculate document vectors. A typical challenge in word vectors is ambiguity: the meaning of a word (” apple” vs ” Apple” ) can be embedded into the same vector location. More advanced vectors, called ” sense embeddings” , solve this problem by interpreting each version of the word differently. For example, a sense embedding might be able to position ” apple” the fruit and ” Apple” the company in very different positions within the vector space.

word vector representation
Simple 2-dimension vector representation of a small house-related vocabulary.

Named entity recognition (NER):

This technique seeks to locate and classify named entities in text into pre-defined categories such as concepts, organizations, locations or people. Traditionally, NER had relied on large databases like Wikipedia, to recognize known entities. Current neural networks are trained on tagged NER datasets to learn language patterns and identify entities the system has never seen before. For example, say your new startup appears in the news tomorrow. A good NER system should be able to classify it as an organization, even if the startup name is something new in the vocabulary. SEO tools with this capability provide a deeper understanding of a topic as they can identify more unique sub-topics in context. While search engines utilize automatic NER systems, it is always a good idea to enrich your data with schema markup conventions.

named entity recognition
Example of automatic NER ran on a sentence.

Query classification:

Query classification is the process where a search engine deciphers user intent from a short text input. The main challenge is related to ambiguity. Search engines like Google collect click-through data from users to validate search intent and train machine learning models around them. Techniques like word vectors and NER are also used in query classification algorithms to compare the topics in your query against a set of potential results.

Question answering:

This is concerned with building systems that automatically answer questions posed by humans in a natural language. In general, there are two types of questions to tackle: fact-based (i.e. what is the capital of France?) and open-domain (i.e. what is the future of SEO?). The latter usually involves analyzing dozens of search results for a given search query, and composing a ” multi-document summary” . Question-answering is one of the most active areas of research as it powers new mediums like voice search, which may require special considerations when it comes to SEO.

Automatic document summarization: 

Text summarization is the process of shortening a text document with software, in order to create a summary with the major points of the original document. Technologies that can make a coherent summary take into account variables such as length, writing style and syntax. Salesforce has made major breakthroughs in summarization.

Below is a screenshot from Frase that shows the taxonomy of a summary:

automatic summarization

Textual entailment: 

Entailment is a fundamental concept in logic, which describes the relationship between statements that hold true when one statement logically follows from one or more statements. A valid logical argument is one in which the conclusion is entailed by the premises, because the conclusion is the consequence of the premises. Textual Entailment algorithms can take a pair of sentences and predict whether the facts in the first necessarily imply the facts in the second one. This can be a useful technique to measure logic and cohesiveness in documents.

The importance of pillar pages and topic clusters

It is widely known that Google analyzes your full website to determine whether your content demonstrates topic authority in certain subjects. AI systems frequently employ document clustering as a technique to group data according to specific properties. For example, if your website has thousands of pages, a clustering algorithm may be able to group them by theme. If your website doesn’t present any clear themes, it might mean it lacks focus or expertise.

As a way to mimic document clustering algorithms,  SEO is shifting to a topic cluster model, where pillar pages act as nodes connecting subpages. This model is a fairly sophisticated way to organize your website’s information architecture and content strategy.

Topic cluster model for content marketing strategy

How to fit AI tools into your content creation process

So now that we better understand the way search engines ” think” , it is time to think about the overall workflow that will help us match our content to what your target audience is actually interested in.

1. Perform a semantic content audit

Crawl your entire website and analyze all of its topics. Does it look like your content is well organized around cohesive themes? Which are the most relevant topics? Which pages receive more internal links? A semantic content audit is a full width analysis of your website’s content that will measure topic breadth. Ideally, you would perform the same analysis on both your competitors and external industry thought leaders. The goal here is to understand the big picture and identify topic gaps. To accomplish this, you will need a tool that can crawl your full website, automatically extract key topics (through named entity recognition) and understand semantic relationships (through word vectors).

2. Define topic clusters

Browse topics from the content audit to identify groups and semantic associations. Build a list of sub-topics for each cluster. These topics should be optimized for two key metrics:

  • Search growth: topics that are receiving an increased exposure in search engine queries.
  • Competition: topics that your direct competitors might have failed to mention.

3. Develop pillar pages

Compose an outline of the main topics your pillar page should include. Define a search query you would like your pillar page to rank for and perform semantic analysis on the top results. Make sure your pillar page covers key topics optimizing for these two metrics:

  • Topic coverage: your content should cover the most relevant topics from SERP pages. Of course, be aware of keyword stuffing and make sure your story flows smoothly. This is where you can pay attention to document length; a story that aggressively covers all the top topics mentioned in SERP results will likely have to be longer. As an alternative, you may want to consider breaking down your topics into multiple shorter articles.
  • Content authenticity: while you have to align your content with the topics mentioned by SERP pages, you also have to find a unique angle to the story. One way to do this is using related topics without specifically using the same terms used by competitors. Remember word vectors understand similarity between topics, so by using good similar topics you may still rank high in search. Once you’ve accomplished a wide and authentic topic coverage, it is always valuable to incorporate proprietary insights nobody else has mentioned.

4. Develop content for each sub-topic

Use the same outline process described in point 3 to develop contents around sub topics that point to your pillar pages.

5. Continuous content optimization and re-publishing

Monitor what thought leaders publish about your target topics. This will help you come up with ideas to either write new content or optimize existing content around up-and-coming topics. The strategy of re-publishing content has proven to generate positive results in search rankings.

Can obsessive SEO limit your creativity?

Today’s marketers use many tools and there is certainly a sense of software fatigue. It almost looks like you have to break down your content creation workflow into stages that might end up limiting your creativity. Am I forgetting a keyword? Am I mentioning this keyword too much?

At some point, you have to thinking whether you are over-obsessing around content optimization. In my view, research and content creation should go hand in hand and work together in a more natural way. For example, based on what you are writing and your intended outcome, a system should be able to recommend topics in context. Helping writers incorporate SEO best practices into their creative workflow is something we think about at Frase.

There are different tools that can help you accomplish some of the analytical tasks explained in this post. At Frase, we’ve created a platform that helps content marketers perform large-scale semantic content audits, along with a writing tool that acts a Research Assistant. It it is totally free to try!

frase editor