What is automatic text summarization?

Automatic text summarization is the data science problem of creating a short, accurate, and fluent summary from a longer document. Summarization methods are greatly needed to consume the ever-growing amount of text data available online. In essence, summarization is meant to help us consume relevant information faster.

While summarization has been a field of study for decades, it has certainly grown in popularity in recent years. In 2017, Salesforce announced certain breakthroughs in the field of abstractive automatic summarization, and the use cases have proliferated across the enterprise.

Earlier in 2014, data scientist Juan Manuel Torres Moreno published a full book on the subject titled “Automatic Text summarization”, where he provided 6 reasons why we need automatic text summarization tools:

  • Summaries reduce reading time.
  • When researching documents, summaries make the selection process easier.
  • Automatic summarization improves the effectiveness of indexing.
  • Automatic summarization algorithms are less biased than human summarizers.
  • Personalized summaries are useful in question-answering systems as they provide personalized information.
  • Using automatic or semi-automatic summarization systems enables commercial abstract services to increase the number of texts they are able to process.

As described by Agolo, a Microsoft-backed summarization startup,  a document summarizer must generally overcome a set of challenges:

  • Determining which sentences are the most salient.
  • Making the summary cohesive and readable.
  • Minimizing the number of references to ideas and entities not mentioned in the summary. (i.e. coreference resolution).

Types of automatic summarization

Automatic summarization can be used in a variety of applications. Depending on the use case and type of documents, summarization systems can fall into different categories.

Abstractive vs. Extractive

When a human is given a corpus of text to summarize, they might rewrite the main points in their own words. This is called abstractive summarization and it requires high-level human skills like the ability to combine multiple perspectives into coherent natural language. As of 2018, the state of the art for abstractive summarization is not yet up to par, so many automatic summarization systems opt for a technique called extractive summarization.

Extractive summaries are excerpts taken directly from the input documents and presented in a readable way. The summary does not contain any rephrasing of the ideas presented in the original text. Extractive summarization methods employ AI-powered techniques to identify the most important sentences directly from the source.

Illustration of Salesforce’s model generating a multi-sentence summary from a news article. For each generated word, the model pays attention to specific words of the input and the previously generated output.

Single-document vs. Multi-document summarization

When summarizing a single document, the summarization system can rely on a cohesive piece of text with very little repetition of facts. However, the chance of redundancy increases with multi-document summarization systems. An ideal multi-document summarizer maximizes the important information included in the summary while minimizing repetition.

Indicative vs. informative

The taxonomy of summaries largely depend on the user’s end goal. For example, journalists or analysts looking to skim information as fast as possible would be interested in the high-level points of an article. So this use case requires an ‘indicative’ type of summary.

On the other hand, when the reader is looking to get more granular, summaries may require more detail. For example, a summary might need to allow topic filtering to let the reader further drill down the summary. This type of summary is considered to be ‘informative’.

Document length and type

The length of the input text heavily impacts the sort of approaches a summarization system can take. The largest summarization datasets, like Newsroom by Cornell University, have focused on news articles, which usually range 300 to 1,000 words. Extractive summarizers can be very effective when dealing with relatively short documents like news or blog articles. On the other hand, a 20-page report or a chapter of a book can only be summarized with the help of more advanced approaches like hierarchical clustering or discourse analysis.

In addition to length, documents may also fall into different genres. It is very different to summarize a news article to a financial earnings report or a technical white-paper. These are very different types of documents that may require entirely distinct summarization approaches.

Recommended articles

As a recap, here is a list of articles that cover the basics of automatic summarization. These articles were actually summarized by Frase’s summarization engine, which uses AI-powered extractive summarization.

New AI Breakthrough from Salesforce Research Boosts Productivity with Text Summarization (salesforce.com)

  • Salesforce Research is tackling this exact challenge and today we’re excited to announce two new breakthroughs in natural language processing towards the goal of automatically summarizing a long text and serving up coherent, digestible highlights that help you stay informed in a fraction of the time.
  • Text summarization is a very tough challenge, especially for longer texts such as news articles, and the work we are doing at Salesforce Research is pushing the state of the art.
  • I’m honored to work with Caiming Xiong and Richard Socher to introduce a more contextual word generation model and a new way of training summarization models with reinforcement learning (RL) .

Introduction to Automatic Text Summarization (blog.algorithmia.com) – Jan 05 2017

  • Without an abstract or summary, it can take minutes just to figure out what the heck someone is talking about in a paper or report.
  • Automatic text summarization is part of the field of natural language processing , which is how computers can analyze, understand, and derive meaning from human language.
  • By keeping things simple and general purpose, the automatic text summarization algorithm is able to function in a variety of situations that other implementations might struggle with, such as documents containing foreign languages or unique word associations that aren’t found in standard english language corpuses.

A Gentle Introduction to Text Summarization (machinelearningmastery.com) – Nov 28 2017

  • Automatic text summarization methods are greatly needed to address the ever-growing amount of text data available online to both better help discover relevant information and to consume relevant information faster.
  • After reading this post, you will know: Why text summarization is important, especially given the wealth of text available on the internet.
  • Automatic text summarization, or just text summarization, is the process of creating a short and coherent version of a longer document.
  • These deep learning approaches to automatic text summarization may be considered abstractive methods and generate a wholly new description by learning a language generation model specific to the source documents.

Taming Recurrent Neural Networks for Better Summarization (abigailsee.com)

  • Abstractive approaches use natural language generation techniques to write novel sentences.
  • In the past few years, the Recurrent neural network (RNN) – a type of neural network that can perform calculations on sequential data (e.g. sequences of words) – has become the standard approach for many Natural Language Processing tasks.
  • The decoder’s ability to freely generate words in any order – including words such as beat that do not appear in the source text – makes the sequence-to-sequence model a potentially powerful solution to abstractive summarization.
  • Explanation for Problem 2 : Repetition may be caused by the decoder’s over-reliance on the decoder input (i.e. previous summary word) , rather than storing longer-term information in the decoder state.

An Overview of Summarization – agolo (blog.agolo.com) – Nov 03 2016

  • A summarization system with what’s called a generic trigger will find the most important topics in a given input text and summarize it without further guidance.
  • A generic trigger for summarization is useful in cases where the user does not yet know the contents of the text to be summarized.
  • Agolo’s summarizer takes these factors into account at various points in the summarization process.

A Quick Introduction to Text Summarization in Machine Learning (towardsdatascience.com) – Sep 18 2018

  • Text summarization refers to the technique of shortening long pieces of text.
  • Machine learning models are usually trained to understand documents and distill the useful information before outputting the required summarized texts.
  • With such a big amount of data circulating in the digital space, there is need to develop machine learning algorithms that can automatically shorten longer texts and deliver accurate summaries that can fluently pass the intended messages.
  • However, the text summarization algorithms required to do abstraction are more difficult to develop; that’s why the use of extraction is still popular.
  • As research in this area continues, we can expect to see breakthroughs that will assist in fluently and accurately shortening long text documents.

How to Make a Text Summarizer – Intro to Deep Learning  (YouTube)

13 blockchain projects to help content creators and freelancers

Independent content creators are struggling to monetize their content at acceptable rates. Advertisers battle with fraud, ad blockers and declining advertising revenue. Publishers are finding it hard to monetize content through paywalls.

Blockchain promises solutions to remove unnecessary middlemen between freelancers and employers, prevent plagiarism, provide efficient escrow services and allow new types of reward programs.

Given the growth of the freelance economy, it seems like these solutions – many of them early-stage startups – are addressing a large, growing market of content creators.

Source: Dorie Clark, Harvard Business Review


If you are one of the 81 million people who use WordPress to publish content, you’ll soon be able to automatically timestamp your works within the WordPress content management system using the new Po.et plugin. (bitcoinmagazine.com)

February 8, 2018 marked the launch of Frost , an open API and set of developer tools from Po.et that will enable content publishers and developers to more easily register their creative works on the blockchain. The new API will enable integrations and decentralized applications, including the WordPress plug-in. Developers can get instructions on how to make an account, create an API key, read developer documentation and access the javascript library at the Frost website. (bitcoinmagazine.com)


The media sharing and advertising platform Snapparazzi has announced a release of its minimum viable product (MVP) of a blockchain-based platform. The app aims to allow everyone with a smartphone “to become a reporter” or a content creator. The user takes footage or a photo of newsworthy events with their smartphone and shares it using the platform. The interested buyers — TV, newspapers, radio, etc. — pay for the content in fiat currency. The user will in turn be paid in SnapCoin, the platform’s token for their contribution. The platform also targets content creators and says they can get paid substantially more with Snapparazzi compared to Youtube for creating or watching content. (cointelegraph.com)


Currently, the most popular decentralized content platform is Steemit , a blogging and social networking website with over one million users. The platform rewards publishers by the popularity of their content as determined by upvotes. The Steemit ecosystem involves three types of tokens, namely, steem, steem power and steem dollars. (nasdaq.com)

When publishers create popular content, they earn Steem dollars, a stable currency that can be sold on the open market at any time. The steem token creation depends on the interaction in the Steemit platform and is usually distributed to content creators and curators either as steem power or steem dollars. Just like bitcoin, people can buy the crypto for speculation purposes. Steem and steem dollars can be converted to steem power to increase voting power. (nasdaq.com)

ASQ Protocol

Another project with an even bigger user base is the ASKfm-backed ASQ Protocol . Just like Steemit, the platform enables content creators to get paid for the value of their work rather than the revenue they generate from advertisements placed on their content. With this project, consumers can order their desired content from ASQ-supported platforms and pay with ASQ tokens. Brands can also sponsor content and reward users who engage with it. (nasdaq.com)


WOM is the creation of YEAY – an App that provides teens and young adults with a location to shoot and share videos featuring streetwear styles. The app makes the videos shoppable by incorporating affiliate links. Brands may benefit from user-generated product recommendations and creators earn WOM Tokens as rewards for the value driven through their videos. To-date YEAY reports it has received $7 million in seed funding from investors including the former COO of Airbnb, the former CEO of Deutsche Telekom, Grazia Equity, Mountain Partners, and others. (crowdfundinsider.com)


You42 says creators will be offered “dynamic commerce capabilities” enabling them to sell content through their own page – effectively creating a “personalized marketplace” that meets their meets. They would generate an income through the U42 token – a cryptocurrency that has been specially created for the platform. (cointelegraph.com)

Users and creators would also be able to use the platform’s internal currency, UCoin – which can be earned for certain activities or purchased via U42 tokens – to access premium content or tip content creators for work they appreciate. (cointelegraph.com)


A new promising project called AC3 hopes to make life easier for content creators and educators. It’ll allow them to share their work with their audience directly, and use a more secure and transparent system to avoid plagiarism. (newsbtc.com)

Unlike many other blockchain companies, AC3 didn’t go down the ICO route when funding their project and chose to focus instead on building the technology. The platform is straightforward: fans and followers pay with AC3 tokens to access content, and creators can also sell their content for these same tokens. They can be used as currency to buy things like design and programming courses within the platform. (newsbtc.com)

Microsoft and Ernst & Young

Microsoft and Ernst & Young (EY) announced the launch of a blockchain solution for content rights and royalties management on Wednesday. The blockchain solution is first implemented for Microsoft’s game publisher partners. Indeed, gaming giant Ubisoft is already experimenting with the technology. After successful testing, Microsoft and EY hope to implement the solution across all industry verticals which require licensing of intellectual property or assets. (thenextweb.com)


The model implemented by the C3C blockchain, for instance, creates a direct consumer-to-creator network, replacing intermediaries in the process. Artists, writers, and musicians currently share somewhere from 10% to 20% of all the revenue they make on a particular online platform. The amount goes markedly up when you add other expenditures such as payment processing fees, bank charges and value added taxes. A payment processor, for instance, will take some 3% to 6% of any payment made to a content creator while cryptocurrency payments eliminate these charges. (c3c.network)

Blockchain provides complete control not only over the copyright but also empowers creators to price their content dynamically and perform micro-metering activities. Dynamic pricing is among the greatest benefits of blockchain, enabling prices adjusted by demand, advertiser-support, and many more factors. (c3c.network)


The community-owned open source social networking platform Minds has amassed one million users and has recently launched a cryptocurrency reward program based on the ethereum blockchain for all users on the platform. (zdnet.com)

Minds is also introducing a direct peer-to-peer advertising tool that allows users to offer tokens to one another in exchange for shares of specific content. (Image: Minds) (zdnet.com)


Escrow services, which hold money independently from two parties until all terms are satisfied, could help alleviate some of the stress between freelancers and employers. StratusCore added a Digital Escrow service to its platform — which was built using blockchain technology — last year. With the service, employers and freelancers can agree to the terms of a project and the necessary benchmarks or milestones that must be met to ensure it is completed on time. (pymnts.com)

When an agreement is reached, an employer deposits the funds for the project into the StratusCore Digital escrow account. Funds are disbursed from the escrow account directly into a freelancer’s preferred bank account once the necessary deliverables are met and the digital assets are uploaded to the platform. This occurs within one to two days after the employer digitally signs off on the delivered asset and the escrow funds are released. (pymnts.com)

ARA Blocks

Among other factors, the sometimes necessary presence of middlemen importantly shrinks and limits the profits of content creators. Luckily for them, there is now a new platform that promises to make things easier, more profitable and secure, and all this thanks to the powerful blockchain technology: meet ARA Blocks . (technewsbeat.com)

With this platform, content creators will be able to sell and distribute their content directly to their targets, thus eliminating the need for the aforementioned middlemen and effectively increasing their revenue. And, given that it is built using Ethereum (which uses blockchain ), all this is done with all the technology’s landmarks, namely security and traceability. (technewsbeat.com)

ACG Networks’s DApp

ACG Networks’s DApp store would allow content creators of the digital content industry around the world to develop and release their own DAPPs and Ethereum based smart contracts. Since those DApp considered to be tokenized and hence can proliferate within ACG Network public blockchain for voting and forecasting purpose by accessing smart contracts (Ethereum). For an application to be viewed as a DApp with regards to Blockchain , it must meet the accompanying criteria: Application must be totally open-source It must work self-governing, and with no content controlling the larger part of its tokens. (medium.com)

What is Natural Language Processing (NLP)?

Natural language processing (NLP) is an area of artificial intelligence that helps computers understand human natural language.  Often referred to as the engineering side of computational linguistics, NLP focuses on extracting meaning from unstructured data. NLP includes many different techniques for interpreting human language, ranging from statistical and machine learning methods to rules-based and algorithmic approaches.

NLP has removed many of the barriers between humans and computers, not only enabling them to understand and interact with each other, but also creating new opportunities to augment human intelligence and accomplish tasks that were impossible before. NLP enables real-world applications, including:

  • Automatic summarization: the process of creating a short and coherent version of a longer document. (machinelearningmastery.com)
  • Sentiment analysis: the process of determining the emotional tone behind a series of words, used to gain an understanding of the the attitudes, opinions and emotions expressed within an online mention. (brandwatch.com)
  • Named entity recognition (NER) locates and classifies the named entities present in the text. NER classifies entities into pre-defined categories such as the names of persons, organizations, locations, quantities, monetary values, specialized terms, product terminology and expressions of times. (blog.paralleldots.com)
  • Parts of speech tagging: the process of marking up a word in a text as corresponding to a particular part of speech (noun, verb, adjective, etc.), based on both its definition and its context—i.e., its relationship with adjacent and related words in a phrase , sentence , or paragraph. (en.wikipedia.org)

Subcategories of NLP include natural language generation (NLG) — a computer’s ability to create communication of its own — and natural language understanding (NLU) — the ability to understand slang, mispronunciations, misspellings, and other variants in language. (cio.com)

Business applications of NLP

NLP has been widely applied across many industries. Some examples include:

  • Enterprise question answering tools leverage NLP to enhance customer experience and improve administrative activities by allowing users to ask questions in everyday language about products, services or applications and receive immediate and accurate answers.  Many companies are successfully using customer support chatbots to streamline some of the work that would traditionally fall to representatives. Models built with NLP algorithms are the brains of these chatbots. They’re trained using text data from past conversations between your customer support agents and customers.
  • Optimizing Customer Satisfaction: manually sifting through product reviews and surveys can be prohibitively time consuming, but NLP can be used to build data models that generate insights to help optimize customer satisfaction. Sentiment analysis is used to classify text into positive, neutral, or negative categories.
  • Classifying medical records: Researchers at MIT in 2012 were able to attain a 75 percent accuracy rate for deciphering the semantic meaning of specific clinical terms contained in free-text clinical notes, using a statistical probability model to assess surrounding terms and put ambiguous terms into context. (healthitanalytics.com)
  • Ad placement: NLP can help in intelligent advertisement targeting and placement. Media buying is usually the largest channel in an organization’s advertising budget. So, it is important to ensure that the advertisement reaches the right eyeballs. Browsing behaviors, social media and emails contains a lot of information imbedded that can give a lot of insights about consumer preferences. NLP can be used here to match keywords of interest in the texts to target the right consumers. It can also be used for disambiguation or identification of the sense in which a word is used in a sentence.
  • Reputation monitoring: with increased competition in diverse market, monitoring reputation is essential to avoid getting drifted away in the tide. With a plethora of information sources abut companies like social media, blog posts, and reports, it becomes imperative to utilize these sources to get more insights about the reputation and reviews of the company. NLP is the best way to understand and extract insights from these sources.
  • Helping hiring manager: NLP can help hiring managers to filter resumes. Automated candidate sourcing tools can scan CVs of applicants to extract required information and pinpoint the candidates who are right fit for the job. This will save a lot of time and give a more efficient solution.
  • Market intelligence: NLP can help to monitor and track market intelligence. Since markets are influenced by information exchange, using event extraction, NLP can recognize what happens to an entity.

These are only a few examples of the many ways NLP can be used to unlock valuable information from text data. To learn how to get started implementing these techniques in your business, below are some resources that might help you dive deeper into the world of NLP:

Relevant Wikipedia articles

  • Natural language generation (NLG) is the natural language processing task of generating natural language from a machine representation system such as a knowledge base or a logical form. Psycholinguists prefer the term language production when such formal representations are interpreted as models for mental representations.
  • Computational linguistics is an interdisciplinary field concerned with the statistical or rule-based modeling of natural language from a computational perspective, as well as the study of appropriate computational approaches to linguistic questions.
  • Question answering (QA) is a computer science discipline within the fields of information retrieval and natural language processing (NLP), which is concerned with building systems that automatically answer questions posed by humans in a natural language.
  • Textual entailment (TE) in natural language processing is a directional relation between text fragments. The relation holds whenever the truth of one text fragment follows from another text.

  • The bag-of-words model is a simplifying representation used in natural language processing and information retrieval (IR). Also known as the vector space model. In this model, a text (such as a sentence or a document) is represented as the bagof its  words, disregarding grammar and even word order but keeping multiplicity.

Open Source NLP Libraries

These libraries provide the algorithmic building blocks of NLP in real-world applications.

  • Apache OpenNLP: a machine learning toolkit that provides tokenizers, sentence segmentation, part-of-speech tagging, named entity extraction, chunking, parsing, and more.
  • Natural Language Toolkit (NLTK): a Python library that provides modules for processing text, classifying, tokenizing, stemming, tagging, parsing, and more.
  • Standford NLP: a suite of NLP tools that provide part-of-speech tagging, the named entity recognizer, coreference resolution system, sentiment analysis, and more.
  • MALLET: a Java package that provides Latent Dirichlet Allocation, document classification, clustering, topic modeling, information extraction, and more.

NLP Courses

In case you are looking to get your feet wet with NLP, these are 2 popular online courses for beginners:

  • Stanford Natural Language Processing on Coursera: “This course covers a broad range of topics in natural language processing, including word and sentence tokenization, text classification and sentiment analysis, spelling correction, information extraction, parsing, meaning extraction, and question answering, We will also introduce the underlying theory from probability, statistics, and machine learning that are crucial for the field, and cover fundamental algorithms like n-gram language modeling, naive bayes and maxent classifiers, sequence models like Hidden Markov Models, probabilistic dependency and constituent parsing, and vector-space models of meaning.”
  • Udemy’s Introduction to Natural Language Processing: ” This course introduces Natural Language Processing through the use of python and the Natural Language Tool Kit. Through a practical approach, you’ll get hands on experience working with and analyzing textAs a student of this course, you’ll get updates for free, which include lecture revisions, new code examples, and new data projects.”

NLP YouTube videos

  • Natural Language Processing with Deep Learning – Stanford University
  • Introduction to Natural Language Processing – Cambridge Data Science Bootcamp
  • Natural Language Generation at Google Research

5 ways freelance writers can use artificial intelligence

The demand for online writing services has exploded in the past decade. Every organization needs a website, every website needs content, and content needs writers to create it. Entire business models live online. By many accounts, the bulk of the world’s content has been created in the past few years.

According to data from research firm Technavio, global spending on content marketing—which includes freelance writing services—will increase at a compound annual growth rate of 16% through 2021 to a projected total of $412 billion.

Anyone would think it is a great time to be a writer. However, the reality is not so straightforward. Making a decent living out of writing remains tough.

The New Economy

It’s not just freelance writers who are experiencing the uncertainty of the digital economy. The march of automation is leaving many other workers apprehensive for the future of their jobs.

According to freelance marketplace Upwork, the majority of workers in the U.S. economy will be freelance by 2027. That is a stunning prediction by any standard. “Freelancers increasingly think that having a diversified portfolio of clients is more secure than one employer,” said Upwork in its recent survey of the status of freelancing in the United States.

According to UpWork, the trend toward freelance is unstoppable

So if the economy is increasingly turning freelance and the demand for content is plentiful, isn’t this a great time for wordsmiths everywhere?

The outlook is mixed. Although 2016 data from the Bureau of Labor Statistics put the median writer salary at $61,820, other surveys point to much lower pay for freelance writers, exacerbated by low hours.

Take the 2017 survey from Freelance Writing in which the average freelance writer reported working fewer than 20 hours a week for less than $10,000 a year. Most of these low-income writers had fewer than three years’ professional experience, with 70% saying that finding enough work was their main struggle.

By contrast, survey respondents earning $40,000 or more a year and working more than 40 hours a week had at least five years’ experience and picked up most of their work through networking and word of mouth.

The Role of AI

The pattern is clear: making a sustainable living as a freelance writer is possible, but it takes time to get established. The question is how can writers best navigate that tricky period when they are finding their feet and building their credentials?

Here we turn to AI. If artificial intelligence is nipping at the heels of conventional employment, it may also be the thing that helps freelance writers gain more jobs, increase their productivity, and generate a higher wage.

What follows are five ways in which AI can turn you into a more accomplished and productive writer, regardless of experience.

1. Topic Ideation

It’s easy to run out of topic ideas. There is so much content online and so many things already written about, how will you find fresh topics to run with?

AI has an answer for that. Topic analysis tools provide great ways of reviewing existing content and identifying gaps. Through natural language processing, these tools analyze huge swathes of content within seconds and generate recommendations for your review. Platforms such as MarketMuse or Frase can help in providing topic recommendations that also incorporate SEO value.

You’re still making the final call about what topics to select and why. But AI guides you along the way in a fraction of the time.

2. Accelerate your Pitching

As the Freelance Writing survey shows, the majority of writers are saddled with the need to pitch multiple topics to secure one job. Even highly experienced writers must spend time preparing topic pitches to ensure buy-in from their clients.

The problem is pitching takes time. A draft headline alone won’t cut it. You will likely need to provide an article outline and provisional research to highlight what your planned article is about. You may also need to demonstrate how the article is relevant in term of target audience and SEO.

In those cases where a prospective client supplies the topic and invites proposals, the pressure on the quality of the pitch is even higher. Freelance writer Rachel Brooks says she pitches at least 10 times per day to guarantee a steady stream of work.

The good news is that AI-driven summarization tools can help not only cut down the amount of time associated with pitches but improve the quality of the pitches themselves.

A good quality pitch also makes it much easier when the time comes to create the content. Content strategist Fernando Nikolic has taken this technique to the next level. He supplies his international team of freelance writers with research-rich creative briefs using the same AI summarization techniques that are helping Rachel speed up her pitching process. AI-generated article outlines get the content creation process rolling in minimal time and keep it on track.

3. Optimize your Content for SEO

Selecting a good topic and putting together a strong pitch are necessary components, but what happens if your content is not optimized for search? It’s like the tree that fell down in the forest that nobody heard.

In past times, editors would help guide content development for you. Now the machines can do much of the hands-on work. Platforms such as Can I Rank ensure your content is optimized for search. They also help keep your writing on topic and free from digression. This leaves more time for you to polish up the final piece.

And just think how satisfying it is to see your article rise to the top of the search engine rankings for the keyword query you targeted. That’s tangible ROI by anyone’s measure.

4. Creating emotionally-optimized content

So you’re checking all the best practice boxes, but what about voice, personality, and emotion?

Your content needs to resonate with its audience. Sometimes just a small change in wording or the way you structure the opening of an article will determine whether or not a person reads on. AI platforms like Persado offer great functionality for optimizing your content emotionally.

Headlines are imperative here. A smart headline that immediately conveys its meaning while generating a level of curiosity that propels a reader to click through will win out.

And although clickbait headlines might be tempting, remember the actual content of your piece must deliver against the promise of the headline. Don’t oversell your story, but don’t underplay it either.

5. Improve your Writing Style and Grammar

Finally, don’t forget the plethora of tools available, from Grammarly to Acrolinx, to help you writer cleaner and more readable prose. When was the last time you opened a paper dictionary or thesaurus? You have everything you need online to render your writing as strong, as clear, and as optimized for search as possible.

Making AI accessible

While freelance writers understand the potential of AI-powered tools, many of these solutions are not easily affordable. AI companies often target large companies and deliver enterprise products at enterprise-level prices, leading to questions around accessibility for freelancers.

By contrast, less expensive tools may fall short in the eyes of professional writers, who need solutions that actually improve the quality of their work.

Frase meets the needs of freelance writers by offering an enterprise-grade research and content optimization solution at a price point that freelance writers can understand. Freelance writer Rachel Brooks agrees. Find out more about her experience as reported in the nDash blog

Save time, write better, secure more work. Frase offers AI-driven solutions that help freelance writers thrive at every stage.

Frase Workflow: how to use automatic summarization in newsletters and content curation

At Frase, we are exploring various use cases for summarization. This post describes how Frase brings the power of automatic summarization to newsletters and content curation programs

If you are like me, you subscribe to newsletters that give you a nicely curated selection of content about topics that matter to you. This content is usually delivered as a list of titles and hyperlinks, which can sometimes be overwhelming if the list is long. Summarization adds value by summarizing the contents inside those selected links, providing convenience and effortless insight to the reader.

The ultimate goal here is to deliver a consistent and cohesive stream of curated contents to our target audience. This process can be fully accomplished in the Frase platform in 3 steps. To illustrate the process, let’s say we wanted to publish a daily round up about the topic of artificial intelligence for SEO:

Customize your media monitors

Customize your own media monitoring system by following topics and publishers of your choosing. In contrast to other feed monitoring tools, Frase goes one step further by summarizing and extracting key topics from the content you monitor.

1. TopicsFrase allows boolean logic and multiple conditions to help you narrow down your target topic. In this case, a simple keyword rule to monitor articles that mention ” artificial intelligence” AND ” search engine optimization” should do the work for us. Depending on the topic you want to follow, you might need to add more keywords, use different boolean operators or add more conditions.

2. Sources: you can either pull from the full Frase index (thousands of publishers, plus Google News or you can customize your own list of rss feeds. I would always recommend building a custom list of blogs and publishers of your interest. In parallel, you can always have a separate monitor to follow more general sources.

Review daily summaries

Once your monitor is up and running, you can browse incoming daily summaries in multiple ways:

1. The Frase app: ideal for reviewing summaries on the go. The app also allows bookmarking and social media sharing.

2. Frase editor: if you are looking to compose a custom daily roundup, you should access your monitor from the Frase editor. This way you can incorporate summaries directly into your document and edit them when needed.

Publish to Mailchimp or WordPress

Once you’ve composed a daily round up out of your monitor summaries, you can publish it directly into your Mailchimp or WordPress accounts.

Interested in leveraging automatic summarization to empower your newsletter and content curation efforts? Try out Frase for free today.

20 Applications of Automatic Summarization in the Enterprise

Summarization has been and continues to be a hot research topic in the data science arena. While text summarization algorithms have existed for a while, major advances in natural language processing and deep learning have been made in recent years. Many internet companies are actively publishing research papers on the subject. Salesforce has published various groundbreaking papers presenting state-of-the-art abstractive summarization. In May 2018, the largest summarization dataset as revealed in a projected supported by a Google Research award.

While there is intense activity in the research field, there is less literature available regarding real world applications of AI-driven summarization. One of the challenges with summarization is that it is hard to generalize. For example, summarizing a news article is very different to summarizing a financial earnings report. Certain text features like document length or genre (tech, sports, finance, travel, etc.) make the task of summarization a serious data science problem to solve.  For this reason, the way summarization works largely depends on the use case and there is no one-size-fits-all solution.

Summarization: the basics

Before diving into an overview of use cases, it is worth explaining a few basics around summarization:

There are two main approaches to summarization:

  • Extractive summarization: it works by selecting the most meaningful sentences in an article and arranging them in a comprehensive manner. This means the summary sentences are extracted from the article without any modifications.
  • Abstractive summarization: it works by paraphrasing its own version of the most important sentence in the article.

There are also two scales of document summarization:

  • Single-document summarization: the task of summarizing a standalone document. Note that a ” document” could refer to different things depending on the use case (URL, internal PDF file, legal contract, financial report, email, etc.).
  • Multi-document summarization: the task of assembling a collection of documents (usually through a query against a database or search engine) and generating a summary that incorporates perspectives from across documents.

Finally, there are two common metrics any summarizer attempts to optimize:

  • Topic coverage: does the summary incorporate the main topics from the document?
  • Readability: do the summary sentences flow in a logical way?

Use cases in the enterprise:

These are some use cases where automatic summarization can be used across the enterprise:

1. Media monitoring

The problem of information overload and ” content shock” has been widely discussed. Automatic summarization presents an opportunity to condense the continuous torrent of information into smaller pieces of information.

2. Newsletters

Many weekly newsletters take the form of an introduction followed by a curated selection of relevant articles. Summarization would allow organizations to further enrich newsletters with a stream of summaries (versus a list of links), which can be a particularly convenient format in mobile.

3. Search marketing and SEO

When evaluating search queries for SEO, it is critical to have a well-rounded understanding of what your competitors are talking about in their content. This has become particularly important since Google updated its algorithm and shifted focus towards topical authority (versus keywords). Multi-document summarization can be a powerful tool to quickly analyze dozens of search results, understand shared themes and skim the most important points.

4. Internal document workflow

Large companies are constantly producing internal knowledge, which frequently gets stored and under-used in databases as unstructured data. These companies should embrace tools that let them re-use already existing knowledge. Summarization can enable analysts to quickly understand everything the company has already done in a given subject, and quickly assemble reports that incorporate different points of view.

5. Financial research

Investment banking firms spend large amounts of money acquiring information to drive their decision-making, including automated stock trading. When you are a financial analyst looking at market reports and news everyday, you will inevitably hit a wall and won’t be able to read everything. Summarization systems tailored to financial documents like earning reports and financial news can help analysts quickly derive market signals from content.

6. Legal contract analysis

Related to point 4 (internal document workflow), more specific summarization systems could be developed to analyze legal documents. In this case, a summarizer might add value by condensing a contract to the riskier clauses, or help you compare agreements.

7. Social media marketing

Companies producing long-form content, like whitepapers, e-books and blogs, might be able to leverage summarization to break down this content and make it sharable on social media sites like Twitter or Facebook. This would allow companies to further re-use existing content.

8. Question answering and bots

Personal assistants are taking over the workplace and the smart home. However, most assistants are fairly limited to very specific tasks. Large-scale summarization could become a powerful question answering technique. By collecting the most relevant documents for a particular question, a summarizer could assemble a cohesive answer in the form of a multi-document summary.

9. Video scripting

Video is becoming one of the most important marketing mediums. Besides video-focused platforms like YouTube or Vimeo, people are now sharing videos on professional networks like LinkedIn. Depending on the type of video, more or less scripting might be required. Summarization can get to be an ally when looking to produce a script that incorporates research from many sources.

10. Medical cases

With the growth of tele-health, there is a growing need to better manage medical cases, which are now fully digital. As telemedicine networks promise a more accessible and open healthcare system, technology has to make the process scalable. Summarization can be a crucial component in the tele-health supply chain when it comes to analyzing medical cases and routing these to the appropriate health professional.

11. Books and literature

Google has reportedly worked on projects that attempt to understand novels. Summarization can help consumers quickly understand what a book is about as part of their buying process.

12. Email overload

Companies like Slack were born to keep us away from constant emailing. Summarization could surface the most important content within email and let us skim emails faster.

13. E-learning and class assignments

Many teachers utilize case studies and news to frame their lectures. Summarization can help teachers more quickly update their content by producing summarized reports on their subject of interest.

14. Science and R&D

Academic papers typically include a human-made abstract that acts as a summary. However, when you are tasked with monitoring trends and innovation in a given sector, it can become overwhelming to read every abstract. Systems that can group papers and further compress abstracts can become useful for this task.

15. Patent research

Researching patents can be a tedious process. Whether you are doing market intelligence research or looking to file a new patent, a summarizer to extract the most salient claims across patents could be a time saver.

16. Meetings and video-conferencing

With the growth of tele-working, the ability to capture key ideas and content from conversations is increasingly needed. A system that could turn voice to text and generate summaries from your team meetings would be fantastic.

17. Help desk and customer support

Knowledge bases have been around for a while, and they are critical for SAAS platforms to provide customer support at scale. Still, users can sometimes feel overwhelmed when browsing help docs. Could multi-document summarization provide key points from across help articles and give the user a well rounded understanding of the issue?

18. Helping disabled people

As voice-to-text technology continues to improve, people with hearing disabilities could benefit from summarization to keep up with content in a more efficient way.

19. Programming languages

There have been multiple attempts to build AI technology that could write code and build websites by itself. It is a possibility that custom ” code summarizers” will emerge to help developers get the big picture out of a new project.

20. Automated content creation

” Will robo-writers replace my job?” That’s what writers are increasingly asking themselves. If artificial intelligence is able to replace any stage of the content creation process, automatic summarization is likely going to play an important role. Related to point 3 (applications in search marketing and SEO), writing a good blog usually goes by summarizing existing sources for a given query. Summarization technology might reach a point where it can compose an entirely original article out of summarizing related search results.

Frase Workflow: How to make a research-driven content brief for SEO

At Frase, we are developing AI-driven research tools to accelerate content creation. Creating a research-driven content brief is one of the workflows Frase aims to help with.

When you have to create a blog post targeting a particular search query, it is helpful to start off with a content brief. A typical creative brief would at least include the following information:

  • Tentative title
  • Key topics
  • Links and related articles
  • Customer persona and user intent
  • Writing tone and style
  • Target word count and delivery date

What is a research-driven content brief?

As you may know, SEO has evolved well beyond simple keywords. Your content has to cover topics widely and deeply, but finding the right topics is only half of the work. Most SEO research tools will give you a list of relevant topics, but a list can feel rather superficial to a writer.

A research-driven brief surrounds each topic with a wealth of information and perspectives. The ultimate goal is to help you create authoritative content that shows off a well-rounded understanding of the subject.

Frase accelerates the creation of content briefs by combining two powerful AI-powered technologies:

  • Named entity recognition: the ability to automatically identify and classify topics in text, as well as drawing relationships between them.
  • Automatic summarization: the ability to condense long articles down to a selection of the most meaningful sentences.

Creating research-driven briefs on Frase

Let’s dive into a real life example. How could you generate a research-driven brief for the query: “how is artificial intelligence transforming content creation?”

1. Define your document theme

The theme is the search query you would like to rank for. In Frase documents, the theme is used as a baseline topic to help the Research Assistant better understand context and make recommendations.

2. Frase editor

The editor is the heart of the Frase platform. It features a minimalistic word processor on the left side, and a research assistant on the right side. The main goal is to help you writing and research in the same environment, which is helpful for research-intensive workflows like composing a content brief.

3. Explore topics

Frase will scan search results for your query and automatically extract key topics for your review.

4. List down your preferred topics

As an initial step towards building your brief, cherrypick those topics that can give form to your coming story.

5. Select top paragraphs for each topic

Once you’ve built your list of topics, explore each topic to understand in what context they are mentioned. For example, in the case below we can quickly understand that Salesforce has been mentioned because it employs algorithms to summarize content. Select those paragraphs you consider insightful and unique for each topic. Each selected paragraph will get added to your document with a citation.

6. Your research-driven content brief is done

In a matter of minutes, you’ve generated a research-driven content brief that incorporates key topics along with deeper information and quality links. Frase allows you to export or share your content in various formats.

Why should you use research-driven content briefs?

  • Understand deeper perspectives for each of your strategic topics.
  • Accelerate the content creation process by directly incorporating ideas from your brief.
  • Better contextualize your linking strategy with more informed background research.
  • If you are a marketing manager or strategist, help your writers meet your content expectations.

Interested in leveraging NLP techniques to create content briefs?

Try out Frase for free today.

How AI-generated news roundups can power your content strategy?

News is a powerful way that organizations can build topic authority and create content on a regular basis. Search marketers have been leaning on news-driven content for many years to help themselves and their clients make progress in the search engines. News is plentiful and fresh, and almost never lets you down. For all its benefits, news is not without its challenges.

Search marketers and content strategists looking to produce news-based content must develop an editorial process that is timely, relevant, and draws on best practice for fact-checking and accuracy.

Bill Smith, owner of Boston SEO Services, may have found just the solution.

With Frase, Bill is able to able to create daily news roundups for publication on his websites in a fraction of the time that a fully manual operation would take.

frase editor

The process is simple. Using Frase’s AI research assistant, Bill is presented with a selection of sources relevant to his search query. He now has a number of choices. The Frase research assistant allows him to:

  • Further refine the selection by adding fresh terms to his query.
  • Summarize each of the articles for quick and easy understanding of their contents.
  • View individual articles in full within the platform, or go direct to the original source if he prefers.
  • Explore specific themes and topics within each article to find exactly what he is looking for.

Once Bill has decided what he wants to include in his roundup, things couldn’t be easier. With a simple click of a button, Frase allows him to add comprehensive summaries of his earmarked articles to a working document, all within the same browser. Article images, dates, topic information, and links are pulled across seamlessly too.

Bill now has the option to add custom copy to his document and otherwise edit the content, or to publish directly to his feed. Whatever he decides, his document already looks great.

Publication is simple. Frase offers a WordPress integration and a variety of ways for exporting a completed document into a separate content management system. The end result is impressive, and incredibly easy to create. “It’s very straightforward to use,” Bill says.

Bill’s site The Young Marketeers benefits from Frase’s AI research tool, which selects and summarizes relevant articles for easy inclusion into content

The marketing upsides of news roundups

 News roundups have many benefits and should be a staple element of a marketer’s toolkit. They drive traffic to a website, help with search rankings, and provide great fuel for social media and email campaigns.

Roundups assist with building topic authority, and introduce themes into your website that can form the basis of additional content. They are, most importantly, of great interest to your site visitors and help position your website as a thought leader in the space.

And they work.

Bill, who shares his published roundups on LinkedIn, was able to see steady traffic coming from the news roundups into his site within two months of launch. He is also using Frase to create roundups for a new digital venture selling CBD hemp oil for medicinal purposes. The content “works great” for email newsletters using Frase’s MailChimp integration, Bill says, and he already has a couple of thousand visitors a month to the new website.

Bill praises Frase on its ability to save him time while generating relevant, effective content powered by cutting-edge AI. Describing his experience with Frase as an “ongoing case study,” he looks forward to putting the platform to work in different ways and on different projects as his business evolves.

So many uses, so little time.

How will you use Frase?

Artificial intelligence in SEO and content optimization

This post examines how artificial intelligence is changing SEO and proposes specific techniques for content marketers to adapt accordingly.

In the past five years, Google has introduced two algorithm updates that put a clear focus on content quality and language comprehensiveness. In 2013, Hummingbird gave search engines semantic analysis capability. In 2015, Google announced RankBrain, which marked the beginning of Google‘s AI-first strategy. This means that Google uses multiple AI-driven techniques to rank search results.

As a result, search engine optimization (SEO) has shifted focus from keywords to topical authority. Keyword research is still important, but its role has changed. Simply put, AI systems can now understand way beyond individual keywords. Much like humans do, new systems can understand relationships between topics and develop a contextual interpretation. In other words, AI is learning to read.

Increasingly, marketers are using AI-powered tools to help them reverse engineer the way search engines find the best content. When it comes to research tools, one widely discussed topic is the difference between SEO and content optimization.

SEO runs on keywords, content optimization runs on NLP

SEO has traditionally run on keywords, while content optimization runs on Natural Language Processing (NLP).

Typically, your content strategy will start with a set of important keywords. However, each keyword now has more sophisticated properties than it used to: what user intent does it relate to? what broader topics does it link with? what cluster does it belong to?

Content optimization is all about understanding these additional properties in language. The ultimate goal is to create the most authoritative content for a given query, optimized for both topic breadth (horizontal topic coverage) and depth (how detailed you go into the topic).

We are teaching machines to read

The terms ” topic modeling” and ” latent semantic indexing” have been widely used in the digital marketing and SEO arenas to describe the way semantic search works. It is worth exploring some specific data science techniques that are powering the latest AI-powered tools.

Word vectors:

Word vectorization is a natural language processing (NLP) technique where words and phrases from a vocabulary are mapped to vectors of real numbers. Word vectors typically have around 200 dimensions, meaning each word gets a position in a 200 dimension space. Placing words in a multi-dimensional vector allows us to perform similarity comparisons, among other operations. The sum of word vectors may also be used to calculate document vectors. A typical challenge in word vectors is ambiguity: the meaning of a word (” apple” vs ” Apple” ) can be embedded into the same vector location. More advanced vectors, called ” sense embeddings” , solve this problem by interpreting each version of the word differently. For example, a sense embedding might be able to position ” apple” the fruit and ” Apple” the company in very different positions within the vector space.

word vector representation
Simple 2-dimension vector representation of a small house-related vocabulary.

Named entity recognition (NER):

This technique seeks to locate and classify named entities in text into pre-defined categories such as concepts, organizations, locations or people. Traditionally, NER had relied on large databases like Wikipedia, to recognize known entities. Current neural networks are trained on tagged NER datasets to learn language patterns and identify entities the system has never seen before. For example, say your new startup appears in the news tomorrow. A good NER system should be able to classify it as an organization, even if the startup name is something new in the vocabulary. SEO tools with this capability provide a deeper understanding of a topic as they can identify more unique sub-topics in context. While search engines utilize automatic NER systems, it is always a good idea to enrich your data with schema markup conventions.

named entity recognition
Example of automatic NER ran on a sentence.

Query classification:

Query classification is the process where a search engine deciphers user intent from a short text input. The main challenge is related to ambiguity. Search engines like Google collect click-through data from users to validate search intent and train machine learning models around them. Techniques like word vectors and NER are also used in query classification algorithms to compare the topics in your query against a set of potential results.

Question answering:

This is concerned with building systems that automatically answer questions posed by humans in a natural language. In general, there are two types of questions to tackle: fact-based (i.e. what is the capital of France?) and open-domain (i.e. what is the future of SEO?). The latter usually involves analyzing dozens of search results for a given search query, and composing a ” multi-document summary” . Question-answering is one of the most active areas of research as it powers new mediums like voice search, which may require special considerations when it comes to SEO.

Automatic document summarization: 

Text summarization is the process of shortening a text document with software, in order to create a summary with the major points of the original document. Technologies that can make a coherent summary take into account variables such as length, writing style and syntax. Salesforce has made major breakthroughs in summarization.

Below is a screenshot from Frase that shows the taxonomy of a summary:

automatic summarization

Textual entailment: 

Entailment is a fundamental concept in logic, which describes the relationship between statements that hold true when one statement logically follows from one or more statements. A valid logical argument is one in which the conclusion is entailed by the premises, because the conclusion is the consequence of the premises. Textual Entailment algorithms can take a pair of sentences and predict whether the facts in the first necessarily imply the facts in the second one. This can be a useful technique to measure logic and cohesiveness in documents.

The importance of pillar pages and topic clusters

It is widely known that Google analyzes your full website to determine whether your content demonstrates topic authority in certain subjects. AI systems frequently employ document clustering as a technique to group data according to specific properties. For example, if your website has thousands of pages, a clustering algorithm may be able to group them by theme. If your website doesn’t present any clear themes, it might mean it lacks focus or expertise.

As a way to mimic document clustering algorithms,  SEO is shifting to a topic cluster model, where pillar pages act as nodes connecting subpages. This model is a fairly sophisticated way to organize your website’s information architecture and content strategy.

Topic cluster model for content marketing strategy

How to fit AI tools into your content creation process

So now that we better understand the way search engines ” think” , it is time to think about the overall workflow that will help us match our content to what your target audience is actually interested in.

1. Perform a semantic content audit

Crawl your entire website and analyze all of its topics. Does it look like your content is well organized around cohesive themes? Which are the most relevant topics? Which pages receive more internal links? A semantic content audit is a full width analysis of your website’s content that will measure topic breadth. Ideally, you would perform the same analysis on both your competitors and external industry thought leaders. The goal here is to understand the big picture and identify topic gaps. To accomplish this, you will need a tool that can crawl your full website, automatically extract key topics (through named entity recognition) and understand semantic relationships (through word vectors).

2. Define topic clusters

Browse topics from the content audit to identify groups and semantic associations. Build a list of sub-topics for each cluster. These topics should be optimized for two key metrics:

  • Search growth: topics that are receiving an increased exposure in search engine queries.
  • Competition: topics that your direct competitors might have failed to mention.

3. Develop pillar pages

Compose an outline of the main topics your pillar page should include. Define a search query you would like your pillar page to rank for and perform semantic analysis on the top results. Make sure your pillar page covers key topics optimizing for these two metrics:

  • Topic coverage: your content should cover the most relevant topics from SERP pages. Of course, be aware of keyword stuffing and make sure your story flows smoothly. This is where you can pay attention to document length; a story that aggressively covers all the top topics mentioned in SERP results will likely have to be longer. As an alternative, you may want to consider breaking down your topics into multiple shorter articles.
  • Content authenticity: while you have to align your content with the topics mentioned by SERP pages, you also have to find a unique angle to the story. One way to do this is using related topics without specifically using the same terms used by competitors. Remember word vectors understand similarity between topics, so by using good similar topics you may still rank high in search. Once you’ve accomplished a wide and authentic topic coverage, it is always valuable to incorporate proprietary insights nobody else has mentioned.

4. Develop content for each sub-topic

Use the same outline process described in point 3 to develop contents around sub topics that point to your pillar pages.

5. Continuous content optimization and re-publishing

Monitor what thought leaders publish about your target topics. This will help you come up with ideas to either write new content or optimize existing content around up-and-coming topics. The strategy of re-publishing content has proven to generate positive results in search rankings.

Can obsessive SEO limit your creativity?

Today’s marketers use many tools and there is certainly a sense of software fatigue. It almost looks like you have to break down your content creation workflow into stages that might end up limiting your creativity. Am I forgetting a keyword? Am I mentioning this keyword too much?

At some point, you have to thinking whether you are over-obsessing around content optimization. In my view, research and content creation should go hand in hand and work together in a more natural way. For example, based on what you are writing and your intended outcome, a system should be able to recommend topics in context. Helping writers incorporate SEO best practices into their creative workflow is something we think about at Frase.

There are different tools that can help you accomplish some of the analytical tasks explained in this post. At Frase, we’ve created a platform that helps content marketers perform large-scale semantic content audits, along with a writing tool that acts a Research Assistant. It it is totally free to try!

frase editor

15 things you should consider before building an AI startup

At Frase, we are using AI  to improve the way people write and research on the internet. Back in 2016, we came up with the idea of an AI-powered Research Assistant, an intelligent agent that would interact with the writer providing sources and ideas in context. Since the early days, Frase was largely a technology play that required lots of research, and we had (and continue to have) many un-answered questions. This technology uncertainty can have a major impact in your company, and you have to be ready to embrace it.

If you are about to start an AI company, particularly if you are a non-technical CEO, these are some things you should consider:

1. You might need a dataset that doesn’t exist

In simple terms, artificial intelligence is possible because we train computers to learn from data. For example, if have a dataset of 20,000 tweets, where 10,000 are positive and 10,000 are negative, we could train a model to detect sentiment in text. That sounds so easy, right?

When it comes to developing AI solutions, having access to a good dataset is frequently the most challenging part. Recently, IBM released the largest ever dataset of facial images with over 1 million tagged images; thats a big deal. In the case of Frase, one traditional dataset challenge is summarization. While there is a lot of activity in the space, I wouldn’t say there is a great dataset available. This may also mean there is an opportunity to build it.

When you hit a dataset wall, you either have to build your own dataset (time consuming and potentially costly, but possible if you have the time), use the best available proxy dataset, or simply move on and focus on other problems.

2. Servers get expensive: CPU vs. GPU

Without getting into any technical details, CPUs have been the most traditional hardware enabling the cloud over the past years. GPU machines are more modern and powerful, and they are used for more server intensive applications, including video games. The bad news is that some of the most promising technologies in AI require a GPU machine. It is bad news because GPUs are expensive, and possibly too much when you are a very early-stage startup. Alternative solutions: buy your own hardware, or get credits from companies like Google, Amazon or Microsoft.

3. AI is not always the right solution

Nowadays, AI is such a hot topic that everyone would always choose to make something ” AI-driven” . While the hype is great, sometimes basic statistics can do the job equally as good. Using the best neural network solution might only gain you 2-3% in accuracy, which is something your users might never feel….but they will feel the slowdown if you haven’t put up the costly architecture to support it.

4. Users will complain during the early stages

Some people are generally skeptical about AI, and often times they can be too judgmental. The early users of your AI product might feel a bit frustrated at the beginning. If your model has an accuracy of 65%, that means 35% of the time your user will not get satisfactory results. Of course, this sucks for the user.

5. AI is a black box, but users want to know how it works

You see data scientists brag about their inventions, but in reality they don’t really know what is going on inside of a neural network. By design, a neural network will have hidden layers,  and all you can actually see is an input and output. This can be frustrating when you are trying to improve a given model, but you don’t have enough data points to take new directions. In addition, users will frequently ask you about the ” algorithm” behind your magical product. And the reality is that you can’t really explain the inner workings, but only give a generic explanation of the process behind it.

6. Developing AI systems requires a rigorous scientific research process

Working on AI can feel like having a university department in your company. Successful machine learning practitioners usually have an academic background or actively contribute to academic journals. For example, in the area of text summarization, Salesforce has published numerous papers and some of their authors are industry leaders.

7. You have to follow Arxiv every day

In relation to point number 6, you have to live and breath Arxiv to keep up to do date. Even if you are not a data scientist or developer, you can only benefit from following what is happening in the space. Don’t be intimidated by the technical formatting and mathematics in most papers, even laymen can understand abstracts and directions of papers…valuable not only to teach yourself but maybe something you can pass on to your researcher.

8. Be ready for manual labor and repetitive tasks as you test your creation

Every now and then, you will have to spend time doing very repetitive tasks to evaluate a given model, or assemble a testing dataset. In relation to point 5, there is nothing better than using your own product to recognize its weaknesses.

9. Generalized versus highly specific models

Again, I will use summarization to illustrate this point. You could train a summarization model on a massive dataset of news articles, and that may work well when summarizing news articles. But what if you try to summarize a technology blog post, will it work equally well? In that case, you might consider training a separate model for technology blog posts. Of course, having numerous models creates a challenge related to infrastructure, performance, etc.

Once you’ve decided what model you want to work on, don’t try to predict the subject matter your users will use. You will fail. Design your systems abstractly and generally because somebody will always try something absolutely ridiculous on a demo.

10. Data scientists

There is an existing problem related to supply. It seems like the market demand for data scientists is very high, so salaries are through the roof. On the other hand, more and more data scientists are being trained in both universities and online courses. Talented software developers can become great date scientists over time.

11. Open source libraries are great, until you dig deeper

There are a few de-facto open source libraries and frameworks used in data science. Most of them are great, particularly those supported by big companies like TensorFlow (Google). Of course, AI is a very new field and some libraries are fairly new, which increases the risk of bugs or unexpected issues. Occasionally, you will also find that some libraries don’t release their best kept secrets. You’ll almost always see their developers go on to create businesses around their open source library that seem to work much better than yours. Don’t be afraid to reach out and start a conversation with them.

12. A data scientist cannot be your CTO

If you are assembling a team for your AI startup, I believe you need at least 2 partners: a CTO taking care of the whole platform, and a dedicated data scientist who is fully focused on machine learning.

13. Make your own data

We’ve already discussed the challenge with datasets. The ultimate solution is when your own product produces enough data to train models around it. The most valuable thing about today’s AI companies is their in-house generated data. Of course, this may take time and be a long term strategy. Many large companies are starting to look inside and realize they have massive amounts of unstructured data. This represents a major opportunity for them to develop AI solutions, although they might not have in-house data science talent.

14. User experience is key for AI systems to succeed

Related to point 4 (users will complain), users will always hit edge cases where your AI system gets confused. You must develop and think of UX concepts to either hide or mitigate your model’s errors. A good example is a bot: by having the human-AI interaction, we seem to create a more guided journey where the user can help your model take less risks.

15. “How would a human do it?”

So if you are thinking of starting an AI company, you probably have an idea that will revolutionize a certain human process. Something that helps me think about AI solutions is asking myself how a human would solve the problems.

Good luck.