At Frase, we are exploring various use cases for summarization. This post describes how Frase brings the power of automatic summarization to newsletters and content curation programs.
If you are like me, you subscribe to newsletters that give you a nicely curated selection of content about topics that matter to you. This content is usually delivered as a list of titles and hyperlinks, which can sometimes be overwhelming if the list is long. Summarization adds value by summarizing the contents inside those selected links, providing convenience and effortless insight to the reader.
The ultimate goal here is to deliver a consistent and cohesive stream of curated contents to our target audience. This process can be fully accomplished in the Frase platform in 3 steps. To illustrate the process, let’s say we wanted to publish a daily round up about the topic of artificial intelligence for SEO:
Customize your media monitors
Customize your own media monitoring system by following topics and publishers of your choosing. In contrast to other feed monitoring tools, Frase goes one step further by summarizing and extracting key topics from the content you monitor.
1. Topics: Frase allows boolean logic and multiple conditions to help you narrow down your target topic. In this case, a simple keyword rule to monitor articles that mention ” artificial intelligence” AND ” search engine optimization” should do the work for us. Depending on the topic you want to follow, you might need to add more keywords, use different boolean operators or add more conditions.
2. Sources: you can either pull from the full Frase index (thousands of publishers, plus Google News or you can customize your own list of rss feeds. I would always recommend building a custom list of blogs and publishers of your interest. In parallel, you can always have a separate monitor to follow more general sources.
Review daily summaries
Once your monitor is up and running, you can browse incoming daily summaries in multiple ways:
1. The Frase app: ideal for reviewing summaries on the go. The app also allows bookmarking and social media sharing.
2. Frase editor: if you are looking to compose a custom daily roundup, you should access your monitor from the Frase editor. This way you can incorporate summaries directly into your document and edit them when needed.
Publish to Mailchimp or WordPress
Once you’ve composed a daily round up out of your monitor summaries, you can publish it directly into your Mailchimp or WordPress accounts.
Summarization has been and continues to be a hot research topic in the data science arena. While text summarization algorithms have existed for a while, major advances in natural language processing and deep learning have been made in recent years. Many internet companies are actively publishing research papers on the subject. Salesforce has published various groundbreaking papers presenting state-of-the-art abstractive summarization. In May 2018, the largest summarization dataset as revealed in a projected supported by a Google Research award.
While there is intense activity in the research field, there is less literature available regarding real world applications of AI-driven summarization. One of the challenges with summarization is that it is hard to generalize. For example, summarizing a news article is very different to summarizing a financial earnings report. Certain text features like document length or genre (tech, sports, finance, travel, etc.) make the task of summarization a serious data science problem to solve. For this reason, the way summarization works largely depends on the use case and there is no one-size-fits-all solution.
Summarization: the basics
Before diving into an overview of use cases, it is worth explaining a few basics around summarization:
There are two main approaches to summarization:
Extractive summarization: it works by selecting the most meaningful sentences in an article and arranging them in a comprehensive manner. This means the summary sentences are extracted from the article without any modifications.
Abstractive summarization: it works by paraphrasing its own version of the most important sentence in the article.
There are also two scales of document summarization:
Single-document summarization: the task of summarizing a standalone document. Note that a ” document” could refer to different things depending on the use case (URL, internal PDF file, legal contract, financial report, email, etc.).
Multi-document summarization: the task of assembling a collection of documents (usually through a query against a database or search engine) and generating a summary that incorporates perspectives from across documents.
Finally, there are two common metrics any summarizer attempts to optimize:
Topic coverage: does the summary incorporate the main topics from the document?
Readability: do the summary sentences flow in a logical way?
Use cases in the enterprise:
These are some use cases where automatic summarization can be used across the enterprise:
1. Media monitoring
The problem of information overload and ” content shock” has been widely discussed. Automatic summarization presents an opportunity to condense the continuous torrent of information into smaller pieces of information.
Many weekly newsletters take the form of an introduction followed by a curated selection of relevant articles. Summarization would allow organizations to further enrich newsletters with a stream of summaries (versus a list of links), which can be a particularly convenient format in mobile.
3. Search marketing and SEO
When evaluating search queries for SEO, it is critical to have a well-rounded understanding of what your competitors are talking about in their content. This has become particularly important since Google updated its algorithm and shifted focus towards topical authority (versus keywords). Multi-document summarization can be a powerful tool to quickly analyze dozens of search results, understand shared themes and skim the most important points.
4. Internal document workflow
Large companies are constantly producing internal knowledge, which frequently gets stored and under-used in databases as unstructured data. These companies should embrace tools that let them re-use already existing knowledge. Summarization can enable analysts to quickly understand everything the company has already done in a given subject, and quickly assemble reports that incorporate different points of view.
5. Financial research
Investment banking firms spend large amounts of money acquiring information to drive their decision-making, including automated stock trading. When you are a financial analyst looking at market reports and news everyday, you will inevitably hit a wall and won’t be able to read everything. Summarization systems tailored to financial documents like earning reports and financial news can help analysts quickly derive market signals from content.
6. Legal contract analysis
Related to point 4 (internal document workflow), more specific summarization systems could be developed to analyze legal documents. In this case, a summarizer might add value by condensing a contract to the riskier clauses, or help you compare agreements.
7. Social media marketing
Companies producing long-form content, like whitepapers, e-books and blogs, might be able to leverage summarization to break down this content and make it sharable on social media sites like Twitter or Facebook. This would allow companies to further re-use existing content.
8. Question answering and bots
Personal assistants are taking over the workplace and the smart home. However, most assistants are fairly limited to very specific tasks. Large-scale summarization could become a powerful question answering technique. By collecting the most relevant documents for a particular question, a summarizer could assemble a cohesive answer in the form of a multi-document summary.
9. Video scripting
Video is becoming one of the most important marketing mediums. Besides video-focused platforms like YouTube or Vimeo, people are now sharing videos on professional networks like LinkedIn. Depending on the type of video, more or less scripting might be required. Summarization can get to be an ally when looking to produce a script that incorporates research from many sources.
10. Medical cases
With the growth of tele-health, there is a growing need to better manage medical cases, which are now fully digital. As telemedicine networks promise a more accessible and open healthcare system, technology has to make the process scalable. Summarization can be a crucial component in the tele-health supply chain when it comes to analyzing medical cases and routing these to the appropriate health professional.
11. Books and literature
Google has reportedly worked on projects that attempt to understand novels. Summarization can help consumers quickly understand what a book is about as part of their buying process.
12. Email overload
Companies like Slack were born to keep us away from constant emailing. Summarization could surface the most important content within email and let us skim emails faster.
13. E-learning and class assignments
Many teachers utilize case studies and news to frame their lectures. Summarization can help teachers more quickly update their content by producing summarized reports on their subject of interest.
14. Science and R&D
Academic papers typically include a human-made abstract that acts as a summary. However, when you are tasked with monitoring trends and innovation in a given sector, it can become overwhelming to read every abstract. Systems that can group papers and further compress abstracts can become useful for this task.
15. Patent research
Researching patents can be a tedious process. Whether you are doing market intelligence research or looking to file a new patent, a summarizer to extract the most salient claims across patents could be a time saver.
16. Meetings and video-conferencing
With the growth of tele-working, the ability to capture key ideas and content from conversations is increasingly needed. A system that could turn voice to text and generate summaries from your team meetings would be fantastic.
17. Help desk and customer support
Knowledge bases have been around for a while, and they are critical for SAAS platforms to provide customer support at scale. Still, users can sometimes feel overwhelmed when browsing help docs. Could multi-document summarization provide key points from across help articles and give the user a well rounded understanding of the issue?
18. Helping disabled people
As voice-to-text technology continues to improve, people with hearing disabilities could benefit from summarization to keep up with content in a more efficient way.
19. Programming languages
There have been multiple attempts to build AI technology that could write code and build websites by itself. It is a possibility that custom ” code summarizers” will emerge to help developers get the big picture out of a new project.
20. Automated content creation
” Will robo-writers replace my job?” That’s what writers are increasingly asking themselves. If artificial intelligence is able to replace any stage of the content creation process, automatic summarization is likely going to play an important role. Related to point 3 (applications in search marketing and SEO), writing a good blog usually goes by summarizing existing sources for a given query. Summarization technology might reach a point where it can compose an entirely original article out of summarizing related search results.
At Frase, we are developing AI-driven research tools to accelerate content creation. Creating a research-driven content brief is one of the workflows Frase aims to help with.
When you have to create a blog post targeting a particular search query, it is helpful to start off with a content brief. A typical creative brief would at least include the following information:
Links and related articles
Customer persona and user intent
Writing tone and style
Target word count and delivery date
What is a research-driven content brief?
As you may know, SEO has evolved well beyond simple keywords. Your content has to cover topics widely and deeply, but finding the right topics is only half of the work. Most SEO research tools will give you a list of relevant topics, but a list can feel rather superficial to a writer.
A research-driven brief surrounds each topic with a wealth of information and perspectives. The ultimate goal is to help you create authoritative content that shows off a well-rounded understanding of the subject.
Frase accelerates the creation of content briefs by combining two powerful AI-powered technologies:
Named entity recognition: the ability to automatically identify and classify topics in text, as well as drawing relationships between them.
Automatic summarization: the ability to condense long articles down to a selection of the most meaningful sentences.
Creating research-driven briefs on Frase
Let’s dive into a real life example. How could you generate a research-driven brief for the query: “how is artificial intelligence transforming content creation?”
1. Define your document theme
The theme is the search query you would like to rank for. In Frase documents, the theme is used as a baseline topic to help the Research Assistant better understand context and make recommendations.
2. Frase editor
The editor is the heart of the Frase platform. It features a minimalistic word processor on the left side, and a research assistant on the right side. The main goal is to help you writing and research in the same environment, which is helpful for research-intensive workflows like composing a content brief.
3. Explore topics
Frase will scan search results for your query and automatically extract key topics for your review.
4. List down your preferred topics
As an initial step towards building your brief, cherrypick those topics that can give form to your coming story.
5. Select top paragraphs for each topic
Once you’ve built your list of topics, explore each topic to understand in what context they are mentioned. For example, in the case below we can quickly understand that Salesforce has been mentioned because it employs algorithms to summarize content. Select those paragraphs you consider insightful and unique for each topic. Each selected paragraph will get added to your document with a citation.
6. Your research-driven content brief is done
In a matter of minutes, you’ve generated a research-driven content brief that incorporates key topics along with deeper information and quality links. Frase allows you to export or share your content in various formats.
Why should you use research-driven content briefs?
Understand deeper perspectives for each of your strategic topics.
Accelerate the content creation process by directly incorporating ideas from your brief.
Better contextualize your linking strategy with more informed background research.
If you are a marketing manager or strategist, help your writers meet your content expectations.
Interested in leveraging NLP techniques to create content briefs?
News is a powerful way that organizations can build topic authority and create content on a regular basis. Search marketers have been leaning on news-driven content for many years to help themselves and their clients make progress in the search engines. News is plentiful and fresh, and almost never lets you down. For all its benefits, news is not without its challenges.
Search marketers and content strategists looking to produce news-based content must develop an editorial process that is timely, relevant, and draws on best practice for fact-checking and accuracy.
With Frase, Bill is able to able to create daily news roundups for publication on his websites in a fraction of the time that a fully manual operation would take.
The process is simple. Using Frase’s AI research assistant, Bill is presented with a selection of sources relevant to his search query. He now has a number of choices. The Frase research assistant allows him to:
Further refine the selection by adding fresh terms to his query.
Summarize each of the articles for quick and easy understanding of their contents.
View individual articles in full within the platform, or go direct to the original source if he prefers.
Explore specific themes and topics within each article to find exactly what he is looking for.
Once Bill has decided what he wants to include in his roundup, things couldn’t be easier. With a simple click of a button, Frase allows him to add comprehensive summaries of his earmarked articles to a working document, all within the same browser. Article images, dates, topic information, and links are pulled across seamlessly too.
Bill now has the option to add custom copy to his document and otherwise edit the content, or to publish directly to his feed. Whatever he decides, his document already looks great.
Publication is simple. Frase offers a WordPress integration and a variety of ways for exporting a completed document into a separate content management system. The end result is impressive, and incredibly easy to create. “It’s very straightforward to use,” Bill says.
The marketing upsides of news roundups
News roundups have many benefits and should be a staple element of a marketer’s toolkit. They drive traffic to a website, help with search rankings, and provide great fuel for social media and email campaigns.
Roundups assist with building topic authority, and introduce themes into your website that can form the basis of additional content. They are, most importantly, of great interest to your site visitors and help position your website as a thought leader in the space.
And they work.
Bill, who shares his published roundups on LinkedIn, was able to see steady traffic coming from the news roundups into his site within two months of launch. He is also using Frase to create roundups for a new digital venture selling CBD hemp oil for medicinal purposes. The content “works great” for email newsletters using Frase’s MailChimp integration, Bill says, and he already has a couple of thousand visitors a month to the new website.
Bill praises Frase on its ability to save him time while generating relevant, effective content powered by cutting-edge AI. Describing his experience with Frase as an “ongoing case study,” he looks forward to putting the platform to work in different ways and on different projects as his business evolves.
This post examines how artificial intelligence is changing SEO and proposes specific techniques for content marketers to adapt accordingly.
In the past five years, Google has introduced two algorithm updates that put a clear focus on content quality and language comprehensiveness. In 2013, Hummingbird gave search enginessemantic analysis capability. In 2015, Google announced RankBrain, which marked the beginning of Google‘s AI-first strategy. This means that Google uses multiple AI-driven techniques to rank search results.
As a result, search engine optimization (SEO) has shifted focus from keywords to topical authority. Keyword research is still important, but its role has changed. Simply put, AI systems can now understand way beyond individual keywords. Much like humans do, new systems can understand relationships between topics and develop a contextual interpretation. In other words, AI is learning to read.
Increasingly, marketers are using AI-powered tools to help them reverse engineer the way search engines find the best content. When it comes to research tools, one widely discussed topic is the difference between SEO and content optimization.
SEO runs on keywords, content optimization runs on NLP
SEO has traditionally run on keywords, while content optimization runs on Natural Language Processing (NLP).
Typically, your content strategy will start with a set of important keywords. However, each keyword now has more sophisticated properties than it used to: what user intent does it relate to? what broader topics does it link with? what cluster does it belong to?
Content optimization is all about understanding these additional properties in language. The ultimate goal is to create the most authoritative content for a given query, optimized for both topic breadth (horizontal topic coverage) and depth (how detailed you go into the topic).
We are teaching machines to read
The terms ” topic modeling” and ” latent semantic indexing” have been widely used in the digital marketing and SEO arenas to describe the way semantic search works. It is worth exploring some specific data science techniques that are powering the latest AI-powered tools.
Word vectorization is a natural language processing (NLP) technique where words and phrases from a vocabulary are mapped to vectors of real numbers. Word vectors typically have around 200 dimensions, meaning each word gets a position in a 200 dimension space. Placing words in a multi-dimensional vector allows us to perform similarity comparisons, among other operations. The sum of word vectors may also be used to calculate document vectors. A typical challenge in word vectors is ambiguity: the meaning of a word (” apple” vs ” Apple” ) can be embedded into the same vector location. More advanced vectors, called ” sense embeddings” , solve this problem by interpreting each version of the word differently. For example, a sense embedding might be able to position ” apple” the fruit and ” Apple” the company in very different positions within the vector space.
Named entity recognition (NER):
This technique seeks to locate and classify named entities in text into pre-defined categories such as concepts, organizations, locations or people. Traditionally, NER had relied on large databases like Wikipedia, to recognize known entities. Current neural networks are trained on tagged NER datasets to learn language patterns and identify entities the system has never seen before. For example, say your new startup appears in the news tomorrow. A good NER system should be able to classify it as an organization, even if the startup name is something new in the vocabulary. SEO tools with this capability provide a deeper understanding of a topic as they can identify more unique sub-topics in context. While search engines utilize automatic NER systems, it is always a good idea to enrich your data with schema markup conventions.
Query classification is the process where a search engine deciphers user intent from a short text input. The main challenge is related to ambiguity. Search engines like Google collect click-through data from users to validate search intent and train machine learning models around them. Techniques like word vectors and NER are also used in query classification algorithms to compare the topics in your query against a set of potential results.
This is concerned with building systems that automatically answer questions posed by humans in a natural language. In general, there are two types of questions to tackle: fact-based (i.e. what is the capital of France?) and open-domain (i.e. what is the future of SEO?). The latter usually involves analyzing dozens of search results for a given search query, and composing a ” multi-document summary” . Question-answering is one of the most active areas of research as it powers new mediums like voice search, which may require special considerations when it comes to SEO.
Automatic document summarization:
Text summarization is the process of shortening a text documentwith software, in order to create a summary with themajor pointsof theoriginal document. Technologies that can make acoherent summarytake intoaccount variablessuch as length, writing style and syntax. Salesforce has made major breakthroughs in summarization.
Below is a screenshot from Frase that shows the taxonomy of a summary:
Entailment is a fundamental concept in logic, which describes the relationship between statements that hold true when one statement logically follows from one or more statements. A valid logical argument is one in which the conclusion is entailed by the premises, because the conclusion is the consequence of the premises. Textual Entailment algorithms can take a pair of sentences and predict whether the facts in the first necessarily imply the facts in the second one. This can be a useful technique to measure logic and cohesiveness in documents.
The importance of pillar pages and topic clusters
It is widely known that Google analyzes your full website to determine whether your content demonstrates topic authority in certain subjects. AI systems frequently employ document clustering as a technique to group data according to specific properties. For example, if your website has thousands of pages, a clustering algorithm may be able to group them by theme. If your website doesn’t present any clear themes, it might mean it lacks focus or expertise.
As a way to mimic document clustering algorithms, SEO is shifting to a topic cluster model, where pillar pages act as nodes connecting subpages. This model is a fairly sophisticated way to organize your website’s information architecture and content strategy.
How to fit AI tools into your content creation process
So now that we better understand the way search engines ” think” , it is time to think about the overall workflow that will help us match our content to what your target audience is actually interested in.
1. Perform a semantic content audit
Crawl your entire website and analyze all of its topics. Does it look like your content is well organized around cohesive themes? Which are the most relevant topics? Which pages receive more internal links? A semantic content audit is a full width analysis of your website’s content that will measure topic breadth. Ideally, you would perform the same analysis on both your competitors and external industrythought leaders. The goal here is to understand the big picture and identify topic gaps. To accomplish this, you will need a tool that can crawl your full website, automatically extract key topics (through named entity recognition) and understand semantic relationships (through word vectors).
2. Define topic clusters
Browse topics from the content audit to identify groups and semantic associations. Build a list of sub-topics for each cluster. These topics should be optimized for two key metrics:
Search growth: topics that are receiving an increased exposure in search engine queries.
Competition: topics that your direct competitors might have failed to mention.
3. Develop pillar pages
Compose an outline of the main topics your pillar page should include. Define a search query you would like your pillar page to rank for and perform semantic analysis on the top results. Make sure your pillar page covers key topics optimizing for these two metrics:
Topic coverage: your content should cover the most relevant topics from SERP pages. Of course, be aware of keyword stuffing and make sure your story flows smoothly. This is where you can pay attention to document length; a story that aggressively covers all the top topics mentioned in SERP results will likely have to be longer. As an alternative, you may want to consider breaking down your topics into multiple shorter articles.
Content authenticity: while you have to align your content with the topics mentioned by SERP pages, you also have to find a unique angle to the story. One way to do this is using related topics without specifically using the same terms used by competitors. Remember word vectors understand similarity between topics, so by using good similar topics you may still rank high in search. Once you’ve accomplished a wide and authentic topic coverage, it is always valuable to incorporate proprietary insights nobody else has mentioned.
4. Develop content for each sub-topic
Use the same outline process described in point 3 to develop contents around sub topics that point to your pillar pages.
5. Continuous content optimization and re-publishing
Monitor what thought leaders publish about your target topics. This will help you come up with ideas to either write new content or optimize existing content around up-and-coming topics. The strategy of re-publishing content has proven to generate positive results in search rankings.
Can obsessive SEO limit your creativity?
Today’s marketers use many tools and there is certainly a sense of software fatigue. It almost looks like you have to break down your content creation workflow into stages that might end up limiting your creativity. Am I forgetting a keyword? Am I mentioning this keyword too much?
At some point, you have to thinking whether you are over-obsessing around content optimization. In my view, research and content creation should go hand in hand and work together in a more natural way. For example, based on what you are writing and your intended outcome, a system should be able to recommend topics in context. Helping writers incorporate SEO best practices into their creative workflow is something we think about at Frase.
There are different tools that can help you accomplish some of the analytical tasks explained in this post. At Frase, we’ve created a platform that helps content marketers perform large-scale semantic content audits, along with a writing tool that acts a Research Assistant. It it is totally free to try!
At Frase, we are using AI to improve the way people write and research on the internet. Back in 2016, we came up with the idea of an AI-powered Research Assistant, an intelligent agent that would interact with the writer providing sources and ideas in context. Since the early days, Frase was largely a technology play that required lots of research, and we had (and continue to have) many un-answered questions. This technology uncertainty can have a major impact in your company, and you have to be ready to embrace it.
If you are about to start an AI company, particularly if you are a non-technical CEO, these are some things you should consider:
1. You might need a dataset that doesn’t exist
In simple terms, artificial intelligence is possible because we train computers to learn from data. For example, if have a dataset of 20,000 tweets, where 10,000 are positive and 10,000 are negative, we could train a model to detect sentiment in text. That sounds so easy, right?
When it comes to developing AI solutions, having access to a good dataset is frequently the most challenging part. Recently, IBM released the largest ever dataset of facial images with over 1 million tagged images; thats a big deal. In the case of Frase, one traditional dataset challenge is summarization. While there is a lot of activity in the space, I wouldn’t say there is a great dataset available. This may also mean there is an opportunity to build it.
When you hit a dataset wall, you either have to build your own dataset (time consuming and potentially costly, but possible if you have the time), use the best available proxy dataset, or simply move on and focus on other problems.
2. Servers get expensive: CPU vs. GPU
Without getting into any technical details, CPUs have been the most traditional hardware enabling the cloud over the past years. GPU machines are more modern and powerful, and they are used for more server intensive applications, including video games. The bad news is that some of the most promising technologies in AI require a GPU machine. It is bad news because GPUs are expensive, and possibly too much when you are a very early-stage startup. Alternative solutions: buy your own hardware, or get credits from companies like Google, Amazon or Microsoft.
3. AI is not always the right solution
Nowadays, AI is such a hot topic that everyone would always choose to make something ” AI-driven” . While the hype is great, sometimes basic statistics can do the job equally as good. Using the best neural network solution might only gain you 2-3% in accuracy, which is something your users might never feel….but they will feel the slowdown if you haven’t put up the costly architecture to support it.
4. Users will complain during the early stages
Some people are generally skeptical about AI, and often times they can be too judgmental. The early users of your AI product might feel a bit frustrated at the beginning. If your model has an accuracy of 65%, that means 35% of the time your user will not get satisfactory results. Of course, this sucks for the user.
5. AI is a black box, but users want to know how it works
You see data scientists brag about their inventions, but in reality they don’t really know what is going on inside of a neural network. By design, a neural network will have hidden layers, and all you can actually see is an input and output. This can be frustrating when you are trying to improve a given model, but you don’t have enough data points to take new directions. In addition, users will frequently ask you about the ” algorithm” behind your magical product. And the reality is that you can’t really explain the inner workings, but only give a generic explanation of the process behind it.
6. Developing AI systems requires a rigorous scientific research process
Working on AI can feel like having a university department in your company. Successful machine learning practitioners usually have an academic background or actively contribute to academic journals. For example, in the area of text summarization, Salesforce has published numerous papers and some of their authors are industry leaders.
7. You have to follow Arxiv every day
In relation to point number 6, you have to live and breath Arxiv to keep up to do date. Even if you are not a data scientist or developer, you can only benefit from following what is happening in the space. Don’t be intimidated by the technical formatting and mathematics in most papers, even laymen can understand abstracts and directions of papers…valuable not only to teach yourself but maybe something you can pass on to your researcher.
8. Be ready for manual labor and repetitive tasks as you test your creation
Every now and then, you will have to spend time doing very repetitive tasks to evaluate a given model, or assemble a testing dataset. In relation to point 5, there is nothing better than using your own product to recognize its weaknesses.
9. Generalized versus highly specific models
Again, I will use summarization to illustrate this point. You could train a summarization model on a massive dataset of news articles, and that may work well when summarizing news articles. But what if you try to summarize a technology blog post, will it work equally well? In that case, you might consider training a separate model for technology blog posts. Of course, having numerous models creates a challenge related to infrastructure, performance, etc.
Once you’ve decided what model you want to work on, don’t try to predict the subject matter your users will use. You will fail. Design your systems abstractly and generally because somebody will always try something absolutely ridiculous on a demo.
10. Data scientists
There is an existing problem related to supply. It seems like the market demand for data scientists is very high, so salaries are through the roof. On the other hand, more and more data scientists are being trained in both universities and online courses. Talented software developers can become great date scientists over time.
11. Open source libraries are great, until you dig deeper
There are a few de-facto open source libraries and frameworks used in data science. Most of them are great, particularly those supported by big companies like TensorFlow (Google). Of course, AI is a very new field and some libraries are fairly new, which increases the risk of bugs or unexpected issues. Occasionally, you will also find that some libraries don’t release their best kept secrets. You’ll almost always see their developers go on to create businesses around their open source library that seem to work much better than yours. Don’t be afraid to reach out and start a conversation with them.
12. A data scientist cannot be your CTO
If you are assembling a team for your AI startup, I believe you need at least 2 partners: a CTO taking care of the whole platform, and a dedicated data scientist who is fully focused on machine learning.
13. Make your own data
We’ve already discussed the challenge with datasets. The ultimate solution is when your own product produces enough data to train models around it. The most valuable thing about today’s AI companies is their in-house generated data. Of course, this may take time and be a long term strategy. Many large companies are starting to look inside and realize they have massive amounts of unstructured data. This represents a major opportunity for them to develop AI solutions, although they might not have in-house data science talent.
14. User experience is key for AI systems to succeed
Related to point 4 (users will complain), users will always hit edge cases where your AI system gets confused. You must develop and think of UX concepts to either hide or mitigate your model’s errors. A good example is a bot: by having the human-AI interaction, we seem to create a more guided journey where the user can help your model take less risks.
15. “How would a human do it?”
So if you are thinking of starting an AI company, you probably have an idea that will revolutionize a certain human process. Something that helps me think about AI solutions is asking myself how a human would solve the problems.
Journalistic integrity is a precious commodity these days. In an era of fake news stories and clickbait headlines, the premium on scrupulous research and investigative reporting is high. Even reputable news outlets fall foul of unreliably sourced stories. In the rush to publish under the pressure of the 24/7 news cycle, editorial accuracy gets compromised.
“More and more we are seeing the larger media organizations lose credibility because of their bias,” says Rohit, who set up Democracy News Live after a series of senior editorial positions at high-profile news organizations such as CNN and CBC and a storied career as an international correspondent covering South Asia.
Right from the outset, the mission for Democracy News Live was clear: to counter the deluge of plagiarized and derivative content online and to shine a light on social justice issues across India.
Here is the challenge Rohit faced: producing well researched, unbiased content takes a lot of time and resources. How could a journalist leverage Artificial Intelligence to research every side of a story?
Research is the “bedrock” of strong editorial for news organizations
Rohit Gandhi and his colleague Vrinda Aggarwal have been working with the Frase team since February 2017, when the AI research platform was in its infancy. Running a large operation of primarily freelance contributors is a challenge at the best of times, but Frase has helped give DNL’s core team of editorial staffers the research tools they need to ensure strong citation practices to drive their reporting.
The DNL team uses Frase to accelerate their research process. Frase’s content creation tool has two main components:
According to Rohit, the strong citation practices of the Frase system as well as its ability to source content from YouTube and Twitter make it a powerful, one-stop platform for editorial research. Frase means a news organization no longer needs a research department, Rohit states.
Leverage AI-powered support for video scripting
And it doesn’t stop there: Democracy News Live is particularly active with video content and uses Frase to assist with scripting for its roving reporters as they counter “fake news” narratives with their real-life video accounts. The ease of interaction between Frase’s word processor and its research pane makes it a perfect tool for script creation.
Powered by Frase’s AI research assistant, scripts can be ideated, researched, developed, and written in the system before the finished copy is shared within the organization. Writers are able to submit completed copy to editors and content managers for review and comment. Documents can be shared and exported in various formats that accommodate different workflows.
As DNL looks to open up new revenue streams to support its editorial mission, it is testing the waters with sponsored content in the form of original documentaries. Video production is close to the heart for Democracy News Live, as evidenced by Rohit’s extensive career both behind and in front of the camera. Frase helps the organization’s video development flow.
Harness competitive intelligence through media monitoring
Alongside scripting, Democracy News Live understands the power of Frase’s monitor solution, which generates a daily stream of summaries for your topics of interest. DNL writers use Frase monitors to keep them focused and up-to-date on their respective beats.
It is easy to set up comprehensive and state-of-the-art monitoring in Frase for key topics and organizations, then watch the alerts come in. Users can access their monitors at any time within the system, have them emailed to their inbox daily, or view everything within a custom microsite for ease of reading.
In addition to its deployment in the newsroom, Rohit offers the monitor service to third party organizations eager to stay ahead of what is being said about them online. Pairing competitive intelligence with timely editorial research, the Frase monitor feature is best-in-class.
Spanning continents, Frase helps reporters around the globe research better
From editorial research and video scripting to media monitoring, Frase is an integral part of Democracy New Live’s daily operations. It facilitates deeper, more focused research, enforces best citation practices, and supports script creation for video content. For news organizations looking to strengthen their research capabilities, Frase is an invaluable ally.
We are constantly reading stories about new applications of artificial intelligence, and for the most part, it is good news. On the other hand, every now and then, we also read stories about the negative consequences of artificial intelligence. Over the past few years we’ve heard concerning remarks from thought leaders ranging from Stephen Hawking to Elon Musk describing the darker side of artificial intelligence.
In partnership with Lucid, the leading Human Answers Platform, we asked 300 people about their concerns regarding artificial intelligence:
It seems like most people are worried about humans becoming fully reliant on computers, but almost one third of respondents were specifically concerned about AI destroying humanity. To further illustrate these perspectives on the issue, we collected a few of the most shared articles addressing some key questions:
Will artificial intelligence take our jobs?
Robots will destroy our jobs – and we’re not ready for it (theguardian.com) – Jan 11 2017
In a classic example of optimism bias, while approximately two-thirds of Americans believe that robots will inevitably perform most of the work currently done by human beings during the next 50 years, about 80% also believe their current jobs will either “definitely” or “probably” exist in their current form within the same timeframe.
As Enbar observed, the most urgent question we must answer is not one of robots’ role in the workforce of 21st-century America, but rather one of inclusion – and whether turning our backs on those who need our help the most is acceptable to us as a nation.
Will Robots Take Our Children’s Jobs? (nytimes.com) – Dec 11 2017
The Associated Press already has used a software program from a company called Automated Insights to churn out passable copy covering Wall Street earnings and some college sports, and last year awarded the bots the minor league baseball beat.
A much-quoted 2013 study by the University of oxford Department of Engineering Science — surely the most sober of institutions — estimated that 47 percent of current jobs, including insurance underwriter, sports referee and loan officer, are at risk of falling victim to automation, perhaps within a decade or two.
Just this week, the McKinsey Global Institute released a report that found that a third of American workers may have to switch jobs in the next dozen or so years because of A.I.
So, workers, experts say artificial intelligence will take all of our jobs by 2060 (newsweek.com) – May 31 2017
There is a 50 percent chance that AI be able to perform all human tasks better than humans in 45 years, and all human jobs are expected to be automated within the next 120 years, according to a survey of 352 AI researchers who published at either the Conference on Neural Information Processing Systems or the International Conference on Machine Learning in 2015.
Transportation innovators like Uber‘s Travis Kalanick and Tesla‘s Elon Musk have predicted that automated vehicles will disrupt the industry over the course of the next 20 years, and Musk estimates ” it will be very unusual” for cars that aren’t autonomous to be manufactured in the next decade.
Researchers are just now beginning to understand the ways in which automation can interact with the human body, and the impact AI will have on the health industry in the coming decades is impossible to estimate, except the idea that it will be significant.
” The accumulated doubling of Moore‘s Law, and the ample doubling still to come, gives us a world where supercomputer power becomes available to toys in just a few years, where ever-cheaper sensors enable inexpensive solutions to previously intractable problems, and where science fiction keeps becoming reality,” Brynjolfsson and Andrew McAfee, associate director of the Center for Digital Business at MIT, write in the book.
” There’s no economic law that says ‘You will always create enough jobs or the balance will always be even’, it’s possible for a technology to dramatically favour one group and to hurt another group, and the net of that might be that you have fewer jobs,” said Brynjolfsson.
I already talked to one big law firm and they said they’re not hiring as many of those sorts of people because a machine can scan through hundreds of thousands or millions of documents and find the relevant information for a case or a trial much more quickly and accurately than a human can,” said Brynjolfsson.
” According to our estimate, 47 percent of total US employment is in the high risk category, meaning that associated occupations are potentially automatable over some unspecified number of years, perhaps a decade or two,” they predict in the report The Future of Employment.
Imagine you are a home help aid or a nurse and you see an unusual mole or a lesion and are not quite sure what it is, you could use augmented reality glasses or other tools to send a photograph of that growth to a human expert or even an expert system, a decision-making system that analyses the shape and contours of that lesion and gives advice on whether you need to bring that person in for treatment,” he added.
” What our simulations show is that one aspect to the debate around artificial intelligence that is frequently lost is the fact that AI and digitisation will impact certain activities in our everyday lives, such as marketing automation or robotic advice, but it may not fully remove the 50 percent of jobs that some pundits talk about.
Disruption is a signal from the future that it is high time to adapt, and that smart investments in the right hardware and software, which includes your own thinking software, have to be made.”
” To me it is astounding that in Australia we are so obsessed with bricks and mortar property, but we are less concerned with investments in our own intellectual property, and AI certainly raises the stakes to ensure our thinking remains future-compatible.
” As a global futurist and futurephile, one of the things that excites me about artificial intelligence is the death of procrastination — anything ‘left brained’ that we avoided and delayed doing, like taxes, filing, travel expense coding, receipt management, and updating our calendars will be procrastinated on no longer.
Will artificial intelligence make humans fully reliant on computers?
Nicholas Carr: ‘Are we becoming too reliant on computers?’ (theguardian.com) – Jan 17 2015
As digital technology sprints forward, we’re not just learning about the possibilities of computer intelligence, we’re also getting a lesson in its limits.
The most subtle of our human skills – our common sense, our ingenuity and adaptability, the fluidity of our thinking – remain well beyond the reach of programmers.
It’s possible to imagine self-driving cars operating independently in tightly controlled circumstances, such as on dedicated highway lanes, but as long as cars have to handle the vagaries of real-world traffic in cities and neighbourhoods, a watchful, adept human will continue to have a place in the driver’s seat.
The shortcomings of robotic drivers and pilots reveal that the skills we humans take for granted – our ability to make sense of an unpredictable world and navigate our way through its complexities – are ones that computers can replicate only imperfectly.
Are computers making our lives too easy? (bbc.com)
And it does seem to me that this is a naive approach to take when thinking about technology in all its forms: in particular when thinking about computer automation, but also when thinking about our own desires and experience of life and of the world.
NC: I think that gets to a fundamental point, which is that the question isn’t, “should we automate these sophisticated tasks?”, it’s “ how should we use automation, how s hould we use the computer to complement human expertise, to offset the weaknesses and flaws in human thinking and behaviour, and also to ensure that we get the most out of our expertise by pushing ourselves to ever higher levels?”
What happens then is that you not only lose the distinctive strengths of human intelligence – the ability of human beings to actually question what they are doing in a way that computers can’t – but you push forward with these systems in a thoughtless way, assuming that speed of decision-making is the most important thing.
And I said, you know, I’m not saying that there is no role for labour-saving technology; I’m saying that we can do this wisely, or we can do it rashly; we can do it in a way that understands the value of human experience and human fulfilment, or in a way that simply understands value as the capability of computers.
And in the end I do think that our latest technologies, if we demand more of them, can do what technologies and tools have done through human history, which is to make the world a more interesting place for us, and to make us better people.
Will artificial intelligence destroy humanity?
Stephen Hawking: Artificial Intelligence Could End Human Race (livescience.com) – November 2017
” The development of full artificial intelligence (AI) could spell the end of the human race,” Hawking told the BBC .
The CEO of the spaceflight companySpaceX and the electric car companyTesla Motors told an audience at MIT that humanity needs to be ” very careful” with AI, and he called for national and international oversight of the field.
” I don’t see any reason to think that as machines become more intelligent … which is not going to happen tomorrow — they would want to destroy us or do harm,” Ortiz told Live Science.
Artificial Intelligence Is Our Future. But Will It Save Or Destroy Humanity? (futurism.com) – Sep 29 2017
“Normally, the way regulations are set up is a whole bunch of bad things happen, there’s a public outcry, and after many years, a regulatory agency is set up to regulate that industry,” Musk said during the same NGA talk.
Russian President Vladimir Putin recently stoked this fear at a meeting with Russian students in early September, when he said, “The one who becomes the leader in this sphere will be the ruler of the world.” These comments further emboldened Musk’s position — he tweeted that the race for AI superiority is the “most likely cause of WW3.”
Facebook’s Mark Zuckerberg went even further during a Facebook Live broadcast back in July, saying that Musk’s comments were “pretty irresponsible.” Zuckerberg is optimistic about what AI will enable us to accomplish and thinks that these unsubstantiated doomsday scenarios are nothing more than fear-mongering.
“We must first understand the concepts of how the brain works and then we can apply that knowledge to AI development.” Better understanding of our own brains would not only lead to AI sophisticated enough to rival human intelligence, but also to better brain-computer interfaces to enable a dialogue between the two.
South Korean university is secretly developing killer AI robot army that could destroy humanity, scientists fear (thesun.co.uk) – Apr 05 2018
A TOP South Korean university is secretly developing a killer Artificial Intelligence robot army that could destroy humanity, scientists fear.
KAIST university allegedly launched a new AI weapons lab in February, leading dozens of researchers to believe the products will ” have the potential to be weapons of terror” .
A top university in South Korea is secretly developing a killer AI robot army, scientists have warned.
AI Could Kill Us All: Meet the Man Taking the Threat Seriously (thenextweb.com) – Mar 08 2014
“So yes, the thing is that what we actually need to do is to try and program in essentially what is a good life for humans or what things it’s not allowed to interfere with and what things it is allowed to interfere with… and do this in a form that can be coded or put into an AI using one or another method.”
“Proper AI of the (kind where) ‘we program it in a computer using some method or other’… the uncertainties are really high and we may not have them for centuries, but there’s another approach that people are pursuing which is whole-brain emulations, some people call them ‘uploads’, which is the idea of copying human brains and instantiating them in a computer.
“We would just have to worry about the fact of an extremely powerful human – a completely different challenge but it’s the kind of challenge that we’re more used to – constraining powerful humans – we have a variety of methods for that that may or may not work, but it is a completely different challenge than dealing with the completely alien mind of a true AI.” David from ‘AI: Artificial Intelligence’
“(If) someone comes up with a nearly neat algorithm, feeds it a lot of data, this turns out to be able to generalize well and – poof – you have it very rapidly, though it is likely that we won’t have it any time soon, we can’t be entirely confident of that either.”
We would want to have done at least enough philosophy that we could get the good parts into the AI so that when it started extending it didn’t extend it in dangerous or counterproductive areas, but then again it would be ‘our final invention’ so we would want to get it right.
Frase shows SEOs not only what to write but how, says AI content marketer
Anyone who has managed a team of writers knows the importance of providing clear directions before the writing begins. Too little direction and the finished work can end up very different from what you had intended.
The challenge nowadays is that a list of keywords no longer cuts it as the basis for a creative brief. What may have worked in SEO five years ago doesn’t fly today. Instead, creative briefs must go deep on background research.
Consider how much Google’s understanding of semantic search has evolved in recent years
Google is no longer looking for specific keywords in your content, but is seeking deeper topic authority. Your creative brief needs to provide a wider understanding of the topic so the writer can incorporate the big picture into the creative process.
This involves doing background research as part of composing a brief. And doing research well takes time. At least that is the conventional wisdom.
Enter Fernando Nikolic, a digital marketer using AI to produce research-driven content briefs
Fernando Nikolic manages a global team of writers from his base in the German capital Berlin. He deeply understands the need for solid topic research and for translating what he finds into robust article outlines for his team. In fact, Fernando frequently blogs about the importance of incorporating AI-driven tools in content marketing. And Frase has proven invaluable to Fernando in running his editorial operation.
It wasn’t always the case. With previous research tools, Fernando ended up with a list of topics but no clear direction about how to craft the article. “These tools for content marketers only give you the keywords,” he observes.
Automatic summarization and topic modeling to speed up the research process
With Frase, it’s a different proposition. Using SEO solution CanIRank, Fernando begins by researching key terms across subject areas such as AI and cryptocurrency. Turning next to a dedicated Slack channel (the aptly named #KeywordFriday), he gives his international team of writers a chance to select the topics that most appeal to them.
From there, Fernando uses Frase to generate a research-driven content brief for each topic. At its core, Frase’s AI-powered word processor provides two main components:
The Frase Research Assistant (right side panel of the interface) summarizes and recommends topics by analyzing the most relevant search results for a given theme. For each source, Frase’s summaries provide the following structure:
In addition to summaries, Frase allows Fernando to explore the most salient topics across top search results. For example, for the theme “artificial intelligence for content marketing”, these are the related topics Frase encourages you to research:
Fernando uses both Frase’s summarization and topic modeling to compose a final research document, which is ultimately shared with his writers. “This is where Frase revolutionizes the whole game,” he says of his finely-tuned process.
The value to Fernando and his team is tangible. Frase’s ability to summarize articles in a user-friendly and intuitive way that immediately extracts the content’s core elements has proven key to fast, high-value editorial research and content creation.
Other content optimization tools fall short on value and usability, Fernando notes. “Ever since I discovered Frase I stopped looking for alternatives. Frase does everything I need.”
Better creative briefs for greater productivity, stronger commercial outcomes
The more deeply researched your creative brief, the better the final content is likely to be. Writers appreciate the additional context and the stronger editorial direction. Content managers value the extra control it provides around quality.
And that’s not all. Not only do the search engines reward deep, differentiated content that displays topic authority. The process efficiencies generated by Frase’s AI summarization technology will expedite areas of your editorial workflow that previously dragged.
Deeper, more relevant content. Your bottom line will thank you for it.
At Frase we are constantly thinking about the future of content creation, and how AI will play a role in making us a more informed society. This post discusses what the general public thinks about AI as a possible tool to combat fake news. It also explores possible alternatives.
In partnership with Lucid, the leading Human Answers Platform, we surveyed a demographic balanced audience in the US with the question: “can Artificial Intelligence (AI) fix the Fake News problem?”
Now, let’s give more background to the Fake News controversy. Around the 2016 US presidential election, political scientists from Princeton University reported that 1 in 4 Americans visited a fake news site. Considering that 62 percent of Americans get their news from social media, platforms like Facebook or Twitter should play a prominent role in combating the Fake News epidemic.
When Mark Zuckerberg told Congress that Facebook would use artificial intelligence to detect fake news posted on the social media site, he wasn’t particularly specific about what that meant or how feasible that would be. We know Facebook is currently working on a number of initiatives to mitigate the fake news problem, including an increased focus on content from friends and family as it relates to Facebook’s News Feed.
Most people don’t believe AI is the solution to Fake News
Based on our survey we can conclude that the majority of Americans don’t believe AI is ready to tackle the problem yet. That said, it is worth looking at some innovations occurring in the space. In fact, I could argue that the proliferation of fake articles on the Internet plays in favor of AI’s chances of learning how to detect fake stories.
After doing some research, I discovered that fake news can generally fall into these different categories:
entirely false or fake information;
discussion of real events with inaccurate interpretations;
pseudoscientific articles that pretend to have a research foundation;
articles that include a mix of unreliable opinions from online forums or social media platforms, most notably Twitter.
My initial reaction was that an AI system could be trained to classify articles into these categories. But of course, the system would not be perfect and would have a high degree of bias based on who labels the training dataset. In addition, a given website might have a combination of real and fake information, which makes the data collection more time consuming.
A company called Fakebox found an interesting solution. Their answer isn’t detecting fake news, but detecting real news! Real news is factual and has little to no interpretation. There are also plenty of reputable sources to build a dataset from.
In the words of Fakebox’s co-founder:
We trained a machine learning model that analyzes the way an article is written, and tells you if its similar to an article written with little to no biased words, strong adjectives, opinion, or colorful language. It can have a hard time if an article is too short, or if it’s primarily comprised of other people’s quotes (or Tweets). It is not the end all solution to fake news. But hopefully it will help spot articles that need to be taken with a grain of salt.
Another interesting tool is FakeRank, like Google’s PageRank for Fake News detection, only that instead of links between web pages, the network consists of facts and supporting evidence. It leverages knowledge from the Web with Deep Learning and Natural Language Processing techniques to understand the meaning of a news story and verify that it is supported by facts.
Unfortunately, AI can be also used for creating even more fake content
Here are few quotes that illustrate the dark side of AI in content creation:
“Video Machine-learning experts have built a neural network that can manipulate videos to create fake footage – in which people appear to say something they never actually said.” (theregister.co.uk)
“So incredible, that of the 1,600 reviews posted on the book’s Amazon page in just a few hours, the company soon deleted 900 it suspected of being bogus: written by people who said they loved or hated the book, but had neither purchased nor likely even read it.” (scientificamerican.com)
“In a paper published this month, the researchers explained their methodology: Using a neural network trained on 17 hours of footage of the former US president’s weekly addresses, they were able to generate mouth shapes from arbitrary audio clips of Obama’s voice.” (qz.com)
“Artificial intelligence can use still images of people and turn them into FAKE video – and even put words in their mouths The system takes an image of a person as well as an audio clip to create a video It identifies facial features in photos using algorithms that recognize the face. As audio plays, it manipulates the mouth so it looks like the person is speaking. With improvement, the researchers say the AI could make fake videos seem real. The system could eventually render video evidence unreliable in court cases.” (dailymail.co.uk)
So…could blockchain come to the rescue?
Besides AI, there is another technology that could play a major role in solving the fake news problem: blockchain. I found two projects that propose very similar approaches, as per their websites:
Eventum makes it easy for people to get paid for reporting on real-time events and information around them, while developers can get any data they want in a cheap, fast and secure data feed. It uses a ‘wisdom of the crowd’ principle and ‘blockchain-as-a-court-system’ on the Ethereum network to solve problems, including fake news, eSports data extraction and real-time feedback to AI algorithms.
PUBLIQ, a non profit foundation, launched a decentralized content platform that offers merit based, real-time, and instant remuneration to authors from all over the world, in order to combat fake news and biased reporting. The PUBLIQ Foundation builds a trust based ecosystem that is operated by authors, journalists, bloggers and advertisers around the world and encourages them to share their written perspectives in a safe and encrypted way.
The survey results are not surprising considering that even Facebook can’t articulate a clear plan as to how AI will solve the fake news problem. It seems like instead of attempting to find out if each story is factual, we might be better off examining its source and distributors. This is where blockchain might be able to help. In addition, AI could probe valuable in recognizing if the content of a story is very similar to that of other stories that had proven to be fake.