This is everything I've written here or elsewhere over the last few years. You can search, filter by publication, or choose from some highlights below.

Defining the Context Layer in NLG. Part 1: What Just Happened


In my last blog post, I outlined the differences between good Natural Language Generation and bad NLG. My conclusion was that all of the hard work and sweat should be going into defining the NLG context layer, the logical decision matrix that sits between the data and the words.

This is because NLG content is less about the words and more about the insights, the data. The context layer will help make sure the words are used to convey the data to the reader in a manner that can be absorbed quickly and easily.

So how do you define a context layer?

First and foremost, defining the context layer is a business process, not a technical process. It's akin to defining business requirements for custom software, and then translating those business requirements into technical requirements. The beauty of Wordsmith is that it allows anyone, regardless of technical aptitude, to translate their own business requirements into automated content.

But like any business initiative worth pursuing, just because the process isn't technical doesn't mean it isn't complex. In fact, when we work with customers to assist them in creating massive automated content projects, like Yahoo's Fantasy Football Recaps or the AP's Quarterly Earnings Reports, we spend the vast majority of our time working with our customer to define the context layer, and we take most of our cues from their industry experts.

The goal when creating good automated content is to unlock the story hidden within the data. And those stories can usually be boiled down to a few basic structural elements.

In this post, we'll look at the first and most important of those structural elements, and talk about building a context layer around it.

read the rest at:

Good vs. Bad Automated Content? It's In the Context Layer


The difference between good automated content and bad automated content can be boiled down to the number of scenarios the programmer creates to turn ordinary data into beautiful prose.

Data variability, which is predicated upon the number and the depth of insights driven by changes in the data, is the key quality driver in Natural Language Generation (NLG). And to do NLG data variability right, you have to create a lot of scenarios.

NLG creators must always be asking: How vast is the universe of outcomes that the engine takes into account when creating a narrative?

In other words: How many ways can you say something?

It's not a coincidence that this is the same approach used when developing NLG's reverse twin, Natural Language Processing (NLP).

Words from Data meets Data from Words

People get touchy when you confuse NLG and NLP, especially those people who do either for a living (which is not a lot of people, but they still get touchy). The truth is that there is a lot of commonality between NLG and NLP. The core concept is the same: Understand the input and translate to the output.

While NLP takes in words and translates those words to data, NLG takes in data and translates that data to words. But creating words isn't the hard part of NLG. In fact, we've reached the point where machines can create complex sentences without too much trouble. In its simplest form, creating words from data is a binary proposition:

read the rest at:

NLG: The Secret Weapon in the War Between Financial Managers and Robo-Advisors

How to make the human touch more human using natural language generation


When the experts discuss the merits of robo-advisors over financial managers, they'll be quick to point out that the robo-advisor, a form of low-AI personal financial portfolio management, is popular with new and younger investors, and is gaining ground with lower-net-worth investors.

They'll point to fees as the culprit. In other words, the robo-advisor is far less expensive, charging less than 1% of the value of one's portfolio, as opposed to the traditional 1% to 3% charged by most professional financial managers.

They'll also jump on the shift in user experience expectations from boomers to millennials. Younger people tend to do everything digitally and quickly, with as little personal contact as necessary. That's how they shop, get themselves from place to place, order food, find lodging, and so on. On-demand, push-button, machine-recommended-options are just the way the kids do things these days.

As a counter-argument, the professional managers offer a more -- pun intended -- human touch. Their experience, their ability to research, and the option to call or email or visit the local branch are all selling points.

What the professional financial managers tend to miss is that the human touch, so often lauded as their unique differentiator, isn't as human as it used to be. If professional managers want to reach and accommodate this new investor class, they need to be able to scale the human touch.

read the rest at:

The NFL Provides a Quantum Leap for Automated Journalism


Yet another huge advancement for automated journalism was announced yesterday. Not surprisingly, it came from a vertical that's been at the forefront of automation technology for years: Professional sports, or in this case specifically, the NFL.

The NFL announced that it will expand an existing stats-tracking test program and, for the 2014/15 season, will be equipping every player with a sensor under each shoulder pad. The sensors will provide near-real-time information on each player's location and speed.

Back in May, I gave a talk on the future of automated journalism at the Tow Center for Digital Journalism at Columbia University. During that talk, I devoted some time to discussing the Robot Reporter, a growing network of chips and sensors that collect and deliver data to automated content platforms like Automated Insights' Wordsmith, which then instantly creates news articles from that data.

One example I gave was Quakebot, the LA Times template-driven software that broke its first widely-recognized earthquake news back in March. The second example was about sports, and all of the sensors currently being used to track events like balls and strikes, measurements like first downs, and the NFL's existing trial with player sensors.

With those sensors in place, I said, it's easy for us to make the jump to more qualitative analysis of a game, not just a statistical overview.

That part was met with a lot of excitement, and I spent most of the Q&;A talking about whether or not it was true and how big an advancement those sensors were.

read the rest

Study Finds Human Writing Indistinguishable from Automated Insights Content


A study published in the 2014 issue of Journalism Practice proved that not only was Automated Insights machine-generated content indistinguishable from journalist-created content, but that our automated content was viewed as more informative and more credible.

Christer Clerwall, from the Department of Media and Communication Studies at Karstad University in Sweden, conducted a pilot. For the test, 46 students in media and communications studies were given either a professionally-written NFL game recap from the L.A. Times or an automated recap of the same game from Automated Insights. They were asked to assess their article on both quality and credibility. They were also asked whether the article was written by a journalist or our engine.

Robot or Human

This certainly isn't the first time Automated Insights has been directly or indirectly involved with a robot vs. human test. It's something we always do in-house as a part of our normal QA process. Considering we've been at this for almost four years, we weren't surprised at the results.

From the study:

"Of the 27 respondents who read the software-generated text, 10 thought a journalist wrote it and 17 thought it was software-generated. For the 18 respondents in the “journalist group”, 8 perceived it as having been written by a journalist, but 10 thought software wrote it. Using a Mann–Whitney test for significance, we can conclude that there is no significant difference (U = 225, r = -0.07, significance = 0.623) between how the groups have perceived the texts."

read the rest

Power to the People: Redefining Personalized Content


Personalized content isn't what you've been led to believe it is. But it's awesome.

In the great race to bring more eyeballs to web and mobile sites, the term “personalized content” has come to mean the employment of machines (algorithms) to determine what the viewer is most interested in reading. Automation is then used to aggregate that content from across the Internet to present to that viewer, who is then more likely to read and enjoy it. More importantly, the viewer is more likely to click on the ads that accompany that content.

So if you're reading an article about Las Vegas, and you've read other articles about Las Vegas, you're probably getting an ad for the Venetian.


This isn't a bad theory. It's like Pandora for content (and I've recently seen more than one startup call themselves the Pandora for content). But there are a couple of things wrong with it:

read the rest at:

Evolving Personalized Automated Content with Yahoo Fantasy Football Matchup Recaps


Over the span of 67 days, the team at Automated Insights went from a blank sheet of paper to what will eventually become tens of millions of Fantasy Football matchup recaps for all players of Yahoo Fantasy Football. It's the largest, deepest, broadest application of personalized automated content generation ever attempted and, judging by the thousands of tweets and emails that we've received, I'd say we nailed it.


The concept of personalized content isn't new. In fact, it's one of the oldest concepts in the communication book. The process of disseminating facts and aggregating data for the sole purpose of updating a single person is one usually undertaken by a team of people (or in some cases, one highly-paid expert individual), in order to provide a decision-maker with the necessary information to make additional decisions. It's an executive status update, a high-net-worth individual's financial statement, a scouting report for the head coach.

In all those cases, however, the person receiving the report is in a position to spend a lot of time and money to have someone draw insights, formulate conclusions, and coax suggestions out of impossibly large amounts of data.

read the rest at:

Showing 1 to 7 of 7 articles