Automating Digital Video

We're On The Cusp of a Video Data Revolution, and We Need to Tell Those Data Stories


"It is estimated that media companies and user-generated content creates over 2 billion digital images and over 1 billion hours of video watch time every day." -- Justin Pang, head of publishing partnerships at Google.

I don't have to tell you that a digital video revolution is underway. That you know. But what I do want to get across in this article is that we're quickly approaching a critical point at which the explosion of unstructured data generated by digital video content will make it next to impossible to understand, utilize, or even recall most of the information contained in all of that video.

Automation can fix that.

Natural Language Generation (NLG) should be telling the stories behind all that video using all that unstructured data. This is the message I'm bringing to NABShow, the annual gathering of the National Association of Broadcasters, when I speak there on April 23rd. If you're in broadcasting and working with digital video, I want to talk to you. Get in touch.

I've been automating content for seven years, from the very inception of our company, Automated Insights, and have produced billions of unique and insightful human-sounding narratives from raw data for companies like the Associated Press and Yahoo. All along, I've been fighting a battle for acceptance of automated content in the universe of traditional journalism.

Last week, the Associated Press published a report that neatly summarized that battle and declared it all but over. Augmented journalism, the term they use for the integration of human and machine in the creation of news stories, is not meant to take journalism jobs away from humans, it said. Augmented journalism should be standing side-by-side with traditional journalism to incorporate the data science required in contemporary journalism while complementing the investigative process and conclusive reasoning inherent in the job of the journalist.

That's the message I took to the Columbia School of Journalism in 2013, and to SXSW in 2016, and again to SXSW earlier this year when I spoke about the Automated Future of Journalism with executives from the Washington Post and the New York Times. Each time I relayed this message, it resonated to a greater degree with journalists and media executives.

However, at this year's SXSW talk, I also started discussing the role of automation around video content. It's something I touched on at the end of this interview with NPR in late March, and it's the focus of what I'll talk about at NABShow on April 23rd. I've been researching, strategizing, and prototyping for about a year now, and I've figured out where automation plays with video.

It's not where you might think.

Despite recent media speculation, we're not heading for a video future in which one automated talking head holds a conversation with another automated talking head. This isn't happening. It's the same kind of misunderstanding of the medium I had to debunk when people thought robots would be writing all the news, all the time.

This speculation ignores the fact that machines and humans will continue to work together, like they have over 100 years of automation history. I get it, ignoring the symbiotic working relationship between machine and human is easy. It makes for good dystopian movies and novels. But like I've said from the beginning, if you want automation to work well, it has to be a partnership. The focus shouldn't be on making the machine independent, it should be on removing the most expensive, time-and-resource consuming tasks from the human's plate.

Video Data: What we know now

As video publishing formats and distribution models evolve, we're creating more data around video content. In most cases, a lot of this data isn't automated, but it is required for the video to be published and discovered on channels like YouTube or Facebook.

We get what I call the basics: Title, category, keywords, length, and even who is in the video and where and when it was shot. Quite a bit of content information can be discovered from this metadata. In a lot of cases, a description is also entered at publishing, although these descriptions can be lacking. They're far from in-depth, they're unstructured, and they're usually an afterthought.

As automation comes to digital video, we're starting to learn a lot more about the content. Auto-captioning is standard on Facebook now, and while those auto-captions are far from accurate today, they have fortified the concept of standard subtitle files (SubRip Text or SRT) for digital video.

What we will know soon

As the audio detection technology that provides the automation of these SRT files becomes more accurate and robust, video detection technology will be right behind it. This will allow for the automatic recognition of people and objects in the video, and will not only provide information as to who said what, but will also provide context, based on what objects are in the shot.

Again, this is not as dystopian as you might fear. Think of how Facebook can, in photos, identify faces and recognize some and prompt the publisher to name others. The same process is happening in video, and it's close, but not quite ready for prime time yet.

What we can do

Once this information is automated and available, it can be used to further refine the categorization of the content of the video, a sort of audio and video topic modeling. The end result is that the more you know about the video before you watch it, the better you can determine whether it's something you'll want to watch.

Unstructured data becomes structured. We'll be able to auto-summarize video content the same way we auto-summarize written content, allowing for a much broader and richer viewing experience, more meaningful engagement with video, and more useful information delivered to the end user in a shorter amount of time.

Why we should do it

When I talk about automating narrative content from data, especially in a media and journalism context, I often come back to increases in the reach, depth, and speed of the organization to bring unique, relevant, and personalized information to the audience.

We do this very well with written content today, but as video eclipses written content in terms of preferred method of information delivery, which we can all agree is happening at a pace we didn't imagine just five years ago, we need to be able to do this with video content as well.

Earlier this year, Automated Insights co-hosted a hackathon with the Amazon Alexa team and 15 teams ranging from startups to Fortune 500 companies. Using our NLG technology and Alexa's Natural Language Processing (NLP) and speech technology, we created mind-blowing applications that allowed end users to receive spoken personalized news, financial, school, weather, and all sorts of other information, just by asking a question.

This is where we're going with video, and this is why we need to tell these data stories around that video. Again, if you're in this space, and especially if you'll be in Las Vegas on 4/23, I want to talk to you. Hit me up here.