In the enterprise and in higher education, video content that can’t be searched has little value.
The majority of business and academic video content is long-form. Town hall meetings are often 30-60 minutes in length. Recorded classroom lectures typically run an hour. Online training videos can range from 15 minutes to well over an hour. According to Cisco, long-form video made up 64% of all video traffic in 2014, a figure that is expected to grow.
With long-form video, traditional “YouTube-style” search is insufficient. Even if videos are extensively tagged, YouTube-style search can only help users find the start of the video. It doesn’t help them find the specific points in the video where their search term actually appears.
Finding content inside a video’s talk track and other presented materials is the challenge of enterprise video search. It’s what makes a 15, 30, or 60-minute video valuable because it allows employees to search and quickly access the content as easily as they would in email, documents or web pages.
In 2014, Panopto launched Smart Search to address the shortcomings of traditional video indexing. Smart Search automatically indexes words in the presenter’s talk track (a process called automatic speech recognition or ASR) and all words that appear in the video (a process called optical character recognition or OCR). OCR is particularly important for business and academic videos, which typically include formal presentation materials or on-screen demonstrations.
Today, we’re excited to announce an important update to Smart Search. In the next couple of days, customers on the Panopto cloud will notice significant improvements in the speed of OCR indexing and in the quality of search results.
Near-Instant indexing: Through updates to our OCR engine, we’ve dramatically improved the speed at which videos are indexed. In our tests, videos that ranged in duration from one minute to 30 minutes were fully indexed during the encoding process. As a result, when the videos were ready for playback, they were also ready to be searched.
Improvements to OCR quality: In addition to speeding the indexing process, we’ve also dramatically improved the quality of our indexing algorithm.
To provide a sense of the new algorithm’s accuracy, we created two tests. The first shows how well Panopto’s OCR handles text of gradually decreasing font size. On a 1920×1080 screen, character recognition was accurate down to, and including, 8-point font.
The second test shows the accuracy of Panopto’s OCR as contrast ratio decreases. In this case, the contrast ratio is measuring the luminance between the text and the background.
You’d expect text recognition to work well when the text is black (RGB 0, 0, 0) and the background is white (RGB 255, 255, 255). As the text color gets lighter , however, contrast ratio decreases. This makes it harder for OCR to accurately distinguish the text from the background.
In our test, we used 16-point font, which is the default size for desktop web browsers. We began with a contrast ratio of 21 (black text on a white background) and gradually decreased contrast ratio to 1.7 (RGB 200, 200, 200 on a white background). As context, acontrast ratio of 1.7 falls far below the W3C’s Web Content Accessibility Guidelines(WCAG 2.0), which specify that the presentation of text have a contrast ratio of no less than 4.5:1.
Yet, even at this low contrast ratio, Panopto’s OCR engine was able to accurately recognize 100% of the text.
This article was originally published on Panopto’s blog