Channel technologies, Enterprise, Storage

Will Businesses Drown in Data?

The Enterprise Content Management (ECM) space is all about capturing, storing, managing and preserving the unstructured data, like documents, scans, video and audio. Many companies are currently struggling with the data flood that is heading right for them.

As the volume of unstructured data increases, the amount of storage available to preserve it all will decrease. You will find data that is redundant, obsolete or trivial (ROT) and provides no real business value to the company. Storing this ROT data is like throwing the money in the wishing well. However if you store the non-ROT data, you need to know what the value and contents are so you can effectively apply lifecycle governance to it.

In the capture area of ECM, where the documents are scanned or imported from the network, it has always been a challenge to correctly identify documents. This is mainly because the documents were classified by their visual representation and not so much on their content. Sure, we could use machine reading, like Optical Character Recognition (OCR), and scrape metadata values from the documents. But machine reading uses predefined areas where specific metadata fields (like policy number, client number or address) could be found.

Cognitive Computing to the Rescue

"The future is Cognitive" is a much heard phrase these days. And it is true, processing power and machine learning will sky-rocket IT into an era where better decisions can be made by computers. But when you look at the currently available technologies, it is clear that the cognitive era is already here. IBM has been a leader in this space for some years now, ever since IBM Watson won the game of Jeopardy! IBM Watson has stepped out of the laboratories and is or will be incorporated in IBM products. One example is IBM Datacap Insight Edition which IBM announced at Insight 2015.

This approach is so different from how humans read documents and classify them. Take for instance a letter. With the above approach, the letter would be classified as correspondence and be sent to a correspondence answering process in the organization. Once it reaches the first human in the process, he or she will read the document and determine that it concerns an inquiry for an insurance claim. Thus, the document is sent off to the insurance claims handling process, and so forth. What the reviewer does is look at the structure, many times the most important information in a letter is in the second or third paragraph. We start with the letter head with address and salutations, the introductory paragraph, followed by the real message in the following paragraphs, ending with the closing and signature.

So what makes IBM Datacap Insight Edition different? In the past, capture processes had to be extended with custom capabilities to integrate these values and assemble business objects from these values. These objects and values are still reviewed by humans to correct errors in classification of the document or refining the metadata values.

Example of classification

IBM has looked at this process and made their capture process better, though similar to the human approach, by including analytics capabilities. But it is more than only analytics, otherwise it wouldn't be cognitive, would it? The analytics capabilities are used to "read" a document as humans would do, as described above. The business objects as sender and addressee are captured, of course. Additionally natural language processing is used to interpret the paragraphs as a human would. The contents are analyzed and the document is classified. This way more facts about the document are automatically captured for instance: that it is a letter, a complaint, is urgent and that it contains confidential information like account numbers and product names. The information is stored and used to be used by  the cognitive aspect and provide a classification based on that context. And the best part? The system will learn from the previous classifications and the quality will improve, once you the system runs over a long period.

Cognitive Computing Will Disrupt ECM

This cognitive way of working will disrupt ECM, because it will provide a faster and higher quality of the classification. Before you can derive insights from the data, you need to know the value of the data. And once you know the value you can substantially act upon it. Classifying the data by leveraging analytics and cognitive computing at the start of the document lifecycle, in the capture process, has shown a precision rate of 95-98%1. No human can do this, with this precision.

The here and now is already cognitive, take some examples discussed by my colleague Ron Tolido. Organizations who use these technologies will derive insights from data; making sure that they are able to put a price tag on the data first. They will maneuver through the waves and take control of where the wave is heading, managing that part of the data. Those who will be fighting off the waves of data might be spending a lot of time and effort analyzing data that have no business value.  If you see the data flood coming and you don’t want to drown in a document disaster, catch the next wave and take off.

  • 1Note these are figures from Proof of Concepts, performed by IBM at various clients.
  • Image courtesy of IBM presented in client facing presentations used at IBM Insight 2015.

Patrick van der Horst is a managing consultant at Capgemini. Read more Capgemini blogs here.


Sponsored by Capgemini

With more than 180,000 people in over 40 countries, Capgemini is a global leader in consulting, technology and outsourcing services. The Group reported 2015 global revenues of EUR 11.9 billion. Together with its clients, Capgemini creates and delivers business, technology and digital solutions that fit their needs, enabling them to achieve innovation and competitiveness. A deeply multicultural organization, Capgemini has developed its own way of working, the Collaborative Business Experience(TM), and draws on Rightshore®, its worldwide delivery model.
Learn more about us at