The Extraordinary Value of Dark Data

It seems as if everyone is talking about big data this week with an infographic here and there making the rounds at CES. It makes sense, we are awash in data, the Twitter firehose does run 24/7/365 and the IoT is truly ramping.

But is the only valuable data new data?

I think we all know that is not the case.

In fact, the opposite is probably true. Within the dark data exists far more value than the shiny, new data.

To develop a more complete picture, a picture that is comprehensive, complete and can be subsequently validated by more chronologically close in data – we need to look at what we have.

We have written extensively on how new database technologies have revolutionized the collection of data – to the point that the decision to store or not store is as expensive as just storing. The natural result of that is for the trend of data collection to continue unabated – if not accelerate.

Still, every business executive knows in their heart that while they look at the streams coming in, that there is value is what already exists inside their businesses.

They just don’t know how to get at it. They don’t have the right tools, the techniques or the approaches. This is often a function of the old adage, “in the right place at the right time.” When the data came in did you have what you needed to extract maximum value?

If you didn’t or tried and couldn’t find the value did you move onto what you could address or what tools did work with what you had?

What was left behind had value – but it remained untapped. Treasure, buried in a chest with a hastily drawn map that was quickly forgotten if not lost.


But the value is still there and frankly compounded as new data arrived. What also compounded was the size and complexity of the challenge.

This is where techniques like machine intelligence at their best. Leveraging topological data analysis and powerful machine learning algorithms can reveal the value and the secrets that exist in these massive stores of data.

This isn’t a theoretical post, but a real one. Our clients are doing it across multiple lines of businesses and our collaborators are publishing ground breaking work on the subject.

Take UCSF for example. The team there worked with a 20 year old study that was widely considered a failure – costing north of $60M.

The reason?

The technology to discern what was important in the data didn’t exist. That technology turned out to be ours and the results were profound enough to land in Nature Communications and Fast Company

There are petabytes of this dark data and it all has value.

Another example is from the Netherlands Cancer Institute. They assembled a rich genomic picture of breast cancer and made that data available. Hundreds of researchers studied the data – for the better part of a decade. When Ayasdi applied TDA to the problem, an important new discovery was made – a discovery that gave hope to patients that would have otherwise been expected to die. The answers were there – extracting them was the challenge.

Finally, our client/collaborators Mt. Sinai recently published some breakthrough work on Type 2 diabetes.

Screenshot at Oct 28 10-54-32

Here again, they had collected genomic, clinical and other types of data but struggled to create a coherent picture of the disease. With topological data analysis, they were able to “see” the groups clearly and use additional techniques to verify those discoveries. This has broad impacts on how to treat this epidemic.

All of these examples – and the dozens more that we don’t have room for or permission to share – speak directly to the value the exists inside our organizations today. In the past we didn’t have the techniques to extract it – and indeed, we will look back in a decade with the same perspective about the tools and technologies subsequently.

What is clear, however, is that techniques like machine intelligence can transform how businesses operate from  – using what we already have in our possession.  

Let’s get started.