The Science of Clinical Carepaths

Healthcare in America is becoming increasingly electronic and data-driven. 

Databases of Electronic Medical Records (EMRs) are rapidly replacing the traditional pen-and-paper practice, collecting years of diagnoses, lab test results, surgical records, doctor’s notes, etc. into a single searchable data store. Mining this wealth of data will enable care providers to deliver better, cheaper and more timely healthcare in the future.

To date, however, analytical techniques have been relatively rudimentary, and the rewards have been commensurately moderate as a consequence.

The challenge can be summarized as follows:

How do we use our records of the past to inform us about how we should treat patients in the future?

This question is simple to ask but surprisingly difficult to answer.

One proposed operational solution to this problem is to utilize clinical carepaths. Carepaths are templates for delivering care during a particular clinical procedure; for instance, total knee replacement surgery or treatment for cancer.

Loosely speaking, carepaths provide step-by-step instructions for treating a patient during an encounter with the healthcare provider. Currently, the steps in the carepaths are determined by a panel of expert physicians, who consult the medical literature and their own experience to come up with a set of best practices. This approach is labor-intensive, full of human bias and inherently very limited in the amount of information that can be consumed. Ideally, one would want to take a look at all the records (typically tens or hundreds of thousands) of previous treatments for a condition, and determine what works and what fails from the data.

The scale and complexity of EMRs makes this prohibitively difficult to do manually.

Even if a human could read 100,000 medical charts they would not be able to synthesize the massive variation in treatment due to patient conditions and physician practices. Complicating that further is the inherently noisy and nully nature of medical data and one quickly see that traditional analytical approaches are quickly rendered useless.

A method that can cut through the noise and variation, as well as infer the best series of steps, is necessary to solve the problem.

Topological Data Analysis (TDA), coupled with algorithms from computational biology and standard machine learning tools, provide just such a solution.

By constructing topological summaries of the space of treatments for a medical procedure, it is possible to get a handle on EMR data that has tens of thousands of features that vary over time. This compressed representation of the data allows accurate identification of groups of treatments in the past that lead to good clinical outcomes.

Grouping medical treatments – highly complex series of events – was a previously unsolved problem. We were able to tackle it by blending state-of-the-art techniques from genomics with our in-house expertise in topological mathematics. Once the data has been segmented in this fashion, it is possible to adapt other methods from biology and signals processing to the problem of determining optimal outcomes. It is also possible to link predictive machine learning methods like regression or classification to perform real-time carepath editing. What this means is that any proposed carepath can immediately optimized further based on the current situation as determined by the physician.

In this manner, algorithmic approaches can effectively side-step the problem of data complexity and size, letting care givers work hands-on with their data, receiving decision-support backed by hundreds of thousands of impartial records instead of their own human experiences and biases.


More generally speaking, this framework can be thought of in terms of process optimization: given some process containing a series of complex actions, and records of previous processes, how can we find the optimal actions? It need not be restricted to standards construction: if one is half-way through a series of steps, the method could be extended to suggest the next most appropriate action.

The applications to healthcare are multifarous, and also extend to any domain where longitudinal records of any business process are kept (banking, retail, manufacturing, etc.).

Given enough data, it would also be possible to create true precision medicine.

One can use these TDA techniques to characterize the type of each individual patient, and then provide medical advice in accordance to their unique properties — in real time as the patient condition changes during treatment.

Compared to the current state of medicine, which operates almost entirely on intuition and individual experience, the combination of big data with powerful algorithms and human creativity is set to propel healthcare far beyond its current ad-hoc practices.