Understanding the disease cycle: groundbreaking research from Stanford using TDA

One of the great rewards of our collaboration program is seeing groundbreaking work get published using our software. Stanford Microbiology, and in particular, Dr. David Schneider and his lab seem to achieve this status annually.  

To wit, they recently won a $6M grant from DARPA to continue their pioneering work in understanding disease recovery. Their work was so substantial that it was published in two separate PLOS papers.  Both papers demonstrate how our underlying technology, Topological Data Analysis (TDA) was used to map the way hosts loop through the disease space.

The work was performed using the unsupervised learning capabilities of our platform and are distinguished by the visualizations of disease maps using cross-sectional data.

Currently, there is no/little data collected when animals/humans recover from disease, so the potential business impact is significant and would vary by disease indication.

The Schneider lab used TDA to cluster data without imposing a connection structure such as a hierarchical pattern or least branching tree. Topological networks provide a striking representation of the health space that resembles the disease maps in which distinct regions of the networks correspond to distinct parts of the disease: comfort, sickness, and recovery.  Indeed, both the mouse and human datasets form these elegant, clearly defined looping structures.

Dr. Schnieder’s team mapped the intensity of parameters such as parasites, RBCs, granulocytes, or reticulocytes – finding that the mouse and human infections are collinear in many respects, having the same order of events. TDA graphs were used to separate the living and dying mice into two different paths and then determined how gene expression differed between the two groups. The researchers demonstrated that RBCs and reticulocytes differed in their representation in living and dying mice as their paths through disease space separated.

One interesting element is that TDA was the only technology that would have worked in this case.

TDA produced graphs that are more obviously looping because the topological networks pull gaps closed. They visualized the “disease space” traversed by infected hosts and identified the different states of the infection process. Resilient systems will not be fit by a tree and are better described by loops – but loops elude most analytical approaches.

TDA, on the  other hand,  is sensitive to the “shape” of the data and will not arbitrarily linearize a loop and then force it to fit a tree. Instead, the analysis simply clusters related data points, represented as nodes on a network graph, and the shape of that graph reveals the connections between the time points. In the case of a resilient system, such as hosts recovering from an infection, this graph forms a loop.

Below are the author summaries for each paper with the link to actual paper.  


Tracking Resilience to Infections by Mapping Disease Space


“When we get sick, we long for recovery; thus, a major goal of medicine is to promote resilience—the ability of a host to return to its original health following an infection. While in the laboratory we can study the response to infection with precise knowledge of inoculation time and dose, sick patients in the clinic do not have this information. This creates a problem because we can’t easily differentiate between patients who are early in the stages of infection that will develop severe disease from more disease-tolerant patients who present later in the infection. The distinction between these two types of patients is important, as the less disease-tolerant patient would require a more aggressive treatment regime. To determine where patients lie along the infection timeline, we charted “disease maps” that trace a patient’s route through “disease space.” We select symptoms that produce looping graphs as patients grow sick and recover. Using a mouse–malaria model, we demonstrate that less resilient individuals take wider loops through this space, representing a longer infection time with more severe symptoms. We find this looping behavior also applies to humans and suggest that people carrying the sickle cell trait are more resilient to malaria infections.”

How Many Parameters Does It Take to Describe Disease Tolerance?

“It is an intuitive assumption that the severity of symptoms suffered during an infection must be linked to pathogen loads. However, the dose–response relationship explaining how health varies with respect to pathogen load is non-linear and can be described as a “disease tolerance curve;” this relationship can vary in response to the genetic properties of the host or pathogen as well as environmental conditions. We studied what changes in the shape of this curve can teach us about the underlying circuitry of the immune response. Using a model system in which we infected fruit flies with the bacterial pathogen Listeria monocytogenes, we observed an S-shaped disease tolerance curve. This type of curve can be described by three or four parameters in a standard manner, which allowed us to develop a simple mathematical model to explain how the curve is expected to change shape as the immune response changes. After observing the variation in curve shape due to host and pathogen genetic variation, we conclude that the damage caused by Listeria infection does not result from an over-exuberant immune response but rather is caused more directly by the pathogen.”