BY Devi Ramanan
Technology startups wanting to “Change the World” have become something of a cliché in Silicon Valley. Some would say that this attitude reflects a naïve sense of optimism, but at Ayasdi, we not only want to change the world, we are doing it. Together with our collaborators. Incrementally. One scientific breakthrough at a time.
We are leveraging our AI platform, machine learning capabilities and software applications to provide transformative solutions to a wide variety of long-standing challenges, ranging from Polio/Refugees to Disease Recovery, Genomics, Spinal Cord Injury/Traumatic Brain Injury, Asthma, Diabetes, Oncology, Proteomics, Predicting Earthquakes, and Macroeconomic loops… to name a few. You can see our full list of peer-reviewed publications here — more than 24 top publications in the last year alone! Our scientific collaborators are thought leaders in their respective fields who, with access to our platform/SDK, are solving some of the world’s most complex and pressing challenges.
What is the key to enabling so many breakthroughs in such diverse disciplines? Our AI platform — leveraging Topological Data Analysis, or “TDA” — is the innovation behind the solutions to all of these challenges where traditional technologies have failed.
TDA gives us the ability to handle high dimensional, unstructured/unlabeled data with weak signals that presents itself in a time series fashion. Or, you could say that TDA gives us the ability to analyze tremendous amounts of data — and how it evolves over time — to identify patterns and relationships that previously went unrecognized, and find solutions in answers to questions no one knew to ask.
The very purpose of data analysis and data science is hypothesis generation and hypothesis refinement. Our platform supports hypothesis generation by empowering subject matter experts with a platform that is simple to use, but delivers impressive scientific results.
As noted in our white paper, Understanding Ayasdi, we believe that for an application to be truly intelligent, it needs to meet several criteria. We call these criteria the five pillars of enterprise intelligence: Discover, Predict, Justify, Act, Learn. In this post, I will focus on the first three pillars – Discover, Predict, Justify – and how they have delivered a windfall of successful results for scientific researchers.
Why is discovery important? It is very difficult to confirm you are asking the “right” questions with any dataset, and this is especially true with complex datasets such as genomics data. In medical research applications, for example, Ayasdi’s approach does not require the development of extensive clinical hypotheses and can automatically map relevant patient subgroups based on advanced mathematical algorithms, providing researchers with answers to questions they didn’t even know to ask.
TDA uses unsupervised and semi-supervised machine learning to find patterns in data. U-BIOPRED used our artificial intelligence to discover a 1693-gene signature to meaningfully distinguish severe asthmatics from non-asthmatics and mild-to-moderate asthmatics. By segmenting the asthma population, the researchers hope to develop targeted treatments for patients who will respond to therapy. Such treatments have been effective in treating diseases that involve just a small number of genes, but it has been far more challenging to develop targeted medicines for conditions – like asthma – involving hundreds or thousands of genes. Ayasdi’s technology is well-suited to the challenge, given its ability to manage extremely high levels of complexity.
“Because asthma is a disease with a high variance in pathologies and is still not well understood, the ability to use the Ayasdi platform to drive unsupervised, multidimensional queries has been integral in accelerating our research,” said Dr. Timothy Hinks. “This progress has allowed our team to be less biased in generating hypotheses about the data. This has helped us focus on driving data-driven hypothesis that saves time and makes our work applicable to all healthcare workers treating asthma and similarly pathologically diverse diseases. Using Ayasdi, generating a network at an appropriate resolution to give significant insight takes only a few hours until insights can be gained. This gives a clear picture of the distinct groups of asthma we as clinicians see presenting to our severe asthma clinics, and will help with identifying subgroups for future clinical trials.”
This and other asthma research is published in The American Journal for Respiratory Critical Care Medicine, and in three different JACI articles – here, here, and here. The four studies, taken together, are helping researchers begin to build a clinical and genomic profile of patients with severe asthma. And this methodology is very applicable to other complex chronic diseases as well.
Evaluation of pathological heterogeneity using TDA with the aim of visualizing disease clusters and microclusters
TDA representation of hierarchical clustering of the severe asthma disease signature
Another advantage of TDA in the discovery phase is that it does not need labeled data but can readily incorporate it when available. Regarding diabetes, our collaborators at Mt. Sinai published a paper in Science: Translational Medicine on their work using TDA to identify previously unknown diabetes subtypes, identifying three distinct subgroups of type 2 diabetes in patient-patient networks.
Mt. Sinai has a large database that pairs the genetic, clinical, and medical record data of over 30 thousand patients. In addition to genomic sequencing data, the database also includes information about each patient’s age, gender, height, weight, race, allergies, blood tests, diagnoses, and family history. The team used a precision medicine approach to characterize the complexity of T2D patient populations based on high-dimensional electronic medical records (EMRs) and genotype data from 11,210 individuals, including genetic markers and clinical data, such as blood levels and symptoms. They were able to uncover hidden patterns in large and complex datasets, enabling research institutions to expedite biomarker discovery, segment disease types, and target drug discovery.
Patient-patient network for topology patterns on T2D patients
Leveraging unsupervised and semi-supervised machine learning, we are also able to identify previously unrecognized patterns in data. Upgrade Capital, as part of the Code in Finance program, used Ayasdi to help investors better understand the current state of financial markets by identifying analogous past states. Such an exercise is very challenging in traditional risk analysis, which tends to rely heavily on dimensionality reduction. Using Ayasdi, the researcher was able to tie together four decades of macroeconomic and market data in a loop, representing the economic cycle. To confirm the validity of this interpretation, he identified distinct regimes such as expansion, contraction, and recovery, and then confirmed that they generally follow each other in a consistent order.
Economic Cycles: Data assembles into distinct regimes. The economic cycle runs clockwise.
We generate relevant features for use in prediction tasks or find local patches of data where supervised algorithms may struggle. Why is prediction important? Our collaborators are not satisfied with discovery — they want to impact the world, and we help them predict the impacts of their discoveries.
TDA uses compressed representation to generate novel features , like time series datasets. Stanford microbiologists used our platform to predict whether an individual will recover from disease. Currently, there is little data collected when animals or humans recover from disease, so the potential business impact is significant and would vary by disease indication. This work was so substantial that it was published in two separate PLOS papers, and using these two papers, they won a $6M grant from DARPA to continue their pioneering work in understanding disease recovery. Using the unsupervised learning capabilities of our platform — distinguished by visualizations of disease maps using cross-sectional data — they mapped the way hosts loop through the disease space. TDA was the only technology that would have worked in this case. TDA produced graphs that are more obviously looping because the topological networks close these gaps. Stanford visualized the “disease space” traversed by infected hosts and identified the different states of the infection process. Resilient systems cannot be mapped by a tree and are better described by loops, which elude most analytical approaches, but don’t elude our platform.
Reconstructed human and mouse disease space maps from longitudinal and cross-sectional data
Using hierarchical models , TDA also discovers the structure of the data and builds distinct local models. In yet another groundbreaking example of our prediction capabilities, our collaborators at the University of Montana used Ayasdi to predict earthquakes of all things! Using Ayasdi and time series analysis, they demonstrated that large earthquakes appear to synchronize globally, in the sense that they are organized in time, according to their renewal properties, and occur in groups in response to very low-stress interactions. Major quakes appeared to cluster in time — although not in space – and the number of large earthquakes seemed to peak at 32-year intervals. The earthquakes could be somehow ‘talking’ to each other, or an external force could be nudging the earth into rupture.
A topological network [Carlsson, 2009] relating earthquakes (nodes) with similar renewal interval and date of occurrence using a variance-normalized Euclidian metric on two real-valued derived measures of event properties: L-infinity centrality and Gaussian density.
In another example of TDA’s use of compressed representation to generate novel features, UCSF used Ayasdi to discover insights into Spinal Cord Injury/Traumatic Brain Injury and Osteoarthritis. They verified that hypertension is a prognostic indicator of survival in Spinal Cord Injury / Traumatic Brain Injury — simply using drugs to lower hypertension immediately post-injury/pre-surgery could drastically improve outcomes. Using techniques designed for uncovering hidden relationships between large numbers of variables, UCSF retroactively mined old, “dark” data, discarded from an old clinical trial, potentially saving millions of dollars. In another Nature paper, UCSF used TDA to identify a unique diagnostic subgroup of patients with unfavorable outcome after mild TBI that were significantly predicted by the presence of specific genetic polymorphisms — hence TDA could provide a robust method for patient stratification and treatment planning targeting identified biomarkers in future clinical trials.
Behavioral outcomes of forelimb function and histopathology were mapped onto the topological network using TDA
Methodological work-flow for integrating diverse clinical TBI data
Why is justification important? For prediction to have value it must be able to justify and explain its assertions and diagnose failures. TDA uses local information to provide justification, with the quality of the groups dictating the quality of the justifications.
Our collaboration program has demonstrated extraordinary potential to better the human condition. SumAll.org used Ayasdi to perform a quick systemic cluster analysis of a polio vaccination campaign for Syrian children. The data was provided by HumanitarianTracker.org and was complex in that it contained patient, temporal and geographical data — but was fluid, given the study area was an active warzone. Our software was able to interact with all of the available feature classes at once and the results were then visualized in such a way that any interesting behavior in the data could be quickly identified. By understanding where children are most likely not reachable for follow-up doses, SumAll.org was able to work with Humanitarian Tracker to evaluate a strategy to ensure that all individuals were receiving their doses. For humanitarian aid organizations strapped for time and resources, ease and efficiency of reliable statistics and reporting are critical, and in cases like this, Ayasdi can significantly reduce the investment needed for analysis.
SumAll.org identified two statistically distinct groups where Not Reached Doses were the largest defining factor, with statistically significant districts within each group
In the field of materials science engineering, EPFL researchers used our software to develop a pore recognition approach to quantify the similarity of pore structures and classify them in nanoporous materials, publishing in Nature. Quantifying similarity of pore structures allowed them not only to find structures geometrically similar to top-performing ones, but also to organize the set of materials with respect to the similarity of their pore shapes. This will be extremely useful to architects and construction companies.
Mapper plot of the best zeolites (top 1%) for methane storage
Columbia University published an analysis of the typology of viral evolution in PNAS, illustrating another example of TDA identifying relevant structure in real-world data that is invisible to classical techniques. Mathematical biology has traditionally represented evolutionary processes as a branching tree, modeling Darwin’s tree of life. That is, evolutionary networks have been depicted as tree-like, without any loops in the underlying networks that model time evolution. This paper shows such loops exist in real data, and that TDA is required to find and understand them. This is the first rigorous and systematic proof that such structures exist — and now it is clear that this is a ubiquitous phenomenon.
Linking algebraic topology to evolution
These are just a handful of examples in which our collaborators are leveraging TDA to change the world — one scientific breakthrough at a time. There are many more top publications documenting significant examples of these powerful collaborations, including Genomics/Oncology, Remote Sensing, Spectroscopy, Fragile X, and more.
This amazing collaboration between the scientific community and Ayasdi has been ongoing for a few years, and the resulting insights will continue to benefit science, medicine, and patients. Stay tuned for even more news in the coming year as our collaborators continue to change the world using our software.
To become a collaborator, email us at firstname.lastname@example.org.