We like to say we have the special sauce when it comes to using geometric shape to analyze complex datasets. Our secret ingredient, Topological Data Analysis(TDA), uses Machine Learning algorithms combined with topology to derive powerful insights.
Since Topology is the foundation of our company, we started an annual tradition three years ago in celebration. “Topology Day” is a day where we invite Mathematicians and Computer Scientists that are currently using applied Topology to come present what they’re working on.
We kicked off the day with a presentation by David Schneider, who specializes in microbiology and immunology at Stanford. David’s lab is trying to understand what makes a host resilient to infections by focusing on health; by this he means the host encounters a pathogen, gets sick and pretty much returns to its original state of health. By focusing on the study of the entire timeline of the disease, from health through sickness and back to health, can give valuable insight into how to improve healthcare.
Next we welcomed Rob Ghrist, a professor in the department of mathematics at the University of Pennsylvania, who led us through a very visually-entertaining session on the emerging mathematic tools of relevance to TDA. Rob focused on functional analysis, homological algebra and applied topology regarding functoriality to transport information between related data sets. These windows put into point-cloud form features related to underlying
dynamics, periodic processes in a more model independent way compared to more traditional techniques like Fourier analysis.
Following Rob, Stanford professor in Computer Science, Leonidas Gulbas, explained how he is developing networks capable of transporting information between highly inter-correlated data sets based on ideas inspired by functional analysis, homological algebra, and algebraic topology. He described a framework that allows for encoding of information as functions over the data and leads to linear operators for mapping between data sets, enabling the use of many powerful tools from linear algebra and optimization. This combination allows Leonidas to create networks capable of transporting information between datasets. The information transport and aggregation such networks can help us clean up the maps, discover shared structure, as well as infer missing information in data — to “see the unseen.”
Raul Rabadan, associate professor in the Department of Systems Biology and the Department of Biomedical Informatics at Columbia, presented a mathematical structure able to capture and represent large-scale properties of evolution. Persistent homology aims to extract global topological features from sequence data by reconstructing simplicial complexes, which at a particular scale of genetic distance represents the relation between different genomes. He showed that there exist topological obstructions to the use of phylogeny for certain genomic datasets. This is another example of TDA seeing relevant structure in real world data that is invisible to classical techniques. Mathematical biology has modeled evolution following Darwin’s tree of life. That is, the evolutionary networks are tree-like, without any loops in the underlying networks that model time evolution. This paper shows such loops exist in real data, and that TDA is required to find and understand them. This is the first rigorous and systematic establishment that such structures exist — and now it is clear that this is a ubiquitous phenomena.
We concluded Ayasdi’s Topology Day event with Jose Perea, an active researcher in the area of computational topology and topological data analysis. Jose spoke about sliding windows, or time-delay embeddings, that are widely used in representations of time series data. These windows put into point-cloud form features related to underlying dynamics, periodic processes and extremal events. He showed how persistent homology can be used to analyze such point-clouds, both theoretically and practically, and presented some applications to periodicity quantification in gene expression time series data.
Check back as we will be loading up some videos of the talks. Some of the presenters are still submitting papers and did not want their presentations recorded, but we will post the others.