The Shape of the Genome: going beyond the double helix with Topological Data Analysis

One of the more iconic shapes in science, or perhaps the 20th century as a whole – is the double helix.

The shape of DNA, discovered in 1953 by Francis Crick and James Watson (with due credit to Maurice Wilkins and Rosalind Franklin), was a critical milestone in understanding the human genome and served as the starting point for some extraordinary research.

Of note from our perspective is the number of collaborators that are working on “omics” datasets using topological data analysis. In this post, we cite a few recent publications where TDA delivered breakthroughs findings.

Our collaborators at Columbia just published a paper in Nature Genetics entitled “Spatiotemporal genomic architecture informs precision oncology in glioblastoma (GBM).” The paper looks at genomic and expression profiles from 52 individuals using our software. What they found was that samples from the same tumor mass share genomic and expression signatures, whereas geographically separated, multifocal tumors and/or long-term recurrent tumors are seeded from different clones. These findings can inform targeted therapeutic interventions for patients with GBM, potentially enabling precision medicine therapies in cancer thru genomic characterization.

UCSF just published a paper a few weeks ago entitled “Uncovering Precision Phenotype Biomarker Associations in Traumatic Brain Injury Using Topological data Analyses in PLOS One.” Here again, the team used TDA to reveal data-driven patterns in patient outcomes to identify potential biomarkers of recovery. This could significantly predict patient outcome recovery after a traumatic brain injury using more traditional methods of univariate statistical tests. This is the second publication emanating from our collaboration with UCSF. These analyses may provide a robust method for patient stratification and treatment planning targeting identified biomarkers in future clinical trials in TBI patients.

Stowers Institute of Medical Research just published a paper last month entitled “Identification of Topological Network Modules in Perturbed Protein Interaction Networks” in Nature Scientific Reports. This is the third publication emanating from our collaboration with Stowers (PLOS paper and EMBO reports). The Stowers scientists identified topological network modules made up of proteins with shared properties that were found in particular locations in networks. Biological networks consist of functional modules, however identifying and characterizing these modules is difficult. The airport analogy from first author Mihaela Sardiu is highly relevant, “Think of a protein as an airport in a hub and spoke system. The system works one way in its regular state. But what happens when a snow storm shuts down a major hub? A portion of the network is affected. A change in one part of the network impacts not just that component but surrounding ones, too.” As more perturbed protein interaction networks become available, analyzing these datasets with advanced mathematical tools like ours will likely provide new insights into the study of diseases like cancer & other human genetic disorders, where protein interaction networks become altered by chemotherapy or by the inherent disease itself. By studying the proteins & their environment, researchers hope to gain insight into a wide range of biological functions, including drug resistance and mutations causing cancer.

The work we have done in this area stretches back years.

For example in 2016, UBIOPRED (Unbiased BIOmarkers in PREDiction of respiratory disease outcomes), an European medical research consortium, used our software to discover a 1700-gene signature to meaningfully distinguish severe asthmatics from non-asthmatics and mild-to-moderate asthmatics. By segmenting the asthma population, the researchers hope to develop targeted treatments for patients who will respond to therapy. Such treatments have been effective in treating diseases that involve just a small number of genes. It has been far more challenging to develop targeted medicines for conditions involving hundreds or thousands of genes, like asthma. The research, published in The American Journal for Respiratory Critical Care Medicine, included 610 patients at 16 sites in 11 European countries, supplementing 2 previous papers published in the Journal of Allergy and Clinical Immunology.

For several years now, Stanford’s David Schnieder has used TDA in the in the complicated field of disease recovery. In a paper published in 2016 the team used our software to reproduce the circular trajectories which mice and humans infected with the malaria parasite described in the transcription phase space when going from a healthy state, to a sick state, and back to a healthy state.

In 2015, a team at Mt. Sinai explored the phenotypic space of 11,210 type 2 diabetes patients using the Ayasdi’s software, and identified 3 previously unreported subgroups of patients with distinct genetic and disease associations. This was mainstream news getting picked up by Fast Company as well as making the cover of Science.

There are literally dozens of papers that precede 2015, but a few worth noting include The Topology of Viral Evolution, where the team from Columbia propose the use of persistent homology of genetic phase spaces to study reticulate evolution, and apply this idea to viral reassortment and recombination. Essentially “Revolution in Evolution.” This is another example of TDA visualizing relevant structure in real world data that is invisible to classical techniques. Mathematical biology has modeled evolution following Darwin’s tree of life. That is, the evolutionary networks are tree-like, without any loops in the underlying networks that model time evolution. This paper shows such loops exist in real data, and that TDA is required to find and understand them. This is the first rigorous and systematic establishment that such structures exist — and now it is clear that this is a ubiquitous phenomena.

Finally, a paper from Nicolau et al, published in PNAS in 2011, detailed how the team used TDA to build low-dimensional topological representations of the transcription phase space of breast cancer tumors, identifying a previously unreported group of patients with excellent prognosis and distinctive molecular signatures.

If you are interested, you can find a recent review paper in Current Opinion in Systems Biology by our Columbia collaborators that cite these breakthrough papers.

And we have more top papers coming in the exciting field of Genetics! Stay tuned.