Unbiased Disease Stratification within the IMI U-BIOPRED Severe Asthma Program Using Topological Data Analysis


1. Centre for Proteomic Research, University of Southampton, UK.

2. NIHR Southampton Respiratory Biomedical Research Unit, Southampton University Hospital, UK.

3. Ayasdi Inc, USA.

4. Imperial College, London, UK.

5. Janssen R&D, UK 

One of the great things about our collaboration program is seeing the impact in makes across a number of different industries. Our strong roots in bio-tech and lifesciences are a function of the fact that these industries immediately grasped the power of topological data analysis and how it would dramatically accelerate the insight discovery process – saving lives in the process.

 Over the past year, we have been collaborating with U-BIOPRED on finding insights for severe Asthma. The U-BIOPRED consortium (an IMI/EU funded project of academics (20 institutions), biopharma industry (12), SME’s (3), and patient organizations (6)) is working collaboratively, using Ayasdi Core, to improve understanding of severe asthma, benefitting science, medicine and patients alike.

While the work is ongoing and still in a preliminary exploratory phase, the team at U-BIOPRED shared the data from two posters presented earlier this year at PRISME and American Society of Mass Spectrometry conference.

  • Asthma is a collective clinical definition of a number of inflammatory respiratory syndromes that are poorly defined at the molecular level. In order to address this lack of molecular knowledge U-BIOPRED put together the largest-ever study cohort for asthma (over 1,000 participants). Several bio-fluids were collected from participants (e.g. urine, serum, induced sputum, bronchial biopsy), and multiple ‘omics technologies are being used for analysis of these samples (e.g., proteomics, transcriptomics, metabolomics, breathomics via E- nose).
  • Complexity in the study cohort (e.g. participants on a combination of medications, and varying co-morbidities), and the different analysis and data types necessitated the development of multiple data analysis pipelines to mine these complex datasets. These pipelines are part of a central process framework that U-BIOPRED refers to as their ‘analytical toolbox’, and TDA via Ayasdi Core is a key component of that toolbox.


TDA generates topological networks that allow the scientist to explore, condense, visualize and extract useful information from these complex and multi-modal data. One of the remarkable features of TDA is that while you can explore a summary of the data’s relationships, you don’t have any information loss in that summary. All of the underlying data remains available.

UBIOPRED have used Ayasdi’s platform to combine and analyze a preliminary subset of proteomic data produced in their study, highlighting the utility of the TDA approach in exploring the biology of the data and as an unbiased feature selection tool.

Data were analyzed via a workflow incorporating a variety of approaches including classical statistics. Data were stratified in an unbiased manner using unsupervised class selection (e.g., TDA), followed by feature selection and machine learning (e.g. Support Vector Machine (SVM).

The key takeaway was that classical statistics using clinical definitions as class labels (severe to mild asthma) were only able to identify significant differential abundance in a low number of proteins, particularly for serum. Stand-alone machine learning techniques performed using clinical definitions as class labels also produced models with poor performance.

Figure 3, Poster 2

TDA, however, allowed for unbiased exploration of these proteomic datasets in combination with clinical and experimental metadata, either as separate analyses or combined and analyzed together.

Figure 3

Using Core to analyse this preliminary dataset UBIOPRED was able to identify potential sub-groupings in Asthma , with specific group assignment guided by persistence of structure and contrasting clinical metadata. Many of such groupings show persistence when using alternative analysis approaches (e.g. Artificial Neural Networks), adding confidence to the observations. These subgroups are defined by specific sets of proteins, clinical variables, biological pathways and functional enrichment. The topological groups were subsequently used as class labels in supervised machine learning approaches and resulted in an improved classification performance and predictive models over cohort information alone. 

Figure 5

This application is an excellent example of TDA in action, finding critical insights that eluded researchers due to the complexity of the data. Further, once those insights had been gleaned, those same researchers were able to leverage more traditional techniques to verify and occasionally extend the initial findings.

We look forward to seeing these results published in multiple scientific manuscripts in peer-reviewed journals.