Poster presentations with our collaborators at the upcoming ASM meeting in San Francisco
Topological Data Analysis of Escherichia coli O157:H7 Survival in Conventional and Organic Soils
Poster Title: Topological Data Analysis of Escherichia coli O157:H7 Survival in Conventional and Organic Soils
A. Mark Ibekwe1,*, Jincai Ma1,2, David E. Crowley2, Ching-Hong Yang3, Alexis M. Johnson4, Pek Y. Lum4
1 USDA-ARS U. S. Salinity Laboratory, Riverside, CA 92507
2 Department of Environmental Sciences, University of California, Riverside, CA 92521
3 Department of Biological Sciences, University of Wisconsin, Milwaukee, WI 53211
4Ayasdi, Inc., Palo Alto, CA
Corresponding author:
A. Mark Ibekwe
E-mail: Mark.Ibekwe@ars.usda.gov
Shiga toxin-producing E. coli O157:H7 has been implicated in many foodborne illnesses caused by the consumption of contaminated fresh produce. However, data on its persistence in major fresh produce-growing soils are limited due to the complexity in datasets generated from different environmental variables and bacterial taxa. There is a continuing need to distinguish the various environmental variables and different bacterial groups to understand the relationships among these factors and the pathogen survival. Using the Ayasdi Iris platform, which employs Topological Data Analysis (TDA) methods, we reconstructed the relationship structure of E. coli O157:H7 survival in 32 soils (16 organic, 16 conventional) from California (CA) and Arizona (AZ) with a multi-resolution output. Our goal was to correlate the survival time of E. coli O157:H7 in soils with soil properties and 16S rRNA 454-pyroseqnence based bacterial community composition. Results showed that the longest survival time, ttd (detection limit of 100 CFU g-1 dry soil) of E. coli O157:H7 was observed in the soils from northern CA and in organic soils from AZ. E. coli O157:H7 survival in soils was negatively correlated with electrical conductivity (EC), while water soluble organic carbon (WSOC) and total nitrogen (T-N) were positively correlated. A laboratory experiment with soils spiked with increasing salt concentrations confirmed that the concentration of Na+ in soil water extracts negatively affected (P < 0.001) ttd. Bacterial diversity as determined by the Shannon diversity index had no significant (P = 0.635) effect on ttd, but individual bacterial phyla had different effects. The survival of E. coliO157 was significantly enhanced by Actinobacteria (P < 0.001) and Acidobacteria (P < 0.05), and significantly suppressed by Proteobacteria and Bacteroidetes (P < 0.05). Our data showed a complex interaction between E. coli O157:H7, soil microbiota, and soil properties in the survival of this pathogen in the soils studied. Therefore, good agricultural practices must be followed during pre-harvest operation to prevent the introduction of E. coli O157:H7 into produce-growing soils, and reduce the potential public health impact and economic losses associated with foodborne outbreaks.
Comparative Analysis of Feature Frequency Profiles and Single Nucleotide Polymorphisms for Determining Escherichia coli and Shigella Strain Relationships Based on Topological Data Analysis
Poster Title: Comparative Analysis of Feature Frequency Profiles and Single Nucleotide Polymorphisms for Determining Escherichia coli and Shigella Strain Relationships Based on Topological Data Analysis
J. L. LEWIS1, M. K. MAMMEL1, D. W. LACHER¹, A. JOHNSON2, P. Y. LUM2, and C. A. ELKINS1.
1US FDA, CFSAN, Laurel, Maryland; 2Ayasdi Inc., Palo Alto, CA
Background: The Centers for Disease Control and Prevention estimates 110,000 cases of enterohemorrhagic Escherichia coli infection occur annually in the United States, while 14,000 cases of shigellosis are reported. There is a continuing need to be able to distinguish the various serotypes and pathotypes of E. coli and Shigella in order to understand the relationships among these strains collected during food-associated outbreaks. We compared the performance of single nucleotide polymorphism (SNP) and feature frequency profile (FFP) methods in determining strain relatedness.
Methods: Alignment of 2583 core genes present in 159 sequenced genomes of E. coli and Shigella yielded 106,323 SNPs. FFPs of these genomes were determined based on complete nucleotide alphabet (ACGT) 15-mers and reduced alphabet (purine/pyrimidine) 24-mers counted in sequences 0 to 3 and 1 to 3 times, respectively. A third approach, k-SNP, in which 25-mers that contain a SNP located at the central base position was also used. Using topological data analysis (TDA) software, we interactively visualized the similarity relationships among the 159 genomes
Results: Distinct groups identified by the software included ECOR groups A, B1, B2, D, and E. The traditional SNP and k-SNP methods gave nearly identical relationships among the ECOR groups. In contrast, the relationships determined by the FFP method were less distinct, especially for strains belonging to the ECOR B1 and E groups. Adjusting resolution and gain parameters of the TDA software produced a progressively fragmented view of the data, allowing for the construction of a phylogenetic tree.
Conclusions: We have compared FFP and SNP methods for determining strain relatedness and used TDA with a multiresolution output to visualize the phylogenetic relationships. Although FFP and k-SNP save time by eliminating the need for constructing gene alignments, traditional SNP methods show the best performance in determining relationship structure both within and among ECOR groups.
Keywords: Escherichia coli, Shigella, phylogeny, Single Nucleotide Polymorphism.
Attendance at Past Conferences
April 24-26, Boston, MA Booth #: 201 Ayasdi Presentation: 2:45 – 3:15 pm, Wednesday, April 25, 2012 Track: Systems & Multi Scale BiologyBio-IT 2012
Title: Topology as a novel approach to detect patterns in complex data sets- Pek Lum, Ph.D., VP Life Sciences, Ayasdi Inc.
Molecular Tri-Conference 2012
February 19-23, San Francisco, CA
Ayasdi Presentation: 1:30 – 2:00pm, Wednesday, February 22, 2012
Track: Integrated R & D Informatics & Knowledge Management program
Title: Topological Data Analysis: A Novel Approach for the Analysis of Large and Complex Data Sets - Pek Yee Lum, VP Life Sciences, Ayasdi Inc.

