Topology within mathematics can be characterized as that part of the subject which studies notions of shape.  It really consists of at least two separate threads, one in which one attempts to “measure” shape, and in the other in which one attempts to find compressed combinatorial representations of shape and analyze the degree to which these representations are faithful to the shape.  The first proceeds primarily via algebraic invariants, such as homology and homotopy groups, to measure and count the instances of particular patterns within the shape in a suitably systematic way.  The second is the subject of a great deal of manifold topology, and is exemplified by the work on the “Haputvermutung” concerning the existence of a common subdivision of any two triangulations of manifolds. 

Both these threads have been extended to the world of point clouds of data.  The measurement aspect is extended via the theory of persistent homology and its variants.  The second one is extended by various simplicial complex constructors, such as Vietoris-Rips complexes, witness complexes, and the complexes constructed by Ayasdi’s Platform.  In ordinary topology, the role of the combinatorial representations is to lend additional concreteness to the study of the shape, as well as to provide a succinct representation of it.  They serve the same purpose in the study of high dimensional and complex data sets, in that they provide a compressed representation of the data which retains information about the geometric relationships between data points.  The representations are also easy to work with, so they provide extremely useful and simple ways to interrogate the data, and to understand the driving variables characterizing various subgroups.  At a high level, one can say that they allow for easy identification of coherent groups within the data.  The search for coherent groups, performed naively, is a clearly intractable problem since it requires searching through the collection of subsets in the data set. 

Ultimately, both sets of ideas will be useful in permitting investigators to study their data.  The representations are at the forefront, because they are what a user deals with directly.  As we move further into automation, the measuring of the shape of a data set and of Ayasdi’s complex outputs will be critical, since we will want, for example, to test Ayasdi constructions for the presence of geometric features such as flares and loops, so as to provide the user the best possible “quick analysis”, automatically building  complexes for the user without requiring by hand selection of parameter values, metrics, and lenses. 

9 thoughts on “Why Topological Data Analysis?

  1. March 22, 2013

    vijay sharma

    Really useful article.
    Could you please point me to a case study related to IRIS complex?

    Thanks & Regards,
    Vijay

    Reply
      1. July 11, 2013

        TRIDIB DUTTA

        Hi TJ Lohar,

        I also find it very interesting. I am a mathematician by training (actually has a PhD in Commutative Algebra), but I am fascinated by big data (having worked as a postdoc in computational biology lab).
        I tried to get the paper, but it is asking for employer information etc. Unfortunately, I am currently unemployed and have no affiliation.
        Can I still get the paper ? or if not, can you give me some pointer which will be helpful in understanding this fascinating relationships.
        Thank& regards.
        Tridib
        PS. I can be reached via dutta.tridib@gmail.com

        Reply
        1. July 11, 2013

          TJ Laher

          Hi Tridib,

          Thanks for reaching out. Yes, you can still download the paper. You can put no affiliation and it should work just fine.

          Thanks,
          TJ

          Reply
  2. Pingback: How signals, geometry, and topology are influencing data science - Strata

  3. Pingback: Helped by GE, Ayasdi positions itself at crest of Big Data’s second wave | PandoDaily

  4. Pingback: Helped by GE, Ayasdi positions itself at crest of Big Data’s second wave | TECH in AMERICA (TiA)

  5. Pingback: Helped by GE, Ayasdi positions itself at crest of Big Data’s second wave | c24 Application Hosting Specialists

  6. Pingback: How signals, geometry, and topology are influencing data science | The Gradient Flow

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>