A Recipe for Success: Introducing the Ayasdi Cookbook and Segmentation Recipe

Ayasdi has strong focus on building intelligent applications. It comes from our belief that, in the near future, every application will need to become intelligent or it will cease to exist.

As a result, we will, when we see repeatability and value, build those applications ourselves. In addition to our industry-specific and industry-agnostic application offerings, we also offer a Python SDK for developers who are comfortable programming against the underlying capabilities of the platform.

Over the years, we have been able to gain insights and learnings about how best to use our platform for a variety of real-world problems across finance, healthcare, telecommunications and more. Although our platform’s SDK offers a full breadth of functionality to developers ranging from different machine learning capabilities to different options for generating topological models, we have been able to identify common patterns for data science problems that can be packaged in a manner that targets specific types of analyses.

In an effort to empower developers using our platform to the fullest extent, we have put energy into packaging this expertise into a format that can be consumed by our developers just as easily as they consume our Python SDK.

Therefore, we are excited to announce our newest offering to help meet these goals: the Ayasdi Cookbooks and Recipes.

About the Ayasdi Cookbooks and Recipes

The Ayasdi Cookbook consists of Python libraries that constitute various “Recipes” for solving various data science problems. The recipes encapsulate years of experience solving complex real-world problems for enterprises with Ayasdi’s machine intelligence platform.

The documentation for the Ayasdi Cookbook Recipes can be found here: https://platform.ayasdi.com/cookbookdocs/

Experienced Python developers will recognize these recipes as a collection of Python classes and methods that offer intuitive input and output signatures that ultimately reduce the amount of code required to achieve commonly desired functionality from the platform. Engaging this functionality is as simple as importing the recipe into your Python code – just like you would with any other Python library or module.

We will continue to grow our recipes in the coming months but we wanted to start with our segmentation recipe, which ships with our 7.7 release.

Segmentation

The segmentation recipe enables users to generate segments given a data source or a topological network. The recipe documentation page includes a step by step sample tutorial (in the form of a Jupyter Notebook) that walks developers through a typical segmentation workflow, including the creation of a classification model to predict the segment of a new incoming data point.

Details of the Segmentation Recipe can be found here: https://platform.ayasdi.com/cookbookdocs/segmentation.html

And documentation on the new Segment module can be found here: https://platform.ayasdi.com/cookbookdocs/segment.html

Segmentation is a core data science pattern of the Ayasdi Machine Intelligence Platform. Many enterprise problems will rely on the segmentation of an underlying dataset based on features that can often be difficult to engineer or extract due to weak signals in the data. The Ayasdi platform, with its patented TDA-based capabilities, offers a segmentation approach that is unique in its ability to leverage topology to find these weak signals.

The recipe leverages platform capabilities to offer different segmentation strategies based on the the shape of the data. This approach is far superior to traditional methods of segmentation and clustering for reasons the are beyond the scope of this blog but well worth your time if interested.

The segmentation recipe utilizes auto-grouping to create a framework for four different algorithmic approaches:

  1. Community
  2. Connected Components
  3. AHCL
  4. Hotspot

While the user ultimately has control over which approach to take, there is documentation that outlines what approach is appropriate in what circumstance.

To put this in context let’s consider two alternative segmentation strategies: community and connected component.

Community detection is a common approach in graph theory. Our specific implementation is based on the Louvain Modularity Algorithm. In a Community segmentation the internal connections of a segment in the topological network are high, whereas the external connections are low. This is useful for subdividing your topological model into groups of connected nodes.

An analog would be a rural town. There are a number of connections within the segment (surface streets) but few external connections (highways or major roads to other towns).

That manifests itself in a network as shown below. You have density in the center with lots of connections but as one moves out (to neighboring, smaller towns) one has fewer connections. While ultimately connected, there are distinct groups. This is valuable in understanding heterogeneous phenomena. This approach is more faithful to the data and produces better segmentations.

Community Segmentation

However, if the data exhibited the following shape one would need a different approach.

Topological Model

Connected Components

 

In this case, one would be more likely to apply a connected components strategy. The connected components strategy is most useful when the underlying topological network exhibits connected components (i.e. islands of interconnected nodes) that are separate from the others in the same network.

While these examples underscore the ability for even a novice user to find utility in this recipe, advanced users of  Ayasdi Workbench could apply any of the approaches in the framework. Further IT can easily build them into intelligent applications.

For instance, AHCL blends the community and the ability to color by an outcome of interest. This enables the user to rapidly determine datapoints that carry a high degree of similarity. That is, they are close to each other and share the same color of the area of interest – say fraud or money-laundering or patients likely to escalate in cost.

ACHL Coloring – In this case incidence of fraud in online transactions.

Why It Matters

One of the challenges that classically trained and citizen data scientists struggle with is repeatability and tribal knowledge. This can manifest itself in many ways across an organization. It can result in “drift” where a lightly understood technique is copied and applied to a different problem type. It can result in knowledge loss and lack of repeatability when you have employee attrition or the simple passing of time. You can have undocumented procedures for which the initial purpose is lost.

The development of well documented, and well understood recipes make the power of topological data analysis more approachable, consumable and repeatable. This is important for a classically trained data scientist who is building his/her understanding of TDA as well as citizen data scientist who seeks the power of machine learning, without the requirement to understand the math that underpins it. Finally, these recipes power and accelerate intelligent applications.

Looking Forward

Ayasdi will continue to roll-out new recipes in the coming months as we fine-tune the various patterns that correspond to different business challenges and data science scenarios.

Stay tuned – this is just the beginning!