BY Jorge Zuniga
Ayasdi has strong focus on building intelligent applications. It comes from our belief that, in the near future, every application will need to become intelligent or it will cease to exist.
As a result, we will, when we see repeatability and value, build those applications ourselves. In addition to our industry-specific and industry-agnostic application offerings, we also offer a Python SDK for developers who are comfortable programming against the underlying capabilities of the platform.
Over the years, we have been able to gain insights and learnings about how best to use our platform for a variety of real-world problems across finance, healthcare, telecommunications and more. Although our platform’s SDK offers a full breadth of functionality to developers ranging from different machine learning capabilities to different options for generating topological models, we have been able to identify common patterns for data science problems that can be packaged in a manner that targets specific types of analyses.
In an effort to empower developers using our platform to the fullest extent, we have put energy into packaging this expertise into a format that can be consumed by our developers just as easily as they consume our Python SDK.
Therefore, we are excited to announce our newest offering to help meet these goals: the Ayasdi Cookbooks and Recipes.
About the Ayasdi Cookbooks and Recipes
The Ayasdi Cookbook consists of Python libraries that constitute various “Recipes” for solving various data science problems. The recipes encapsulate years of experience solving complex real-world problems for enterprises with Ayasdi’s machine intelligence platform.
The documentation for the Ayasdi Cookbook Recipes can be found here: https://platform.ayasdi.com/cookbookdocs/
Experienced Python developers will recognize these recipes as a collection of Python classes and methods that offer intuitive input and output signatures that ultimately reduce the amount of code required to achieve commonly desired functionality from the platform. Engaging this functionality is as simple as importing the recipe into your Python code – just like you would with any other Python library or module.
We will continue to grow our recipes in the coming months but we wanted to start with our segmentation recipe, which ships with our 7.7 release.
The segmentation recipe enables users to generate segments given a data source or a topological network. The recipe documentation page includes a step by step sample tutorial (in the form of a Jupyter Notebook) that walks developers through a typical segmentation workflow, including the creation of a classification model to predict the segment of a new incoming data point.
Details of the Segmentation Recipe can be found here: https://platform.ayasdi.com/cookbookdocs/segmentation.html
And documentation on the new Segment module can be found here: https://platform.ayasdi.com/cookbookdocs/segment.html
Segmentation is a core data science pattern of the Ayasdi Machine Intelligence Platform. Many enterprise problems will rely on the segmentation of an underlying dataset based on features that can often be difficult to engineer or extract due to weak signals in the data. The Ayasdi platform, with its patented TDA-based capabilities, offers a segmentation approach that is unique in its ability to leverage topology to find these weak signals.
The recipe leverages platform capabilities to offer different segmentation strategies based on the the shape of the data. This approach is far superior to traditional methods of segmentation and clustering for reasons the are beyond the scope of this blog but well worth your time if interested.
The segmentation recipe utilizes auto-grouping to create a framework for four different algorithmic approaches:
- Connected Components
While the user ultimately has control over which approach to take, there is documentation that outlines what approach is appropriate in what circumstance.
To put this in context let’s consider two alternative segmentation strategies: community and connected component.
Community detection is a common approach in graph theory. Our specific implementation is based on the Louvain Modularity Algorithm. In a Community segmentation the internal connections of a segment in the topological network are high, whereas the external connections are low. This is useful for subdividing your topological model into groups of connected nodes.
An analog would be a rural town. There are a number of connections within the segment (surface streets) but few external connections (highways or major roads to other towns).
That manifests itself in a network as shown below. You have density in the center with lots of connections but as one moves out (to neighboring, smaller towns) one has fewer connections. While ultimately connected, there are distinct groups. This is valuable in understanding heterogeneous phenomena. This approach is more faithful to the data and produces better segmentations.
However, if the data exhibited the following shape one would need a different approach.
In this case, one would be more likely to apply a connected components strategy. The connected components strategy is most useful when the underlying topological network exhibits connected components (i.e. islands of interconnected nodes) that are separate from the others in the same network.
While these examples underscore the ability for even a novice user to find utility in this recipe, advanced users of Ayasdi Workbench could apply any of the approaches in the framework. Further IT can easily build them into intelligent applications.
For instance, AHCL blends the community and the ability to color by an outcome of interest. This enables the user to rapidly determine datapoints that carry a high degree of similarity. That is, they are close to each other and share the same color of the area of interest – say fraud or money-laundering or patients likely to escalate in cost.
Why It Matters
One of the challenges that classically trained and citizen data scientists struggle with is repeatability and tribal knowledge. This can manifest itself in many ways across an organization. It can result in “drift” where a lightly understood technique is copied and applied to a different problem type. It can result in knowledge loss and lack of repeatability when you have employee attrition or the simple passing of time. You can have undocumented procedures for which the initial purpose is lost.
The development of well documented, and well understood recipes make the power of topological data analysis more approachable, consumable and repeatable. This is important for a classically trained data scientist who is building his/her understanding of TDA as well as citizen data scientist who seeks the power of machine learning, without the requirement to understand the math that underpins it. Finally, these recipes power and accelerate intelligent applications.
Ayasdi will continue to roll-out new recipes in the coming months as we fine-tune the various patterns that correspond to different business challenges and data science scenarios.
Stay tuned – this is just the beginning!