Feature Rich – How Features Feature in Our 7.10 Release

Our position is pretty clear – we believe that every application will become an intelligent application or it will cease to exist. This has broad implications for the software development and management process, of which we only touch a part. Still, we are working furiously to accelerate that vision and our most recent release, 7.10 is a testament to that commitment.

The 7.10 release is one of Ayasdi’s most functionality-packed releases ever. We won’t cover everything in this post, but will touch on the highlights, of which there are many:

  • Additional Transformation Services,
  • New Statistical Functions
  • Automated Feature Selection and Enhanced Feature Scoring Results
  • Decision Tree Classification Modeling Enhancements
  • Ability to Predict using a Subset of Data or Rows of Input Data
  • On-Premise Installation Capabilities (with High Availability Option)

Additional Transformation Services

We covered our growing portfolio of transformation services in a recent post and so will focus on what’s new in this release.

Data is being collected at faster and faster rates, everywhere. The data we collect, however, is not very valuable, from a predictive perspective, in its raw form.  The main way for companies to harness the data and make it valuable (i.e. have predictive power) is through data transformation. These transformations include merges, concatenations, summing and other transformations.

Ayasdi continues to grow its portfolio of transformations and with Release 7.10, Ayasdi has added Pivot, Binarize, Union, and Math/String Operator transformations to its already extensive list.   

Let’s start with the Math/String Operator transformations. Customers now have the ability to perform mathematical operations (+ – * /) between a set of columns (for creating ratios, sums, differences, etc.) or they can concatenate string or numeric values between multiple columns using the new Math/String Operators.  

Next is Union transformations. With this new feature, a number of data sources can now be united into one.  For example, transactions from 2015 can be merged with those from 2016 to produce a data source that spans over both years.

We also added the ability to binarize any column in a dataset.  For example, for an age column, any value above 20 can be labeled with a 1 and any value 20 or below can be given a 0 value in a new binary column.

Finally, we’ve added pivot transformations.  Pivoting can convert, for example, a customer-transaction source into a customer-product-number of transactions source.  This is similar to Microsoft Excel Pivot functionality, except the Ayasdi system handles much larger data pivots.

Expect more transformations going forward as Ayasdi continues to deliver against our customer requirements in this area.    

New Statistical Functions

Part of the transformation puzzle involves retrieving statistical facts about the data.  The raw form of the data itself might not be meaningful but understanding the distribution of the values is often quite valuable.  

Release 7.10 provides new statistical functions, including Row Group Stats, Histograms, and Distributions. These and many other statistical functions can be accessed using the Ayasdi Python SDK Source.get_stats function.  

Some of the highlights include Most Common Value (mode) within a group or the entire source (i.e. what song did a customer listen to the most?), the number of unique values, and the actual distinct values themselves.  

Additionally, the distribution of values (i.e. percentiles) and histogram data is now easily obtained. A histogram of the number of times a customer listened to a particular song Genre might provide an interesting customer summary visual for that customer, for example.

Ultimately, these features are not that impressive from a statistical perspective (frankly they are not at all) but exposing the functionality will allow a user to create robust dashboards, something that was previously more difficult (though not impossible) to do.

Automated Feature Selection and Enhanced Feature Scoring Results

Features are an integral part of any machine intelligence exercise. Our technology, Topological Data Analysis, feasts on features. The more the better. That is why we spend the time we do on transformations – they allow us to increase the feature space. Where some technologies struggle with this dimensionality, we thrive.

Having said that, more features can create challenges. The greater the number, the more interaction there likely is and the more difficult the process of manual selection becomes. For an accomplished data scientist, there are methods, approaches, and intuition that can be applied. These can be time-consuming.

Release 7.10 has introduced two elements that help with this challenge. The first is Automated Feature Selection and the second is an enhanced internal search algorithm that provides improved Feature Selection and scoring results.  

Users now have three options. They can either directly select features for analysis as they have traditionally done, choose the automatically created feature subset (i.e. column set), or use the scoring results from Automatic Feature Selection to choose alternative feature sets

Automated Feature Selection, which is currently available for supervised feature selection, identifies features (columns such as age, location or transaction type) in a data source with the most predictive value with respect to a given outcome.

Ayasdi’s Feature Selection works to identify features that have high relevance to the outcome while reducing redundancy within the selected set of features.  The Feature Selection returns an ordered list of features. Ayasdi’s new Automated Feature Selection now proposes feature sets based on this list and scores them for their overall ability to build good topological models that localize the outcome.

The new Automated Feature Selection functionality substantially reduces the amount of time necessary to identify the subset of features that are relevant to the outcome and facilitate the rapid discovery of high performing models.

These features (pun intended) are massive time savers – and come with a well thought-out set of guard rails to ensure the output is meaningful. We often talk about co-optimizing for efficiency and effectiveness – these features exemplify that pursuit.

Decision Tree Classification Modeling Enhancements

Once the causal factors for decisions are selected and a classification model is created, the user typically begins to predict future outcomes.  Often, classification systems become a “black box” of sorts providing a predicted classification group but not providing any insight into why that prediction was made.

Until this release, users had the ability to create one_vs_rest classification models within Ayasdi MIP. While this is a great strategy for creating performant predictive models, it suffers from two major issues, when it comes to justifiability

  1. Similar rules could be present across multiple groups if the decision tree was not deep enough.
  2. Rules extracted from such trees would not serve the purpose of justifying and comparing across multiple classification groups.

Ayasdi addresses this challenge through its Multiclass Model framework. The past several releases have brought major enhancements to this framework and this release is no different.  

The new Ayasdi Decision Tree MultiClass Predictor provides a single set of rules to justify and explain why certain transactions/claims/genre choices belong to a particular classification group in the Topological Network (using dt.get_rules(), dt.rules, and dt.dot).  This is important because users can now understand, at a granular level, the causal factors behind the groupings and is particularly useful when a network has multiple disjoint groups where the user wants to predict in which group a data point belongs.

We have maintained for a long time that rules-based systems are ultimately flawed. However, we realize that they are exceptionally prevalent, especially in the finserve and healthcare verticals, and have justifiable use in situations where very fast, millisecond scale decisions need to be made.

A minor addition we’ve made is that results are now returned in the form of prediction probabilities and decisions.  For example, a model can predict if Customer A will be a churning customer (the Decision) with a 90% probability (how confident the model is of this Decision).  

This is useful in a number of constructs such as determining churn, identifying upgrade candidates and identifying failed program states just to name a few.

Ability to Predict using a Subset of Data or Rows of Input Data

Often, the user would like to perform prediction using just a subset of the data. With this release, Ayasdi has made this even easier. Now, the user can construct a predictive model using a subset of the data or using particular datapoints. The new functionality is enabled through the Ayasdi Python SDK SourceSubset and DataPointList parameters.  

Each DataPointList represents the values for the row corresponding to the original column set used for training.  This meets a common requirement for Logistic Regression, Random Forest and Decision Tree models to be able to predict against only one or a few incoming data points (instead of a full dataset).  

New or test data no longer needs to be imported into the system as a source.  What this abstraction provides us is that a stream of data, as it is generated in real-time from an external source system, can now be used as an input.  This addition is a milestone in our constant progress towards real-time prediction capabilities.

Multiple API Connection Capability to Support Integrated Applications

Ayasdi is in the business of building intelligent applications. All of those applications are built on our machine intelligence platform and thus have a common analytical foundation (whether it uses TDA or not).

Often, however, multiple applications would like to access a single, platform-powered component, such as a classification model.

Previously this would require multiple simultaneous connections. With this release, we have introduced API Connection changes that allow a user to call the Python SDK in order to make connections to multiple installations, as multiple users, without having to logout and login.  

This functionality is key for a number of AI applications, such as Anti-Money Laundering where multi-user (multiple authorized users at the same time), multi-tenancy (multiple users can spawn concurrent jobs) is a requirement.   

On-Premise Installation Capabilities (with High Availability Option)

Release 7.10 also offers an option for the on-premise or private cloud (adding Azure to our existing AWS support). In keeping with all our previous releases, we continue to support high availability in our installations.

New for Release 7.10, the installation process no longer requires administrator or super-user privileges.  

Additional Ayasdi Python SDK Tutorials and Getting Started Page

Whew. To think that only covers the highlights. We are constantly looking for ways to make this information more accessible and more digestible. There is now a new Getting Started page for the Python SDK, which was introduced to help users set up their environment before launching into the Ayasdi Python SDK Tutorials.  This page provides links to all the sample data and code needed for the Ayasdi Python SDK Tutorials, describes how to upload the data to the Ayasdi Machine Intelligence Platform, and explains how to start a fresh Ayasdi Notebook.  

Additionally, new documentation with Jupyter Notebook samples is provided with sample code to create topological networks and produce groups and auto-groups.

Please refer to the Ayasdi Machine Intelligence Platform 7.10 Release Notes and Ayasdi Python SDK Documentation for more details on these features and more.