Using Artificial Intelligence to Reduce Customer Churn in Private Banking

It is no secret that private banking is in turmoil.

While our view is that large banks possess a massive competitive advantage given the amount of data they create, trade in and see – private banking is an area of concern.

Technology driven start-ups have made real inroads with millennials and private wealth management growth has stalled at many banks. Given the fixed cost nature of the supporting infrastructure this can quickly eat into earnings.

There are a number of areas that banks can focus on, from aligning costs more effectively to enhancing the customer experience – but one chronic and elusive prize is churn. The research is clear, it is far easier and more profitable to retain a customer than it is to generate net new customers. For this reason, technology companies with subscription revenue streams have spent heavily to better understand addressable churn and to identify the drivers.

Banks need to take a similarly aggressive approach in order to stabilize these businesses as they try and re-tool for growth.

There are several challenges attached to the problem of churn prediction.

First of all, customer churn in private banking is an infrequent event. Typical rates are around 1-7% with some dependencies on geography and client segments. In addition, there is the challenge of addressable and non-addressable churn. Death, for example is a non-addressable churn issue (even if the assets remain in the bank).

The problem has two primary components. The first is predicting churn in an accurate fashion. Second, we must understand the drivers of that churn. The understanding is as important as the prediction, because the bank needs to develop strategies to address the potential causes – before a customer leaves the bank.

From a data science perspective, the first component comes down to dealing with an imbalance class data set. The second component is about surfacing the nuances in client groups and their behavior patterns.

At Ayasdi, we solve these problems by topological segmentation of data sets, building local models for groups surfaced by topology and providing statistical explains for those groups.

To elaborate how Ayasdi’s AI platform tackles these challenges, we’ll use a recent customer project as an example.

Our customer is both innovative and data-savvy. They are committed to leveraging every piece of data at their disposal to create better customer experiences, predict and ultimately intercede prior a customer leaving.

The data they possess on their customer includes demographics, asset summaries, accounts, equities and holdings, trading history, transfer history, contacts and meeting logs. In general, banks capture clients’ data in a cross-sectional fashion, i.e. for every constant period they take a snapshot of every client. Concretely in the example here, our customer took a snapshot of all their clients at the end of every month.

In order to capture the clients’ temporal pattern, we started off the data transformation by aggregating client data statistics for a time window, say, 6 months, to predict client churn in the next 6 months. We moved the time window forward with an interval of 3 months and kept doing this until our time window hit the end of the analysis period. We then concatenated all these window slides except the last window into one training data set. We reserved the last window slide as an out-sample validation set for final model evaluation.

Once the training data set was ready, we kicked off our analysis workflow.

First a partial least square (PLS) transformation was done on the data matrix with respect to the outcome, which in this case was churn. The first two PLS components were appended back to the training data set.

Next, we utilized the minimum redundancy and maximum relevance (mRMR) algorithm on our platform to pick the most highly relevant features. By identifying the most relevant features we enhanced our TDA networks because we were able to isolate noise – producing better groups or clusters of customers.

The network was then colored by the outcome variable and the locality of the outcome could be checked.

This network supports operational deployments where groups would be created by our auto grouping algorithms. This facilitates the creation of topological segmentations on the data set allowing for the creation of local models for each segment. A group classifier would also be trained and comes into the prediction pipeline before a local model.

When a new data point comes in, the group classifier return the probabilities of falling into each group and each group’s local model gives the probability of churn. The final churn probability is then the local model probability conditioned by the group probability. This workflow reduces systematic bias compared to a global model for the whole population.

While certainly not an ahh-ha moment, it was immediately clear that clients churned for a wide variety of reasons demonstrated by their distribution across the network in Figure 1. Our auto grouping algorithm identified the high churn rate groups within a few seconds. Furthermore, the characteristics of each group was immediately apparent via the explain tables.

The Kolmogorov-Smirnov test on continuous variables and the hypergeometric test on the categorical variables sheds light on the nuances from one churner group to another.

Take the high churn rate group on top of the network for example (colored red + yellow). This client group tended to have lots of missing information on their marital status (80% in group vs. 20% in the rest of the data) and they were all women clients. They also tended to have low balances in their asset and investment accounts while also having low numbers on transfers and trading activities.

If we move to the lower left corner of the network and focus on the high churn rate group there (again colored in red and yellow), the pattern is significantly different. Gender was 100% male and one of the outstanding reasons they churned was the underperformance of their assets.

In the current context, TDA not only helps our customer to predict the clients at risk but also to understand the variety of underlying drivers for leaving the bank. This client profiling significantly alleviates the painful process of random guessing on the appropriate and relevant retention proposals. Without TDA, these client insights typically take months or years to develop. We surfaced them in a matter of days.

Remember this is an imbalance data set (base churn rate 3.4%) and if the classifier for some reason labels every client as non-churners one would get an accuracy of 96.6%, which is misleading. The appropriate metric to evaluate the final model will be the ROC curve and the area under the curve (AUC), which is show below.

The dash line in the chart represents a random guessing model and the blue line is our final model, which produces a 24% lift.

As one moves along the line, it’s basically a tradeoff between recall and precision. In the context of client churn, the recall (or sensitivity) gives the gauge of how many churners our customer will catch out of all true churners and the precision gives our customer the actual churners out of all predicted churners.

From a private banking operation stand point, it’s a tradeoff between client asset retention (higher recall = more client outreach, more expense but more assets retained) and customer relationship maintenance cost saving (higher precision = less client outreach, therefore cheaper but some churners get through). Our model provides our customer with a comprehensive view into their tradeoffs and the expected return of their client retention efforts.

The aforementioned modeling, segmentation, profiling and prediction workflow was based on our customer’s historical data. In practice, one can set up an automatic process in operation to retrain the TDA model once new data are available so that the model always stays up to date and adapts to pattern shift in client behaviors.

There are obviously more details to the story and more extensibility to the technology (credit default prediction, false positives in AML, etc.).

Feel free to reach out to us to get more information on how we might apply our technology to your challenges.