In the wake of the credit markets implosion in 2008, the Global Systemically Important Banks (G-SIB) were faced with a multitude of challenges. Challenges around risk management, reputation, financial stability – and not least regulation. The new mantra from regulators was to ensure that the G-SIB could endure extreme volatility and prolonged periods of economic stress.

One such set of regulations are the Basel accords. With Basel, standardized rules-based approaches are available as well as the option to use data-driven modeling approaches.

The second part is key.

By adopting a data-driven approach, G-SIB institutions can provide a significant reduction in capital requirements, more visibility into ongoing risk levels and more options for improvement such as greater collateralization and optimal product mixes. With thousands of people doing model development and hundreds of people validating those models across various stages, this is a laborious undertaking by any measure with fragmented teams spread across different locations utilizing disparate analytics approaches.

One component of this modeling exercise, and a critical one, is Risk Weighted Assets (RWA). RWA are the amount of capital allocated to cover counterparty default risk. This requirement transcends on-balance sheet exposures to cover off-balance sheet, derivative contracts and other exposures. In addition, RWA encompasses Credit Risk, Market Risk and Operational Risk. There are a whole host of parameters that go into Credit Risk calculations including probability of default (PD), Loss Given Default (LGD), Exposure at Default (EAD) and Maturity. For purposes of this post, we will focus on the PD component and show how the Ayasdi can be purposed to take a team from insight generation to model creation.

Let’s take a look at how this would work in practice by using a public representative loan dataset from Lending Club. This dataset includes approximately 5 years of data on over 25,000 loans. Characteristics include borrower specific attributes, loan specific attributes and market conditions at time of origination.

The Ayasdi platform brings together the most optimal combination of machine learning algorithms to produce a topological summary. The topological summary here is a similarity map across all of these dimensions. Each node in this map represents a set of similar loans, while each edge between the nodes represents a common loan(s) across the respective nodes. Color is used to visualize an outcome of interest.


The topological summary above is colored by the predictions from one global logistic regression model across the entire dataset (left side) while the topological summary is colored by the actual default behavior (right side). The global model captures portions of the default behavior but contains distinct regions of false positives and false negatives. The reason for this is that there are nonlinearities in the underlying data. Let’s take a closer look at what’s driving the island region on the top left side of the topological summary.


The above shows a statistical ranking of key drivers that distinguish the small group subpopulation from the rest. The top factors include borrower-specific characteristics such as debt-to-income ratio and number of bankruptcies, loan- specific characteristics such as the amount and term structure as surface up, and market-specific characteristics such as 3M changes of the Ten Year Rate and Unemployment Claims.

 A local logistic regression model can be built on the top-left island region incorporating the key factors. The side-by-side comparison shows how a local logistic regression model more accurately captures the default characteristics and minimizes the effect of false positives and false negatives. This can easily be programmatically extended to create local models for all such regions within the topological summary.


What this means is that through these topological summaries, Ayasdi can identify areas of localization that would ultimately elude standard techniques. As a result, with knowledge of these groups of default modelers can ultimately build models that are more accurate and statistically valid while always being transparent about the underlying model generation process.

This has significant implications for Basel and regulation. The reason is that regulators demand simplicity – often in the form of a regression model. Regression, however, often is not the right approach. Using Ayasdi’s approach the areas that can be modeled with regression are identified and then modeled in a way that is far superior from the regulators perspective – and in a fraction of the time.

For G-SIB institutions looking to enhance their risk management function in an efficient (people and time) there are few better options.

In the next post, we will take a closer look at corporate and international datasets as well as approaches to characterizing Market Risk.  If you have a suggestion for this series drop us a note at