# Topological Data Analysis of Oil and Gas Petrophysical Data

The explosion of data in the energy sector is simply stunning. In an industry where the old timers still visit the maproom, the depth and breath of digital data has completely transformed how energy is discovered, produced and stored.

Look at drilling for example. From active and passive seismic sensors or the complex borehole sensor data, drilling and completion operations the drilling community is producing massive amounts of data per well.

The challenge for drillers, and for the industry at large, is how to turn these stores of data into actionable insights.

One such example is how to use knowledge of petrophysical composition to find the sweet spots for fracking or refracturing. By relying on statistical and machine learning methods (such as Principal Component Analysis, k-means and hierarchical clustering, Gaussian Mixture models, and Neural Network Classifiers) you will find yourself in need of a more efficient and rigorous mathematical method.

Topological Data Analysis (TDA) provides an effective way of extracting actionable insights from disparate data sources.

TDA uses topology, the mathematical study of geometrical shape, to understand complex datasets. Linear regression, the fitting of a straight line to a cloud of points on a plane, is perhaps the most rudimentary type of topological data analysis all scientists and engineers are familiar with.

A cloud of points on a plane roughly distributed along a circle is another familiar shape that is associated with periodicity.

Groups of points on a plane also have a topological interpretation in terms of disconnected, i.e., independent logical units. Another familiar shape is the Y junction that is associated with bifurcation phenomena.

When the number of dimensions in the data increases, however, it becomes increasingly difficult to discern geometrical features, such as these, and it becomes all the more difficult to derive locally linear models. TDA builds upon and generalizes these geometric concepts to real-world datasets that can span millions of explanatory variables and millions of measuring points.

The idea behind the application of TDA is to represent data via topological networks, i.e., data is represented by grouping similar data points into *nodes*, and connecting the nodes by an *edge* if the corresponding nodes have at least a data point in common. Because each node represents multiple data points, the network gives a compressed, low dimensional version of extremely high dimensional data. When used as a framework in conjunction with machine learning, TDA enables the understanding of the shape of complex data sets, highlighting previously hidden groups of data, and revealing the relevant explanatory variables.

We have applied TDA to the problem of lithofacies classification to a set of nine wells in the Marcellus play and found that points with high Total Organic Carbon (TOC) can belong to different groups. The details of this work can be found in [Cortis, 2015].

Moreover, those lithofacies groups that have been found by means of TDA have very distinct marginal distributions in the compressional velocity vs. density representation, a property that is not shared by groups found by other classification methods.

Having defined groups with well-defined marginal distributions, it is possible to define the main variables that characterize each group, and define models with unprecedented predictive precision.

Now including production, completion, seismic, microseismic, and geomechanical data to the mix and- thanks to TDA- the oil and gas industry has a new powerful and flexible tool in this exploration and production toolbox, which now allows the solution of previously hopeless decision problems.