Topology | July 9, 2018

Incorporating Prior Information In Order to Boost Neural Networks

BY Gunnar Carlsson

Deep neural nets typically operate on “raw data” of some kind, such as images, text, time series, etc., without the benefit of “derived” features. The idea is that because of their flexibility, neural networks can learn the features relevant to the problem at hand, be it a classification problem or an estimation problem.  Whether derived or learned, features are important. The challenge is in determining how one might use what one learned from the features in future work (staying inside the same typology, e.g. images, text, etc.).

By eliminating the need to relearn these features one can increase the performance (speed and accuracy) of a neural networks.

In this post, we describe how these ideas are used to speed up the learning process in image datasets while simultaneously improving accuracy. This work is a continuation of our collaboration with Rickard Brüel Gabrielsson

As we recall from our first and second installments we can, discover how convolutional neural nets learn and how to use that information to improve performance while allowing for the generalization of those techniques between data sets of the same type.

These findings drew from a previous paper where we analyzed a high density set of 3×3 image patches in a large database of images.  That paper identified the patches visible in Figure 1 below and allowed them to be described algebraically. This permits the creation of features (for more detail, see the technical note at the end).  What was found in the paper was that at a given density threshold (e.g. the top 10% densest patches) those patches were organized around a circle. The organization is summarized by the following image.


Figure 1

While the initial finding was via the study of a data set of patches, we have confirmed the finding through the study of a space of weight vectors in a neural network trained on data sets of images of hand-drawn digits. Each point on the circle is specified by an angle , and there is a corresponding patch P.


What this allows us to do is to create a feature that is very simple to compute using trigonometric functions, that for each angle that measures the similarity of the given patch to P. See the technical note at the end for a precise definition. By adjoining these features to two distinct data sets, and treating them as separate data channels, we have obtained improved training results for the two distinct data sets, one MNIST and the other SVHN.  We’ll call this approach the boosted method. We are using a simple two layer convolutional neural net in both cases.  The results are as follows.


Validation Accuracy # Batch iterations Boosted # Batch iterations standard
.8 187 293
.9 378 731
.95 1046 2052
.96 1499 2974
.97 2398 4528
.98 5516 8802
.985 9584 16722


Validation Accuracy # Batch iterations Boosted # Batch iterations standard
.25 303 1148
.5 745 2464
.75 1655 5866
.8 2534 8727
.83 4782 13067
.84 6312 15624
.85 8426 21009

The application to MNIST gives a rough speedup of a little under 2X.  For the more complicated SVHN, we obtain improvement by a rough factor of  3.5X until hitting the .8 threshold, and lowering to 2.5-3X thereafter. An examination of the graphs of validation accuracy vs. number of batch iterations suggests, plotted to 30,000 iterations, that at the higher accuracy ranges, the standard method may never attain the results of the boosted method.

These findings suggest that one should always use these features for image data sets in order to speed up learning.  The results in our earlier post also suggest that they contribute to improved generalization capability from one data set to another.

These results are only a beginning.  There are more complex models of spaces of high density patches in natural images, which can generate richer sets of features, and which one would expect to enable improvement, particularly when used in higher layers.  


In our next post, we will describe a methodology for constructing neural net architectures especially adapted to data sets of fixed structures, or adapted to individual data sets based on information concerning their sets of features.  By using a set of features equipped with a geometry or distance function, we will show how that information can be used to inform the development of architectures for various applications of deep neural nets.

Additional Resources

Artificial Intelligence | July 21, 2020
Beyond Explainability – AI Transparency

It is now well understood that in order to make Artificial Intelligence broadly useful, it is critical that humans can interact with and have confidence in the algorithms that are being used. This observation has led to the development of the notion of explainable AI (sometimes called XAI) which was initially a loosely defined concept […]

Artificial Intelligence | February 12, 2020
AyasdiAI Model Accelerator recognized as Prime Example of AI Governance by Singapore Government at Davos 2020

On Jan 22nd, 2020, Singapore launched its second edition of its Model AI Governance Framework at the World Economic Forum in Davos at a joint press conference for the Fourth Industrial Revolution (WEF C4IR). This framework’s “unique contribution to the global discourse on AI ethics lies in translating ethical principles into practical recommendations that organizations […]

Artificial Intelligence | December 3, 2018
Relationships, Geometry, and Artificial Intelligence

In their very provocative paper, Peter Battaglia and his colleagues, posit that in order for artificial intelligence (AI) to achieve the capabilities of human intelligence, it must be able to compute with highly structured data types, such as the ones humans deal with. This means that AI must be able to work with information about relationships […]