Ocean

In this article, I will present to you a famous neural network architecture knows as a Deep Autoencoder. We will discuss in detail how an Autoencoders works, and what Deep Autoencoders are used for.

But most importantly I will show you how to implement a Deep AutoEncoder in TensorFlow 2.0 — nice :)


Table of Content

  1. Recap: Architecture of Feedforward Neural Networks
  2. The Architecture of an Autoencoder
  3. Dimensionally Reduction
  4. Training of an Autoencoder
  5. Autoencoder TensorFlow 2.0 Code
  6. What are Autoencoders used for?

1. Recap: Architecture of Feedforward Neural Networks

Before we address the architecture of an autoencoder let us quickly recap the architecture of the probably most simple and common neural network architecture: The feedforward Neural Network.

The architecture of a feedforward neural network looks as follows:

Feedforward Neural Network

This architecture consists of a collection of nodes/neurons that are connected by edges. The neurons represent numeric values, which are some intermediate results of computations that are going on inside the network.

The edges, on the other hand, represent weights, which are the parameters of the network that are adjusted and improved during training.

A feedforward neural network can have an arbitrary number of layers, each with an arbitrary amount of neurons.

We call the first layer as the input layer that receives an input. The last layer is the output layer that represents the prediction of the network.

The layers between the input and the output layer are called the hidden layers. More information in feedforward neural networks and how they are functioning can be found in the article “What is Deep Learning and how does it work?


The Architecture of an AutoEncoder

The simplest form of Autoencoder is a feedforward neural network that you are already familiar with. As with feedforward neural networks, an Autoencoder has an input layer, an output layer, and one or more hidden layers:

AutoEncoder

But there is a significant difference to a regular feedforward neural network. An Autoencoder has always the same number of neurons in the output layer as there are neurons in the input layer.

This is because an Autoencoder is used for a special type of task. While a feedforward neural network is usually used for regression or classification tasks, an Autoencoder, on the other hand, is used to compress and reconstruct the input data.

The input data which we usually refer to as input feature vector x is compressed to a shorter (latent) representation. This latent representation is then decompressed into something that approximates the original input data.

AutoEncoder Architecture

The Autoencoder has several hidden layers. The number of hidden neurons decreases with the depth of the network up until the middle layer, which contains the latent representation of the input data.

We call this transition as the encoding step because the decreasing number of neurons in these layers enforces the compression or encoding of the input features x into a latent representation located in the middle layer.

On the other hand, we have the decoding step, which represents the transition from the middle layer up to the final output layer. In the decoding step, the number of neurons in the hidden layers increases again, reconstructing the latent representation into its original form.

At this point, it should be noted that an Autoencoder does not use the labels in the same way as a feedforward neural network. The labels here are the features.

So, in the end, we compare the predictions of an Autoencoder with the original input data. In a nutshell, all what an Autoencoder does is taking some input features x, encoding them into a latent representation and using this latent representation to reconstruct the original input features.


3. Dimensionally Reduction

The decreasing number of neurons during the decoding step forces the Autoencoder to engage in dimensionality reduction. This can only be accomplished by learning how to disregard not important information in the input features.

This way the input features are reduced to only the most relevant information.

Please consider the case where an Autoencoder has 20 input neurons. The network encodes the input features into a latent representation spread among 10 hidden neurons in the middle layer. If this Autoencoder is able to use this shorter, latent representation to reconstruct the original input than this means that some information in the input features was not that relevant.


Advantages of Depth

Autoencoder is often implemented and trained with only a single hidden layer, however, by using a deeper architecture we are leveraging several advantages:

  • Depth can exponentially reduce the computational cost of representing the input features
  • Depth can exponentially decrease the amount of training data
  • Experimentally, Deep Autoencoders yield better compression compared to shallow or linear Autoencoders.

An Autoencoder can be used to encode data of varying degrees of complexity. From simple numbers to full pictures. It has to be taken into account that for data of higher complexity such as an image, the Autoencoder needs to have more hidden layers.

Only the additional layers allow the network to learn more and more complex data patterns.

An Intuitive Example

Let’s take a look at a simple but intuitive example of how an Autoencoder works in practice. Please consider the following Autoencoder, which is used to encode the mushroom into a latent representation. This representation is then used to restore the original input image:

AutoEncoder Example

The hidden layers of the Autoencoder encode the essential features of the mushroom, e.g. : round head, yellow face, black eyes, redhead and white spots into a latent representation z in the middle layer.

These encoded features are used during the decoding step to recreate the input. Although it was possible to reconstruct the mushroom, the result differs somewhat from the original input. Certain features such as shadows or the exact color palette were not learned by the Autoencoder, resulting in a slight difference between input and output.

This example shows how things are in reality. An Autoencoder can not reconstruct the input to 100%. There is always a measurable difference between the input and the output.

Our goal in building and training the AutoEncoder is to minimize this difference. Because a smaller difference between input and output means a better and more accurate latent representation of the input. Or in other words: a better encoded, lower-dimensional representation of the essential information of the features.

This latent representation is very important because it can be used for many different purposes, such as detecting fraud in the financial transaction and other things which we will cover in the following.

First, I would like to discuss with you how the Autoencoder can be trained so that we get an accurate latent representation of the input features.


4. Training of an AutoEncoder

Fortunately, we can train an Autoencoder the same way as we train a regular feedforward neural network. The layers of Autoencoder are connected by weights:

AutoEncoder Weights

Given the input vector x, we must perform a dot-product between the input vector x and the weight matrix that connects the input layers with the first hidden layer. After this step, we must apply a nonlinear function such as Tanh ReLU or Sigmoid (We have covered different kinds of activation functions in Deep Learning in detail in the Article Activation Functions in Neural Networks ).

We repeat this process until we reach the middle layer. Here the hidden neurons are usually represented by the letter z, which is the latent representation of the input vector x. This was the encoding step, that can be summarized by the following equations:

Encoding Step

Here a is the vector of hidden values in the first hidden layer of the network and σ an arbitrary non-linear activation function.

During the decoding step, we use the latent vector z as the new input and repeat the same process from before, which is computing dot-products and applying activation functions. And the final output of the network is the vector x_hat, which represents the reconstructed input x:

Decoding Step

Please note that in this case, it is important to choose the activation function in the last layer very wisely. Depending on the data values in input vector x, some non-linear functions may not cover the values that you want to reconstruct.

As already mentioned, we want to reduce the difference between the input vector x and its reconstructed counterpart x_hat. We can also formulate this goal as a minimization of the Mean Squared Error loss function with regular gradient descent approach. (For more information on the Mean Squared Error and other loss functions please refer to the article “Loss Functions in Deep Learning”) :

MSE

Minimizing the MSE loss function reduces the difference between x and x_hat, which automatically results in a better latent representation of the input vector x.

There are 3 hyperparameters we must consider before training an AutoEncoder, which are …

  • Size of the latent Vector z: Number of neurons in the middle layer. Smaller size results in more compression.
  • Number of layers: The neural network can have as many layers as we want, however it recommended having at least 1 hidden layer for encoding and one hidden layer for the decoding step.
  • Number of Neurons per Layer: A deep Autoencoder usually has a decreasing number of neurons per layer during the encoding and an increasing number of neurons per layer during the decoding step. Besides, the network should be symmetrical.

5. Autoencoder TensorFlow 2.0 Code

In this section, we will build an Autoencoder that encodes the MNIST dataset into a latent representation and uses this representation to reconstruct the original input.

First, we must import the libraries and download the MNIST dataset:

After that, we can implement the encoding step of the autoencoder as a class. In this case, the encoder has two hidden layers:

After that follows the decoding step, again implemented as a class. The decoder has one hidden layer and one output layer. So, in total, the autoencoder has three hidden layers. All layers use ReLU as the activation function:

Now we can build the autoencoder class where the encoder and the decoder are connected with each other. Also, we define a function for the computation of the MSE loss and training of the network.

After that, we can initialize an instance of the autoencoder class and start training for 5 epochs.

After training the autoencoder is able to reconstruct the original input images of the MNIST dataset. The following image of the left-hand side shows the original MNIST input features x that is used as input for an AutoEncoder. The image on the right-hand side shows the reconstruction of this input. Although the input can be reconstructed we can observe a difference between the input and the output:


6. What are Autoencoders used for?

Autoencoder are used in a variety of different fields, such as dimensionality reduction, anomaly detection (in particular detection of financial fraud), image processing, and information retrieval.

Dimensionality Reduction

Dimensionality reduction is not only a use case for deep autoencoder but in fact, it was one of the first applications of deep learning and one of the early motivations to study autoencoders.

In a nutshell, we can say that we want to project data from high dimensional features space to a lower-dimensional features space. This is basically the encoding step.

The input feature x represents the higher dimensional space and the hidden vector z represents the lower-dimensional representation.

One milestone paper on this subject was published in Science Magazine in 2006 (Hinton, G. E.; Salakhutdinov, R.R. (28 July 2006). “Reducing the Dimensionality of Data with Neural Networks”.) In this paper, a deep autoencoder achieved a better reconstruction error compared to PCA, which was before that the go-to-method for dimensionality reduction.

Representing data in a lower-dimensional space can improve performance on different tasks, such as classification.

Detection of Financial Fraud

Another field of application where autoencoders are used is anomaly detection. In particular detection of fraudulent financial transactions (which can be considered as anomalies) is an area where autoencoders perform very well.

Since the network learns to reconstruct the most important information in the training data the model is able to reproduce precisely the most frequent characteristics and patterns of the data. During training, the autoencoder learns only on non-anomalous data instances. After training, the autoencoder will reconstruct normal data very well, while failing to do so with anomaly data which the autoencoder has not encountered.

This means that if the autoencoder faces an anomaly as input data, then the error between this input and the reconstruction will be much higher.

Based on this reconstruction error an anomaly can be distinguished from a normal data instance.

Image Processing: Compression, Denoising, Super-Resolution

Another field of application for autoencoders is image processing

One example is image compression, where an image is compressed or encoded into a format that takes less disc space than the original image. A standard approach for the encoding of images is JPEG 2000. However, autoencoders already demonstrated their potential in image compression by outperforming this standard approach.

This is due to the unique character of the network to encode features of higher dimensional space into a lower-dimensional space while keeping the essential information of the data (image).

Another useful application of autoencoders in the field of image preprocessing is image denoising. Images or videos taken in poor conditions often require some kind of image restoration or quality improvements. Autoencoders are able to remove unwanted artifacts from the images and videos.

An even more astonishing capability of autoencoders is increasing the resolution of images and videos, for example from 1080p to 4K. This application gets more and more popular in the gaming industry.

Information Retrieval

Autoencoders can also be used for information retrieval. This kind of tasks benefits from dimensionality reduction. This way seach for certain items can become more efficient in low dimensional spaces.

Autoencoders were indeed applied to semantic hashing. In a nutshell, the autoencoder is trained to produce a low-dimensional binary code. After that, all database entries could be stored in a hash table mapping this binary code vectors to the entries in the database.

This table would then allow performing information retrieval by returning all entries with the same binary code as the query, or slightly less similar entries by flipping some bits from the encoding of the query.


Take-Home-Message

  • Autoencoders are basically feedforward neural networks where the input and the output layer has the same number of neurons
  • The features are used as labels. Meaning we compare the input features with the reconstructed input.
  • Autoencoders perform dimensionality reduction of input data, by disregarding non-relevant information
  • Autoencoders are widely used in image processing, fraud detection, and information retrieval