IAM

MISC

MatLab Two-Layer Perceptron

This is a MatLab implementation of a two-layer perceptron, that is a neural network with one input, one hidden and one output layer. The implementation was assessed using the MNIST dataset. The work was part of a seminar paper at the chair for computer science i6, RWTH Aachen University.

Overview

The MNIST dataset is a dataset of $60k$ handwritten digits in resolution $28 \times 28$ where the task is to classify each image as the correct digit. For this seminar paper, a two-layer perceptron was implemented in MatLab. The code is available on GitHub:

MatLab Two-Layer Perceptron on GitHub

Seminar Paper

Update. The slides of the seminar paper are part of Prof. Schiele's and Dr. Mario Fritz' lecture slides on deep learning.

In this seminar paper we study artificial neural networks, their training and application to pattern recognition. We start by giving a general definition of artificial neural networks and introduce both the single-layer and the multilayer perceptron. After considering several activation functions we discuss network topology and the expressive power of multilayer perceptrons. The second section introduces supervised network training. Therefore, we discuss gradient descent and Newton's method for parameter optimization. We derive the error backpropagation algorithm for evaluating the gradient of the error function and extend this approach to evaluate its hessian. In addition, the concept of regularization will be introduced. The third section introduces pattern classification. Using maximum likelihood estimation we derive the cross-entropy error function. As application, we train a two-layer perceptron to recognize handwritten digits based on the MNIST dataset.

Seminar PaperPresentation SlidesPaper on GitHub

Network Training

We want to train a two-layer perceptron to recognize handwritten digits, that is given a new $28 \times 28$ pixels image, the goal is to decide which digit it represents. For this purpose, the two-layer perceptron consists of $28 \cdot 28 = 784$ input units, a variable number of hidden units and $10$ output units. The general case of a two-layer perceptron with $D$ input units, $m$ hidden units and $C$ output units is shown in figure 1. The network is trained using a stochastic variant of mini-batch training, the sum-of-squared error function and the error backpropagation algorithm. The method returns the weights of the hidden layer and the output layer after training as well as the normalized sum-of-squared error after the last iteration. In addition, it plots the normalized error over time resulting in a plot as shown in figure 2.

function [hiddenWeights, outputWeights, error] = trainStochasticSquaredErrorTwoLayerPerceptron(activationFunction, dActivationFunction, numberOfHiddenUnits, inputValues, targetValues, epochs, batchSize, learningRate)
% trainStochasticSquaredErrorTwoLayerPerceptron Creates a two-layer perceptron
% and trains it on the MNIST dataset.
%
% INPUT:
% activationFunction             : Activation function used in both layers.
% dActivationFunction            : Derivative of the activation
% function used in both layers.
% numberOfHiddenUnits            : Number of hidden units.
% inputValues                    : Input values for training (784 x 60000)
% targetValues                   : Target values for training (1 x 60000)
% epochs                         : Number of epochs to train.
% batchSize                      : Plot error after batchSize images.
% learningRate                   : Learning rate to apply.
%
% OUTPUT:
% hiddenWeights                  : Weights of the hidden layer.
% outputWeights                  : Weights of the output layer.
% 

100-500-0.5-500units

Figure 2: The normalized error plotted during training with a learning rate $\gamma = 0.5$, $500$ hidden units, a batch size of $100$ images iterated $500$ times. The $y$-axis represents the normalized error; the $x$-axis is the iteration.

The above method requires the activation function used for both the hidden layer and the output layer to be given as parameter. The logistic sigmoid defined by

$\sigma(z) = \frac{1}{1 + \exp(-z)}$

is a commonly used activation function and implemented in logisticSigmoid. In addition, the error backpropagation algorithm needs the derivative of the activation function which is implemented as dLogisticSigmoid.

function y = logisticSigmoid(x)
% simpleLogisticSigmoid Logistic sigmoid activation function
% 
% INPUT:
% x     : Input vector.
%
% OUTPUT:
% y     : Output vector where the logistic sigmoid was applied element by
% element.
%
function y = dLogisticSigmoid(x)
% dLogisticSigmoid Derivative of the logistic sigmoid.
% 
% INPUT:
% x     : Input vector.
%
% OUTPUT:
% y     : Output vector where the derivative of the logistic sigmoid was
% applied element by element.
%

Usage and Validation

The method applyStochasticSquaredErrorTwoLayerPerceptronMNIST provides an example of how to use the above methods:

% Load MNIST dataset.
inputValues = loadMNISTImages('train-images.idx3-ubyte');
labels = loadMNISTLabels('train-labels.idx1-ubyte');
    
% Transform the labels to correct target values.
targetValues = 0.*ones(10, size(labels, 1));
for n = 1: size(labels, 1)
    targetValues(labels(n) + 1, n) = 1;
end;
    
% Choose form of MLP:
numberOfHiddenUnits = 700;
    
% Choose appropriate parameters.
learningRate = 0.1;
    
% Choose activation function.
activationFunction = @logisticSigmoid;
dActivationFunction = @dLogisticSigmoid;
    
% Choose batch size and epochs. Remember there are 60k input values.
batchSize = 100;
epochs = 500;
    
fprintf('Train twolayer perceptron with %d hidden units.\n', numberOfHiddenUnits);
fprintf('Learning rate: %d.\n', learningRate);
    
[hiddenWeights, outputWeights, error] = trainStochasticSquaredErrorTwoLayerPerceptron(activationFunction, dActivationFunction, numberOfHiddenUnits, inputValues, targetValues, epochs, batchSize, learningRate);
    
% Load validation set.
inputValues = loadMNISTImages('t10k-images.idx3-ubyte');
labels = loadMNISTLabels('t10k-labels.idx1-ubyte');
    
% Choose decision rule.
fprintf('Validation:\n');
    
[correctlyClassified, classificationErrors] = validateTwoLayerPerceptron(activationFunction, hiddenWeights, outputWeights, inputValues, labels);
    
fprintf('Classification errors: %d\n', classificationErrors);
fprintf('Correctly classified: %d\n', correctlyClassified);

First the MNIST dataset needs to be loaded using the methods mentioned above (loadMNISTImages and loadMNISTLaels). The labels are provided as vector where the $i^{th}$ entry contains the digit represented by the $i^{th}$ image. We transform the labels to form a $10 \times N$ matrix, where $N$ is the number of training images, such that the $i^{th}$ entry of the $n^{th}$ column vector is $1$ iff the $n^{th}$ training image represents the digit $i - 1$.

The network is trained using the logistic sigmoid activation function, a fixed batch size and a fixed number of iterations. The training method trainStochasticSquaredErrorTwoLayerPerceptron returns the weights of the hidden layer and the output layer as well as the normalized sum-of-squared error after the last iteration.

The method validateTwoLayerPerceptron uses the network weights to count the number of classification errors on the validation set.

Results

Some of the results after validating the two-layer perceptron on the provided validation set can be found in my seminar paper or in figure 3.

Tow-layer Perceptron Results

Figure 3: Results of the trained two-layer perceptron evaluated on the validation set of $10,000$ handwritten digits. Left: $500$ training iterations with a batch size of $100$ and the shown learning rates $\gamma = 0.5$ and $\gamma = 0.1$. Right: $500$ training iterations with fixed learning rate $\gamma = 0.5$ and batch sizes $100$ and $200$.
What is your opinion on this article? Let me know your thoughts on Twitter @davidstutz92 or LinkedIn in/davidstutz92.