MISC
MatLab Two-Layer Perceptron
This is a MatLab implementation of a two-layer perceptron, that is a neural network with one input, one hidden and one output layer. The implementation was assessed using the MNIST dataset. The work was part of a seminar paper at the chair for computer science i6, RWTH Aachen University.
Overview
The MNIST dataset is a dataset of $60k$ handwritten digits in resolution $28 \times 28$ where the task is to classify each image as the correct digit. For this seminar paper, a two-layer perceptron was implemented in MatLab. The code is available on GitHub:
MatLab Two-Layer Perceptron on GitHubSeminar Paper
Update. The slides of the seminar paper are part of Prof. Schiele's and Dr. Mario Fritz' lecture slides on deep learning.
In this seminar paper we study artificial neural networks, their training and application to pattern recognition. We start by giving a general definition of artificial neural networks and introduce both the single-layer and the multilayer perceptron. After considering several activation functions we discuss network topology and the expressive power of multilayer perceptrons. The second section introduces supervised network training. Therefore, we discuss gradient descent and Newton's method for parameter optimization. We derive the error backpropagation algorithm for evaluating the gradient of the error function and extend this approach to evaluate its hessian. In addition, the concept of regularization will be introduced. The third section introduces pattern classification. Using maximum likelihood estimation we derive the cross-entropy error function. As application, we train a two-layer perceptron to recognize handwritten digits based on the MNIST dataset.
Seminar PaperPresentation SlidesPaper on GitHub
Network Training
We want to train a two-layer perceptron to recognize handwritten digits, that is given a new $28 \times 28$ pixels image, the goal is to decide which digit it represents. For this purpose, the two-layer perceptron consists of $28 \cdot 28 = 784$ input units, a variable number of hidden units and $10$ output units. The general case of a two-layer perceptron with $D$ input units, $m$ hidden units and $C$ output units is shown in figure 1. The network is trained using a stochastic variant of mini-batch training, the sum-of-squared error function and the error backpropagation algorithm. The method returns the weights of the hidden layer and the output layer after training as well as the normalized sum-of-squared error after the last iteration. In addition, it plots the normalized error over time resulting in a plot as shown in figure 2.
function [hiddenWeights, outputWeights, error] = trainStochasticSquaredErrorTwoLayerPerceptron(activationFunction, dActivationFunction, numberOfHiddenUnits, inputValues, targetValues, epochs, batchSize, learningRate) % trainStochasticSquaredErrorTwoLayerPerceptron Creates a two-layer perceptron % and trains it on the MNIST dataset. % % INPUT: % activationFunction : Activation function used in both layers. % dActivationFunction : Derivative of the activation % function used in both layers. % numberOfHiddenUnits : Number of hidden units. % inputValues : Input values for training (784 x 60000) % targetValues : Target values for training (1 x 60000) % epochs : Number of epochs to train. % batchSize : Plot error after batchSize images. % learningRate : Learning rate to apply. % % OUTPUT: % hiddenWeights : Weights of the hidden layer. % outputWeights : Weights of the output layer. %
The above method requires the activation function used for both the hidden layer and the output layer to be given as parameter. The logistic sigmoid defined by
$\sigma(z) = \frac{1}{1 + \exp(-z)}$
is a commonly used activation function and implemented in logisticSigmoid
. In addition, the error backpropagation algorithm needs the derivative of the activation function which is implemented as dLogisticSigmoid
.
function y = logisticSigmoid(x) % simpleLogisticSigmoid Logistic sigmoid activation function % % INPUT: % x : Input vector. % % OUTPUT: % y : Output vector where the logistic sigmoid was applied element by % element. %
function y = dLogisticSigmoid(x) % dLogisticSigmoid Derivative of the logistic sigmoid. % % INPUT: % x : Input vector. % % OUTPUT: % y : Output vector where the derivative of the logistic sigmoid was % applied element by element. %
Usage and Validation
The method applyStochasticSquaredErrorTwoLayerPerceptronMNIST
provides an example of how to use the above methods:
% Load MNIST dataset. inputValues = loadMNISTImages('train-images.idx3-ubyte'); labels = loadMNISTLabels('train-labels.idx1-ubyte'); % Transform the labels to correct target values. targetValues = 0.*ones(10, size(labels, 1)); for n = 1: size(labels, 1) targetValues(labels(n) + 1, n) = 1; end; % Choose form of MLP: numberOfHiddenUnits = 700; % Choose appropriate parameters. learningRate = 0.1; % Choose activation function. activationFunction = @logisticSigmoid; dActivationFunction = @dLogisticSigmoid; % Choose batch size and epochs. Remember there are 60k input values. batchSize = 100; epochs = 500; fprintf('Train twolayer perceptron with %d hidden units.\n', numberOfHiddenUnits); fprintf('Learning rate: %d.\n', learningRate); [hiddenWeights, outputWeights, error] = trainStochasticSquaredErrorTwoLayerPerceptron(activationFunction, dActivationFunction, numberOfHiddenUnits, inputValues, targetValues, epochs, batchSize, learningRate); % Load validation set. inputValues = loadMNISTImages('t10k-images.idx3-ubyte'); labels = loadMNISTLabels('t10k-labels.idx1-ubyte'); % Choose decision rule. fprintf('Validation:\n'); [correctlyClassified, classificationErrors] = validateTwoLayerPerceptron(activationFunction, hiddenWeights, outputWeights, inputValues, labels); fprintf('Classification errors: %d\n', classificationErrors); fprintf('Correctly classified: %d\n', correctlyClassified);
First the MNIST dataset needs to be loaded using the methods mentioned above (loadMNISTImages
and loadMNISTLaels
). The labels are provided as vector where the $i^{th}$ entry contains the digit represented by the $i^{th}$ image. We transform the labels to form a $10 \times N$ matrix, where $N$ is the number of training images, such that the $i^{th}$ entry of the $n^{th}$ column vector is $1$ iff the $n^{th}$ training image represents the digit $i - 1$.
The network is trained using the logistic sigmoid activation function, a fixed batch size and a fixed number of iterations. The training method trainStochasticSquaredErrorTwoLayerPerceptron
returns the weights of the hidden layer and the output layer as well as the normalized sum-of-squared error after the last iteration.
The method validateTwoLayerPerceptron
uses the network weights to count the number of classification errors on the validation set.
Results
Some of the results after validating the two-layer perceptron on the provided validation set can be found in my seminar paper or in figure 3.