# DAVIDSTUTZ

Check out the latest superpixel benchmark — Superpixel Benchmark (2016) — and let me know your opinion! @david_stutz

## Recognizing Handwritten Digits using a Two-Layer Perceptron and the MNIST Dataset

In the course of my seminar paper on neural networks and their usage in pattern recognition I came across the MNIST dataset. The MNIST dataset provides test and validation images of handwritten digits. As result, I implemented a two-layer perceptron in MatLab to apply my knowledge of neural networks to the problem of recognizing handwritten digits.

### Foreword

In the course of a seminar on “Selected Topics in Human Language Technology and Pattern Recognition” I wrote a seminar paper on neural networks: "Introduction to Neural Networks". The seminar paper and the slides of the corresponding talk can be found in my previous article: Seminar Paper “Introduction to Neural Networks”. Background on neural networks and the two-layer perceptron can be found in my seminar paper.

### Introduction

The MNIST dataset provides a training set of $60,000$ handwritten digits and a validation set of $10,000$ handwritten digits. The images have a size of $28 \times 28$ pixels. We want to train a two-layer perceptron to recognize handwritten digits, that is given a new $28 \times 28$ pixels image, the goal is to decide which digit it represents. For this purpose, the two-layer perceptron consists of $28 \cdot 28 = 784$ input units, a variable number of hidden units and $10$ output units. The general case of a two-layer perceptron with $D$ input units, $m$ hidden units and $C$ output units is shown in figure 1.

### Code

The two-layer perceptron is implemented in MatLab and the code can be found on GitHub and is available under the GNU General Public License version 3.

The methods loadMNISTImages and loadMNISTLabels are used to load the MNIST dataset as it is stored in a special file format. The methods can be found online at http://ufldl.stanford.edu/wiki/index.php/Using_the_MNIST_Dataset.

### Network Training

The network is trained using a stochastic variant of mini-batch training, the sum-of-squared error function and the error backpropagation algorithm. The method returns the weights of the hidden layer and the output layer after training as well as the normalized sum-of-squared error after the last iteration. In addition, it plots the normalized error over time resulting in a plot as shown in figure 2.

function [hiddenWeights, outputWeights, error] = trainStochasticSquaredErrorTwoLayerPerceptron(activationFunction, dActivationFunction, numberOfHiddenUnits, inputValues, targetValues, epochs, batchSize, learningRate)
% trainStochasticSquaredErrorTwoLayerPerceptron Creates a two-layer perceptron
% and trains it on the MNIST dataset.
%
% INPUT:
% activationFunction             : Activation function used in both layers.
% dActivationFunction            : Derivative of the activation
% function used in both layers.
% numberOfHiddenUnits            : Number of hidden units.
% inputValues                    : Input values for training (784 x 60000)
% targetValues                   : Target values for training (1 x 60000)
% epochs                         : Number of epochs to train.
% batchSize                      : Plot error after batchSize images.
% learningRate                   : Learning rate to apply.
%
% OUTPUT:
% hiddenWeights                  : Weights of the hidden layer.
% outputWeights                  : Weights of the output layer.
%


The above method requires the activation function used for both the hidden layer and the output layer to be given as parameter. The logistic sigmoid defined by

$\sigma(z) = \frac{1}{1 + \exp(-z)}$

is a commonly used activation function and implemented in logisticSigmoid. In addition, the error backpropagation algorithm needs the derivative of the activation function which is implemented as dLogisticSigmoid.

function y = logisticSigmoid(x)
% simpleLogisticSigmoid Logistic sigmoid activation function
%
% INPUT:
% x     : Input vector.
%
% OUTPUT:
% y     : Output vector where the logistic sigmoid was applied element by
% element.
%

function y = dLogisticSigmoid(x)
% dLogisticSigmoid Derivative of the logistic sigmoid.
%
% INPUT:
% x     : Input vector.
%
% OUTPUT:
% y     : Output vector where the derivative of the logistic sigmoid was
% applied element by element.
%


### Usage and Validation

The method applyStochasticSquaredErrorTwoLayerPerceptronMNIST provides an example of how to use the above methods:
% Load MNIST dataset.

% Transform the labels to correct target values.
targetValues = 0.*ones(10, size(labels, 1));
for n = 1: size(labels, 1)
targetValues(labels(n) + 1, n) = 1;
end;

% Choose form of MLP:
numberOfHiddenUnits = 700;

% Choose appropriate parameters.
learningRate = 0.1;

% Choose activation function.
activationFunction = @logisticSigmoid;
dActivationFunction = @dLogisticSigmoid;

% Choose batch size and epochs. Remember there are 60k input values.
batchSize = 100;
epochs = 500;

fprintf('Train twolayer perceptron with %d hidden units.\n', numberOfHiddenUnits);
fprintf('Learning rate: %d.\n', learningRate);

[hiddenWeights, outputWeights, error] = trainStochasticSquaredErrorTwoLayerPerceptron(activationFunction, dActivationFunction, numberOfHiddenUnits, inputValues, targetValues, epochs, batchSize, learningRate);

% Choose decision rule.
fprintf('Validation:\n');

[correctlyClassified, classificationErrors] = validateTwoLayerPerceptron(activationFunction, hiddenWeights, outputWeights, inputValues, labels);

fprintf('Classification errors: %d\n', classificationErrors);
fprintf('Correctly classified: %d\n', correctlyClassified);


First the MNIST dataset needs to be loaded using the methods mentioned above (loadMNISTImages and loadMNISTLaels). The labels are provided as vector where the $i^{th}$ entry contains the digit represented by the $i^{th}$ image. We transform the labels to form a $10 \times N$ matrix, where $N$ is the number of training images, such that the $i^{th}$ entry of the $n^{th}$ column vector is $1$ iff the $n^{th}$ training image represents the digit $i - 1$.

The network is trained using the logistic sigmoid activation function, a fixed batch size and a fixed number of iterations. The training method trainStochasticSquaredErrorTwoLayerPerceptron returns the weights of the hidden layer and the output layer as well as the normalized sum-of-squared error after the last iteration.

The method validateTwoLayerPerceptron uses the network weights to count the number of classification errors on the validation set.

### Results

Some of the results after validating the two-layer perceptron on the provided validation set can be found in my seminar paper or in figure 3.

### References

• [1] David Stutz, Pavel Golik, Ralf Schlüter, and Hermann Ney. Introduction to Neural Networks. Seminar on Selected Topics in Human Language Technology and Pattern Recognition, 2014. PDF

What is your opinion on this article? Did you find it interesting or useful? Let me know your thoughts in the comments below or using the following platforms:

• masoud

hello dear i found my problem you have used apply method as a main function
please tell to me how i recognize how many digit has classified with one of unit in last layer(with plot a figure tanks

• masoud

on other hand i want plot a figure that it shows amount of weights in every iteration of apply input values

• masoud

Thank you for the your help and contributions

• davidstutz

I am not quite sure what you mean, but have a look at the validateTwoLayerPerceptron function. For a given digit, you can simply count the number of correctly classified samples with this digit. However, I do not understand what you want to plot – this is a single number, dividing it by the total number of samples with this particular digit you get a kind of in class accuracy.

• masoudtala20

hello tanks my dear how i draw a figure with plot for showing hidden unit AGINEST accuracy plot(weights,accuracy)

• davidstutz

Well, in the case you have multiple hidden units (e.g. roughly 300), then your hidden unit weights have dimensionality 300. So a simple (weights, accuracy) plot is not possible. You may only plot single weights against accuracy – however, you will see that this plot will not carry much information.

• myasuka

Hi, David.
In your code, you update the weight matrices after random selecting one item, executing propagating and backpropagating, which means you actually update the weight matrices epochs * batchSize times. I try to use batch size items during the training stage, which means just update the weight matrices epochs times.
I modify your code to make inputVector as 784×10 matrix not previous 784×1 vector, and set epoch as 500 or even 5000, the result is totally wrong. The correct pecent of classification is only about 10%.

Can you explain why, thanks.

• davidstutz

Hard to tell without seeing your adapted code, but it is not sufficient to change the dimensions of inputVector. Then you also need to adapt the update equations (and eventually the backpropagation equations). For example, lines 58-59 in trainStochasticTwoLayerTwoLayerPerceptron.m need to be adapted.

Hope this helps!

• M Faizan Khan

Thanx, Appreciate your work it helps me alot ……. I was working onit and need a working matlab code. Never use matlab before but your code is nicely written really helped.

• ketul parikh

Hi David

when i tried to convert this code into three layer code, I am facing one difficulty and that is the error should be less than 0.1 or more less but in my code I am finding it constantly fluctuation between 0.94 to 0.96. Here is my code for training the MNIST data using three layer network. It is the function to train the network, the only change is i tried for two hidden layers. Please help me to find out the loop hole. It would be the great help to me. THANK YOU

function [hiddenWeights1, hiddenWeights2, outputWeights] = trainStochasticSquaredErrorTwoLayerPerceptron1(activationFunction, dActivationFunction, numberOfHiddenUnits1, inputValues, targetValues, epochs, batchSize, learningRate)

% trainStochasticSquaredErrorTwoLayerPerceptron Creates a two-layer perceptron

% and trains it on the MNIST dataset.

%

% INPUT:

% activationFunction : Activation function used in both layers.

% dActivationFunction : Derivative of the activation

% function used in both layers.

% numberOfHiddenUnits1: Number of hidden units in first hidden layer.

% numberOfHiddenUnits2: Number of hidden units in second hidden layer.

% inputValues : Input values for training (784 x 60000)

% targetValues : Target values for training (10 x 60000)

% epochs : Number of epochs to train.

% batchSize : Plot error after batchSize images.

% learningRate : Learning rate to apply.

%

% OUTPUT:

% hiddenWeights1 : Weights of the hidden layer one

% hiddenWeights2 : Weights of the hidden layer two.

% outputWeights : Weights of the output layer.

%

numberOfHiddenUnits2=400;

% The number of training vectors.

trainingSetSize = size(inputValues, 2);

% Input vector has 784 dimensions.

inputDimensions = size(inputValues, 1);

% We have to distinguish 10 digits.

outputDimensions = size(targetValues, 1);

% Initialize the weights for the hidden layer and the output layer.

hiddenWeights1 = rand(numberOfHiddenUnits1, inputDimensions);

hiddenWeights2= rand(numberOfHiddenUnits2, numberOfHiddenUnits1);

outputWeights = rand(outputDimensions, numberOfHiddenUnits2);

hiddenWeights1 = hiddenWeights1./size(hiddenWeights1, 2);

hiddenWeights2 = hiddenWeights2./size(hiddenWeights2, 2);

outputWeights = outputWeights./size(outputWeights, 2);

n = zeros(batchSize);

figure; hold on;

for t = 1: epochs

display(t);

for k = 1: batchSize

% Select which input vector to train on.

n(k) = floor(rand(1)*trainingSetSize + 1);

% Propagate the input vector through the network.

inputVector = inputValues(:, n(k));

hiddenActualInput1 = hiddenWeights1*inputVector;

hiddenOutputVector1 = activationFunction(hiddenActualInput1);

hiddenActualInput2 = hiddenWeights2*hiddenOutputVector1;

hiddenOutputVector2 = activationFunction(hiddenActualInput2);

outputActualInput = outputWeights*hiddenOutputVector2;

outputVector = activationFunction(outputActualInput);

targetVector = targetValues(:, n(k));

% Backpropagate the errors.

outputDelta = dActivationFunction(outputActualInput).*(outputVector – targetVector);

hiddenDelta1 = dActivationFunction(hiddenActualInput2).*(outputWeights’*outputDelta);

hiddenDelta2 = dActivationFunction(hiddenActualInput1).*(hiddenWeights2’*hiddenDelta1);

outputWeights = outputWeights – learningRate.*((outputDelta*hiddenOutputVector2′));

hiddenWeights2 = hiddenWeights2 – learningRate.*((hiddenDelta1*hiddenOutputVector1′));

hiddenWeights1 = hiddenWeights1 – learningRate.*((hiddenDelta2*inputVector’));

end;

% Calculate the error for plotting.

error = zeros(epochs,1);

for k = 1: batchSize

inputVector = inputValues(:, n(k));

targetVector = targetValues(:, n(k));

error(t,1) = error(t,1) + norm(activationFunction(outputWeights*activationFunction(hiddenWeights2*activationFunction(hiddenWeights1*inputVector))) – targetVector, 2);

end;

error(t,1) = error(t,1)/batchSize;

plot(t, error(t,1),’*’);

end;

end