Skip to content

Latest commit

 

History

History
94 lines (73 loc) · 4.28 KB

README.md

File metadata and controls

94 lines (73 loc) · 4.28 KB

MNIST Handwritten Digit Classifier

An implementation of multilayer neural network using Python's numpy library. The implementation is a modified version of Michael Nielsen's implementation in Neural Networks and Deep Learning book.

Why a modified implementation ?

This book and Stanford's Machine Learning Course by Prof. Andrew Ng are recommended as good resources for beginners. At times, it got confusing to me while referring both resources:

Stanford Course uses MATLAB, which has 1-indexed vectors and matrices.
The book uses numpy library of Python, which has 0-indexed vectors and arrays.

Further more, some parameters of a neural network are not defined for the input layer, hence I didn't get a hang of implementation using Python. For example according to the book, the bias vector of second layer of neural network was referred as bias[0] as input layer(first layer) has no bias vector. So indexing got weird with numpy and MATLAB.

Brief Background:

For total beginners who landed up here before reading anything about Neural Networks:

Sigmoid Neuron

  • Usually, neural networks are made up of building blocks known as Sigmoid Neurons. These are named so because their output follows Sigmoid Function.
  • xj are inputs, which are weighted by wj weights and the neuron has its intrinsic bias b. The output of neuron is known as "activation ( a )".
  • A neural network is made up by stacking layers of neurons, and is defined by the weights of connections and biases of neurons. Activations are a result dependent on a particular input.

Naming and Indexing Convention:

I have followed a particular convention in indexing quantities. Dimensions of quantities are listed according to this figure.

Small Labelled Neural Network

Layers

  • Input layer is the 0th layer, and output layer is the Lth layer. Number of layers: NL = L + 1.
sizes = [2, 3, 1]

Weights

  • Weights in this neural network implementation are a list of matrices (numpy.ndarrays). weights[l] is a matrix of weights entering the lth layer of the network (Denoted as wl).
  • An element of this matrix is denoted as wljk. It is a part of jth row, which is a collection of all weights entering jth neuron, from all neurons (0 to k) of (l-1)th layer.
  • No weights enter the input layer, hence weights[0] is redundant, and further it follows as weights[1] being the collection of weights entering layer 1 and so on.
weights = |¯   [[]],    [[a, b],    [[p],   ¯|
          |              [c, d],     [q],    |
          |_             [e, f]],    [r]]   _|

Biases

  • Biases in this neural network implementation are a list of one-dimensional vectors (numpy.ndarrays). biases[l] is a vector of biases of neurons in the lth layer of network (Denoted as bl).
  • An element of this vector is denoted as blj. It is a part of jth row, the bias of jth in layer.
  • Input layer has no biases, hence biases[0] is redundant, and further it follows as biases[1] being the biases of neurons of layer 1 and so on.
biases = |¯   [[],    [[0],    [[0]]   ¯|
         |     []],    [1],             |
         |_            [2]],           _|

'Z's

  • For input vector x to a layer l, z is defined as: zl = wl . x + bl
  • Input layer provides x vector as input to layer 1, and itself has no input, weight or bias, hence zs[0] is redundant.
  • Dimensions of zs will be same as biases.

Activations

  • Activations of lth layer are outputs from neurons of lth which serve as input to (l+1)th layer. The dimensions of biases, zs and activations are similar.
  • Input layer provides x vector as input to layer 1, hence activations[0] can be related to x - the input training example.