The past few weeks, I have been experimenting with the latest-and-greatest deep learning networks, all written in python, to decide which framework I could dive into an become an expert in. After looking at hebel, keras, chainer, and Lasagne, I decided to go with Lasagne because of the documentation and tutorials available online. The other frameworks are great, it just seemed like Lasagne currently has the most tutorials and the best docs.

In this blog post, I am going to show you how to use the Lasagne framework to train a neural network on the MNIST database. In later blog posts, I am going to use Lasagne to solve a variety of deep learning problems in natural language processing and computer vision.

All of my work was done on an GPU g2.8xlarge rented on Amazon AWS. Specifically, I was able to utilize the following AMI to do this work: ami-55deaf30.

To start with, I need to load the MNIST database into a format that Lasagne accepts, which happens to be numpy matrices. Luckily, I did not need tow write much code here because mnielsen did most of the work for me. I did end up writing a single method that utilized his code:

def load(testing=False):
if not testing:
training_inputs = [np.reshape(x, (784, 1)) for x in tr_d[0]]
training_results = [vectorized_result(y) for y in tr_d[1]]
X,y = np.array(training_inputs),np.array(training_results)
X = np.reshape(X,(50000, 784))
y = np.reshape(y,(50000, 10))
X = X.astype(np.float32)
y = y.astype(np.float32)
return (X,y)
else:
raise
"""
~~~~~~~~~~~~

A library to load the MNIST image data.  For details of the data
structures that are returned, see the doc strings for load_data
and load_data_wrapper.  In practice, load_data_wrapper is the
function usually called by our neural network code.
"""

#### Libraries
# Standard library
import cPickle
import gzip

# Third-party libraries
import numpy as np

"""Return the MNIST data as a tuple containing the training data,
the validation data, and the test data.

The training_data is returned as a tuple with two entries.
The first entry contains the actual training images.  This is a
numpy ndarray with 50,000 entries.  Each entry is, in turn, a
numpy ndarray with 784 values, representing the 28 * 28 = 784
pixels in a single MNIST image.

The second entry in the training_data tuple is a numpy ndarray
containing 50,000 entries.  Those entries are just the digit
values (0...9) for the corresponding images contained in the first
entry of the tuple.

The validation_data and test_data are similar, except
each contains only 10,000 images.

This is a nice data format, but for use in neural networks it's
helpful to modify the format of the training_data a little.
That's done in the wrapper function load_data_wrapper(), see
below.
"""
f = gzip.open('./data/mnist.pkl.gz', 'rb')
f.close()
return (training_data, validation_data, test_data)

"""Return a tuple containing (training_data, validation_data,
test_data). Based on load_data, but the format is more
convenient for use in our implementation of neural networks.

In particular, training_data is a list containing 50,000
2-tuples (x, y).  x is a 784-dimensional numpy.ndarray
containing the input image.  y is a 10-dimensional
numpy.ndarray representing the unit vector corresponding to the
correct digit for x.

validation_data and test_data are lists containing 10,000
2-tuples (x, y).  In each case, x is a 784-dimensional
numpy.ndarry containing the input image, and y is the
corresponding classification, i.e., the digit values (integers)
corresponding to x.

Obviously, this means we're using slightly different formats for
the training data and the validation / test data.  These formats
turn out to be the most convenient for use in our neural network
code."""
training_inputs = [np.reshape(x, (784, 1)) for x in tr_d[0]]
training_results = [vectorized_result(y) for y in tr_d[1]]
training_data = zip(training_inputs, training_results)
validation_inputs = [np.reshape(x, (784, 1)) for x in va_d[0]]
validation_data = zip(validation_inputs, va_d[1])
test_inputs = [np.reshape(x, (784, 1)) for x in te_d[0]]
test_data = zip(test_inputs, te_d[1])
return (training_data, validation_data, test_data)

if not testing:
training_inputs = [np.reshape(x, (784, 1)) for x in tr_d[0]]
training_results = [vectorized_result(y) for y in tr_d[1]]
X,y = np.array(training_inputs),np.array(training_results)
X = np.reshape(X,(50000, 784))
y = np.reshape(y,(50000, 10))
X = X.astype(np.float32)
y = y.astype(np.float32)
return (X,y)
else:
raise

def vectorized_result(j):
"""Return a 10-dimensional unit vector with a 1.0 in the jth
position and zeroes elsewhere.  This is used to convert a digit
(0...9) into a corresponding desired output from the neural
network."""
e = np.zeros((10, 1))
e[j] = 1.0
return e
from lasagne import layers;
from nolearn.lasagne import NeuralNet;
import numpy as np;

net1 = NeuralNet(
layers=[  # three layers: one hidden layer
('input', layers.InputLayer),
('hidden', layers.DenseLayer),
('output', layers.DenseLayer),
],
# layer parameters:
input_shape=(None, 784),  # 96x96 input pixels per batch
hidden_num_units=100,  # number of units in hidden layer
output_nonlinearity=None,  # output layer uses identity function
output_num_units=10,  # 30 target values

# optimization method:
update=nesterov_momentum,
update_learning_rate=0.01,
update_momentum=0.9,

regression=True,  # flag to indicate we're dealing with regression problem
max_epochs=400,  # we want to train this many epochs
verbose=1,
)

net1.fit(X, y)
# Neural Network with 79510 learnable parameters

## Layer information

#  name      size
---  ------  ------
0  input      784
1  hidden     100
2  output      10

epoch    train loss    valid loss    train/val  dur
-------  ------------  ------------  -----------  -----
1       0.06673       0.04498      1.48367  0.48s
2       0.04156       0.03616      1.14935  0.41s
3       0.03523       0.03184      1.10660  0.41s
4       0.03165       0.02910      1.08774  0.41s
5       0.02921       0.02713      1.07678  0.43s
6       0.02738       0.02560      1.06937  0.44s
7       0.02593       0.02437      1.06394  0.44s
8       0.02475       0.02336      1.05926  0.44s
9       0.02376       0.02251      1.05548  0.42s
10       0.02293       0.02179      1.05204  0.42s

......

391       0.00703       0.00939      0.74809  0.44s
392       0.00702       0.00939      0.74772  0.41s
393       0.00702       0.00939      0.74724  0.42s
394       0.00701       0.00939      0.74683  0.43s
395       0.00700       0.00938      0.74634  0.42s
396       0.00700       0.00938      0.74590  0.42s
397       0.00699       0.00938      0.74554  0.43s
398       0.00699       0.00938      0.74508  0.45s
399       0.00698       0.00938      0.74453  0.43s
400       0.00698       0.00937      0.74412  0.41s

TODO - add evaluation stuff here …