Mnist classification
%matplotlib inline
import numpy as np
import utils; reload(utils)
from utils import plots
from matplotlib import pyplot as plt
Download the MNIST dataset
from keras.datasets import mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()
Print some of the data
# Data shape
# show raw pixels
plt.imshow(x_train[0], cmap='gray')
# plot the first 10 images
(60000, 28, 28)
Prepare data for VGG like computations.
First we need to add a color channel to all images.
X = np.expand_dims(x_train, 1)
X_val = np.expand_dims(x_test, 1)
plots(X[:10], titles=y_train[:10])
(60000, 1, 28, 28)
Then we need to change the labels to one_hot_encodings
def one_hot_encoded(_y):
max_class = np.max(_y)
one_hot = np.zeros((_y.shape[0], max_class+1), dtype=np.float)
for i, clazz in enumerate(_y):
one_hot[i][clazz] = 1
return one_hot
y_ = one_hot_encoded(y_train)
[5 0 4 1 9 2 1 3 1 4]
[[ 0. 0. 0. 0. 0. 1. 0. 0. 0. 0.]
[ 1. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 1. 0. 0. 0. 0. 0.]
[ 0. 1. 0. 0. 0. 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0. 0. 0. 0. 0. 1.]
[ 0. 0. 1. 0. 0. 0. 0. 0. 0. 0.]
[ 0. 1. 0. 0. 0. 0. 0. 0. 0. 0.]
[ 0. 0. 0. 1. 0. 0. 0. 0. 0. 0.]
[ 0. 1. 0. 0. 0. 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 1. 0. 0. 0. 0. 0.]]
The same effect can be obtain by using the keras function bellow
from keras.utils.np_utils import to_categorical
y = to_categorical(y_train)
y_val = to_categorical(y_test)
assert np.allclose(y, y_)
We should also normalize the inputs by substracting the mean and dividing by the standard_deviation. The mean and std should be computed on all the features at once, because the goal is to make all the features be on the same order of magnitude (so that the training converges faster).
x_mean = np.mean(X).astype(np.float)
x_std = np.std(X).astype(np.float)
def normalize_input(X):
return (X - x_mean) / x_std
Build a first really simple model, a linear regressor
from keras.models import Sequential
from keras.layers.core import Lambda
from keras.layers import Dense, InputLayer, Flatten
from keras.optimizers import Adam
model = Sequential([
Lambda(normalize_input, input_shape=(1, 28, 28)),
Dense(10, activation='softmax')
model.compile(Adam(), loss='categorical_crossentropy', metrics=['accuracy']), y, nb_epoch=10, validation_data=(X_val, y_val))
Train on 60000 samples, validate on 10000 samples
Epoch 1/10
60000/60000 [==============================] - 4s - loss: 0.3911 - acc: 0.8854 - val_loss: 0.2899 - val_acc: 0.9159
Epoch 2/10
60000/60000 [==============================] - 3s - loss: 0.2993 - acc: 0.9153 - val_loss: 0.2902 - val_acc: 0.9158
Epoch 3/10
60000/60000 [==============================] - 3s - loss: 0.2881 - acc: 0.9190 - val_loss: 0.2794 - val_acc: 0.9243
Epoch 4/10
60000/60000 [==============================] - 3s - loss: 0.2801 - acc: 0.9215 - val_loss: 0.2932 - val_acc: 0.9185
Epoch 5/10
60000/60000 [==============================] - 3s - loss: 0.2774 - acc: 0.9227 - val_loss: 0.2764 - val_acc: 0.9235
Epoch 6/10
60000/60000 [==============================] - 3s - loss: 0.2742 - acc: 0.9244 - val_loss: 0.2835 - val_acc: 0.9210
Epoch 7/10
60000/60000 [==============================] - 3s - loss: 0.2715 - acc: 0.9250 - val_loss: 0.2967 - val_acc: 0.9173
Epoch 8/10
60000/60000 [==============================] - 3s - loss: 0.2684 - acc: 0.9250 - val_loss: 0.2881 - val_acc: 0.9211
Epoch 9/10
60000/60000 [==============================] - 3s - loss: 0.2674 - acc: 0.9253 - val_loss: 0.2882 - val_acc: 0.9225
Epoch 10/10
60000/60000 [==============================] - 3s - loss: 0.2660 - acc: 0.9255 - val_loss: 0.2871 - val_acc: 0.9231
<keras.callbacks.History at 0x7fb48115bf90>
A model with one hidden layer
model = Sequential([
Lambda(normalize_input, input_shape=(1, 28, 28)),
Dense(100, activation='sigmoid'),
Dense(10, activation='softmax')
model.compile(Adam(), loss='categorical_crossentropy', metrics=['accuracy']), y, validation_data=(X_val, y_val))
Train on 60000 samples, validate on 10000 samples
Epoch 1/10
60000/60000 [==============================] - 6s - loss: 0.3240 - acc: 0.9119 - val_loss: 0.1898 - val_acc: 0.9435
Epoch 2/10
60000/60000 [==============================] - 6s - loss: 0.1578 - acc: 0.9546 - val_loss: 0.1424 - val_acc: 0.9582
Epoch 3/10
60000/60000 [==============================] - 6s - loss: 0.1139 - acc: 0.9677 - val_loss: 0.1162 - val_acc: 0.9660
Epoch 4/10
60000/60000 [==============================] - 6s - loss: 0.0883 - acc: 0.9752 - val_loss: 0.1031 - val_acc: 0.9690
Epoch 5/10
60000/60000 [==============================] - 6s - loss: 0.0715 - acc: 0.9797 - val_loss: 0.1037 - val_acc: 0.9694
Epoch 6/10
60000/60000 [==============================] - 6s - loss: 0.0594 - acc: 0.9838 - val_loss: 0.0867 - val_acc: 0.9737
Epoch 7/10
60000/60000 [==============================] - 6s - loss: 0.0500 - acc: 0.9859 - val_loss: 0.0889 - val_acc: 0.9726
Epoch 8/10
60000/60000 [==============================] - 6s - loss: 0.0414 - acc: 0.9894 - val_loss: 0.0897 - val_acc: 0.9729
Epoch 9/10
60000/60000 [==============================] - 6s - loss: 0.0359 - acc: 0.9906 - val_loss: 0.0805 - val_acc: 0.9753
Epoch 10/10
60000/60000 [==============================] - 6s - loss: 0.0304 - acc: 0.9923 - val_loss: 0.0829 - val_acc: 0.9756
<keras.callbacks.History at 0x7fb480d87bd0>
A model with convolutions
from keras.layers.convolutional import Convolution2D, MaxPooling2D
model = Sequential([
Lambda(normalize_input, input_shape=(1, 28, 28)),
Convolution2D(7, 3, 3, activation='relu', border_mode='same'),
Convolution2D(7, 3, 3, activation='relu', border_mode='same'),
MaxPooling2D(pool_size=(2,2), strides=(2,2), dim_ordering='tf'),
Convolution2D(14, 3, 3, activation='relu', border_mode='same'),
Convolution2D(14, 3, 3, activation='relu', border_mode='same'),
MaxPooling2D(pool_size=(2,2), strides=(2,2), dim_ordering='tf'),
Convolution2D(28, 3, 3, activation='relu', border_mode='same'),
Convolution2D(28, 3, 3, activation='relu', border_mode='same'),
Convolution2D(28, 3, 3, activation='relu', border_mode='same'),
MaxPooling2D(pool_size=(2,2), strides=(2,2), dim_ordering='tf'),
Convolution2D(56, 3, 3, activation='relu', border_mode='same'),
Convolution2D(56, 3, 3, activation='relu', border_mode='same'),
Convolution2D(56, 3, 3, activation='relu', border_mode='same'),
MaxPooling2D(pool_size=(2,2), strides=(2,2), dim_ordering='tf'),
Dense(100, activation='sigmoid'),
Dense(10, activation='softmax')
model.compile(Adam(), loss='categorical_crossentropy', metrics=['accuracy']), y, validation_data=(X_val, y_val), nb_epoch=1)
Train on 60000 samples, validate on 10000 samples
Epoch 1/1
60000/60000 [==============================] - 570s - loss: 0.2619 - acc: 0.9223 - val_loss: 0.1007 - val_acc: 0.9684
<keras.callbacks.History at 0x7fb47bcf8850>
Do some training data augmentation
from keras.preprocessing.image import ImageDataGenerator
gen = ImageDataGenerator(rotation_range=10, zoom_range=0.1, shear_range=0.1, dim_ordering='th')
batch_gen = gen.flow(X, y)
img, _ = next(batch_gen)
model.fit_generator(batch_gen, batch_gen.n, nb_epoch=1, validation_data=(X_val, y_val))
Epoch 1/1
60000/60000 [==============================] - 573s - loss: 0.1227 - acc: 0.9621 - val_loss: 0.1090 - val_acc: 0.9633
<keras.callbacks.History at 0x7fb47ac37550>
Add Dropout for better regularization
from keras.layers import Dense
from keras.layers.core import Dropout
from keras.layers.convolutional import Convolution2D, MaxPooling2D
model = Sequential([
Lambda(normalize_input, input_shape=(1, 28, 28)),
Convolution2D(10, 3, 3, activation='relu', border_mode='same'),
Convolution2D(10, 3, 3, activation='relu', border_mode='same'),
MaxPooling2D(pool_size=(2,2), strides=(2,2), dim_ordering='th'),
Convolution2D(20, 3, 3, activation='relu', border_mode='same'),
Convolution2D(20, 3, 3, activation='relu', border_mode='same'),
MaxPooling2D(pool_size=(2,2), strides=(2,2), dim_ordering='th'),
Convolution2D(40, 3, 3, activation='relu', border_mode='same'),
Convolution2D(40, 3, 3, activation='relu', border_mode='same'),
MaxPooling2D(pool_size=(2,2), strides=(2,2), dim_ordering='th'),
Dense(400, activation='relu'),
Dense(400, activation='relu'),
Dense(10, activation='softmax')
model.compile(Adam(lr=0.001), loss='categorical_crossentropy', metrics=['accuracy'])
model.fit_generator(batch_gen, batch_gen.n, nb_epoch=1, validation_data=(X_val, y_val))
Layer (type) Output Shape Param # Connected to
lambda_38 (Lambda) (None, 1, 28, 28) 0 lambda_input_37[0][0]
convolution2d_269 (Convolution2D (None, 10, 28, 28) 100 lambda_38[0][0]
convolution2d_270 (Convolution2D (None, 10, 28, 28) 910 convolution2d_269[0][0]
maxpooling2d_106 (MaxPooling2D) (None, 10, 14, 14) 0 convolution2d_270[0][0]
convolution2d_271 (Convolution2D (None, 20, 14, 14) 1820 maxpooling2d_106[0][0]
convolution2d_272 (Convolution2D (None, 20, 14, 14) 3620 convolution2d_271[0][0]
maxpooling2d_107 (MaxPooling2D) (None, 20, 7, 7) 0 convolution2d_272[0][0]
convolution2d_273 (Convolution2D (None, 40, 7, 7) 7240 maxpooling2d_107[0][0]
convolution2d_274 (Convolution2D (None, 40, 7, 7) 14440 convolution2d_273[0][0]
maxpooling2d_108 (MaxPooling2D) (None, 40, 3, 3) 0 convolution2d_274[0][0]
flatten_36 (Flatten) (None, 360) 0 maxpooling2d_108[0][0]
dense_86 (Dense) (None, 400) 144400 flatten_36[0][0]
dropout_44 (Dropout) (None, 400) 0 dense_86[0][0]
dense_87 (Dense) (None, 400) 160400 dropout_44[0][0]
dropout_45 (Dropout) (None, 400) 0 dense_87[0][0]
dense_88 (Dense) (None, 10) 4010 dropout_45[0][0]
Total params: 336,940
Trainable params: 336,940
Non-trainable params: 0
Epoch 1/1
60000/60000 [==============================] - 343s - loss: 0.2592 - acc: 0.9191 - val_loss: 0.0468 - val_acc: 0.9849
<keras.callbacks.History at 0x7fb44b2bba10>
Adding the batch normalisation
from keras.layers import Dense
from keras.layers.core import Dropout
from keras.layers.convolutional import Convolution2D, MaxPooling2D
from keras.layers.normalization import BatchNormalization
model = Sequential([
Lambda(normalize_input, input_shape=(1, 28, 28)),
Convolution2D(10, 3, 3, activation='relu', border_mode='same'),
Convolution2D(10, 3, 3, activation='relu', border_mode='same'),
MaxPooling2D(pool_size=(2,2), strides=(2,2), dim_ordering='th'),
Convolution2D(20, 3, 3, activation='relu', border_mode='same'),
Convolution2D(20, 3, 3, activation='relu', border_mode='same'),
MaxPooling2D(pool_size=(2,2), strides=(2,2), dim_ordering='th'),
Convolution2D(40, 3, 3, activation='relu', border_mode='same'),
Convolution2D(40, 3, 3, activation='relu', border_mode='same'),
MaxPooling2D(pool_size=(2,2), strides=(2,2), dim_ordering='th'),
Dense(400, activation='relu'),
Dense(400, activation='relu'),
Dense(10, activation='softmax')
model.compile(Adam(lr=0.001), loss='categorical_crossentropy', metrics=['accuracy'])
model.fit_generator(batch_gen, batch_gen.n, nb_epoch=1, validation_data=(X_val, y_val))
Layer (type) Output Shape Param # Connected to
lambda_41 (Lambda) (None, 1, 28, 28) 0 lambda_input_40[0][0]
convolution2d_287 (Convolution2D (None, 10, 28, 28) 100 lambda_41[0][0]
batchnormalization_17 (BatchNorm (None, 10, 28, 28) 40 convolution2d_287[0][0]
convolution2d_288 (Convolution2D (None, 10, 28, 28) 910 batchnormalization_17[0][0]
batchnormalization_18 (BatchNorm (None, 10, 28, 28) 40 convolution2d_288[0][0]
maxpooling2d_115 (MaxPooling2D) (None, 10, 14, 14) 0 batchnormalization_18[0][0]
convolution2d_289 (Convolution2D (None, 20, 14, 14) 1820 maxpooling2d_115[0][0]
batchnormalization_19 (BatchNorm (None, 20, 14, 14) 80 convolution2d_289[0][0]
convolution2d_290 (Convolution2D (None, 20, 14, 14) 3620 batchnormalization_19[0][0]
batchnormalization_20 (BatchNorm (None, 20, 14, 14) 80 convolution2d_290[0][0]
maxpooling2d_116 (MaxPooling2D) (None, 20, 7, 7) 0 batchnormalization_20[0][0]
convolution2d_291 (Convolution2D (None, 40, 7, 7) 7240 maxpooling2d_116[0][0]
batchnormalization_21 (BatchNorm (None, 40, 7, 7) 160 convolution2d_291[0][0]
convolution2d_292 (Convolution2D (None, 40, 7, 7) 14440 batchnormalization_21[0][0]
batchnormalization_22 (BatchNorm (None, 40, 7, 7) 160 convolution2d_292[0][0]
maxpooling2d_117 (MaxPooling2D) (None, 40, 3, 3) 0 batchnormalization_22[0][0]
flatten_39 (Flatten) (None, 360) 0 maxpooling2d_117[0][0]
dense_95 (Dense) (None, 400) 144400 flatten_39[0][0]
batchnormalization_23 (BatchNorm (None, 400) 1600 dense_95[0][0]
dropout_50 (Dropout) (None, 400) 0 batchnormalization_23[0][0]
dense_96 (Dense) (None, 400) 160400 dropout_50[0][0]
batchnormalization_24 (BatchNorm (None, 400) 1600 dense_96[0][0]
dropout_51 (Dropout) (None, 400) 0 batchnormalization_24[0][0]
dense_97 (Dense) (None, 10) 4010 dropout_51[0][0]
Total params: 340,700
Trainable params: 338,820
Non-trainable params: 1,880
Epoch 1/1
60000/60000 [==============================] - 802s - loss: 0.3070 - acc: 0.9103 - val_loss: 0.0619 - val_acc: 0.9814
<keras.callbacks.History at 0x7fb432456b10>
model.fit_generator(batch_gen, batch_gen.n, nb_epoch=1, validation_data=(X_val, y_val))
Epoch 1/1
60000/60000 [==============================] - 830s - loss: 0.0785 - acc: 0.9768 - val_loss: 0.0243 - val_acc: 0.9917
<keras.callbacks.History at 0x7fb45a341b10>
Other experiments with preprocessing input data
Adding all three color chanels
# Add two new color channels (for g, b set to 0) that we are going to expand the array on the 1 axis
colors = np.zeros((X.shape[0], 2, X.shape[2], X.shape[3]))
X = np.append(X, colors, axis=1)
Experiment with the numpy roll function. I makes take on axis and puts it into the specified final position (start=)
x = X[:3]
x = np.rollaxis(x, 2, 1) # get second axis and put into the first axis
x = np.rollaxis(x, 3, 2) # get fourth axis(index is 3) and put it onto the third axis (index is 2)
# TODO: Merging the three examples into the color channels axis:
# - we need the 3 to be the last axis
# - we need the 1 to ne the first axis (one final image)
x = np.rollaxis(x, 0, 3) # get fourth axis(index is 3) and put it onto the third axis (index is 2)
x = np.rollaxis(x, 3, 0) # get fourth axis(index is 3) and put it onto the third axis (index is 2)
# if we print now the image, we should see the three numbers initialy selected, merged into one
# each on a different channel (one on the red, one on the green, one on the blue)
(3, 1, 28, 28)
(3, 28, 1, 28)
(3, 28, 28, 1)
(28, 28, 3, 1)
(1, 28, 28, 3)