Lab 8: Object recognition with convolutional neural networks

Lab 8: Object recognition with convolutional neural networks#

In this lab we consider the CIFAR dataset, but model it using convolutional neural networks instead of linear models. There is no separate tutorial, but you can find lots of examples in the lecture notebook on convolutional neural networks.

Tip: You can run these exercises faster on a GPU (but they will also run fine on a CPU). If you do not have a GPU locally, you can upload this notebook to Google Colab. You can enable GPU support at “runtime” -> “change runtime type”.

import tensorflow as tf
import os
tf.config.experimental.list_physical_devices('GPU') # Check whether GPUs are available
os.environ['TF_CPP_MIN_LOG_LEVEL'] = "2"

%matplotlib inline
import openml as oml
import matplotlib.pyplot as plt

# Download CIFAR data. Takes a while the first time.
# This version returns 3x32x32 resolution images. 
# If you feel like it, repeat the exercises with the 96x96x3 resolution version by using ID 41103 
cifar = oml.datasets.get_dataset(40926) 
X, y, _, _ = cifar.get_data(target=cifar.default_target_attribute, dataset_format='array'); 
cifar_classes = {0: "airplane", 1: "automobile", 2: "bird", 3: "cat", 4: "deer",
                 5: "dog", 6: "frog", 7: "horse", 8: "ship", 9: "truck"}

# The data is in a weird 3x32x32 format, we need to reshape and transpose
Xr = X.reshape((len(X),3,32,32)).transpose(0,2,3,1)

# Take some random examples, reshape to a 32x32 image and plot
from random import randint
fig, axes = plt.subplots(1, 5,  figsize=(10, 5))
for i in range(5):
    n = randint(0,len(Xr))
    # The data is stored in a 3x32x32 format, so we need to transpose it
    axes[i].imshow(Xr[n]/255)
    axes[i].set_xlabel((cifar_classes[int(y[n])]))
    axes[i].set_xticks(()), axes[i].set_yticks(())
plt.show();

../_images/5cd0fac758103966139dcee0c86a484fee4faef93740678145a345e0f143ffd6.png

Exercise 1: A simple model#

Split the data into 80% training and 20% validation sets
Normalize the data to [0,1]
Build a ConvNet with 3 convolutional layers interspersed with MaxPooling layers, and one dense layer.
- Use at least 32 filters in the first layer and ReLU activation.
- Otherwise, make rational design choices or experiment a bit to see what works.
You should at least get 60% accuracy.
For training, you can try batch sizes of 64, and 20-50 epochs, but feel free to explore this as well
Plot and interpret the learning curves

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(Xr,y, stratify=y, train_size=0.8)

from tensorflow.keras.utils import to_categorical
X_train = X_train / 255.
X_test = X_test / 255.
y_train = to_categorical(y_train)
y_test = to_categorical(y_test)

from tensorflow.keras import layers
from tensorflow.keras import models

model = models.Sequential()
model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(32, 32, 3)))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.Flatten())
model.add(layers.Dense(64, activation='relu'))
model.add(layers.Dense(10, activation='softmax'))
model.compile(optimizer='rmsprop',
              loss='categorical_crossentropy',
              metrics=['accuracy'])
history = model.fit(X_train, y_train, epochs=25, batch_size=64, verbose=0,
                    validation_data=(X_test, y_test))

Metal device set to: Apple M1 Pro

import pandas as pd
import numpy as np
pd.DataFrame(history.history).plot(lw=2,style=['b:','r:','b-','r-']);
print("Max val_acc",np.max(history.history['val_accuracy']))

Max val_acc 0.6492500305175781

../_images/6ec37b1351ae0c260a0ccef804a3eb0452935a8ea395cb3b4c60373087fceafd.png

Already decent performance but the model starts overfitting heavily after epoch 15.

Exercise 2: VGG-like model#

Mimic the VGG model by building 3 ‘blocks’ of 2 convolutional layers each
Do MaxPooling after each block
The first layer should have at least 32 filters
Use zero-padding to be able to build a deeper model
Use a dense layer with at least 128 hidden nodes.
Plot and interpret the learning curves

model = models.Sequential()
model.add(layers.Conv2D(32, (3, 3), activation='relu', padding='same', input_shape=(32, 32, 3)))
model.add(layers.Conv2D(32, (3, 3), activation='relu', padding='same'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu', padding='same'))
model.add(layers.Conv2D(64, (3, 3), activation='relu', padding='same'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(128, (3, 3), activation='relu', padding='same'))
model.add(layers.Conv2D(128, (3, 3), activation='relu', padding='same'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Flatten())
model.add(layers.Dense(128, activation='relu'))
model.add(layers.Dense(10, activation='softmax'))
model.compile(optimizer='rmsprop',
              loss='categorical_crossentropy',
              metrics=['accuracy'])
history = model.fit(X_train, y_train, epochs=25, batch_size=64, verbose=0,
                    validation_data=(X_test, y_test))

pd.DataFrame(history.history).plot(lw=2,style=['b:','r:','b-','r-']);
print("Max val_acc",np.max(history.history['val_accuracy']))

Max val_acc 0.6827500462532043

../_images/dd2fbf7d0ca958ff5941d89f3d6656aae6878371962b8a1d524c46205104c92d.png

Better result, but still overfitting heavily

Exercise 3: Regularization#

Explore different ways to regularize your VGG-like model
- Try adding some dropout after every MaxPooling and Dense layer.
  - What are good Dropout rates?
- Try batch nornmalization together with Dropout
Plot and interpret the learning curves

model = models.Sequential()
model.add(layers.Conv2D(32, (3, 3), activation='relu', padding='same', input_shape=(32, 32, 3)))
model.add(layers.Conv2D(32, (3, 3), activation='relu', padding='same'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Dropout(0.2))
model.add(layers.Conv2D(64, (3, 3), activation='relu', padding='same'))
model.add(layers.Conv2D(64, (3, 3), activation='relu', padding='same'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Dropout(0.2))
model.add(layers.Conv2D(128, (3, 3), activation='relu', padding='same'))
model.add(layers.Conv2D(128, (3, 3), activation='relu', padding='same'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Dropout(0.2))
model.add(layers.Flatten())
model.add(layers.Dense(128, activation='relu'))
model.add(layers.Dropout(0.2))
model.add(layers.Dense(10, activation='softmax'))
model.compile(optimizer='rmsprop',
              loss='categorical_crossentropy',
              metrics=['accuracy'])
history = model.fit(X_train, y_train, epochs=25, batch_size=64, verbose=0,
                    validation_data=(X_test, y_test))

pd.DataFrame(history.history).plot(lw=2,style=['b:','r:','b-','r-']);
print("Max val_acc",np.max(history.history['val_accuracy']))

Max val_acc 0.7305000424385071

../_images/631f2b8e63972f0c8ed640f9ba510cd3f58c649f316abeb3f288603c946339d3.png

Accuracy is quite a bit better and overfitting seems lessened

Another common approach is to gradually increase the amount of dropout. This forces layers deep in the model to regularize more than layers closer to the input.

model = models.Sequential()
model.add(layers.Conv2D(32, (3, 3), activation='relu', padding='same', input_shape=(32, 32, 3)))
model.add(layers.Conv2D(32, (3, 3), activation='relu', padding='same'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Dropout(0.2))
model.add(layers.Conv2D(64, (3, 3), activation='relu', padding='same'))
model.add(layers.Conv2D(64, (3, 3), activation='relu', padding='same'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Dropout(0.3))
model.add(layers.Conv2D(128, (3, 3), activation='relu', padding='same'))
model.add(layers.Conv2D(128, (3, 3), activation='relu', padding='same'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Dropout(0.4))
model.add(layers.Flatten())
model.add(layers.Dense(128, activation='relu'))
model.add(layers.Dropout(0.5))
model.add(layers.Dense(10, activation='softmax'))
model.compile(optimizer='rmsprop',
              loss='categorical_crossentropy',
              metrics=['accuracy'])
history = model.fit(X_train, y_train, epochs=25, batch_size=64, verbose=0,
                    validation_data=(X_test, y_test))

pd.DataFrame(history.history).plot(lw=2,style=['b:','r:','b-','r-']);
print("Max val_acc",np.max(history.history['val_accuracy']))

Max val_acc 0.7412500381469727

../_images/126c43a647c246c4904891c216c21612a9f9b2ff730b1172886cac992fbbd848.png

Slightly better accuracy and very little overfitting remains.

Next, we try adding Batch Normalization.

model = models.Sequential()
model.add(layers.Conv2D(32, (3, 3), activation='relu', padding='same', input_shape=(32, 32, 3)))
model.add(layers.BatchNormalization())
model.add(layers.Conv2D(32, (3, 3), activation='relu', padding='same'))
model.add(layers.BatchNormalization())
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Dropout(0.2))
model.add(layers.Conv2D(64, (3, 3), activation='relu', padding='same'))
model.add(layers.BatchNormalization())
model.add(layers.Conv2D(64, (3, 3), activation='relu', padding='same'))
model.add(layers.BatchNormalization())
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Dropout(0.3))
model.add(layers.Conv2D(128, (3, 3), activation='relu', padding='same'))
model.add(layers.BatchNormalization())
model.add(layers.Conv2D(128, (3, 3), activation='relu', padding='same'))
model.add(layers.BatchNormalization())
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Dropout(0.4))
model.add(layers.Flatten())
model.add(layers.Dense(128, activation='relu'))
model.add(layers.BatchNormalization())
model.add(layers.Dropout(0.5))
model.add(layers.Dense(10, activation='softmax'))
model.compile(optimizer='rmsprop',
              loss='categorical_crossentropy',
              metrics=['accuracy'])
history = model.fit(X_train, y_train, epochs=25, batch_size=64, verbose=0,
                    validation_data=(X_test, y_test))

pd.DataFrame(history.history).plot(lw=2,style=['b:','r:','b-','r-']);
print("Max val_acc",np.max(history.history['val_accuracy']))

Max val_acc 0.7827500104904175

../_images/73110a91c076f14386e897031949c7bb8e0ab336716f4e392beca56456ac6a10.png

Exercise 4: Data Augmentation#

Perform image augmentation. You can use the ImageDataGenerator for this.
What is the effect? What is the effect with and without Dropout?
Plot and interpret the learning curves

from tensorflow.keras.preprocessing.image import ImageDataGenerator

train_datagen = ImageDataGenerator(
    width_shift_range=0.1,
    height_shift_range=0.1,
    horizontal_flip=True,)
it_train = train_datagen.flow(X_train, y_train, batch_size=64)

from tensorflow.keras import layers
from tensorflow.keras import models

model = models.Sequential()
model.add(layers.Conv2D(32, (3, 3), activation='relu', padding='same', input_shape=(32, 32, 3)))
model.add(layers.BatchNormalization())
model.add(layers.Conv2D(32, (3, 3), activation='relu', padding='same'))
model.add(layers.BatchNormalization())
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Dropout(0.2))
model.add(layers.Conv2D(64, (3, 3), activation='relu', padding='same'))
model.add(layers.BatchNormalization())
model.add(layers.Conv2D(64, (3, 3), activation='relu', padding='same'))
model.add(layers.BatchNormalization())
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Dropout(0.3))
model.add(layers.Conv2D(128, (3, 3), activation='relu', padding='same'))
model.add(layers.BatchNormalization())
model.add(layers.Conv2D(128, (3, 3), activation='relu', padding='same'))
model.add(layers.BatchNormalization())
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Dropout(0.4))
model.add(layers.Flatten())
model.add(layers.Dense(128, activation='relu'))
model.add(layers.BatchNormalization())
model.add(layers.Dropout(0.5))
model.add(layers.Dense(10, activation='softmax'))
model.compile(optimizer='rmsprop',
              loss='categorical_crossentropy',
              metrics=['accuracy'])

steps = int(X_train.shape[0] / 64)
history = model.fit(it_train, epochs=25, steps_per_epoch=steps, verbose=0,
                    validation_data=(X_test, y_test))

import pandas as pd
import numpy as np
pd.DataFrame(history.history).plot(lw=2,style=['b:','r:','b-','r-']);
print("Max val_acc",np.max(history.history['val_accuracy']))

Max val_acc 0.8022500276565552

../_images/4d0e96f82305c43bf0dd9c159249077ff7afbbbcd9bbd9906feb2d469934c7c9.png

We get 2-3% improvement. We get the best results with very subtle data augmentation (small shifts and flips). The images are quite low resolution and rotation or sheer will destroy too much information.

Exercise 5: Interpreting misclassifications#

Chances are that even your best models are not yet perfect. It is important to understand what kind of errors it still makes.

Run the test images through the network and detect all misclassified ones
Interpret the results. Are these misclassifications to be expected?
Compute the confusion matrix. Which classes are often confused?

y_pred = model.predict(X_test)
misclassified_samples = np.nonzero(np.argmax(y_test, axis=1) != np.argmax(y_pred, axis=1))[0]

Since we have numeric outputs (a value per class), we need to take the class with the maximum value as the predicted class.

# Visualize the (first five) misclassifications, together with the predicted and actual class
fig, axes = plt.subplots(1, 5,  figsize=(10, 5))
for nr, i in enumerate(misclassified_samples[:5]):
    axes[nr].imshow(X_test[i])
    axes[nr].set_xlabel("Predicted: %s,\n Actual : %s" % (cifar_classes[np.argmax(y_pred[i])],cifar_classes[np.argmax(y_test[i])]))
    axes[nr].set_xticks(()), axes[nr].set_yticks(())

plt.show();

../_images/8a25833aacb4d8ba2ec988125222c275c6ee44e79314bec95b3d8372df61c2fe.png

Some of these are indeed hard to categorize, although we can probably still improve the model quite a bit.

from sklearn.metrics import confusion_matrix
cm = confusion_matrix(np.argmax(y_test, axis=1),np.argmax(y_pred, axis=1))
fig, ax = plt.subplots()
im = ax.imshow(cm)
ax.set_xticks(np.arange(10)), ax.set_yticks(np.arange(10))
ax.set_xticklabels(list(cifar_classes.values()), rotation=45, ha="right")
ax.set_yticklabels(list(cifar_classes.values()))
ax.set_ylabel('True')
ax.set_xlabel('Predicted')
for i in range(100):
    ax.text(int(i/10),i%10,cm[i%10,int(i/10)], ha="center", va="center", color="w")

../_images/99e10c626f885233b6831a5ea20688e43723b19c6f5bb2f455287cdcaeb91e96.png

Most misclassifications seem to involve cats, birds, and horses. The most common misclassification is between cats and dogs.

Exercise 6: Interpreting the model#

Retrain your best model on all the data. Next, retrieve and visualize the activations (feature maps) for every filter for every layer, or at least for a few filters for every layer. Tip: see the course notebooks for examples on how to do this.

Interpret the results. Is your model indeed learning something useful?

model.summary()

Model: "sequential_5"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 conv2d_27 (Conv2D)          (None, 32, 32, 32)        896       
                                                                 
 batch_normalization_7 (Batc  (None, 32, 32, 32)       128       
 hNormalization)                                                 
                                                                 
 conv2d_28 (Conv2D)          (None, 32, 32, 32)        9248      
                                                                 
 batch_normalization_8 (Batc  (None, 32, 32, 32)       128       
 hNormalization)                                                 
                                                                 
 max_pooling2d_14 (MaxPoolin  (None, 16, 16, 32)       0         
 g2D)                                                            
                                                                 
 dropout_12 (Dropout)        (None, 16, 16, 32)        0         
                                                                 
 conv2d_29 (Conv2D)          (None, 16, 16, 64)        18496     
                                                                 
 batch_normalization_9 (Batc  (None, 16, 16, 64)       256       
 hNormalization)                                                 
                                                                 
 conv2d_30 (Conv2D)          (None, 16, 16, 64)        36928     
                                                                 
 batch_normalization_10 (Bat  (None, 16, 16, 64)       256       
 chNormalization)                                                
                                                                 
 max_pooling2d_15 (MaxPoolin  (None, 8, 8, 64)         0         
 g2D)                                                            
                                                                 
 dropout_13 (Dropout)        (None, 8, 8, 64)          0         
                                                                 
 conv2d_31 (Conv2D)          (None, 8, 8, 128)         73856     
                                                                 
 batch_normalization_11 (Bat  (None, 8, 8, 128)        512       
 chNormalization)                                                
                                                                 
 conv2d_32 (Conv2D)          (None, 8, 8, 128)         147584    
                                                                 
 batch_normalization_12 (Bat  (None, 8, 8, 128)        512       
 chNormalization)                                                
                                                                 
 max_pooling2d_16 (MaxPoolin  (None, 4, 4, 128)        0         
 g2D)                                                            
                                                                 
 dropout_14 (Dropout)        (None, 4, 4, 128)         0         
                                                                 
 flatten_5 (Flatten)         (None, 2048)              0         
                                                                 
 dense_10 (Dense)            (None, 128)               262272    
                                                                 
 batch_normalization_13 (Bat  (None, 128)              512       
 chNormalization)                                                
                                                                 
 dropout_15 (Dropout)        (None, 128)               0         
                                                                 
 dense_11 (Dense)            (None, 10)                1290      
                                                                 
=================================================================
Total params: 552,874
Trainable params: 551,722
Non-trainable params: 1,152
_________________________________________________________________

from tensorflow.keras import models

img_tensor = X_test[4]
img_tensor = np.expand_dims(img_tensor, axis=0) 

# Extracts the outputs of the top 8 layers:
layer_outputs = [layer.output for layer in model.layers[:15]]
# Creates a model that will return these outputs, given the model input:
activation_model = models.Model(inputs=model.input, outputs=layer_outputs)

# This will return a list of 5 Numpy arrays:
# one array per layer activation
activations = activation_model.predict(img_tensor)

plt.rcParams['figure.dpi'] = 120
first_layer_activation = activations[0]

f, (ax1, ax2) = plt.subplots(1, 2, sharey=True)
ax1.imshow(img_tensor[0])
ax2.matshow(first_layer_activation[0, :, :, 2], cmap='viridis')
ax1.set_xticks([])
ax1.set_yticks([])
ax2.set_xticks([])
ax2.set_yticks([])
ax1.set_xlabel('Input image')
ax2.set_xlabel('Activation of filter 2');

../_images/d8334313d2ea420b16b72d0ccdcb3e63beb1bcbacc1bfbe9d0c7f88b0206b198.png

images_per_row = 16

layer_names = []
for layer in model.layers[:15]:
    layer_names.append(layer.name)

def plot_activations(layer_index, activations):
    start = layer_index
    end = layer_index+1
    # Now let's display our feature maps
    for layer_name, layer_activation in zip(layer_names[start:end], activations[start:end]):
        # This is the number of features in the feature map
        n_features = layer_activation.shape[-1]

        # The feature map has shape (1, size, size, n_features)
        size = layer_activation.shape[1]

        # We will tile the activation channels in this matrix
        n_cols = n_features // images_per_row
        display_grid = np.zeros((size * n_cols, images_per_row * size))

        # We'll tile each filter into this big horizontal grid
        for col in range(n_cols):
            for row in range(images_per_row):
                channel_image = layer_activation[0,
                                                 :, :,
                                                 col * images_per_row + row]
                # Post-process the feature to make it visually palatable
                channel_image -= channel_image.mean()
                channel_image /= channel_image.std()
                channel_image *= 64
                channel_image += 128
                channel_image = np.clip(channel_image, 0, 255).astype('uint8')
                display_grid[col * size : (col + 1) * size,
                             row * size : (row + 1) * size] = channel_image

        # Display the grid
        scale = 1. / size
        plt.figure(figsize=(scale * display_grid.shape[1],
                            scale * display_grid.shape[0]))
        plt.title("Activation of layer {} ({})".format(layer_index+1,layer_name))
        plt.grid(False)
        plt.imshow(display_grid, aspect='auto', cmap='viridis')

    plt.show()

plot_activations(0, activations);

/var/folders/0t/5d8ttqzd773fy0wq3h5db0xr0000gn/T/ipykernel_34025/2702396986.py:30: RuntimeWarning: invalid value encountered in true_divide
  channel_image /= channel_image.std()

../_images/6d2e74dfbd8c7e93d5ed4f6b27739aae995e92e4daa717584cc09998e85ea81f.png

plot_activations(2, activations);

../_images/a272da1091f9844fabe7354e5f331cfdd3fb162b05dc592914384a683e3d24f2.png

plot_activations(6, activations);

../_images/e9d1042a17161cf6ce477252343017f080f1f1bf0d35db4a0994cbec019ee34c.png

plot_activations(8, activations)

../_images/b50f7ff96c4de5defdb554c3b88e6a5f3181f5fe04027a8d5bb378c044bb795c.png

plot_activations(12, activations)

/var/folders/0t/5d8ttqzd773fy0wq3h5db0xr0000gn/T/ipykernel_34025/2702396986.py:30: RuntimeWarning: invalid value encountered in true_divide
  channel_image /= channel_image.std()

../_images/3f9ba637c5a5feea99676318929e0dd5666970e19352ee90d62fbc0fad83972b.png

Optional: Take it a step further#

Repeat the exercises, but now use a higher-resolution version of the CIFAR dataset (with OpenML ID 41103), or another version with 100 classes (with OpenML ID 41983). Good luck!