Lab 7b: Neural Networks for text#

# Auto-setup when running on Google Colab
if 'google.colab' in str(get_ipython()):
    !pip install openml

# General imports
%matplotlib inline
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import openml as oml
import tensorflow as tf

Before you start, read the Tutorial for this lab (‘Deep Learning with Python’)

Exercise 1: Sentiment Analysis#

  • Take the IMDB dataset from keras.datasets with 10000 words and the default train-test-split

from tensorflow.keras.datasets import imdb
# Download IMDB data with 10000 most frequent words
word_index = imdb.get_word_index()
(train_data, train_labels), (test_data, test_labels) = imdb.load_data(num_words=10000)
reverse_word_index = dict([(value, key) for (key, value) in word_index.items()])

for i in [0,5,10]:
    print("Review {}:".format(i),' '.join([reverse_word_index.get(i - 3, '?') for i in train_data[i]][0:20]))
Review 0: ? this film was just brilliant casting location scenery story direction everyone's really suited the part they played and you
Review 5: ? begins better than it ends funny that the russian submarine crew ? all other actors it's like those scenes
Review 10: ? french horror cinema has seen something of a revival over the last couple of years with great films such
  • Vectorize the reviews using one-hot-encoding (see tutorial for helper code)

  • Build a network of 2 Dense layers with 16 nodes each and the ReLU activation function.

  • Use cross-entropy as the loss function, RMSprop as the optimizer, and accuracy as the evaluation metric.

  • Plot the learning curves, using the first 10000 samples as the validation set and the rest as the training set.

  • Use 20 epochs and a batch size of 512

  • Retrain the model, this time using early stopping to stop training at the optimal time

  • Evaluate on the test set and report the accuracy

  • Try to manually improve the score and explain what you observe. E.g. you could:

    • Try 3 hidden layers

    • Change to a higher learning rate (e.g. 0.4)

    • Try another optimizer (e.g. Adagrad)

    • Use more or fewer hidden units (e.g. 64)

    • tanh activation instead of ReLU

  • Tune the results by doing a grid search for the most interesting hyperparameters

    • Tune the learning rate between 0.001 and 1

    • Tune the number of epochs between 1 and 20

    • Use only 3-4 values for each

Exercise 2: Topic classification#

  • Take the Reuters dataset from keras.datasets with 10000 words and the default train-test-split

from keras.datasets import reuters

(train_data, train_labels), (test_data, test_labels) = reuters.load_data(num_words=10000)
word_index = reuters.get_word_index()
reverse_word_index = dict([(value, key) for (key, value) in word_index.items()])

for i in [0,5,10]:
    print("News wire {}:".format(i),
          ' '.join([reverse_word_index.get(i - 3, '?') for i in train_data[i]]))
    # Note that our indices were offset by 3
News wire 0: ? ? ? said as a result of its december acquisition of space co it expects earnings per share in 1987 of 1 15 to 1 30 dlrs per share up from 70 cts in 1986 the company said pretax net should rise to nine to 10 mln dlrs from six mln dlrs in 1986 and rental operation revenues to 19 to 22 mln dlrs from 12 5 mln dlrs it said cash flow per share this year should be 2 50 to three dlrs reuter 3
News wire 5: ? the u s agriculture department estimated canada's 1986 87 wheat crop at 31 85 mln tonnes vs 31 85 mln tonnes last month it estimated 1985 86 output at 24 25 mln tonnes vs 24 25 mln last month canadian 1986 87 coarse grain production is projected at 27 62 mln tonnes vs 27 62 mln tonnes last month production in 1985 86 is estimated at 24 95 mln tonnes vs 24 95 mln last month canadian wheat exports in 1986 87 are forecast at 19 00 mln tonnes vs 18 00 mln tonnes last month exports in 1985 86 are estimated at 17 71 mln tonnes vs 17 72 mln last month reuter 3
News wire 10: ? period ended december 31 shr profit 11 cts vs loss 24 cts net profit 224 271 vs loss 511 349 revs 7 258 688 vs 7 200 349 reuter 3
  • Vectorize the data and the labels using one-hot-encoding

  • Build a network with 2 dense layers of 64 nodes each

  • Make sensible choices about the activation functions, loss, …

  • Take a validation set from the first 1000 points of the training set

  • Fit the model with 20 epochs and a batch size of 512

  • Plot the learning curves

  • Create an information bottleneck: rebuild the model, but now use only 4 hidden units in the second layer. Evaluate the model. Does it still perform well?

Exercise 3: Regularization#

  • Go back to the IMDB dataset

  • Retrain with only 4 units per layer

  • Plot the results. What do you observe?

  • Use 16 hidden nodes in the layers again, but now add weight regularization. Use L2 loss with alpha=0.001. What do you observe?

  • Add a drop out layer after every dense layer. Use a dropout rate of 0.5. What do you observe?

Exercise 4: Word embeddings#

  • Instead of one-hot-encoding, use a word embedding of length 300

  • Only add an output layer after the Embedding layer.

  • Train the embedding as well as you can (takes time!)

    • Evaluate as before. Does it perform better?

  • Import a GloVe embedding pretrained om Wikipedia

    • Evaluate as before. Does it perform better?