Example: IMBD review database
Text contains 88584 unique words Review 0: the this film was just brilliant casting location scenery story direction everyone's really suited the part they played and you could just imagine being there robert redford's is an amazing actor and now the same being director norman's father came from the same scottish island as myself so i loved Review 5: the begins better than it ends funny that the russian submarine crew outperforms all other actors it's like those scenes where documentary shots br br spoiler part the message dechifered was contrary to the whole story it just does not mesh br br Review 10: the french horror cinema has seen something of a revival over the last couple of years with great films such as inside and switchblade romance bursting on to the scene maléfique preceded the revival just slightly but stands head and shoulders over most modern horror titles and is surely one
Review 3: the the scots excel at storytelling the traditional sort many years after the event i can still see in my mind's eye an elderly lady my friend's mother retelling the battle of culloden she makes the characters come alive her passion is that of an eye witness one to the events on the sodden heath a mile or so from where she lives br br of course it happened many years before she was born but you wouldn't guess from Encoded review: [1, 1, 18606, 16082, 30, 2801, 1, 2037, 429, 108, 150, 100, 1, 1491, 10, 67, 128, 64, 8, 58, 15302, 741, 32, 3712, 758, 58, 5763, 449, 9211, 1, 982, 4, 64314, 56, 163, 1, 102, 213, 1236, 38, 1794, 6, 12, 4, 32, 741, 2410, 28, 5, 1, 684, 20, 1, 33926, 7336, 3, 3690, 39, 35, 36, 118, 56, 453, 7, 7, 4, 262, 9, 572, 108, 150, 156, 56, 13, 1444, 18, 22, 583, 479, 36] One-hot-encoded review: [0. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 0. 1. 0. 1. 1. 1. 0. 1. 1. 1. 0. 1. 1. 1. 1. 1. 0. 1. 1. 1. 1. 1. 1. 1. 1. 0. 1. 0. 1. 1. 1. 1. 1. 1. 0. 1. 1. 0. 0. 0. 1. 1. 1. 1. 0. 0. 1. 1. 0. 1. 0. 0. 1. 1. 0. 0. 0. 1. 0. 0. 0. 0. 1. 0. 1.]
CountVectorizer
(on 2 documents)Vocabulary (feature names) after fit: ['actor' 'amazing' 'an' 'and' 'are' 'as' 'bad' 'be' 'being' 'best' 'big' 'boobs' 'brilliant' 'but' 'came' 'casting' 'cheesy' 'could' 'describe' 'direction' 'director' 'ever' 'everyone' 'father' 'film' 'from' 'giant' 'got' 'had' 'hair' 'horror' 'hundreds' 'imagine' 'is' 'island' 'just' 'location' 'love' 'loved' 'made' 'movie' 'movies' 'music' 'myself' 'norman' 'now' 'of' 'on' 'paper' 'part' 'pin' 'played' 'plot' 'really' 'redford' 'ridiculous' 'robert' 'safety' 'same' 'scenery' 'scottish' 'seen' 'so' 'story' 'suited' 'terrible' 'the' 'there' 'these' 'they' 'thin' 'this' 'to' 've' 'was' 'words' 'worst' 'you'] Count encoding doc 1: [1 1 1 2 0 1 0 0 2 0 0 0 1 0 1 1 0 1 0 1 1 0 1 1 1 1 0 0 0 0 0 0 1 1 1 2 1 0 1 0 0 0 0 1 1 1 0 0 0 1 0 1 0 1 1 0 1 0 2 1 1 0 1 1 1 0 4 1 0 1 0 1 0 0 1 0 0 1] Count encoding doc 2: [0 0 0 3 1 0 1 1 0 1 2 1 0 1 0 0 1 0 1 0 0 1 0 0 0 0 1 1 1 1 1 1 0 1 0 0 0 1 0 1 1 1 1 0 0 0 1 1 1 0 1 0 1 0 0 1 0 1 0 0 0 1 0 0 0 1 4 0 1 0 1 2 2 1 0 1 1 0]
Logistic regression accuracy: 0.8538
model.add(layers.Dense(16, activation='relu', input_shape=(10000,)))
model.add(layers.Dropout(0.5))
model.add(layers.Dense(16, activation='relu'))
model.add(layers.Dropout(0.5))
model.add(layers.Dense(1, activation='sigmoid'))
model.compile(optimizer='rmsprop', loss='binary_crossentropy', metrics=['accuracy'])
Let's look at a few predictions:
782/782 [==============================] - 2s 2ms/step Review 0: ? please give this one a miss br br ? ? and the rest of the cast rendered terrible performances the show is flat flat flat br br i don't know how michael madison could have allowed this one on his plate he almost seemed to know this wasn't going to work out and his performance was quite ? so all you madison fans give this a miss Predicted positiveness: [0.046] Review 16: ? from 1996 first i watched this movie i feel never reach the end of my satisfaction i feel that i want to watch more and more until now my god i don't believe it was ten years ago and i can believe that i almost remember every word of the dialogues i love this movie and i love this novel absolutely perfection i love willem ? he has a strange voice to spell the words black night and i always say it for many times never being bored i love the music of it's so much made me come into another world deep in my heart anyone can feel what i feel and anyone could make the movie like this i don't believe so thanks thanks Predicted positiveness: [0.956]
Let's try this:
max_length = 100 # pad documents to a maximum number of words
vocab_size = 10000 # vocabulary size
embedding_length = 20 # embedding length (more would be better)
model = models.Sequential()
model.add(layers.Embedding(vocab_size, embedding_length, input_length=max_length))
model.add(layers.GlobalAveragePooling1D())
model.add(layers.Dense(1, activation='sigmoid'))
Let's try this
array([-0.5 , -0.708, 0.554, 0.673, 0.225, 0.603, -0.262, 0.739, -0.654, -0.216, -0.338, 0.245, -0.515, 0.857, -0.372, -0.588, 0.306, -0.307, -0.219, 0.784, -0.619, -0.549, 0.431, -0.027, 0.976, 0.462, 0.115, -0.998, 1.066, -0.208, 0.532, 0.409, 1.041, 0.249, 0.187, 0.415, -0.954, 0.368, -0.379, -0.68 , -0.146, -0.201, 0.171, -0.557, 0.719, 0.07 , -0.236, 0.495, 1.158, -0.051, 0.257, -0.091, 1.266, 1.105, -0.516, -2.003, -0.648, 0.164, 0.329, 0.048, 0.19 , 0.661, 0.081, 0.336, 0.228, 0.146, -0.51 , 0.638, 0.473, -0.328, 0.084, -0.785, 0.099, 0.039, 0.279, 0.117, 0.579, 0.044, -0.16 , -0.353, -0.049, -0.325, 1.498, 0.581, -1.132, -0.607, -0.375, -1.181, 0.801, -0.5 , -0.166, -0.706, 0.43 , 0.511, -0.803, -0.666, -0.637, -0.36 , 0.133, -0.561], dtype=float32)
Same simple model, but with frozen GloVe embeddings: much worse!
embedding_layer = layers.Embedding(input_dim=10000, output_dim=100,
input_length=max_length, trainable=False)
embedding_layer.set_weights([weights]) # set pre-trained weigths
model = models.Sequential([
embedding_layer, layers.GlobalAveragePooling1D(),
layers.Dense(1, activation='sigmoid')]
Conv1D
layers and MaxPooling1D
. Better!model = models.Sequential([
embedding_layer,
layers.Conv1D(32, 7, activation='relu'),
layers.MaxPooling1D(5),
layers.Conv1D(32, 7, activation='relu'),
layers.GlobalAveragePooling1D(),
layers.Dense(1, activation='sigmoid')]
157/157 [==============================] - 3s 18ms/step - loss: 0.0807 - accuracy: 0.9932 - val_loss: 1.5570 - val_accuracy: 0.8172
${\color{orange} h_t} = \sigma \left( {\color{orange} W } \left[ \begin{array}{c} {\color{blue}x}_t \\ {\color{orange} h}_{t-1} \end{array} \right] + b \right)$
Acknowledgement
Several figures came from the excellent VU Deep Learning course