Text contains 88584 unique words

Review 0: the this film was just brilliant casting location scenery story direction everyone's really suited the part they played and you could just imagine being there robert redford's is an amazing actor and now the same being director norman's father came from the same scottish island as myself so i loved

Review 5: the begins better than it ends funny that the russian submarine crew outperforms all other actors it's like those scenes where documentary shots br br spoiler part the message dechifered was contrary to the whole story it just does not mesh br br

Review 10: the french horror cinema has seen something of a revival over the last couple of years with great films such as inside and switchblade romance bursting on to the scene maléfique preceded the revival just slightly but stands head and shoulders over most modern horror titles and is surely one

Review 3: the the scots excel at storytelling the traditional sort many years after the event i can still see in my mind's eye an elderly lady my friend's mother retelling the battle of culloden she makes the characters come alive her passion is that of an eye witness one to the events on the sodden heath a mile or so from where she lives br br of course it happened many years before she was born but you wouldn't guess from

Encoded review:  [1, 1, 18606, 16082, 30, 2801, 1, 2037, 429, 108, 150, 100, 1, 1491, 10, 67, 128, 64, 8, 58, 15302, 741, 32, 3712, 758, 58, 5763, 449, 9211, 1, 982, 4, 64314, 56, 163, 1, 102, 213, 1236, 38, 1794, 6, 12, 4, 32, 741, 2410, 28, 5, 1, 684, 20, 1, 33926, 7336, 3, 3690, 39, 35, 36, 118, 56, 453, 7, 7, 4, 262, 9, 572, 108, 150, 156, 56, 13, 1444, 18, 22, 583, 479, 36]

One-hot-encoded review:  [0. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 0. 1. 0. 1. 1. 1. 0. 1. 1.
 1. 0. 1. 1. 1. 1. 1. 0. 1. 1. 1. 1. 1. 1. 1. 1. 0. 1. 0. 1. 1. 1. 1. 1.
 1. 0. 1. 1. 0. 0. 0. 1. 1. 1. 1. 0. 0. 1. 1. 0. 1. 0. 0. 1. 1. 0. 0. 0.
 1. 0. 0. 0. 0. 1. 0. 1.]

Vocabulary (feature names) after fit: ['actor' 'amazing' 'an' 'and' 'are' 'as' 'bad' 'be' 'being' 'best' 'big'
 'boobs' 'brilliant' 'but' 'came' 'casting' 'cheesy' 'could' 'describe'
 'direction' 'director' 'ever' 'everyone' 'father' 'film' 'from' 'giant'
 'got' 'had' 'hair' 'horror' 'hundreds' 'imagine' 'is' 'island' 'just'
 'location' 'love' 'loved' 'made' 'movie' 'movies' 'music' 'myself'
 'norman' 'now' 'of' 'on' 'paper' 'part' 'pin' 'played' 'plot' 'really'
 'redford' 'ridiculous' 'robert' 'safety' 'same' 'scenery' 'scottish'
 'seen' 'so' 'story' 'suited' 'terrible' 'the' 'there' 'these' 'they'
 'thin' 'this' 'to' 've' 'was' 'words' 'worst' 'you']
Count encoding doc 1: [1 1 1 2 0 1 0 0 2 0 0 0 1 0 1 1 0 1 0 1 1 0 1 1 1 1 0 0 0 0 0 0 1 1 1 2 1
 0 1 0 0 0 0 1 1 1 0 0 0 1 0 1 0 1 1 0 1 0 2 1 1 0 1 1 1 0 4 1 0 1 0 1 0 0
 1 0 0 1]
Count encoding doc 2: [0 0 0 3 1 0 1 1 0 1 2 1 0 1 0 0 1 0 1 0 0 1 0 0 0 0 1 1 1 1 1 1 0 1 0 0 0
 1 0 1 1 1 1 0 0 0 1 1 1 0 1 0 1 0 0 1 0 1 0 0 0 1 0 0 0 1 4 0 1 0 1 2 2 1
 0 1 1 0]

Logistic regression accuracy: 0.8538

model.add(layers.Dense(16, activation='relu', input_shape=(10000,)))
model.add(layers.Dropout(0.5))
model.add(layers.Dense(16, activation='relu'))
model.add(layers.Dropout(0.5))
model.add(layers.Dense(1, activation='sigmoid'))
model.compile(optimizer='rmsprop', loss='binary_crossentropy', metrics=['accuracy'])

782/782 [==============================] - 2s 2ms/step
Review 0:  ? please give this one a miss br br ? ? and the rest of the cast rendered terrible performances the show is flat flat flat br br i don't know how michael madison could have allowed this one on his plate he almost seemed to know this wasn't going to work out and his performance was quite ? so all you madison fans give this a miss
Predicted positiveness:  [0.046]

Review 16:  ? from 1996 first i watched this movie i feel never reach the end of my satisfaction i feel that i want to watch more and more until now my god i don't believe it was ten years ago and i can believe that i almost remember every word of the dialogues i love this movie and i love this novel absolutely perfection i love willem ? he has a strange voice to spell the words black night and i always say it for many times never being bored i love the music of it's so much made me come into another world deep in my heart anyone can feel what i feel and anyone could make the movie like this i don't believe so thanks thanks
Predicted positiveness:  [0.956]

max_length = 100 # pad documents to a maximum number of words
vocab_size = 10000 # vocabulary size
embedding_length = 20 # embedding length (more would be better)

model = models.Sequential()
model.add(layers.Embedding(vocab_size, embedding_length, input_length=max_length))
model.add(layers.GlobalAveragePooling1D())
model.add(layers.Dense(1, activation='sigmoid'))

array([-0.5  , -0.708,  0.554,  0.673,  0.225,  0.603, -0.262,  0.739,
       -0.654, -0.216, -0.338,  0.245, -0.515,  0.857, -0.372, -0.588,
        0.306, -0.307, -0.219,  0.784, -0.619, -0.549,  0.431, -0.027,
        0.976,  0.462,  0.115, -0.998,  1.066, -0.208,  0.532,  0.409,
        1.041,  0.249,  0.187,  0.415, -0.954,  0.368, -0.379, -0.68 ,
       -0.146, -0.201,  0.171, -0.557,  0.719,  0.07 , -0.236,  0.495,
        1.158, -0.051,  0.257, -0.091,  1.266,  1.105, -0.516, -2.003,
       -0.648,  0.164,  0.329,  0.048,  0.19 ,  0.661,  0.081,  0.336,
        0.228,  0.146, -0.51 ,  0.638,  0.473, -0.328,  0.084, -0.785,
        0.099,  0.039,  0.279,  0.117,  0.579,  0.044, -0.16 , -0.353,
       -0.049, -0.325,  1.498,  0.581, -1.132, -0.607, -0.375, -1.181,
        0.801, -0.5  , -0.166, -0.706,  0.43 ,  0.511, -0.803, -0.666,
       -0.637, -0.36 ,  0.133, -0.561], dtype=float32)

embedding_layer = layers.Embedding(input_dim=10000, output_dim=100,
                                   input_length=max_length, trainable=False)
embedding_layer.set_weights([weights]) # set pre-trained weigths
model = models.Sequential([
    embedding_layer, layers.GlobalAveragePooling1D(),
    layers.Dense(1, activation='sigmoid')]

model = models.Sequential([
        embedding_layer,
        layers.Conv1D(32, 7, activation='relu'),
        layers.MaxPooling1D(5),
        layers.Conv1D(32, 7, activation='relu'),
        layers.GlobalAveragePooling1D(),
        layers.Dense(1, activation='sigmoid')]

157/157 [==============================] - 3s 18ms/step - loss: 0.0807 - accuracy: 0.9932 - val_loss: 1.5570 - val_accuracy: 0.8172

Lecture 10. Neural Networks for text¶

Overview¶

Bag of word representation¶

Word counts¶

Classification¶

Neural networks on bag of words¶

Evaluation¶

Predictions¶

Word Embeddings¶

Learning embeddings from scratch¶

Pre-trained embeddings¶

Word2Vec¶

Word2Vec properties¶

Doc2Vec¶

FastText¶

Global Vector model (GloVe)¶

Sequence-to-sequence (seq2seq) models¶

seq2seq models¶

1D convolutional networks¶

Recurrent neural networks (RNNs)¶

Simple self-attention¶

Simple self-attention (2)¶

Simple self-attention layer¶

Simple self-attention layer¶

Standard self-attention¶

Standard self-attention¶

Standard self-attention (2)¶

Standard self-attention (3)¶

Transformer model¶

Positional encoding¶

Summary¶