!pip install keras
Requirement already satisfied: keras in /usr/local/lib/python3.10/dist-packages (3.4.1)
Requirement already satisfied: absl-py in /usr/local/lib/python3.10/dist-packages (from keras) (1.4.0)
...
Keras has been around for a long time and in the early days I used to assemble networks with it. Before PyTorch and TensorFlow what they are today (we are September 2024). The ML scene has evolved tremendously. Due to this, code which used to work does not anymore and if you want to get anywhere you need to reconsider all the basics you thought you had in your fingers.
Below is the basic LSTM setup and I record it here in order to use it as a stepping stone in the next project. In this way I don’t have to rewire and scan documentation all over again next time.
Requirement already satisfied: keras in /usr/local/lib/python3.10/dist-packages (3.4.1)
Requirement already satisfied: absl-py in /usr/local/lib/python3.10/dist-packages (from keras) (1.4.0)
...
The following has to run before anything else to define the backend, in our case PyTorch. You can use TensorFlow is you prefer.
The MPS fallback is only needed if you run on Silicon and tells PyTorch to fallback to CPU is things are not implemented.
Keras has direct access to IMDB:
import numpy as np
from keras.preprocessing import sequence
from keras.models import Sequential
from keras.layers import Dense, Dropout, Embedding, LSTM
from keras.callbacks import EarlyStopping
from keras.datasets import imdb
n_words = 1000
(X_train, y_train), (X_test, y_test) = imdb.load_data(num_words=n_words)
print('Train seq: {}'.format(len(X_train)))
print('Test seq: {}'.format(len(X_train)))
print('Train example: {}'.format(X_train[0]))
print('Test example: {}'.format(X_test[0]))
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/imdb.npz
17464789/17464789 ━━━━━━━━━━━━━━━━━━━━ 0s 0us/step
Train seq: 25000
Test seq: 25000
Train example: [1, 14, 22, 16, 43, 530, 973, 2, 2, 65, 458, 2, 66, 2, 4, 173, 36, 256, 5, 25, 100, 43, 838, 112, 50, 670, 2, 9, 35, 480, 284, 5, 150, 4, 172, 112, 167, 2, 336, 385, 39, 4, 172, 2, 2, 17, 546, 38, 13, 447, 4, 192, 50, 16, 6, 147, 2, 19, 14, 22, 4, 2, 2, 469, 4, 22, 71, 87, 12, 16, 43, 530, 38, 76, 15, 13, 2, 4, 22, 17, 515, 17, 12, 16, 626, 18, 2, 5, 62, 386, 12, 8, 316, 8, 106, 5, 4, 2, 2, 16, 480, 66, 2, 33, 4, 130, 12, 16, 38, 619, 5, 25, 124, 51, 36, 135, 48, 25, 2, 33, 6, 22, 12, 215, 28, 77, 52, 5, 14, 407, 16, 82, 2, 8, 4, 107, 117, 2, 15, 256, 4, 2, 7, 2, 5, 723, 36, 71, 43, 530, 476, 26, 400, 317, 46, 7, 4, 2, 2, 13, 104, 88, 4, 381, 15, 297, 98, 32, 2, 56, 26, 141, 6, 194, 2, 18, 4, 226, 22, 21, 134, 476, 26, 480, 5, 144, 30, 2, 18, 51, 36, 28, 224, 92, 25, 104, 4, 226, 65, 16, 38, 2, 88, 12, 16, 283, 5, 16, 2, 113, 103, 32, 15, 16, 2, 19, 178, 32]
Test example: [1, 591, 202, 14, 31, 6, 717, 10, 10, 2, 2, 5, 4, 360, 7, 4, 177, 2, 394, 354, 4, 123, 9, 2, 2, 2, 10, 10, 13, 92, 124, 89, 488, 2, 100, 28, 2, 14, 31, 23, 27, 2, 29, 220, 468, 8, 124, 14, 286, 170, 8, 157, 46, 5, 27, 239, 16, 179, 2, 38, 32, 25, 2, 451, 202, 14, 6, 717]
The vector size the length of the vectors but this is unrelated to the range of values within the vector. This is somewhat mistakenly called the ‘input dimension’ input_num
below in the Embedding
layer.
How the length of the tensors affect outcome/accuracy is an interesting question on its own. For now, we will pad the vectors to 200:
Our network with LSTM looks like the following:
model = Sequential([
Embedding(n_words, 50, name="Embedding"),
Dropout(0.2, name = "Dropout 1"),
LSTM(100, dropout=0.2, recurrent_dropout=0.2, name = "LSTM"),
Dense(250, activation='relu', name = "Dense"),
Dropout(0.2, name = "Dropout 2"),
Dense(1, activation='sigmoid', name = "Sigmoid"),
])
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
model.summary()
Model: "sequential"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┓ ┃ Layer (type) ┃ Output Shape ┃ Param # ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━┩ │ Embedding (Embedding) │ ? │ 0 (unbuilt) │ ├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤ │ Dropout 1 (Dropout) │ ? │ 0 (unbuilt) │ ├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤ │ LSTM (LSTM) │ ? │ 0 (unbuilt) │ ├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤ │ Dense (Dense) │ ? │ 0 (unbuilt) │ ├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤ │ Dropout 2 (Dropout) │ ? │ 0 (unbuilt) │ ├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤ │ Sigmoid (Dense) │ ? │ 0 (unbuilt) │ └──────────────────────────────────────┴─────────────────────────────┴─────────────────┘
Total params: 0 (0.00 B)
Trainable params: 0 (0.00 B)
Non-trainable params: 0 (0.00 B)
callbacks = [EarlyStopping(monitor='val_accuracy', patience=3, mode="max")]
batch_size = 128
n_epochs = 100
model.fit(X_train, y_train, batch_size=batch_size, epochs=n_epochs, validation_split=0.2, callbacks=callbacks)
print('Accuracy on test set: {}'.format(model.evaluate(X_test, y_test)[1]))
Epoch 1/100
157/157 ━━━━━━━━━━━━━━━━━━━━ 59s 354ms/step - accuracy: 0.6012 - loss: 0.6486 - val_accuracy: 0.7794 - val_loss: 0.4697
Epoch 2/100
157/157 ━━━━━━━━━━━━━━━━━━━━ 49s 311ms/step - accuracy: 0.8050 - loss: 0.4342 - val_accuracy: 0.7854 - val_loss: 0.4532
Epoch 3/100
157/157 ━━━━━━━━━━━━━━━━━━━━ 48s 303ms/step - accuracy: 0.8158 - loss: 0.4184 - val_accuracy: 0.8240 - val_loss: 0.4078
Epoch 4/100
157/157 ━━━━━━━━━━━━━━━━━━━━ 47s 299ms/step - accuracy: 0.8226 - loss: 0.4063 - val_accuracy: 0.8316 - val_loss: 0.3862
Epoch 5/100
157/157 ━━━━━━━━━━━━━━━━━━━━ 47s 302ms/step - accuracy: 0.8309 - loss: 0.3955 - val_accuracy: 0.8132 - val_loss: 0.4175
Epoch 6/100
157/157 ━━━━━━━━━━━━━━━━━━━━ 47s 302ms/step - accuracy: 0.8317 - loss: 0.3896 - val_accuracy: 0.8222 - val_loss: 0.3975
Epoch 7/100
157/157 ━━━━━━━━━━━━━━━━━━━━ 48s 308ms/step - accuracy: 0.8435 - loss: 0.3702 - val_accuracy: 0.8280 - val_loss: 0.3951
782/782 ━━━━━━━━━━━━━━━━━━━━ 122s 156ms/step - accuracy: 0.8290 - loss: 0.3906
Accuracy on test set: 0.8313199877738953
Another fun example of LSTM is simple additions, it’s an example we also assembled in Wolfram. It’s interesting to note that the LSTM layer is sufficient, nothing dense, dropout or anything.
import numpy as np
from keras.models import Sequential
from keras.layers import LSTM, Dense
# Generate data for addition
def generate_addition_data(num_samples, max_value):
"""Generates data for addition task."""
X = np.random.randint(0, max_value + 1, size=(num_samples, 2))
y = np.sum(X, axis=1)
return X, y
# Generate training and testing data
num_samples = 10000
max_value = 100
X_train, y_train = generate_addition_data(num_samples, max_value)
X_test, y_test = generate_addition_data(num_samples // 10, max_value)
# Reshape data for LSTM
X_train = X_train.reshape(X_train.shape[0], X_train.shape[1], 1)
X_test = X_test.reshape(X_test.shape[0], X_test.shape[1], 1)
# Create LSTM model
model = Sequential()
model.add(LSTM(32, input_shape=(X_train.shape[1], X_train.shape[2])))
model.add(Dense(1))
# Compile the model
model.compile(loss='mean_squared_error', optimizer='adam')
# Train the model
model.fit(X_train, y_train, epochs=100, batch_size=32, validation_split=0.2)
# Evaluate the model
loss = model.evaluate(X_test, y_test)
print('Test Loss:', loss)
# Make predictions
predictions = model.predict(X_test)
print('Predictions:', predictions[:10])
print('Actual values:', y_test[:10])
Epoch 1/100
15/250 ━━━━━━━━━━━━━━━━━━━━ 1s 8ms/step - loss: 11907.7441
250/250 ━━━━━━━━━━━━━━━━━━━━ 2s 9ms/step - loss: 11319.8467 - val_loss: 9217.2393
Epoch 2/100
250/250 ━━━━━━━━━━━━━━━━━━━━ 2s 9ms/step - loss: 8795.1885 - val_loss: 7656.9336
Epoch 3/100
250/250 ━━━━━━━━━━━━━━━━━━━━ 3s 11ms/step - loss: 7350.7256 - val_loss: 6516.9985
...
Predictions: [[ 95.97442 ]
[103.975876]
[ 47.84532 ]
[144.85732 ]
[118.88151 ]
[157.88887 ]
[ 89.905106]
[ 67.921936]
[ 47.886024]
[ 38.99708 ]]
Actual values: [ 96 104 48 145 119 158 90 68 48 39]