Keras LSTM setup

Graphs
LSTM network setup.

Keras LSTM

Open In Colab

Keras has been around for a long time and in the early days I used to assemble networks with it. Before PyTorch and TensorFlow what they are today (we are September 2024). The ML scene has evolved tremendously. Due to this, code which used to work does not anymore and if you want to get anywhere you need to reconsider all the basics you thought you had in your fingers.

Below is the basic LSTM setup and I record it here in order to use it as a stepping stone in the next project. In this way I don’t have to rewire and scan documentation all over again next time.

!pip install keras
Requirement already satisfied: keras in /usr/local/lib/python3.10/dist-packages (3.4.1)
Requirement already satisfied: absl-py in /usr/local/lib/python3.10/dist-packages (from keras) (1.4.0)
...

The following has to run before anything else to define the backend, in our case PyTorch. You can use TensorFlow is you prefer.

The MPS fallback is only needed if you run on Silicon and tells PyTorch to fallback to CPU is things are not implemented.

import os
import numpy as np
import torch

os.environ["KERAS_BACKEND"] = "torch"
os.environ["PYTORCH_ENABLE_MPS_FALLBACK"] = "1"
import keras

Keras has direct access to IMDB:

import numpy as np

from keras.preprocessing import sequence
from keras.models import Sequential
from keras.layers import Dense, Dropout, Embedding, LSTM
from keras.callbacks import EarlyStopping

from keras.datasets import imdb

n_words = 1000
(X_train, y_train), (X_test, y_test) = imdb.load_data(num_words=n_words)
print('Train seq: {}'.format(len(X_train)))
print('Test seq: {}'.format(len(X_train)))

print('Train example: {}'.format(X_train[0]))
print('Test example: {}'.format(X_test[0]))
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/imdb.npz
17464789/17464789 ━━━━━━━━━━━━━━━━━━━━ 0s 0us/step
Train seq: 25000
Test seq: 25000
Train example: [1, 14, 22, 16, 43, 530, 973, 2, 2, 65, 458, 2, 66, 2, 4, 173, 36, 256, 5, 25, 100, 43, 838, 112, 50, 670, 2, 9, 35, 480, 284, 5, 150, 4, 172, 112, 167, 2, 336, 385, 39, 4, 172, 2, 2, 17, 546, 38, 13, 447, 4, 192, 50, 16, 6, 147, 2, 19, 14, 22, 4, 2, 2, 469, 4, 22, 71, 87, 12, 16, 43, 530, 38, 76, 15, 13, 2, 4, 22, 17, 515, 17, 12, 16, 626, 18, 2, 5, 62, 386, 12, 8, 316, 8, 106, 5, 4, 2, 2, 16, 480, 66, 2, 33, 4, 130, 12, 16, 38, 619, 5, 25, 124, 51, 36, 135, 48, 25, 2, 33, 6, 22, 12, 215, 28, 77, 52, 5, 14, 407, 16, 82, 2, 8, 4, 107, 117, 2, 15, 256, 4, 2, 7, 2, 5, 723, 36, 71, 43, 530, 476, 26, 400, 317, 46, 7, 4, 2, 2, 13, 104, 88, 4, 381, 15, 297, 98, 32, 2, 56, 26, 141, 6, 194, 2, 18, 4, 226, 22, 21, 134, 476, 26, 480, 5, 144, 30, 2, 18, 51, 36, 28, 224, 92, 25, 104, 4, 226, 65, 16, 38, 2, 88, 12, 16, 283, 5, 16, 2, 113, 103, 32, 15, 16, 2, 19, 178, 32]
Test example: [1, 591, 202, 14, 31, 6, 717, 10, 10, 2, 2, 5, 4, 360, 7, 4, 177, 2, 394, 354, 4, 123, 9, 2, 2, 2, 10, 10, 13, 92, 124, 89, 488, 2, 100, 28, 2, 14, 31, 23, 27, 2, 29, 220, 468, 8, 124, 14, 286, 170, 8, 157, 46, 5, 27, 239, 16, 179, 2, 38, 32, 25, 2, 451, 202, 14, 6, 717]
print("Vector size; ",len(X_train[3]))
Vector size;  550

The vector size the length of the vectors but this is unrelated to the range of values within the vector. This is somewhat mistakenly called the ‘input dimension’ input_num below in the Embedding layer.

How the length of the tensors affect outcome/accuracy is an interesting question on its own. For now, we will pad the vectors to 200:

max_len = 200
X_train = sequence.pad_sequences(X_train, maxlen=max_len)
X_test = sequence.pad_sequences(X_test, maxlen=max_len)
print("Vector size after padding: ",len(X_train[3]))
Vector size after padding:  200

Our network with LSTM looks like the following:

model = Sequential([
    Embedding(n_words, 50, name="Embedding"),
    Dropout(0.2, name = "Dropout 1"),
    LSTM(100, dropout=0.2, recurrent_dropout=0.2, name = "LSTM"),
    Dense(250, activation='relu', name  = "Dense"),
    Dropout(0.2, name = "Dropout 2"),
    Dense(1, activation='sigmoid', name = "Sigmoid"),
    ])
model.compile(loss='binary_crossentropy',  optimizer='adam', metrics=['accuracy'])
model.summary()
Model: "sequential"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┓
┃ Layer (type)                          Output Shape                         Param # ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━┩
│ Embedding (Embedding)                │ ?                           │     0 (unbuilt) │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ Dropout 1 (Dropout)                  │ ?                           │     0 (unbuilt) │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ LSTM (LSTM)                          │ ?                           │     0 (unbuilt) │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ Dense (Dense)                        │ ?                           │     0 (unbuilt) │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ Dropout 2 (Dropout)                  │ ?                           │     0 (unbuilt) │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ Sigmoid (Dense)                      │ ?                           │     0 (unbuilt) │
└──────────────────────────────────────┴─────────────────────────────┴─────────────────┘
 Total params: 0 (0.00 B)
 Trainable params: 0 (0.00 B)
 Non-trainable params: 0 (0.00 B)
callbacks = [EarlyStopping(monitor='val_accuracy', patience=3, mode="max")]
batch_size = 128
n_epochs = 100
model.fit(X_train, y_train, batch_size=batch_size, epochs=n_epochs, validation_split=0.2, callbacks=callbacks)

print('Accuracy on test set: {}'.format(model.evaluate(X_test, y_test)[1]))
Epoch 1/100
157/157 ━━━━━━━━━━━━━━━━━━━━ 59s 354ms/step - accuracy: 0.6012 - loss: 0.6486 - val_accuracy: 0.7794 - val_loss: 0.4697
Epoch 2/100
157/157 ━━━━━━━━━━━━━━━━━━━━ 49s 311ms/step - accuracy: 0.8050 - loss: 0.4342 - val_accuracy: 0.7854 - val_loss: 0.4532
Epoch 3/100
157/157 ━━━━━━━━━━━━━━━━━━━━ 48s 303ms/step - accuracy: 0.8158 - loss: 0.4184 - val_accuracy: 0.8240 - val_loss: 0.4078
Epoch 4/100
157/157 ━━━━━━━━━━━━━━━━━━━━ 47s 299ms/step - accuracy: 0.8226 - loss: 0.4063 - val_accuracy: 0.8316 - val_loss: 0.3862
Epoch 5/100
157/157 ━━━━━━━━━━━━━━━━━━━━ 47s 302ms/step - accuracy: 0.8309 - loss: 0.3955 - val_accuracy: 0.8132 - val_loss: 0.4175
Epoch 6/100
157/157 ━━━━━━━━━━━━━━━━━━━━ 47s 302ms/step - accuracy: 0.8317 - loss: 0.3896 - val_accuracy: 0.8222 - val_loss: 0.3975
Epoch 7/100
157/157 ━━━━━━━━━━━━━━━━━━━━ 48s 308ms/step - accuracy: 0.8435 - loss: 0.3702 - val_accuracy: 0.8280 - val_loss: 0.3951
782/782 ━━━━━━━━━━━━━━━━━━━━ 122s 156ms/step - accuracy: 0.8290 - loss: 0.3906
Accuracy on test set: 0.8313199877738953

Arithmetic

Another fun example of LSTM is simple additions, it’s an example we also assembled in Wolfram. It’s interesting to note that the LSTM layer is sufficient, nothing dense, dropout or anything.

import numpy as np
from keras.models import Sequential
from keras.layers import LSTM, Dense

# Generate data for addition
def generate_addition_data(num_samples, max_value):
  """Generates data for addition task."""
  X = np.random.randint(0, max_value + 1, size=(num_samples, 2))
  y = np.sum(X, axis=1)
  return X, y

# Generate training and testing data
num_samples = 10000
max_value = 100
X_train, y_train = generate_addition_data(num_samples, max_value)
X_test, y_test = generate_addition_data(num_samples // 10, max_value)

# Reshape data for LSTM
X_train = X_train.reshape(X_train.shape[0], X_train.shape[1], 1)
X_test = X_test.reshape(X_test.shape[0], X_test.shape[1], 1)

# Create LSTM model
model = Sequential()
model.add(LSTM(32, input_shape=(X_train.shape[1], X_train.shape[2])))
model.add(Dense(1))

# Compile the model
model.compile(loss='mean_squared_error', optimizer='adam')

# Train the model
model.fit(X_train, y_train, epochs=100, batch_size=32, validation_split=0.2)

# Evaluate the model
loss = model.evaluate(X_test, y_test)
print('Test Loss:', loss)

# Make predictions
predictions = model.predict(X_test)
print('Predictions:', predictions[:10])
print('Actual values:', y_test[:10])
Epoch 1/100
 15/250 ━━━━━━━━━━━━━━━━━━━━ 1s 8ms/step - loss: 11907.7441
250/250 ━━━━━━━━━━━━━━━━━━━━ 2s 9ms/step - loss: 11319.8467 - val_loss: 9217.2393
Epoch 2/100
250/250 ━━━━━━━━━━━━━━━━━━━━ 2s 9ms/step - loss: 8795.1885 - val_loss: 7656.9336
Epoch 3/100
250/250 ━━━━━━━━━━━━━━━━━━━━ 3s 11ms/step - loss: 7350.7256 - val_loss: 6516.9985
...
Predictions: [[ 95.97442 ]
 [103.975876]
 [ 47.84532 ]
 [144.85732 ]
 [118.88151 ]
 [157.88887 ]
 [ 89.905106]
 [ 67.921936]
 [ 47.886024]
 [ 38.99708 ]]
Actual values: [ 96 104  48 145 119 158  90  68  48  39]