I wanted a simple code review for improvement to increase the efficiency of my text generating model. This model is taken from the official TensorFlow site but is being trained on different datasets. I am using TensorFlow 2.0 (beta1) GPU version and Keras.
I was training this on a Harry Potter book but I found that the output was not the best, despite training it for a couple of hours(when loss stabilized at around 0.0565). Here is the code:-
import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Dropout
from keras.callbacks import ModelCheckpoint
from keras.layers import LSTM
from keras.utils import np_utils
import os
text = open ("/home/awesome_ruler/Documents/Atom projects/HarryPotter/hp.txt").read()
vocab = sorted(set(text))
# Creating a mapping from unique characters to indices
char2idx = {u:i for i, u in enumerate(vocab)}
idx2char = np.array(vocab)
text_as_int = np.array([char2idx[c] for c in text])
# The maximum length sentence we want for a single input in characters
seq_length = 200
examples_per_epoch = len(text)//(seq_length+1)
# Create training examples / targets
char_dataset = tf.data.Dataset.from_tensor_slices(text_as_int)
sequences = char_dataset.batch(seq_length+1, drop_remainder=True)
def split_input_target(chunk):
input_text = chunk[:-1]
target_text = chunk[1:]
return input_text, target_text
dataset = sequences.map(split_input_target)
# Batch size
BATCH_SIZE = 64 # DEfault is 64
# Buffer size to shuffle the dataset
# (TF data is designed to work with possibly infinite sequences,
# so it doesn't attempt to shuffle the entire sequence in memory. Instead,
# it maintains a buffer in which it shuffles elements).
BUFFER_SIZE = 10000
dataset = dataset.shuffle(BUFFER_SIZE).batch(BATCH_SIZE, drop_remainder=True)
# Length of the vocabulary in chars
vocab_size = len(vocab)
# The embedding dimension
embedding_dim = 256
# Number of RNN units
rnn_units = 960 # 32 multiple
def build_model(vocab_size, embedding_dim, rnn_units, batch_size):
model = tf.keras.Sequential([
tf.keras.layers.Embedding(vocab_size, embedding_dim,
batch_input_shape=[batch_size, None]),
tf.keras.layers.LSTM(rnn_units,
return_sequences=True,
stateful=True,
recurrent_initializer='glorot_uniform'),
tf.keras.layers.Dropout(0.5),
tf.keras.layers.Dense(vocab_size)
])
return model
model = build_model(
vocab_size = len(vocab),
embedding_dim=embedding_dim,
rnn_units=rnn_units,
batch_size=BATCH_SIZE)
def loss(labels, logits):
return tf.keras.losses.sparse_categorical_crossentropy(labels, logits, from_logits=True)
model.compile(optimizer='Adam', loss=loss)
# Directory where the checkpoints will be saved
checkpoint_dir = '/home/awesome_ruler/Documents/Atom projects/HarryPotter/CheckPoints'
checkpoint_prefix = os.path.join(checkpoint_dir, "ckpt_{epoch}")
checkpoint_callback=tf.keras.callbacks.ModelCheckpoint(
filepath=checkpoint_prefix,
save_weights_only=True)
EPOCHS=350
history = model.fit(dataset, epochs=EPOCHS, callbacks=[checkpoint_callback]) # Comment to evaluate the model
checkpoint_dir = 'CheckPoints/ckpt_380'
model = build_model(vocab_size, embedding_dim, rnn_units, batch_size=1)
model.load_weights(checkpoint_dir)
model.summary()
def generate_text(model, start_string):
# Evaluation step (generating text using the learned model)
# Number of characters to generate
num_generate = 1000
# Converting our start string to numbers (vectorizing)
input_eval = [char2idx[s] for s in start_string]
input_eval = tf.expand_dims(input_eval, 0)
# Empty string to store our results
text_generated = []
# Low temperatures results in more predictable text.
# Higher temperatures results in more surprising text.
# Experiment to find the best setting.
temperature = 0.2
# Here batch size == 1
model.reset_states()
for i in range(num_generate):
predictions = model(input_eval)
# remove the batch dimension
predictions = tf.squeeze(predictions, 0)
# using a categorical distribution to predict the character returned by the model
predictions = predictions / temperature
predicted_id = tf.random.categorical(predictions, num_samples=1)[-1,0].numpy()
# We pass the predicted character as the next input to the model
# along with the previous hidden state
input_eval = tf.expand_dims([predicted_id], 0)
text_generated.append(idx2char[predicted_id])
return (start_string + ''.join(text_generated))
print(generate_text(model, start_string=u"homework"))
Since the original dataset had many more characters, I reduced the Rnn units, keeping it a multiple of 32. Also, I converted the GRU layer to LSTM as it has better memory holding capability(theoretically). Can anyone suggest any other improvements? I would be really glad to know them
Also, here is some sample output to show you an Idea of how bad the model is (I think the main reason is that the dataset is very small). But is there still something that can be done?
homework than Privet Drive, which was
difficult as there wasn’t much longerulble – he went into the
living-room window. Harry was streaming it all right – now, where at last
on to a confused – and we couldn’t remember what I said what’s best is tites,
but the reflected here, sixteen minutes later, Dumbledore with the Quidditch Cup, what’s already.’
He was engreed. It was a very direction boats as passing the wall, because of it and
the fire, flanding in the middle of a ploughed field, halfway
across a suspension bridge and at the top of a munched of ‘Hagrid’s arms ... this is what is it?’ Harry whispered.
‘It’s the black eyes on his cupboard, there’s something you’ve
got your owl broomstick, Potter’s obviously spet the only one who’ve got yeh anything ...
‘Master,’ Harry tried. And there, she were all get still staring at the cart. He
was still started looking from the roof of his neck, and fell one
of them at night and getting pellets were died. He managed to bane
when we arrived.