ML text generating code with Python and Tensorflow

Question

I wanted a simple code review for improvement to increase the efficiency of my text generating model. This model is taken from the official TensorFlow site but is being trained on different datasets. I am using TensorFlow 2.0 (beta1) GPU version and Keras.

I was training this on a Harry Potter book but I found that the output was not the best, despite training it for a couple of hours(when loss stabilized at around 0.0565). Here is the code:-

import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Dropout
from keras.callbacks import ModelCheckpoint
from keras.layers import LSTM
from keras.utils import np_utils
import os

text = open ("/home/awesome_ruler/Documents/Atom projects/HarryPotter/hp.txt").read()
vocab = sorted(set(text))

# Creating a mapping from unique characters to indices
char2idx = {u:i for i, u in enumerate(vocab)}
idx2char = np.array(vocab)

text_as_int = np.array([char2idx[c] for c in text])

# The maximum length sentence we want for a single input in characters
seq_length = 200
examples_per_epoch = len(text)//(seq_length+1)

# Create training examples / targets
char_dataset = tf.data.Dataset.from_tensor_slices(text_as_int)

sequences = char_dataset.batch(seq_length+1, drop_remainder=True)

def split_input_target(chunk):
    input_text = chunk[:-1]
    target_text = chunk[1:]
    return input_text, target_text

dataset = sequences.map(split_input_target)

# Batch size
BATCH_SIZE = 64 # DEfault is 64

# Buffer size to shuffle the dataset
# (TF data is designed to work with possibly infinite sequences,
# so it doesn't attempt to shuffle the entire sequence in memory. Instead,
# it maintains a buffer in which it shuffles elements).
BUFFER_SIZE = 10000

dataset = dataset.shuffle(BUFFER_SIZE).batch(BATCH_SIZE, drop_remainder=True)

# Length of the vocabulary in chars
vocab_size = len(vocab)

# The embedding dimension
embedding_dim = 256

# Number of RNN units
rnn_units = 960 # 32 multiple

def build_model(vocab_size, embedding_dim, rnn_units, batch_size):
  model = tf.keras.Sequential([
    tf.keras.layers.Embedding(vocab_size, embedding_dim,
                              batch_input_shape=[batch_size, None]),
    tf.keras.layers.LSTM(rnn_units,
                        return_sequences=True,
                        stateful=True,
                        recurrent_initializer='glorot_uniform'),
    tf.keras.layers.Dropout(0.5),
    tf.keras.layers.Dense(vocab_size)
  ])
  return model

model = build_model(
  vocab_size = len(vocab),
  embedding_dim=embedding_dim,
  rnn_units=rnn_units,
  batch_size=BATCH_SIZE)

def loss(labels, logits):
  return tf.keras.losses.sparse_categorical_crossentropy(labels, logits, from_logits=True)

model.compile(optimizer='Adam', loss=loss)

# Directory where the checkpoints will be saved
checkpoint_dir = '/home/awesome_ruler/Documents/Atom projects/HarryPotter/CheckPoints'

checkpoint_prefix = os.path.join(checkpoint_dir, "ckpt_{epoch}")

checkpoint_callback=tf.keras.callbacks.ModelCheckpoint(
    filepath=checkpoint_prefix,
    save_weights_only=True)


EPOCHS=350
history = model.fit(dataset, epochs=EPOCHS, callbacks=[checkpoint_callback]) # Comment to evaluate the model

checkpoint_dir = 'CheckPoints/ckpt_380'

model = build_model(vocab_size, embedding_dim, rnn_units, batch_size=1)

model.load_weights(checkpoint_dir)

model.summary()

def generate_text(model, start_string):
  # Evaluation step (generating text using the learned model)

  # Number of characters to generate
  num_generate = 1000

  # Converting our start string to numbers (vectorizing)
  input_eval = [char2idx[s] for s in start_string]
  input_eval = tf.expand_dims(input_eval, 0)

  # Empty string to store our results
  text_generated = []

  # Low temperatures results in more predictable text.
  # Higher temperatures results in more surprising text.
  # Experiment to find the best setting.
  temperature = 0.2

  # Here batch size == 1
  model.reset_states()
  for i in range(num_generate):
      predictions = model(input_eval)
      # remove the batch dimension
      predictions = tf.squeeze(predictions, 0)

      # using a categorical distribution to predict the character returned by the model
      predictions = predictions / temperature
      predicted_id = tf.random.categorical(predictions, num_samples=1)[-1,0].numpy()

      # We pass the predicted character as the next input to the model
      # along with the previous hidden state
      input_eval = tf.expand_dims([predicted_id], 0)

      text_generated.append(idx2char[predicted_id])

  return (start_string + ''.join(text_generated))

print(generate_text(model, start_string=u"homework"))

Since the original dataset had many more characters, I reduced the Rnn units, keeping it a multiple of 32. Also, I converted the GRU layer to LSTM as it has better memory holding capability(theoretically). Can anyone suggest any other improvements? I would be really glad to know them

Also, here is some sample output to show you an Idea of how bad the model is (I think the main reason is that the dataset is very small). But is there still something that can be done?

homework than Privet Drive, which was
difficult as there wasn’t much longerulble – he went into the
living-room window. Harry was streaming it all right – now, where at last
on to a confused – and we couldn’t remember what I said what’s best is tites,
but the reflected here, sixteen minutes later, Dumbledore with the Quidditch Cup, what’s already.’
He was engreed. It was a very direction boats as passing the wall, because of it and
the fire, flanding in the middle of a ploughed field, halfway
across a suspension bridge and at the top of a munched of ‘Hagrid’s arms ... this is what is it?’ Harry whispered.
‘It’s the black eyes on his cupboard, there’s something you’ve
got your owl broomstick, Potter’s obviously spet the only one who’ve got yeh anything ...
‘Master,’ Harry tried. And there, she were all get still staring at the cart. He
was still started looking from the roof of his neck, and fell one
of them at night and getting pellets were died. He managed to bane
when we arrived.

So, you want better output because the current is unsatisfactory and you borrowed most of the code? — Mast
– Mast ♦, Commented Mar 31, 2020 at 16:58
@Mast Basically, I want to get a feel for transfer learning - adapting existing models on a new dataset. Since it is my first time, I want to know what parts of the code are the ones usually targeted my pros when optimizing a specific code for some other dataset. This would help me much in my other projects. Advice about the no. of layers and rnn units is also very helpful and appreciated — neel g
– neel g, Commented Mar 31, 2020 at 17:06

Stack Exchange Network

ML text generating code with Python and Tensorflow

0

You must log in to answer this question.

Hot Network Questions

ML text generating code with Python and Tensorflow

0

You must log in to answer this question.

Related

Hot Network Questions