RNN生成新序列

RNN实例

在这篇文章中我们将使用 Anna Karenina 来构建rnn,然后生成新的文本,文章来源于udacity。 参考文章:

jpeg

import time
from collections import namedtuple

import numpy as np
import tensorflow as tf

首先我们将文本转换为整数值。

with open('anna.txt', 'r') as f:
    text=f.read()
vocab = sorted(set(text))
vocab_to_int = {c: i for i, c in enumerate(vocab)}
int_to_vocab = dict(enumerate(vocab))
encoded = np.array([vocab_to_int[c] for c in text], dtype=np.int32)
encoded[:100]
array([68, 72,  2,  0, 15, 74, 34, 20, 36,  3,  3,  3, 33,  2,  0,  0, 21,
       20,  1,  2, 79, 13, 49, 13, 74, 75, 20,  2, 34, 74, 20,  2, 49, 49,
       20,  2, 49, 13,  4, 74, 37, 20, 74, 42, 74, 34, 21, 20,  5, 80, 72,
        2,  0,  0, 21, 20,  1,  2, 79, 13, 49, 21, 20, 13, 75, 20,  5, 80,
       72,  2,  0,  0, 21, 20, 13, 80, 20, 13, 15, 75, 20, 45, 29, 80,  3,
       29,  2, 21, 35,  3,  3, 26, 42, 74, 34, 21, 15, 72, 13, 80], dtype=int32)
len(vocab)
83

构造训练的batch

下图是一个mini-batch: png


接下来,我们创建一个函数实现上述功能。 我们想要的结果如下所示:

y[:, :-1], y[:, -1] = x[:, 1:], x[:, 0]

x是输入input,y是target。

def get_batches(arr, n_seqs, n_steps):
    '''Create a generator that returns batches of size
       n_seqs x n_steps from arr.
       
       Arguments
       ---------
       arr: Array you want to make batches from
       n_seqs: Batch size, the number of sequences per batch
       n_steps: Number of sequence steps per batch
    '''
    # Get the number of characters per batch and number of batches we can make
    characters_per_batch = n_seqs * n_steps
    n_batches = len(arr)//characters_per_batch
    
    # Keep only enough characters to make full batches
    arr = arr[:n_batches * characters_per_batch]
    
    # Reshape into n_seqs rows
    arr = arr.reshape((n_seqs, -1))
    
    for n in range(0, arr.shape[1], n_steps):
        # The features
        x = arr[:, n:n+n_steps]
        # The targets, shifted by one
        y = np.zeros_like(x)
        y[:, :-1], y[:, -1] = x[:, 1:], x[:, 0]
        yield x, y

检验一下实现的函数是否正确。

batches = get_batches(encoded, 10, 50)
x, y = next(batches)
print('x\n', x[:10, :10])
print('\ny\n', y[:10, :10])
x
 [[68 72  2  0 15 74 34 20 36  3]
 [20  2 79 20 80 45 15 20 11 45]
 [42 13 80 35  3  3 38 70 74 75]
 [80 20 78  5 34 13 80 11 20 72]
 [20 13 15 20 13 75 81 20 75 13]
 [20 32 15 20 29  2 75  3 45 80]
 [72 74 80 20 64 45 79 74 20  1]
 [37 20  6  5 15 20 80 45 29 20]
 [15 20 13 75 80 65 15 35 20 60]
 [20 75  2 13 78 20 15 45 20 72]]

y
 [[72  2  0 15 74 34 20 36  3  3]
 [ 2 79 20 80 45 15 20 11 45 13]
 [13 80 35  3  3 38 70 74 75 81]
 [20 78  5 34 13 80 11 20 72 13]
 [13 15 20 13 75 81 20 75 13 34]
 [32 15 20 29  2 75  3 45 80 49]
 [74 80 20 64 45 79 74 20  1 45]
 [20  6  5 15 20 80 45 29 20 75]
 [20 13 75 80 65 15 35 20 60 72]
 [75  2 13 78 20 15 45 20 72 74]]

构建模型

下面是我们将要构建的一个模型。

png

输入

首先,我们需要创建输入。

def build_inputs(batch_size, num_steps):
    ''' Define placeholders for inputs, targets, and dropout 
    
        Arguments
        ---------
        batch_size: Batch size, number of sequences per batch
        num_steps: Number of sequence steps in a batch
        
    '''
    # Declare placeholders we'll feed into the graph
    inputs = tf.placeholder(tf.int32, [batch_size, num_steps], name='inputs')
    targets = tf.placeholder(tf.int32, [batch_size, num_steps], name='targets')
    
    # Keep probability placeholder for drop out layers
    keep_prob = tf.placeholder(tf.float32, name='keep_prob')
    
    return inputs, targets, keep_prob

LSTM Cell

我们在隐藏层会使用 LSTM cell

lstm = tf.contrib.rnn.BasicLSTMCell(num_units)

num_units是隐藏层中LSTM cell的个数,当然,在这里我们同样可以使用dropout:

tf.contrib.rnn.DropoutWrapper(lstm, output_keep_prob=keep_prob)

最后,我们需要将这些LSTM cell堆叠起来,我们可以使用 tf.contrib.rnn.MultiRNNCell

def build_cell(num_units, keep_prob):
    lstm = tf.contrib.rnn.BasicLSTMCell(num_units)
    drop = tf.contrib.rnn.DropoutWrapper(lstm, output_keep_prob=keep_prob)
    
    return drop
    
tf.contrib.rnn.MultiRNNCell([build_cell(num_units, keep_prob) for _ in range(num_layers)])

我们还需要创建一个初始状态:

initial_state = cell.zero_state(batch_size, tf.float32)
def build_lstm(lstm_size, num_layers, batch_size, keep_prob):
    ''' Build LSTM cell.
    
        Arguments
        ---------
        keep_prob: Scalar tensor (tf.placeholder) for the dropout keep probability
        lstm_size: Size of the hidden layers in the LSTM cells
        num_layers: Number of LSTM layers
        batch_size: Batch size

    '''
    ### Build the LSTM Cell
    
    def build_cell(lstm_size, keep_prob):
        # Use a basic LSTM cell
        lstm = tf.contrib.rnn.BasicLSTMCell(lstm_size)
        
        # Add dropout to the cell
        drop = tf.contrib.rnn.DropoutWrapper(lstm, output_keep_prob=keep_prob)
        return drop
    
    
    # Stack up multiple LSTM layers, for deep learning
    cell = tf.contrib.rnn.MultiRNNCell([build_cell(lstm_size, keep_prob) for _ in range(num_layers)])
    initial_state = cell.zero_state(batch_size, tf.float32)
    
    return cell, initial_state

输出

RNN的输出同样需要全连接层,我们使用了softmax

def build_output(lstm_output, in_size, out_size):
    ''' Build a softmax layer, return the softmax output and logits.
    
        Arguments
        ---------
        
        x: Input tensor
        in_size: Size of the input tensor, for example, size of the LSTM cells
        out_size: Size of this softmax layer
    
    '''

    # Reshape output so it's a bunch of rows, one row for each step for each sequence.
    # That is, the shape should be batch_size*num_steps rows by lstm_size columns
    seq_output = tf.concat(lstm_output, axis=1)
    x = tf.reshape(seq_output, [-1, in_size])
    
    # Connect the RNN outputs to a softmax layer
    with tf.variable_scope('softmax'):
        softmax_w = tf.Variable(tf.truncated_normal((in_size, out_size), stddev=0.1))
        softmax_b = tf.Variable(tf.zeros(out_size))
    
    # Since output is a bunch of rows of RNN cell outputs, logits will be a bunch
    # of rows of logit outputs, one for each step and sequence
    logits = tf.matmul(x, softmax_w) + softmax_b
    
    # Use softmax to get the probabilities for predicted characters
    out = tf.nn.softmax(logits, name='predictions')
    
    return out, logits

训练损失

损失我们仍然可以使用交叉熵。

def build_loss(logits, targets, lstm_size, num_classes):
    ''' Calculate the loss from the logits and the targets.
    
        Arguments
        ---------
        logits: Logits from final fully connected layer
        targets: Targets for supervised learning
        lstm_size: Number of LSTM hidden units
        num_classes: Number of classes in targets
        
    '''
    
    # One-hot encode targets and reshape to match logits, one row per batch_size per step
    y_one_hot = tf.one_hot(targets, num_classes)
    y_reshaped = tf.reshape(y_one_hot, logits.get_shape())
    
    # Softmax cross entropy loss
    loss = tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=y_reshaped)
    loss = tf.reduce_mean(loss)
    return loss

优化器

一般的RNN都有梯度消失或者梯度爆炸的问题,LSTM可以修复梯度消失的问题;梯度爆炸,我们可以通过剪枝来实现。

def build_optimizer(loss, learning_rate, grad_clip):
    ''' Build optmizer for training, using gradient clipping.
    
        Arguments:
        loss: Network loss
        learning_rate: Learning rate for optimizer
    
    '''
    
    # Optimizer for training, using gradient clipping to control exploding gradients
    tvars = tf.trainable_variables()
    grads, _ = tf.clip_by_global_norm(tf.gradients(loss, tvars), grad_clip)
    train_op = tf.train.AdamOptimizer(learning_rate)
    optimizer = train_op.apply_gradients(zip(grads, tvars))
    
    return optimizer

构建网络

现在,我们需要把上面的各个部分组合起来,构成一个网络。

class CharRNN:
    
    def __init__(self, num_classes, batch_size=64, num_steps=50, 
                       lstm_size=128, num_layers=2, learning_rate=0.001, 
                       grad_clip=5, sampling=False):
    
        # When we're using this network for sampling later, we'll be passing in
        # one character at a time, so providing an option for that
        if sampling == True:
            batch_size, num_steps = 1, 1
        else:
            batch_size, num_steps = batch_size, num_steps

        tf.reset_default_graph()
        
        # Build the input placeholder tensors
        self.inputs, self.targets, self.keep_prob = build_inputs(batch_size, num_steps)

        # Build the LSTM cell
        cell, self.initial_state = build_lstm(lstm_size, num_layers, batch_size, self.keep_prob)

        ### Run the data through the RNN layers
        # First, one-hot encode the input tokens
        x_one_hot = tf.one_hot(self.inputs, num_classes)
        
        # Run each sequence step through the RNN and collect the outputs
        outputs, state = tf.nn.dynamic_rnn(cell, x_one_hot, initial_state=self.initial_state)
        self.final_state = state
        
        # Get softmax predictions and logits
        self.prediction, self.logits = build_output(outputs, lstm_size, num_classes)
        
        # Loss and optimizer (with gradient clipping)
        self.loss = build_loss(self.logits, self.targets, lstm_size, num_classes)
        self.optimizer = build_optimizer(self.loss, learning_rate, grad_clip)

超参数

下面是网络会使用到的一些超参数:

  • batch_size - 传入网络的batch个数
  • num_steps - 每个batch中序列的长度
  • lstm_size - 每个隐藏层中LSTM单元的个数
  • num_layers - 隐藏层LSTM的层数
  • learning_rate - 学习率
  • keep_prob - 训练期间,神经网络单元被保留的概率,如果过拟合了,尝试减小这个值

你可以在这儿了解一些小技巧 where it originally came from.

Tips and Tricks

Monitoring Validation Loss vs. Training Loss

If you’re somewhat new to Machine Learning or Neural Networks it can take a bit of expertise to get good models. The most important quantity to keep track of is the difference between your training loss (printed during training) and the validation loss (printed once in a while when the RNN is run on the validation data (by default every 1000 iterations)). In particular:

  • If your training loss is much lower than validation loss then this means the network might be overfitting. Solutions to this are to decrease your network size, or to increase dropout. For example you could try dropout of 0.5 and so on.
  • If your training/validation loss are about equal then your model is underfitting. Increase the size of your model (either number of layers or the raw number of neurons per layer)

Approximate number of parameters

The two most important parameters that control the model are lstm_size and num_layers. I would advise that you always use num_layers of either 2/3. The lstm_size can be adjusted based on how much data you have. The two important quantities to keep track of here are:

  • The number of parameters in your model. This is printed when you start training.
  • The size of your dataset. 1MB file is approximately 1 million characters.

These two should be about the same order of magnitude. It’s a little tricky to tell. Here are some examples:

  • I have a 100MB dataset and I’m using the default parameter settings (which currently print 150K parameters). My data size is significantly larger (100 mil » 0.15 mil), so I expect to heavily underfit. I am thinking I can comfortably afford to make lstm_size larger.
  • I have a 10MB dataset and running a 10 million parameter model. I’m slightly nervous and I’m carefully monitoring my validation loss. If it’s larger than my training loss then I may want to try to increase dropout a bit and see if that helps the validation loss.

Best models strategy

The winning strategy to obtaining very good models (if you have the compute time) is to always err on making the network larger (as large as you’re willing to wait for it to compute) and then try different dropout values (between 0,1). Whatever model has the best validation performance (the loss, written in the checkpoint filename, low is good) is the one you should use in the end.

It is very common in deep learning to run many different models with many different hyperparameter settings, and in the end take whatever checkpoint gave the best validation performance.

By the way, the size of your training and validation splits are also parameters. Make sure you have a decent amount of data in your validation set or otherwise the validation performance will be noisy and not very informative.

batch_size = 100        # Sequences per batch
num_steps = 100         # Number of sequence steps per batch
lstm_size = 512         # Size of hidden layers in LSTMs
num_layers = 2          # Number of LSTM layers
learning_rate = 0.001   # Learning rate
keep_prob = 0.5         # Dropout keep probability

训练

epochs = 20
# Save every N iterations
save_every_n = 200

model = CharRNN(len(vocab), batch_size=batch_size, num_steps=num_steps,
                lstm_size=lstm_size, num_layers=num_layers, 
                learning_rate=learning_rate)

saver = tf.train.Saver(max_to_keep=100)
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    
    # Use the line below to load a checkpoint and resume training
    #saver.restore(sess, 'checkpoints/______.ckpt')
    counter = 0
    for e in range(epochs):
        # Train network
        new_state = sess.run(model.initial_state)
        loss = 0
        for x, y in get_batches(encoded, batch_size, num_steps):
            counter += 1
            start = time.time()
            feed = {model.inputs: x,
                    model.targets: y,
                    model.keep_prob: keep_prob,
                    model.initial_state: new_state}
            batch_loss, new_state, _ = sess.run([model.loss, 
                                                 model.final_state, 
                                                 model.optimizer], 
                                                 feed_dict=feed)
            
            end = time.time()
            print('Epoch: {}/{}... '.format(e+1, epochs),
                  'Training Step: {}... '.format(counter),
                  'Training loss: {:.4f}... '.format(batch_loss),
                  '{:.4f} sec/batch'.format((end-start)))
        
            if (counter % save_every_n == 0):
                saver.save(sess, "checkpoints/i{}_l{}.ckpt".format(counter, lstm_size))
    
    saver.save(sess, "checkpoints/i{}_l{}.ckpt".format(counter, lstm_size))

Saved checkpoints

checkpoint 保留了训练中的状态: https://www.tensorflow.org/programmers_guide/variables

tf.train.get_checkpoint_state('checkpoints')
model_checkpoint_path: "checkpoints/i3960_l512.ckpt"
all_model_checkpoint_paths: "checkpoints/i200_l512.ckpt"
all_model_checkpoint_paths: "checkpoints/i400_l512.ckpt"
all_model_checkpoint_paths: "checkpoints/i600_l512.ckpt"
all_model_checkpoint_paths: "checkpoints/i800_l512.ckpt"
all_model_checkpoint_paths: "checkpoints/i1000_l512.ckpt"
all_model_checkpoint_paths: "checkpoints/i1200_l512.ckpt"
all_model_checkpoint_paths: "checkpoints/i1400_l512.ckpt"
all_model_checkpoint_paths: "checkpoints/i1600_l512.ckpt"
all_model_checkpoint_paths: "checkpoints/i1800_l512.ckpt"
all_model_checkpoint_paths: "checkpoints/i2000_l512.ckpt"
all_model_checkpoint_paths: "checkpoints/i2200_l512.ckpt"
all_model_checkpoint_paths: "checkpoints/i2400_l512.ckpt"
all_model_checkpoint_paths: "checkpoints/i2600_l512.ckpt"
all_model_checkpoint_paths: "checkpoints/i2800_l512.ckpt"
all_model_checkpoint_paths: "checkpoints/i3000_l512.ckpt"
all_model_checkpoint_paths: "checkpoints/i3200_l512.ckpt"
all_model_checkpoint_paths: "checkpoints/i3400_l512.ckpt"
all_model_checkpoint_paths: "checkpoints/i3600_l512.ckpt"
all_model_checkpoint_paths: "checkpoints/i3800_l512.ckpt"
all_model_checkpoint_paths: "checkpoints/i3960_l512.ckpt"

预测

现在,网络已经训练好了,我们可以产生新的文本了。

def pick_top_n(preds, vocab_size, top_n=5):
    p = np.squeeze(preds)
    p[np.argsort(p)[:-top_n]] = 0
    p = p / np.sum(p)
    c = np.random.choice(vocab_size, 1, p=p)[0]
    return c
def sample(checkpoint, n_samples, lstm_size, vocab_size, prime="The "):
    samples = [c for c in prime]
    model = CharRNN(len(vocab), lstm_size=lstm_size, sampling=True)
    saver = tf.train.Saver()
    with tf.Session() as sess:
        saver.restore(sess, checkpoint)
        new_state = sess.run(model.initial_state)
        for c in prime:
            x = np.zeros((1, 1))
            x[0,0] = vocab_to_int[c]
            feed = {model.inputs: x,
                    model.keep_prob: 1.,
                    model.initial_state: new_state}
            preds, new_state = sess.run([model.prediction, model.final_state], 
                                         feed_dict=feed)

        c = pick_top_n(preds, len(vocab))
        samples.append(int_to_vocab[c])

        for i in range(n_samples):
            x[0,0] = c
            feed = {model.inputs: x,
                    model.keep_prob: 1.,
                    model.initial_state: new_state}
            preds, new_state = sess.run([model.prediction, model.final_state], 
                                         feed_dict=feed)

            c = pick_top_n(preds, len(vocab))
            samples.append(int_to_vocab[c])
        
    return ''.join(samples)
tf.train.latest_checkpoint('checkpoints')
'checkpoints/i3960_l512.ckpt'
checkpoint = tf.train.latest_checkpoint('checkpoints')
samp = sample(checkpoint, 2000, lstm_size, len(vocab), prime="Far")
print(samp)
Farens. The man and this presence there was a long whell against the
serfice. And she would have seen a dest of sorry of any arrangements,
true. But the side house was in the money about the chief same, and
that he had to say to all his face, with some significance for his wife
that he had come and done to the penslition of a little and sounds of
the society, and always was in her and horses, and that that they
could not be a chalacteristic on her so second become in her soul that it
was all sincerity, and her clothest having been talking to horror of
a presine that was something was all his mother had no station.

The conversation had suddenly had not been anything.

Sometying her health she came over the services, had sat down again, and
at the conversation was not in the strange shreed to the door, and with
his sight of the party, the sick man thinking the secret of the mother
when they had brought a humored friends, and the prayers that he could not
let him any more and marking this subject, and they. The most of the
steps about indeed and triude of accordance of the position that was
steping himself. They went to those on its ordered and could only to
chart how when they all and the more angry and the sight about it. The
peasants would his chillish accounts and at the partious feeling of
particularly ares, and so thore was some shorting or them. And he so
see the place, who was seeing that something coughed hands, as though
he was at the same, was not that she was insulted. Anna had been at
the prince tell out his hat this article and to begin to say.

"Yive were a selvated the members who have taken the station in which the
committee? And is to darling a minutate of the carriage of marnated. I have
took to myself in the steps and a single morn and happy.

"If that's into a single morning, this seems thanks to her and thinks
in the past and married that." The side had been carrying all the
portrait of his words. These wires he was sitting at the past of the
room, t
checkpoint = 'checkpoints/i200_l512.ckpt'
samp = sample(checkpoint, 1000, lstm_size, len(vocab), prime="Far")
print(samp)
Fard,. ""
I"
"The tat he winse he wos al alt ong the hor an thot hed the
the sin the sand we he sos and se th the an soris he as oth thes sosed to he the he ad ond on har soron sha sas sitis he
sentes the he wis as arton thir ans on ond wome to whe the sos thes sosit to the sos at hos ho she sis the she him an tot thes th an setit and wo he angeras,. Tho hor ho wot thime
ne tot ha to we we he won thar al thir send tougtint ting and wh wamed, the the her what hot wit he wat ta hor,
thing th whe tousederede sad sithe the the he se the han shes oress ho tot on here an on ot to he the hhressasesid thise where ans andentor tat the
t the won he ans te hite wost hin sos torinnd
tare wes to se her
inntor ond tot ont ate
silt wens
thithe serede ho he the soros, sers on asis and af on he tare sin hos tot this won oth the he tins, and ang tee ar the ant oud thot woth shat ha lan and sithe to she sos thoth sar te sesilt ther singetin to he te artor to ho he
wans he ans her han ase and. 
I hind wasd we
checkpoint = 'checkpoints/i600_l512.ckpt'
samp = sample(checkpoint, 1000, lstm_size, len(vocab), prime="Far")
print(samp)
Farnt out a seenen a doons the werly a prostines, bet of to as sone him, al her his and and she had booke word and wert on her had and said to hoult, sto stat to she whon he wond to
denting andond to stat had he he seed a sele to the corsions.

"The more when she shead, thourg that so he had and to the sort, sunding him his and and sace and where her
headed he said
the haress and than sheard her, her has thing his stalith his tongenter," she said
he dasted the monting the sond of
their that allong her ald
have a tare him that to the somesily wourd to dive in and som her and whene the sead, and withon te said whine and had sail he having had
same to sime her has
antanter that
she she distlidion of her heally.

"When, an the pinter on atone with
the pasting, and sooked hus a sant of her his her ander was having to the panting to shingher at a shiss to thow to and walk nover as to the portanion," said Stepan'Akknavitcch with which all she coundsess in this his,
boring the panitiou, and
atonge
checkpoint = 'checkpoints/i1200_l512.ckpt'
samp = sample(checkpoint, 1000, lstm_size, len(vocab), prime="Far")
print(samp)
Farcial the
confiring to the mone of the correm and thinds. She
she saw the
streads of herself hand only astended of the carres to her his some of the princess of which he came him of
all that his white the dreasing of
thisking the princess and with she was she had
bettee a still and he was happined, with the pood on the mush to the peaters and seet it.

"The possess a streatich, the may were notine at his mate a misted
and the
man of the mother at the same of the seem her
felt. He had not here.

"I conest only be alw you thinking that the partion
of their said."

"A much then you make all her
somether. Hower their centing
about
this, and I won't give it in
himself.
I had not come at any see it will that there she chile no one that him.

"The distiction with you all.... It was
a mone of the mind were starding to the simple to a mone. It to be to ser in the place," said Vronsky.
"And a plais in
his face, has alled in the consess on at they to gan in the sint
at as that
he would not be and t