Foreword¶

In addition to convolutional neural networks (CNNs), recurrent neural networks (RNNs) are also widely used in deep learning, especially for natural language processing tasks. In this chapter, we will learn how to implement an RNN using PaddlePaddle and use this network to train a sentiment analysis model.

Training the Model¶

Create a text_classification.py Python file. First, import the Python libraries. We’ve used fluid and numpy in previous chapters, so we won’t repeat those imports here. The key new import is the imdb library, which provides an English movie review dataset with two classes: positive and negative sentiment.

import paddle
import paddle.dataset.imdb as imdb
import paddle.fluid as fluid
import numpy as np

RNNs have evolved into several advanced variants, such as Long Short-Term Memory (LSTM) networks. The following code implements a simple RNN:

def rnn_net(ipt, input_dim):
    # Embed the input IDs
    emb = fluid.layers.embedding(input=ipt, size=[input_dim, 128], is_sparse=True)
    sentence = fluid.layers.fc(input=emb, size=128, act='tanh')

    rnn = fluid.layers.DynamicRNN()
    with rnn.block():
        word = rnn.step_input(sentence)
        prev = rnn.memory(shape=[128])
        hidden = fluid.layers.fc(input=[word, prev], size=128, act='relu')
        rnn.update_memory(prev, hidden)
        rnn.output(hidden)

    last = fluid.layers.sequence_last_step(rnn())
    out = fluid.layers.fc(input=last, size=2, act='softmax')
    return out

The following code implements a simple LSTM network, which addresses the vanishing/exploding gradient problems of RNNs with long sequences:

# Define the LSTM network
def lstm_net(ipt, input_dim):
    # Embed the input IDs
    emb = fluid.layers.embedding(input=ipt, size=[input_dim, 128], is_sparse=True)

    # First fully connected layer
    fc1 = fluid.layers.fc(input=emb, size=128)
    # Apply dynamic LSTM operation
    lstm1, _ = fluid.layers.dynamic_lstm(input=fc1, size=128)

    # Max sequence pooling
    fc2 = fluid.layers.sequence_pool(input=fc1, pool_type='max')
    lstm2 = fluid.layers.sequence_pool(input=lstm1, pool_type='max')

    # Output layer with softmax
    out = fluid.layers.fc(input=[fc2, lstm2], size=2, act='softmax')
    return out

Define the input layer, noting that the data is sequential, so we set lod_level=1 to indicate sequence data:

# Define input data with lod_level=1 for sequence data
words = fluid.layers.data(name='words', shape=[1], dtype='int64', lod_level=1)
label = fluid.layers.data(name='label', shape=[1], dtype='int64')

Load the data dictionary, which maps words to integer IDs in the training set:

# Load the data dictionary
print("Loading data dictionary...")
word_dict = imdb.word_dict()
dict_dim = len(word_dict)

Select the network architecture (LSTM is recommended here):

# Choose the network: LSTM or RNN
model = lstm_net(words, dict_dim)
# model = rnn_net(words, dict_dim)  # Uncomment to use RNN

Define the loss function (cross-entropy for classification) and accuracy metric:

# Define loss and accuracy
cost = fluid.layers.cross_entropy(input=model, label=label)
avg_cost = fluid.layers.mean(cost)
acc = fluid.layers.accuracy(input=model, label=label)

Clone the test program for evaluation:

# Create a test program for evaluation
test_program = fluid.default_main_program().clone(for_test=True)

Define the optimizer (Adagrad for sparse data) with learning rate 0.002:

# Define optimizer
optimizer = fluid.optimizer.AdagradOptimizer(learning_rate=0.002)
opt = optimizer.minimize(avg_cost)

Initialize the executor (use GPU if available with fluid.CUDAPlace(0)):

# Create executor
place = fluid.CPUPlace()
# place = fluid.CUDAPlace(0)  # Uncomment for GPU
exe = fluid.Executor(place)
exe.run(fluid.default_startup_program())

Load and preprocess training/test data with batching and shuffling:

# Load training and test data
print("Loading training data...")
train_reader = paddle.batch(paddle.reader.shuffle(imdb.train(word_dict), 25000), batch_size=128)
print("Loading test data...")
test_reader = paddle.batch(imdb.test(word_dict), batch_size=128)

Define the data feeder:

# Define data feeder
feeder = fluid.DataFeeder(place=place, feed_list=[words, label])

Start training (1 epoch here; adjust for convergence):

# Start training
for pass_id in range(1):
    train_cost = 0
    for batch_id, data in enumerate(train_reader()):
        train_cost = exe.run(program=fluid.default_main_program(),
                             feed=feeder.feed(data),
                             fetch_list=[avg_cost])

        if batch_id % 40 == 0:
            print('Pass:%d, Batch:%d, Cost:%0.5f' % (pass_id, batch_id, train_cost[0]))

            # Evaluate on test set
            test_costs = []
            test_accs = []
            for batch_id, data in enumerate(test_reader()):
                test_cost, test_acc = exe.run(program=test_program,
                                              feed=feeder.feed(data),
                                              fetch_list=[avg_cost, acc])
                test_costs.append(test_cost[0])
                test_accs.append(test_acc[0])

            test_cost = np.mean(test_costs)
            test_acc = np.mean(test_accs)
            print('Test: Cost:%0.5f, ACC:%0.5f' % (test_cost, test_acc))

Expected output:

Pass:0, Batch:0, Cost:0.69274
Test: Cost:0.69329, ACC:0.50175
Pass:0, Batch:40, Cost:0.61183
Test: Cost:0.61142, ACC:0.82659
...

Predicting New Data¶

Define three test sentences (neutral, positive, negative) and preprocess them:

# Define prediction data
reviews_str = ['read the book forget the movie', 'this is a great movie', 'this is very bad']
reviews = [c.split() for c in reviews_str]

Convert sentences to word IDs using the dictionary:

# Encode words to IDs
UNK = word_dict.get('<unk>', 0)  # Unknown token ID
lod = []
for c in reviews:
    encoded = [word_dict.get(word.encode('utf-8'), UNK) for word in c]
    lod.append(encoded)

Create input tensor for prediction:

# Prepare prediction data
base_shape = [[len(seq) for seq in lod]]
tensor_words = fluid.create_lod_tensor(lod, base_shape, place)

Run prediction with the test program:

# Predict
results = exe.run(program=test_program,
                  feed={'words': tensor_words, 'label': np.array([[0]]*len(reviews)).astype('int64')},
                  fetch_list=[model])

Print prediction results:

# Print sentiment probabilities
for i, r in enumerate(results[0]):
    print("\'%s\' - Positive: %0.5f, Negative: %0.5f" % (
        reviews_str[i], r[0], r[1]))

Expected output:

'read the book forget the movie' - Positive: 0.53604, Negative: 0.46396
'this is a great movie' - Positive: 0.67564, Negative: 0.32436
'this is very bad' - Positive: 0.35406, Negative: 0.64594

References¶

https://blog.csdn.net/u010089444/article/details/76725843
http://ai.stanford.edu/~amaas/data/sentiment/
https://github.com/PaddlePaddle/book/tree/develop/06.understand_sentiment

Previous Chapter: 4. Convolutional Neural Networks ¶

Next Chapter: 6. Generative Adversarial Networks ¶

Foreword¶

Training the Model¶

Predicting New Data¶

References¶

Previous Chapter: 4. Convolutional Neural Networks¶

Next Chapter: 6. Generative Adversarial Networks¶

Related Articles

Previous Chapter: 4. Convolutional Neural Networks ¶

Next Chapter: 6. Generative Adversarial Networks ¶