Introduction

In deep learning training, such as image recognition training, training from scratch each time consumes a large amount of time and resources. Moreover, when the dataset is small, the model is difficult to fit. In this context, transfer learning emerged. By using a pre-trained model to initialize the network to be trained, the convergence speed of the model can be accelerated, and the accuracy of the model can also be improved. The model used to initialize the training network is a model trained on a large dataset and has fully converged. It is best to use the same network for the best-trained model and the pre-trained model to maximize the initialization of all layers.

Initial Training of the Model

The pre-trained model used in this chapter is the ResNet50 network model provided by PaddlePaddle official, and the training dataset is ImageNet. Its download address is: http://paddle-imagenet-models-name.bj.bcebos.com/ResNet50_pretrained.zip. Readers can download more models at here. After downloading, unzip it to the models directory.

Write a Python program pretrain_model.py for initial training of the model. First, import relevant dependency packages.

import os
import shutil
import paddle as paddle
import paddle.dataset.flowers as flowers
import paddle.fluid as fluid
from paddle.fluid.param_attr import ParamAttr

Define a residual neural network (ResNet), which is provided by PaddlePaddle. The model address is models_name. This network has specified parameter names for each layer, which is convenient for initializing the network model. If the network structure changes but the names remain unchanged, the pre-trained model can be initialized layer by layer according to each parameter’s name.

# Define Residual Neural Network (ResNet)
def resnet50(input):
    def conv_bn_layer(input, num_filters, filter_size, stride=1, groups=1, act=None, name=None):
        conv = fluid.layers.conv2d(input=input,
                                   num_filters=num_filters,
                                   filter_size=filter_size,
                                   stride=stride,
                                   padding=(filter_size - 1) // 2,
                                   groups=groups,
                                   act=None,
                                   param_attr=ParamAttr(name=name + "_weights"),
                                   bias_attr=False,
                                   name=name + '.conv2d.output.1')
        if name == "conv1":
            bn_name = "bn_" + name
        else:
            bn_name = "bn" + name[3:]
        return fluid.layers.batch_norm(input=conv,
                                       act=act,
                                       name=bn_name + '.output.1',
                                       param_attr=ParamAttr(name=bn_name + '_scale'),
                                       bias_attr=ParamAttr(bn_name + '_offset'),
                                       moving_mean_name=bn_name + '_mean',
                                       moving_variance_name=bn_name + '_variance', )

    def shortcut(input, ch_out, stride, name):
        ch_in = input.shape[1]
        if ch_in != ch_out or stride != 1:
            return conv_bn_layer(input, ch_out, 1, stride, name=name)
        else:
            return input

    def bottleneck_block(input, num_filters, stride, name):
        conv0 = conv_bn_layer(input=input,
                              num_filters=num_filters,
                              filter_size=1,
                              act='relu',
                              name=name + "_branch2a")
        conv1 = conv_bn_layer(input=conv0,
                              num_filters=num_filters,
                              filter_size=3,
                              stride=stride,
                              act='relu',
                              name=name + "_branch2b")
        conv2 = conv_bn_layer(input=conv1,
                              num_filters=num_filters * 4,
                              filter_size=1,
                              act=None,
                              name=name + "_branch2c")

        short = shortcut(input, num_filters * 4, stride, name=name + "_branch1")

        return fluid.layers.elementwise_add(x=short, y=conv2, act='relu', name=name + ".add.output.5")

    depth = [3, 4, 6, 3]
    num_filters = [64, 128, 256, 512]

    conv = conv_bn_layer(input=input, num_filters=64, filter_size=7, stride=2, act='relu', name="conv1")
    conv = fluid.layers.pool2d(input=conv, pool_size=3, pool_stride=2, pool_padding=1, pool_type='max')

    for block in range(len(depth)):
        for i in range(depth[block]):
            conv_name = "res" + str(block + 2) + chr(97 + i)
            conv = bottleneck_block(input=conv,
                                    num_filters=num_filters[block],
                                    stride=2 if i == 0 and block != 0 else 1,
                                    name=conv_name)

    pool = fluid.layers.pool2d(input=conv, pool_size=7, pool_type='avg', global_pooling=True)
    return pool

Define the input layer for image data and label data. The image dataset used in this chapter is flowers. The images of the flowers dataset obtained through PaddlePaddle’s interface are 3-channel RGB images with a width and height of 224, and there are a total of 102 categories.

# Define input layers
image = fluid.layers.data(name='image', shape=[3, 224, 224], dtype='float32')
label = fluid.layers.data(name='label', shape=[1], dtype='int64')

Get the upper layer of the classifier, and clone a basic program from the main program for subsequent parameter loading.

# Get the upper layer of the classifier
pool = resnet50(image)
# Stop gradient descent
pool.stop_gradient = True
# Create a basic main program here
base_model_program = fluid.default_main_program().clone()

Add the network classifier here. Since the pre-trained model has 1000 categories, the classifier needs to be modified to match the number of categories in the new dataset. This is also the key difference in training the new model: by separating the classifier to solve the problem of different categories between the two datasets.

# Reload the network classifier, with size corresponding to the number of categories in this project
model = fluid.layers.fc(input=pool, size=102, act='softmax')

Then obtain the loss function, accuracy function, and optimization method.

# Get loss function and accuracy function
cost = fluid.layers.cross_entropy(input=model, label=label)
avg_cost = fluid.layers.mean(cost)
acc = fluid.layers.accuracy(input=model, label=label)

# Define optimization method
optimizer = fluid.optimizer.AdamOptimizer(learning_rate=1e-3)
opts = optimizer.minimize(avg_cost)

Obtain the flowers dataset. Since testing is not required here, the test dataset does not need to be read.

# Get flowers data
train_reader = paddle.batch(flowers.train(), batch_size=16)

Create an executor. It is best to use GPU for training because the dataset and network are relatively large.

# Define an executor using CPU
place = fluid.CUDAPlace(0)
# place = fluid.CPUPlace()
exe = fluid.Executor(place)
# Initialize parameters
exe.run(fluid.default_startup_program())

Here is the key part of loading the pre-trained model. Use the if_exist function to determine whether the model file required by the network exists, and then load the existing model file by calling fluid.io.load_vars. Note that the previously cloned basic program is used here.

# Official pre-trained model
src_pretrain_model_path = 'models/ResNet50_pretrained/'


# Function to check if the model file exists
def if_exist(var):
    path = os.path.join(src_pretrain_model_path, var.name)
    exist = os.path.exists(path)
    if exist:
        print('Load model: %s' % path)
    return exist


# Load model files, only load existing model files
fluid.io.load_vars(executor=exe, dirname=src_pretrain_model_path, predicate=if_exist, main_program=base_model_program)

Then train for 10 passes using this pre-trained model.

# Optimize memory
optimized = fluid.transpiler.memory_optimize(input_program=fluid.default_main_program(), print_log=False)

# Define input data dimensions
feeder = fluid.DataFeeder(place=place, feed_list=[image, label])

# Train for 10 passes
for pass_id in range(10):
    # Training process
    for batch_id, data in enumerate(train_reader()):
        train_cost, train_acc = exe.run(program=fluid.default_main_program(),
                                        feed=feeder.feed(data),
                                        fetch_list=[avg_cost, acc])
        # Print information every 100 batches
        if batch_id % 100 == 0:
            print('Pass:%d, Batch:%d, Cost:%0.5f, Accuracy:%0.5f' %
                  (pass_id, batch_id, train_cost[0], train_acc[0]))

Training output information:

Load model: models/ResNet50_pretrained/res5a_branch2a_weights
Load model: models/ResNet50_pretrained/res4c_branch2a_weights
Load model: models/ResNet50_pretrained/res4f_branch2b_weights
Load model: models/ResNet50_pretrained/bn2a_branch2b_variance
Load model: models/ResNet50_pretrained/bn4d_branch2b_variance
Load model: models/ResNet50_pretrained/bn4f_branch2b_variance
Load model: models/ResNet50_pretrained/bn4e_branch2a_offset
Load model: models/ResNet50_pretrained/res4f_branch2c_weights
Load model: models/ResNet50_pretrained/res5c_branch2b_weights
......
Pass:0, Batch:0, Cost:6.92118, Accuracy:0.00000
Pass:0, Batch:100, Cost:3.31085, Accuracy:0.31250
Pass:0, Batch:200, Cost:3.32227, Accuracy:0.18750
Pass:0, Batch:300, Cost:3.85708, Accuracy:0.31250
Pass:1, Batch:0, Cost:3.36264, Accuracy:0.25000
......

After training, use the fluid.io.save_params interface to save the parameters. This model now matches the number of categories in the new dataset, so it can be directly used to initialize the model later without re-separating the classifier.

# Save the parameter model
save_pretrain_model_path = 'models/pretrain_model/'
# Delete old model files
shutil.rmtree(save_pretrain_model_path, ignore_errors=True)
# Create directory to save model files
os.makedirs(save_pretrain_model_path)
# Save parameter model
fluid.io.save_params(executor=exe, dirname=save_pretrain_model_path)

At this point, the first step of pre-training (processing the original pre-trained model) is completed. The next step is to use this processed model for formal training.

Formal Training Using the Processed Model

This part uses the processed model for formal training. Create a train.py for formal training. First, import relevant dependency packages.

import os
import shutil
import paddle as paddle
import paddle.dataset.flowers as flowers
import paddle.fluid as fluid
from paddle.fluid.param_attr import ParamAttr

Define the residual neural network, which is basically the same as before, but includes the classifier. This is a complete neural network.

# Define Residual Neural Network (ResNet)
def resnet50(input, class_dim):
    def conv_bn_layer(input, num_filters, filter_size, stride=1, groups=1, act=None, name=None):
        conv = fluid.layers.conv2d(input=input,
                                   num_filters=num_filters,
                                   filter_size=filter_size,
                                   stride=stride,
                                   padding=(filter_size - 1) // 2,
                                   groups=groups,
                                   act=None,
                                   param_attr=ParamAttr(name=name + "_weights"),
                                   bias_attr=False,
                                   name=name + '.conv2d.output.1')
        if name == "conv1":
            bn_name = "bn_" + name
        else:
            bn_name = "bn" + name[3:]
        return fluid.layers.batch_norm(input=conv,
                                       act=act,
                                       name=bn_name + '.output.1',
                                       param_attr=ParamAttr(name=bn_name + '_scale'),
                                       bias_attr=ParamAttr(bn_name + '_offset'),
                                       moving_mean_name=bn_name + '_mean',
                                       moving_variance_name=bn_name + '_variance', )

    def shortcut(input, ch_out, stride, name):
        ch_in = input.shape[1]
        if ch_in != ch_out or stride != 1:
            return conv_bn_layer(input, ch_out, 1, stride, name=name)
        else:
            return input

    def bottleneck_block(input, num_filters, stride, name):
        conv0 = conv_bn_layer(input=input,
                              num_filters=num_filters,
                              filter_size=1,
                              act='relu',
                              name=name + "_branch2a")
        conv1 = conv_bn_layer(input=conv0,
                              num_filters=num_filters,
                              filter_size=3,
                              stride=stride,
                              act='relu',
                              name=name + "_branch2b")
        conv2 = conv_bn_layer(input=conv1,
                              num_filters=num_filters * 4,
                              filter_size=1,
                              act=None,
                              name=name + "_branch2c")

        short = shortcut(input, num_filters * 4, stride, name=name + "_branch1")

        return fluid.layers.elementwise_add(x=short, y=conv2, act='relu', name=name + ".add.output.5")

    depth = [3, 4, 6, 3]
    num_filters = [64, 128, 256, 512]

    conv = conv_bn_layer(input=input, num_filters=64, filter_size=7, stride=2, act='relu', name="conv1")
    conv = fluid.layers.pool2d(input=conv, pool_size=3, pool_stride=2, pool_padding=1, pool_type='max')

    for block in range(len(depth)):
        for i in range(depth[block]):
            conv_name = "res" + str(block + 2) + chr(97 + i)
            conv = bottleneck_block(input=conv,
                                    num_filters=num_filters[block],
                                    stride=2 if i == 0 and block != 0 else 1,
                                    name=conv_name)

    pool = fluid.layers.pool2d(input=conv, pool_size=7, pool_type='avg', global_pooling=True)
    output = fluid.layers.fc(input=pool, size=class_dim, act='softmax')
    return output

Then define the necessary functions: input layer, neural network classifier, loss function, accuracy function, optimization method, obtain training and test data for flowers, and create an executor.

# Define input layers
image = fluid.layers.data(name='image', shape=[3, 224, 224], dtype='float32')
label = fluid.layers.data(name='label', shape=[1], dtype='int64')

# Get classifier
model = resnet50(image, 102)

# Get loss function and accuracy function
cost = fluid.layers.cross_entropy(input=model, label=label)
avg_cost = fluid.layers.mean(cost)
acc = fluid.layers.accuracy(input=model, label=label)

# Get training and test programs
test_program = fluid.default_main_program().clone(for_test=True)

# Define optimization method
optimizer = fluid.optimizer.AdamOptimizer(learning_rate=1e-3)
opts = optimizer.minimize(avg_cost)

# Get flowers data
train_reader = paddle.batch(flowers.train(), batch_size=16)
test_reader = paddle.batch(flowers.test(), batch_size=16)

# Define an executor using GPU
place = fluid.CUDAPlace(0)
# place = fluid.CPUPlace()
exe = fluid.Executor(place)
# Initialize parameters
exe.run(fluid.default_startup_program())

Here, use the fluid.io.load_params interface to load the processed pre-trained model file.
```python

Processed pre-trained model

pretrained_model_path = ‘models/pretrain_model/’

Xiaoye