Foreword¶

In this series of tutorials, none of the previous sections saved the model; the training process concluded after training. This chapter will introduce how to save the model during training for subsequent prediction, resuming training, or using pre-trained models from other datasets. We will cover three methods of saving and using models.

Training the Model¶

During model training, you can save the model at any time, and you can also load a previously trained model before starting training. To demonstrate the three saving methods, we have prepared three Python programs: save_infer_model.py, save_use_params_model.py, and save_use_persistables_model.py.

Import required libraries

import os
import shutil
import paddle as paddle
import paddle.dataset.cifar as cifar
import paddle.fluid as fluid

Define a Residual Neural Network (ResNet)
This is a commonly used network structure that improves accuracy by increasing depth without the precision loss that occurs in older networks when deepened.

# Define Residual Neural Network (ResNet) for CIFAR10
def resnet_cifar10(ipt, class_dim):
    def conv_bn_layer(input,
                      ch_out,
                      filter_size,
                      stride,
                      padding,
                      act='relu',
                      bias_attr=False):
        tmp = fluid.layers.conv2d(
            input=input,
            filter_size=filter_size,
            num_filters=ch_out,
            stride=stride,
            padding=padding,
            bias_attr=bias_attr)
        return fluid.layers.batch_norm(input=tmp, act=act)

    def shortcut(input, ch_in, ch_out, stride):
        if ch_in != ch_out:
            return conv_bn_layer(input, ch_out, 1, stride, 0, None)
        else:
            return input

    def basicblock(input, ch_in, ch_out, stride):
        tmp = conv_bn_layer(input, ch_out, 3, stride, 1)
        tmp = conv_bn_layer(tmp, ch_out, 3, 1, 1, act=None, bias_attr=True)
        short = shortcut(input, ch_in, ch_out, stride)
        return fluid.layers.elementwise_add(x=tmp, y=short, act='relu')

    def layer_warp(block_func, input, ch_in, ch_out, count, stride):
        tmp = block_func(input, ch_in, ch_out, stride)
        for i in range(1, count):
            tmp = block_func(tmp, ch_out, ch_out, 1)
        return tmp

    conv1 = conv_bn_layer(ipt, ch_out=16, filter_size=3, stride=1, padding=1)
    res1 = layer_warp(basicblock, conv1, 16, 16, 5, 1)
    res2 = layer_warp(basicblock, res1, 16, 32, 5, 2)
    res3 = layer_warp(basicblock, res2, 32, 64, 5, 2)
    pool = fluid.layers.pool2d(input=res3, pool_size=8, pool_type='avg', pool_stride=1)
    predict = fluid.layers.fc(input=pool, size=class_dim, act='softmax')
    return predict

Define Input Layers
The CIFAR dataset uses 3-channel color images with dimensions 32x32. The input layer shape is specified as [3, 32, 32].

# Define input layers
image = fluid.layers.data(name='image', shape=[3, 32, 32], dtype='float32')
label = fluid.layers.data(name='label', shape=[1], dtype='int64')

Get the ResNet Classifier
Specify 10 categories since the CIFAR10 dataset has 10 classes.

# Get the classifier
model = resnet_cifar10(image, 10)

Define Loss and Accuracy Metrics
Cross-entropy loss and Top-1 accuracy are used here.

# Define loss function and accuracy metric
cost = fluid.layers.cross_entropy(input=model, label=label)
avg_cost = fluid.layers.mean(cost)
acc = fluid.layers.accuracy(input=model, label=label)

Get Test Program
Clone the main program for evaluation during testing.

# Create test program for evaluation
test_program = fluid.default_main_program().clone(for_test=True)

Define Optimization Method
Adam optimizer with a learning rate of 1e-3 is used.

# Define optimization method
optimizer = fluid.optimizer.AdamOptimizer(learning_rate=1e-3)
opts = optimizer.minimize(avg_cost)

Load CIFAR Dataset
Use the 10-class version of CIFAR10.

# Load CIFAR10 dataset
train_reader = paddle.batch(cifar.train10(), batch_size=32)
test_reader = paddle.batch(cifar.test10(), batch_size=32)

Create Executor
For large networks like ResNet, GPU is strongly recommended for faster training (CPU will be very slow).

# Create executor (GPU recommended for speed)
# place = fluid.CPUPlace()
place = fluid.CUDAPlace(0)
exe = fluid.Executor(place)
exe.run(fluid.default_startup_program())  # Initialize parameters

Loading the Model¶

After creating the executor, you can load previously trained models in two ways, corresponding to the two saving methods:

save_use_params_model.py loads the parameter model saved via fluid.io.save_params. Use these parameters to initialize the network for continued training.

# Load pre-trained parameter model
save_path = 'models/params_model/'
if os.path.exists(save_path):
    print('Using parameter model as pre-trained weights')
    fluid.io.load_params(executor=exe, dirname=save_path)

save_use_persistables_model.py loads the persistable variables model saved via fluid.io.save_persistables. Use these parameters to initialize the network for continued training.

# Load pre-trained persistable variables model
save_path = 'models/persistables_model/'
if os.path.exists(save_path):
    print('Using persistable variables model as pre-trained weights')
    fluid.io.load_persistables(executor=exe, dirname=save_path)

Start Training
After loading the model (or not), train the network:

# Define data feeder
feeder = fluid.DataFeeder(place=place, feed_list=[image, label])

for pass_id in range(10):
    # Training
    for batch_id, data in enumerate(train_reader()):
        train_cost, train_acc = exe.run(
            program=fluid.default_main_program(),
            feed=feeder.feed(data),
            fetch_list=[avg_cost, acc]
        )
        if batch_id % 100 == 0:
            print(f'Pass:{pass_id}, Batch:{batch_id}, Cost:{train_cost[0]:.5f}, Accuracy:{train_acc[0]:.5f}')

    # Testing
    test_accs, test_costs = [], []
    for batch_id, data in enumerate(test_reader()):
        test_cost, test_acc = exe.run(
            program=test_program,
            feed=feeder.feed(data),
            fetch_list=[avg_cost, acc]
        )
        test_accs.append(test_acc[0])
        test_costs.append(test_cost[0])
    test_cost = sum(test_costs)/len(test_costs)
    test_acc = sum(test_accs)/len(test_accs)
    print(f'Test: Pass:{pass_id}, Cost:{test_cost:.5f}, Accuracy:{test_acc:.5f}')

Training Outputs¶

Without loading a pre-trained model:

Pass:0, Batch:0, Cost:2.73460, Accuracy:0.03125
Pass:0, Batch:100, Cost:1.93663, Accuracy:0.25000
...

With parameter model:

Using parameter model as pre-trained weights
Pass:0, Batch:0, Cost:0.27627, Accuracy:0.90625
Pass:0, Batch:100, Cost:0.40026, Accuracy:0.87500
...

With persistable variables model:

Using persistable variables model as pre-trained weights
Pass:0, Batch:0, Cost:0.51357, Accuracy:0.81250
Pass:0, Batch:100, Cost:0.64380, Accuracy:0.78125
...

Saving the Model¶

After training, save the model (can also save after each epoch). Three saving methods correspond to the three programs:

save_infer_model.py saves the inference model for prediction:

# Save inference model for prediction
save_path = 'models/infer_model/'
shutil.rmtree(save_path, ignore_errors=True)  # Remove old files
os.makedirs(save_path)
fluid.io.save_inference_model(save_path, feeded_var_names=[image.name], target_vars=[model], executor=exe)

save_use_params_model.py saves parameter model for initialization during training:

# Save parameter model for future training
save_path = 'models/params_model/'
shutil.rmtree(save_path, ignore_errors=True)
os.makedirs(save_path)
fluid.io.save_params(executor=exe, dirname=save_path)

save_use_persistables_model.py saves persistable variables model for initialization:

# Save persistable variables model
save_path = 'models/persistables_model/'
shutil.rmtree(save_path, ignore_errors=True)
os.makedirs(save_path)
fluid.io.save_persistables(executor=exe, dirname=save_path)

Prediction¶

To predict using the model saved via fluid.io.save_inference_model, use the following use_infer_model.py:

Import required libraries

import paddle.fluid as fluid
from PIL import Image
import numpy as np

Create Executor
CPU is sufficient for prediction (faster than training).

# Create executor (CPU recommended for prediction)
place = fluid.CPUPlace()
exe = fluid.Executor(place)

Load Inference Model
Load the saved model to get the prediction program and output layers.

# Load inference model
save_path = 'models/infer_model/'
[infer_program, feeded_var_names, target_var] = fluid.io.load_inference_model(dirname=save_path, executor=exe)

Image Preprocessing
Convert the input image to the format required by PaddlePaddle.

# Preprocess image to CIFAR10 format
def load_image(file):
    im = Image.open(file)
    im = im.resize((32, 32), Image.ANTIALIAS)
    im = np.array(im).astype(np.float32)
    # Convert HWC (PIL) to CHW (PaddlePaddle)
    im = im.transpose((2, 0, 1))
    # Original CIFAR10 uses BGR, so swap RGB to BGR
    im = im[(2, 1, 0), :, :]  # BGR
    im = im / 255.0
    im = np.expand_dims(im, axis=0)
    return im

Predict with the Model
No need for dummy labels (removed during inference model saving).

# Load image and predict
img = load_image('image/cat.png')
result = exe.run(
    program=infer_program,
    feed={feeded_var_names[0]: img},
    fetch_list=target_var
)

# Output the result
lab = np.argsort(result)[0][0][-1]
names = ['airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']
print(f'Prediction: Label {lab}, Name {names[lab]}, Probability {result[0][0][lab]:.6f}')

Prediction Output:

Prediction: Label 3, Name cat, Probability 0.864919

Conclusion¶

This covers model saving and usage in PaddlePaddle. For subsequent chapters, we will introduce transfer learning using pre-trained models.

GitHub Code: https://github.com/yeyupiaoling/LearnPaddle2/tree/master/note8

Previous Chapter: Reinforcement Learning
Next Chapter: Transfer Learning