Introduction

Many developers share pre-trained models when learning deep learning frameworks, which we can use in our own projects. If the same framework is used, direct usage is convenient. However, for different frameworks, model conversion is necessary. Below, we introduce how to convert a Caffe model to a PaddlePaddle Fluid model.

Environment Preparation

  • Install the latest PaddlePaddle online using pip:
pip install paddlepaddle
  • For the latest installation, choose the appropriate version from the following link and install using pip:
http://www.paddlepaddle.org/documentation/docs/zh/0.14.0/new_docs/beginners_guide/install/install_doc.html#id26
  • Clone the PaddlePaddle models source code:
git clone https://github.com/PaddlePaddle/models.git

Model Conversion

  • Navigate to the caffe2fluid directory under the cloned models:
cd models/fluid/image_classification/caffe2fluid/
  • Download the required Python dependency file:
cd proto/ && wget https://raw.githubusercontent.com/ethereon/caffe-tensorflow/master/kaffe/caffe/caffepb.py
  • Rename the downloaded file:
mv caffepb.py caffe_pb2.py
  • Obtain the Caffe model to convert (using the following open-source model as an example):
https://gist.github.com/ksimonyan/211839e770f7b538e2d8

First, download the network configuration file:

cd ../ && wget https://gist.githubusercontent.com/ksimonyan/211839e770f7b538e2d8/raw/ded9363bd93ec0c770134f4e387d8aaaaa2407ce/VGG_ILSVRC_16_layers_deploy.prototxt

Second, download the weight file:

wget http://www.robots.ox.ac.uk/~vgg/software/very_deep/caffe/VGG_ILSVRC_16_layers.caffemodel
  • Convert the Caffe model to Fluid’s network structure and weight files. VGG16.py is the Python file defining the PaddlePaddle network structure, and VGG16.npy is the weight file:
python convert.py VGG_ILSVRC_16_layers_deploy.prototxt \
        --caffemodel VGG_ILSVRC_16_layers.caffemodel \
        --data-output-path VGG16.npy \
        --code-output-path VGG16.py

During execution, the following information will be printed:

register layer[Axpy]
register layer[Flatten]
register layer[ArgMax]
register layer[Reshape]
register layer[ROIPooling]
register layer[PriorBox]
register layer[Permute]
register layer[DetectionOutput]
register layer[Normalize]
register layer[Select]
register layer[Crop]
register layer[Reduction]

------------------------------------------------------------
    WARNING: PyCaffe not found!
    Falling back to a pure protocol buffer implementation.
    * Conversions will be drastically slower.
------------------------------------------------------------

Type                 Name                                          Param               Output
----------------------------------------------------------------------------------------------
Data                 data                                             --    (10, 3, 224, 224)
Convolution          conv1_1                               (64, 3, 3, 3)   (10, 64, 224, 224)
Convolution          conv1_1                                       (64,)   (10, 64, 224, 224)
Convolution          conv1_2                              (64, 64, 3, 3)   (10, 64, 224, 224)
Convolution          conv1_2                                       (64,)   (10, 64, 224, 224)
Pooling              pool1                                            --   (10, 64, 112, 112)
Convolution          conv2_1                             (128, 64, 3, 3)  (10, 128, 112, 112)
Convolution          conv2_1                                      (128,)  (10, 128, 112, 112)
Convolution          conv2_2                            (128, 128, 3, 3)  (10, 128, 112, 112)
Convolution          conv2_2                                      (128,)  (10, 128, 112, 112)
Pooling              pool2                                            --    (10, 128, 56, 56)
Convolution          conv3_1                            (256, 128, 3, 3)    (10, 256, 56, 56)
Convolution          conv3_1                                      (256,)    (10, 256, 56, 56)
Convolution          conv3_2                            (256, 256, 3, 3)    (10, 256, 56, 56)
Convolution          conv3_2                                      (256,)    (10, 256, 56, 56)
Convolution          conv3_3                            (256, 256, 3, 3)    (10, 256, 56, 56)
Convolution          conv3_3                                      (256,)    (10, 256, 56, 56)
Pooling              pool3                                            --    (10, 256, 28, 28)
Convolution          conv4_1                            (512, 256, 3, 3)    (10, 512, 28, 28)
Convolution          conv4_1                                      (512,)    (10, 512, 28, 28)
Convolution          conv4_2                            (512, 512, 3, 3)    (10, 512, 28, 28)
Convolution          conv4_2                                      (512,)    (10, 512, 28, 28)
Convolution          conv4_3                            (512, 512, 3, 3)    (10, 512, 28, 28)
Convolution          conv4_3                                      (512,)    (10, 512, 28, 28)
Pooling              pool4                                            --    (10, 512, 14, 14)
Convolution          conv5_1                            (512, 512, 3, 3)    (10, 512, 14, 14)
Convolution          conv5_1                                      (512,)    (10, 512, 14, 14)
Convolution          conv5_2                            (512, 512, 3, 3)    (10, 512, 14, 14)
Convolution          conv5_2                                      (512,)    (10, 512, 14, 14)
Convolution          conv5_3                            (512, 512, 3, 3)    (10, 512, 14, 14)
Convolution          conv5_3                                      (512,)    (10, 512, 14, 14)
Pooling              pool5                                            --      (10, 512, 7, 7)
InnerProduct         fc6                                   (4096, 25088)           (10, 4096)
InnerProduct         fc6                                         (4096,)           (10, 4096)
Dropout              drop6                                            --           (10, 4096)
InnerProduct         fc7                                    (4096, 4096)           (10, 4096)
InnerProduct         fc7                                         (4096,)           (10, 4096)
Dropout              drop7                                            --           (10, 4096)
InnerProduct         fc8                                    (1000, 4096)           (10, 1000)
InnerProduct         fc8                                         (1000,)           (10, 1000)
Softmax              prob                                             --           (10, 1000)
Converting data...
Saving data...
Saving source...
set env variable before using converted model if used custom_layers:
  • Generate the prediction model file using PaddlePaddle’s network structure and weight files:
python VGG16.py VGG16.npy ./fluid_models
  • After execution, the prediction model will be generated and stored in the fluid_models directory, containing two files: model and params. This is compatible with the paddle.fluid.io.save_inference_model interface (see the documentation). We will use this model for image prediction in the next step.

Testing the Prediction Model

To predict images using the converted model, first write a PaddlePaddle prediction program:

# coding=utf-8
import os
import time
from PIL import Image
import numpy as np
import paddle.v2 as paddle
import paddle.fluid as fluid


def load_image(file):
    im = Image.open(file)
    im = im.resize((224, 224), Image.ANTIALIAS)
    im = np.array(im).astype(np.float32)
    # PIL opens images in H(Height), W(Width), C(Channels) order
    # PaddlePaddle requires CHW order, so we transpose dimensions
    im = im.transpose((2, 0, 1))
    # CIFAR uses BGR order, while PIL opens RGB, so swap channels
    im = im[(2, 1, 0), :, :]  # Convert to BGR
    # Subtract mean values
    mean = [123.68, 116.78, 103.94]
    mean = np.array(mean, dtype=np.float32)
    mean = mean[:, np.newaxis, np.newaxis]
    im -= mean

    return im


def infer_one(image_path, use_cuda, model_path, model_filename, params_filename):
    # Set execution device
    place = fluid.CUDAPlace(0) if use_cuda else fluid.CPUPlace()
    exe = fluid.Executor(place)

    inference_scope = fluid.core.Scope()
    with fluid.scope_guard(inference_scope):
        # Load the inference model
        [inference_program, feed_target_names, fetch_targets] = fluid.io.load_inference_model(
            model_path, exe, model_filename=model_filename, params_filename=params_filename)

        # Prepare input data
        test_datas = [load_image(image_path)]
        test_data = np.array(test_datas)

        # Execute prediction
        results = exe.run(
            inference_program,
            feed={feed_target_names[0]: test_data},
            fetch_list=fetch_targets)

        # Process results (sort in descending order)
        results = np.argsort(-results[0])
        result = results[0][0]

        print("Predicted label: %d" % result)


if __name__ == '__main__':
    image_path = "0b77aba2-9557-11e8-a47a-c8ff285a4317.jpg"
    use_cuda = False
    model_path = "fluid_models/"
    model_filename = "model"
    params_filename = "params"
    infer_one(image_path, use_cuda, model_path, model_filename, params_filename)

This program handles image preprocessing and model inference. Note the consistent image processing requirements with the training phase.

References

  1. https://github.com/PaddlePaddle/models/tree/develop/fluid/image_classification/caffe2fluid
  2. https://gist.github.com/ksimonyan/211839e770f7b538e2d8
Xiaoye