Original blog: Doi Tech Team
Link: https://blog.doiduoyi.com/authors/1584446358138
Original intention: Record the excellent learning experiences of the Doi Tech Team

*This article is based on PaddlePaddle 0.11.0 and Python 2.7

Introduction¶

Before reading this article, you should read the previous article Object Detection Implementation Using VOC Dataset, as most of the code and dataset formats used in this article are derived from the previous one. This article introduces how to perform object detection using a custom image dataset.

Dataset Introduction¶

The dataset we use this time is natural scene license plates. Do you remember how license plates were cropped in the article End-to-End Recognition of License Plates? We used OpenCV for multiple image processing steps to achieve license plate localization, but the localization effect was poor. In this article, we attempt to use a neural network for license plate localization.

Download License Plates¶

First, we download license plate data from the internet for training. The core code snippet is as follows:

def start_download(self):
    self.download_sum = 0
    gsm = 80
    str_gsm = str(gsm)
    pn = 0
    if not os.path.exists(self.save_path):
        os.makedirs(self.save_path)
    while self.download_sum < self.download_max:
        str_pn = str(self.download_sum)
        url = 'http://image.baidu.com/search/flip?tn=baiduimage&ie=utf-8&' \
              'word=' + self.key_word + '&pn=' + str_pn + '&gsm=' + str_gsm + '&ct=&ic=0&lm=-1&width=0&height=0'
        print url
        result = requests.get(url)
        self.downloadImages(result.text)
    print 'Download completed'

Rename Images¶

Downloaded images are stored in data/plate_number/images/. Some downloaded data may not be license plates, so we need to delete those. To make the dataset compatible with the VOC format, we rename the images with a six-digit number naming convention:

# coding=utf-8
import os

def rename(images_dir):
    # Get all images
    images = os.listdir(images_dir)
    i = 1
    for image in images:
        src_name = images_dir + image
        # Name with six-digit numbers (VOC dataset format)
        name = '%06d.jpg' % i
        dst_name = images_dir + name
        os.rename(src_name, dst_name)
        i += 1
    print 'Renaming completed'

if __name__ == '__main__':
    # Path to the directory containing images to be renamed
    images_dir = '../data/plate_number/images/'
    rename(images_dir)

Annotate the Dataset¶

We have image data and renamed them, but we still need annotation information. In the VOC dataset, annotations are stored in XML files with the same name as the image (excluding the suffix). We use LabelImg to create these annotation files.

Install LabelImg¶

Installation on Ubuntu 16.04 is straightforward with the following commands:

# Get root privileges
sudo su
# Install dependencies
apt-get install pyqt4-dev-tools
pip install lxml
# Install LabelImg
pip install labelImg
# Exit root privileges
exit
# Run LabelImg
labelImg

Use LabelImg¶

After running the program, the interface appears as follows:

Click Open Dir to select the image directory data/plate_number/images/, and the program will display the images:

Before annotating, set the save directory for annotation files by clicking Change Save Dir and selecting data/plate_number/annotation/. Then click Create RectBox to mark the license plate and label it plate_number. Finally, save the annotation by clicking Save, which will generate an XML file named after the image. Repeat for other images.

The annotation XML file should look like this (compatible with VOC format):

<annotation>
    <folder>images</folder>
    <filename>000001.jpg</filename>
    <path>/home/yeyupiaoling/data/plate_number/images/000001.jpg</path>
    <source>
        <database>Unknown</database>
    </source>
    <size>
        <width>750</width>
        <height>562</height>
        <depth>3</depth>
    </size>
    <segmented>0</segmented>
    <object>
        <name>plate_number</name>
        <pose>Unspecified</pose>
        <truncated>0</truncated>
        <difficult>0</difficult>
        <bndbox>
            <xmin>225</xmin>
            <ymin>298</ymin>
            <xmax>560</xmax>
            <ymax>405</ymax>
        </bndbox>
    </object>
</annotation>

Generate Image Lists¶

We need two image lists: trainval.txt (training) and test.txt (testing). Due to different dataset structures, we need a new program to generate these lists.

First, read all images and their corresponding annotations:

for images in all_images:
    trainval = []
    test = []
    if data_num % 10 == 0:
        # Use every 10th image as test set
        name = images.split('.')[0]
        annotation = os.path.join(annotation_path, name + '.xml')
        if not os.path.exists(annotation):
            continue
        test.append(os.path.join(images_path, images))
        test.append(annotation)
        test_list.append(test)
    else:
        # Use remaining images as training set
        name = images.split('.')[0]
        annotation = os.path.join(annotation_path, name + '.xml')
        if not os.path.exists(annotation):
            continue
        trainval.append(os.path.join(images_path, images))
        trainval.append(annotation)
        trainval_list.append(trainval)
    data_num += 1

Shuffle the training data and save to files:

# Shuffle training data
random.shuffle(trainval_list)
# Save training list
with open(os.path.join(output_dir, 'trainval.txt'), 'w') as ftrainval:
    for item in trainval_list:
        ftrainval.write(item[0] + ' ' + item[1] + '\n')
# Save test list
with open(os.path.join(output_dir, 'test.txt'), 'w') as ftest:
    for item in test_list:
        ftest.write(item[0] + ' ' + item[1] + '\n')

Train the Model¶

With image data, annotations, and image lists, we can start training. Before training, modify the configuration file pascal_voc_conf.py to set the number of classes to 2 (license plate + background):

# Number of image classes
__C.CLASS_NUM = 2

Pre-trained Model Handling¶

Direct training may cause floating-point errors. We use a pre-trained model (download from official pre-trained model). Remove files containing “mbox” to adapt to our class count:

Start Training¶

Training uses 2 GPUs (requires CUDA environment). train_file_list is the training list, dev_file_list is the test list, and init_model_path is the pre-trained model:

if __name__ == "__main__":
    # Initialize PaddlePaddle
    paddle.init(use_gpu=True, trainer_count=2)
    # Set data parameters
    data_args = data_provider.Settings(
        data_dir='../data',
        label_file='../data/label_list',
        resize_h=cfg.IMG_HEIGHT,
        resize_w=cfg.IMG_WIDTH,
        mean_value=[104, 117, 124])
    # Start training
    train(
        train_file_list='../data/trainval.txt',
        dev_file_list='../data/test.txt',
        data_args=data_args,
        init_model_path='../models/vgg_model.tar.gz')

Sample training logs:

Pass 0, Batch 0, TrainCost 16.567970, Detection mAP=0.014627
......
Test with Pass 0, TestCost: 8.723172, Detection mAP=0.00609719

Pass 1, Batch 0, TrainCost 7.185760, Detection mAP=0.239866
......
Test with Pass 1, TestCost: 6.301503, Detection mAP=60.357

Pass 2, Batch 0, TrainCost 6.052617, Detection mAP=32.094097
......
Test with Pass 2, TestCost: 5.375503, Detection mAP=48.9882

Evaluate the Model¶

To evaluate the trained model, use the test dataset:

if __name__ == "__main__":
    paddle.init(use_gpu=True, trainer_count=2)
    # Set data parameters
    data_args = data_provider.Settings(
        data_dir='../data',
        label_file='../data/label_list',
        resize_h=cfg.IMG_HEIGHT,
        resize_w=cfg.IMG_WIDTH,
        mean_value=[104, 117, 124])
    # Start evaluation
    eval(eval_file_list='../data/test.txt',
         batch_size=4,
         data_args=data_args,
         model_path='../models/params_pass.tar.gz')

Sample evaluation output:

TestCost: 1.813083, Detection mAP=90.5595

Predict Data¶

Get Prediction Data¶

Download test images and place them in images/infer/, with paths listed in images/infer.txt:

infer/000001.jpg
infer/000002.jpg
infer/000003.jpg
infer/000004.jpg
infer/000005.jpg
infer/000006.jpg

Get Prediction Results¶

Use the prediction function to generate results saved in images/infer.res:

if __name__ == "__main__":
    paddle.init(use_gpu=True, trainer_count=2)
    # Set data parameters
    data_args = data_provider.Settings(
        data_dir='../images',
        label_file='../data/label_list',
        resize_h=cfg.IMG_HEIGHT,
        resize_w=cfg.IMG_WIDTH,
        mean_value=[104, 117, 124])
    # Start prediction (batch_size=1 to avoid data loss)
    infer(
        eval_file_list='../images/infer.txt',
        save_path='../images/infer.res',
        data_args=data_args,
        batch_size=1,
        model_path='../models/params_pass.tar.gz',
        threshold=0.3)

Prediction results format: Image Path Label Score xmin ymin xmax ymax

infer/000001.jpg        0       0.9999114       357.44736313819885 521.2164137363434 750.5996704101562 648.5584638118744
infer/000002.jpg        0       0.9970805       102.86840772628784 94.18213963508606 291.60091638565063 155.58562874794006
...

Display Prediction Results¶

Use OpenCV to draw bounding boxes on images:

# Read images
for img_path in all_img_paht:
    im = cv2.imread('../images/' + img_path)
    # Draw boxes for each prediction
    for label_1 in all_labels:
        label_img_path = label_1[0]
        if img_path == label_img_path:
            xmin, ymin, xmax, ymax = label_1[3].split(' ')
            xmin = float(xmin)
            ymin = float(ymin)
            xmax = float(xmax)
            ymax = float(ymax)
            cv2.rectangle(im, (int(xmin), int(ymin)), (int(xmax), int(ymax)), (0, 255, 0), 3)
    # Save annotated images
    names = img_path.strip().split('/')
    name = names[len(names)-1]
    cv2.imwrite('../images/result/%s' % name, im)

Run the main function to save annotated images to images/result/:

if __name__ == '__main__':
    img_path_list = '../images/infer.txt'
    result_data_path = '../images/infer.res'
    save_path = '../images/result'
    show(img_path_list, result_data_path, save_path)

Before prediction:

After prediction:

Previous chapter: Notes on My PaddlePaddle Learning - Part 9: Object Detection Using VOC Dataset ¶

Next chapter: Notes on My PaddlePaddle Learning - Part 11: Using the New Fluid Version ¶

Project Code¶

GitHub: https://github.com/yeyupiaoling/LearnPaddle

References¶

http://paddlepaddle.org/
https://github.com/tzutalin/labelImg