Introduction

MTCNN, Multi-task convolutional neural network, combines face detection with facial landmark detection. It is divided into three network structures: P-Net, R-Net, and O-Net. Proposed by the Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences in 2016, this multi-task neural network model is used for face detection tasks. It mainly employs three cascaded networks with a candidate box plus classifier approach for fast and efficient face detection. These three cascaded networks are: P-Net (quickly generates candidate windows), R-Net (performs high-precision candidate window filtering), and O-Net (generates final bounding boxes and facial landmarks). Similar to many convolutional neural network models for image problems, MTCNN also uses techniques such as image pyramids, bounding box regression, and non-maximum suppression.

Source Code: https://github.com/yeyupiaoling/Pytorch-MTCNN

Environment

  • PyTorch 1.8.1
  • Python 3.7

File Introduction

  • models/Loss.py Loss functions used by MTCNN, including classification loss, bounding box loss, and landmark loss
  • models/PNet.py PNet network structure
  • models/RNet.py RNet network structure
  • models/ONet.py ONet network structure
  • utils/data_format_converter.py Merges multiple images into a single file
  • utils/data.py Training data reader
  • utils/utils.py Various utility functions
  • train_PNet/generate_PNet_data.py Generates training data for PNet
  • train_PNet/train_PNet.py Trains the PNet model
  • train_RNet/generate_RNet_data.py Generates training data for RNet
  • train_RNet/train_RNet.py Trains the RNet model
  • train_ONet/generate_ONet_data.py Generates training data for ONet
  • train_ONet/train_ONet.py Trains the ONet model
  • infer_path.py Predicts images using paths, detects face positions and key points in images, and displays results
  • infer_camera.py Predicts images, detects face positions and key points in real-time

Dataset Download

After extracting the datasets, the dataset directory should contain folders: lfw_5590, net_7876, WIDER_train, and annotation files: testImageList.txt, trainImageList.txt, wider_face_train.txt.

Training Models

Training the model involves three steps: training PNet, training RNet, and training ONet. Each step depends on the previous one.

Step 1: Train PNet Model

PNet (Proposal Network) is a fully convolutional network. It serves as a region proposal network for face detection. After three convolutional layers, the network uses a face classifier to determine if a region is a face and performs bounding box regression.


- cd train_PNet Switch to the train_PNet folder
- python3 generate_PNet_data.py First generate training data for PNet
- python3 train_PNet.py Start training the PNet model

Step 2: Train RNet Model

RNet (Refine Network) is a convolutional neural network with an additional fully connected layer compared to PNet, enabling stricter filtering of input data. After PNet, many prediction windows are generated; these are fed into RNet, which filters out low-quality candidates and optimizes predictions using Bounding-Box Regression and NMS.


- cd train_RNet Switch to the train_RNet folder
- python3 generate_RNet_data.py Generate training data for RNet using the trained PNet model
- python3 train_RNet.py Start training the RNet model

Step 3: Train ONet Model

ONet (Output Network) is a more complex convolutional neural network with an additional convolutional layer compared to RNet. It uses more supervision to identify facial regions and performs regression on facial landmarks, ultimately outputting five facial feature points.


- cd train_ONet Switch to the train_ONet folder
- python3 generate_ONet_data.py Generate training data for ONet using the trained PNet and RNet models
- python3 train_ONet.py Start training the ONet model

Inference

  • python3 infer_path.py Identify face boxes and key points in images using image paths and display results

  • python3 infer_camera.py Capture images from the camera, detect face boxes and key points, and display results in real-time

References

  1. https://github.com/AITTSMD/MTCNN-Tensorflow
  2. https://blog.csdn.net/qq_36782182/article/details/83624357
Xiaoye