Introduction¶

MTCNN, Multi-task convolutional neural network, combines face detection with facial landmark detection. It is divided into three network structures: P-Net, R-Net, and O-Net. Proposed by the Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences in 2016, this multi-task neural network model is used for face detection tasks. It mainly employs three cascaded networks with a candidate box plus classifier approach for fast and efficient face detection. These three cascaded networks are: P-Net (quickly generates candidate windows), R-Net (performs high-precision candidate window filtering), and O-Net (generates final bounding boxes and facial landmarks). Similar to many convolutional neural network models for image problems, MTCNN also uses techniques such as image pyramids, bounding box regression, and non-maximum suppression.

Source Code: https://github.com/yeyupiaoling/Pytorch-MTCNN

Environment¶

PyTorch 1.8.1
Python 3.7

File Introduction¶

models/Loss.py Loss functions used by MTCNN, including classification loss, bounding box loss, and landmark loss
models/PNet.py PNet network structure
models/RNet.py RNet network structure
models/ONet.py ONet network structure
utils/data_format_converter.py Merges multiple images into a single file
utils/data.py Training data reader
utils/utils.py Various utility functions
train_PNet/generate_PNet_data.py Generates training data for PNet
train_PNet/train_PNet.py Trains the PNet model
train_RNet/generate_RNet_data.py Generates training data for RNet
train_RNet/train_RNet.py Trains the RNet model
train_ONet/generate_ONet_data.py Generates training data for ONet
train_ONet/train_ONet.py Trains the ONet model
infer_path.py Predicts images using paths, detects face positions and key points in images, and displays results
infer_camera.py Predicts images, detects face positions and key points in real-time

Dataset Download¶

WIDER Face Download the Training Images, extract the WIDER_train folder and place it under dataset. Also download Face annotations, extract it, and place the wider_face_train_bbx_gt.txt file in the dataset directory.
Deep Convolutional Network Cascade for Facial Point Detection Download the Training set, extract it, and place the lfw_5590 and net_7876 folders under dataset.

After extracting the datasets, the dataset directory should contain folders: lfw_5590, net_7876, WIDER_train, and annotation files: testImageList.txt, trainImageList.txt, wider_face_train.txt.

Training Models¶

Training the model involves three steps: training PNet, training RNet, and training ONet. Each step depends on the previous one.

Step 1: Train PNet Model¶

PNet (Proposal Network) is a fully convolutional network. It serves as a region proposal network for face detection. After three convolutional layers, the network uses a face classifier to determine if a region is a face and performs bounding box regression.

- cd train_PNet Switch to the train_PNet folder
- python3 generate_PNet_data.py First generate training data for PNet
- python3 train_PNet.py Start training the PNet model

Step 2: Train RNet Model¶

RNet (Refine Network) is a convolutional neural network with an additional fully connected layer compared to PNet, enabling stricter filtering of input data. After PNet, many prediction windows are generated; these are fed into RNet, which filters out low-quality candidates and optimizes predictions using Bounding-Box Regression and NMS.

- cd train_RNet Switch to the train_RNet folder
- python3 generate_RNet_data.py Generate training data for RNet using the trained PNet model
- python3 train_RNet.py Start training the RNet model

Step 3: Train ONet Model¶

ONet (Output Network) is a more complex convolutional neural network with an additional convolutional layer compared to RNet. It uses more supervision to identify facial regions and performs regression on facial landmarks, ultimately outputting five facial feature points.

- cd train_ONet Switch to the train_ONet folder
- python3 generate_ONet_data.py Generate training data for ONet using the trained PNet and RNet models
- python3 train_ONet.py Start training the ONet model

Inference¶

python3 infer_path.py Identify face boxes and key points in images using image paths and display results
python3 infer_camera.py Capture images from the camera, detect face boxes and key points, and display results in real-time

References¶

https://github.com/AITTSMD/MTCNN-Tensorflow
https://blog.csdn.net/qq_36782182/article/details/83624357