Introduction¶

This tutorial introduces how to implement face recognition using TensorFlow with MTCNN and MobileFaceNet, and does not cover model training. For training MTCNN and MobileFaceNet, please refer to the following tutorials: MTCNN-Tensorflow and MobileFaceNet_TF. Both models are lightweight, so they have good prediction speed even in CPU environments. As an advocate for lightweight models, I prioritize speed over accuracy when choosing, especially since my main focus is on deploying deep learning models on mobile and embedded devices. The following sections will demonstrate three face recognition implementations: using local images, camera capture, and HTTP service.

Tutorial Source Code: https://github.com/yeyupiaoling/Tensorflow-FaceRecognition

Local Image Face Recognition¶

Local image face recognition reads images from the local path for face registration or recognition. The corresponding code is in path_infer.py. First, load the two face recognition models: MTCNN for face detection and key point detection, and MobileFaceNet for face recognition. Loading these models is encapsulated in a utility function for convenience.

The add_faces() function reads manually added images from the temp path into the face database. Specifically, if you have 100 images named by person names, you cannot directly add them to the face database because the database stores images processed by MTCNN. For large-scale additions, temporarily store images in the temp folder and let the program automatically add them. Finally, read images from the face database, use MobileFaceNet to predict feature values for each face, and store these features in a list for subsequent face comparison.

# Load MTCNN face detection model
mtcnn_detector = load_mtcnn()
# Load MobileFaceNet face recognition model
face_sess, inputs_placeholder, embeddings = load_mobilefacenet()
# Add faces to database
add_faces(mtcnn_detector)
# Load registered faces from database
faces_db = load_faces(face_sess, inputs_placeholder, embeddings)

Face registration reads images via paths, uses MTCNN to detect faces, aligns faces using key points, crops and scales to 112×112, and stores the image with the registered name in the face database.

def face_register(img_path, name):
    image = cv2.imdecode(np.fromfile(img_path, dtype=np.uint8), 1)
    faces, landmarks = mtcnn_detector.detect(image)
    if faces.shape[0] is not 0:
        faces_sum = 0
        bbox = []
        points = []
        for i, face in enumerate(faces):
            if round(faces[i, 4], 6) > 0.95:
                bbox = faces[i, 0:4]
                points = landmarks[i, :].reshape((5, 2))
                faces_sum += 1
        if faces_sum == 1:
            nimg = face_preprocess.preprocess(image, bbox, points, image_size='112,112')
            cv2.imencode('.png', nimg)[1].tofile('face_db/%s.png' % name)
            print("Registration successful!")
        else:
            print('Registration failed: the image should contain exactly one face')
    else:
        print('Registration failed: the image should contain exactly one face')

Face recognition reads images via paths, detects and aligns faces using MTCNN, then uses MobileFaceNet to predict features. Compare these features with those in the face database to find the highest similarity exceeding the threshold. Finally, draw bounding boxes and labels on the image.

def face_recognition(img_path):
    image = cv2.imdecode(np.fromfile(img_path, dtype=np.uint8), 1)
    faces, landmarks = mtcnn_detector.detect(image)
    if faces.shape[0] is not 0:
        faces_sum = 0
        for i, face in enumerate(faces):
            if round(faces[i, 4], 6) > 0.95:
                faces_sum += 1
        if faces_sum > 0:
            # Face information storage
            info_location = np.zeros(faces_sum)
            info_name = []
            probs = []
            # Extract faces from image
            input_images = np.zeros((faces.shape[0], 112, 112, 3))
            for i, face in enumerate(faces):
                if round(faces[i, 4], 6) > 0.95:
                    bbox = faces[i, 0:4]
                    points = landmarks[i, :].reshape((5, 2))
                    nimg = face_preprocess.preprocess(image, bbox, points, image_size='112,112')
                    nimg = nimg - 127.5
                    nimg = nimg * 0.0078125
                    input_images[i, :] = nimg

            # Perform face recognition
            feed_dict = {inputs_placeholder: input_images}
            emb_arrays = face_sess.run(embeddings, feed_dict=feed_dict)
            emb_arrays = sklearn.preprocessing.normalize(emb_arrays)
            for i, embedding in enumerate(emb_arrays):
                embedding = embedding.flatten()
                temp_dict = {}
                # Compare with existing faces in database
                for com_face in faces_db:
                    ret, sim = feature_compare(embedding, com_face["feature"], 0.70)
                    temp_dict[com_face["name"]] = sim
                # Sort by similarity descending
                sorted_dict = sorted(temp_dict.items(), key=lambda d: d[1], reverse=True)
                if sorted_dict[0][1] > VERIFICATION_THRESHOLD:
                    name = sorted_dict[0][0]
                    probs.append(sorted_dict[0][1])
                    info_name.append(name)
                else:
                    probs.append(sorted_dict[0][1])
                    info_name.append("unknown")

            # Draw results on image
            for k in range(faces_sum):
                x1, y1, x2, y2 = faces[k][0], faces[k][1], faces[k][2], faces[k][3]
                x1 = max(int(x1), 0)
                y1 = max(int(y1), 0)
                x2 = min(int(x2), image.shape[1])
                y2 = min(int(y2), image.shape[0])
                prob = '%.2f' % probs[k]
                label = "{}, {}".format(info_name[k], prob)
                cv2img = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
                pilimg = Image.fromarray(cv2img)
                draw = ImageDraw.Draw(pilimg)
                font = ImageFont.truetype('font/simfang.ttf', 18, encoding="utf-8")
                draw.text((x1, y1 - 18), label, (255, 0, 0), font=font)
                image = cv2.cvtColor(np.array(pilimg), cv2.COLOR_RGB2BGR)
                cv2.rectangle(image, (x1, y1), (x2, y2), (255, 0, 0), 2)
    cv2.imshow('image', image)
    cv2.waitKey(0)
    cv2.destroyAllWindows()

The main function for interaction:

if __name__ == '__main__':
    i = int(input("Select function: 1 for register, 2 for recognition: "))
    image_path = input("Enter image path: ")
    if i == 1:
        user_name = input("Enter registration name: ")
        face_register(image_path, user_name)
    elif i == 2:
        face_recognition(image_path)
    else:
        print("Invalid function selection")

Sample log output:

loaded face: 张伟.png
loaded face: 迪丽热巴.png
Select function: 1 for register, 2 for recognition: 1
Enter image path: test.png
Enter registration name: Yeyupiaoling
Registration successful!

Camera-Based Face Recognition¶

The camera_infer.py implements camera-based face recognition. Similar to the local image method, load MTCNN and MobileFaceNet models first. Then add faces from the temp folder to the database and load existing faces.

Camera-based face recognition uses real-time camera images. For registration, press ‘y’ to capture a face, which is then processed and stored in the database. For recognition, the camera continuously detects faces and displays results in real-time.

# Load MTCNN and MobileFaceNet models
mtcnn_detector = load_mtcnn()
face_sess, inputs_placeholder, embeddings = load_mobilefacenet()
add_faces(mtcnn_detector)
faces_db = load_faces(face_sess, inputs_placeholder, embeddings)

Camera face registration function:

def face_register():
    print("Press 'y' to capture a face")
    cap = cv2.VideoCapture(0)
    while True:
        ret, frame = cap.read()
        if ret:
            cv2.imshow('camera', frame)
            if cv2.waitKey(1) & 0xFF == ord('y'):
                faces, landmarks = mtcnn_detector.detect(frame)
                if faces.shape[0] is not 0:
                    faces_sum = 0
                    bbox = []
                    points = []
                    for i, face in enumerate(faces):
                        if round(faces[i, 4], 6) > 0.95:
                            bbox = faces[i, 0:4]
                            points = landmarks[i, :].reshape((5, 2))
                            faces_sum += 1
                    if faces_sum == 1:
                        nimg = face_preprocess.preprocess(frame, bbox, points, image_size='112,112')
                        user_name = input("Enter registration name: ")
                        cv2.imencode('.png', nimg)[1].tofile('face_db/%s.png' % user_name)
                        print("Registration successful!")
                    else:
                        print('Registration failed: image should contain exactly one face')
                else:
                    print('Registration failed: image should contain exactly one face')
                break
    cap.release()
    cv2.destroyAllWindows()

Camera face recognition function:

def face_recognition():
    cap = cv2.VideoCapture(0)
    while True:
        ret, frame = cap.read()
        if ret:
            faces, landmarks = mtcnn_detector.detect(frame)
            if faces.shape[0] is not 0:
                faces_sum = 0
                for i, face in enumerate(faces):
                    if round(faces[i, 4], 6) > 0.95:
                        faces_sum += 1
                if faces_sum > 0:
                    # Extract and preprocess faces
                    input_images = np.zeros((faces.shape[0], 112, 112, 3))
                    for i, face in enumerate(faces):
                        if round(faces[i, 4], 6) > 0.95:
                            bbox = faces[i, 0:4]
                            points = landmarks[i, :].reshape((5, 2))
                            nimg = face_preprocess.preprocess(frame, bbox, points, image_size='112,112')
                            nimg = nimg - 127.5
                            nimg = nimg * 0.0078125
                            input_images[i, :] = nimg

                    # Perform recognition
                    feed_dict = {inputs_placeholder: input_images}
                    emb_arrays = face_sess.run(embeddings, feed_dict=feed_dict)
                    emb_arrays = sklearn.preprocessing.normalize(emb_arrays)
                    info_name = []
                    probs = []
                    for i, embedding in enumerate(emb_arrays):
                        embedding = embedding.flatten()
                        temp_dict = {}
                        for com_face in faces_db:
                            ret, sim = feature_compare(embedding, com_face["feature"], 0.70)
                            temp_dict[com_face["name"]] = sim
                        sorted_dict = sorted(temp_dict.items(), key=lambda d: d[1], reverse=True)
                        if sorted_dict[0][1] > VERIFICATION_THRESHOLD:
                            info_name.append(sorted_dict[0][0])
                            probs.append(sorted_dict[0][1])
                        else:
                            info_name.append("unknown")
                            probs.append(sorted_dict[0][1])

                    # Draw results on frame
                    for k in range(faces_sum):
                        x1, y1, x2, y2 = faces[k][0], faces[k][1], faces[k][2], faces[k][3]
                        x1 = max(int(x1), 0)
                        y1 = max(int(y1), 0)
                        x2 = min(int(x2), frame.shape[1])
                        y2 = min(int(y2), frame.shape[0])
                        prob = '%.2f' % probs[k]
                        label = "{}, {}".format(info_name[k], prob)
                        cv2img = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
                        pilimg = Image.fromarray(cv2img)
                        draw = ImageDraw.Draw(pilimg)
                        font = ImageFont.truetype('font/simfang.ttf', 18, encoding="utf-8")
                        draw.text((x1, y1 - 18), label, (255, 0, 0), font=font)
                        frame = cv2.cvtColor(np.array(pilimg), cv2.COLOR_RGB2BGR)
                        cv2.rectangle(frame, (x1, y1), (x2, y2), (255, 0, 0), 2)
        cv2.imshow('camera', frame)
        if cv2.waitKey(1) & 0xFF == ord('q'):
            break
    cap.release()
    cv2.destroyAllWindows()

Main interaction:

if __name__ == '__main__':
    i = int(input("Select function: 1 for register, 2 for recognition: "))
    if i == 1:
        face_register()
    elif i == 2:
        face_recognition()
    else:
        print("Invalid function selection")

Sample log output:

loaded face: 张伟.png
loaded face: 迪丽热巴.png
Select function: 1 for register, 2 for recognition: 1
Press 'y' to capture a face
Registration successful!

HTTP Service-Based Face Recognition¶

The server_main.py uses Flask to provide web API services. It supports cross-origin requests and runs on localhost for browser compatibility. Load MTCNN and MobileFaceNet models, and the face database.

API endpoints:
- /register: Register a face by uploading an image and name.
- /recognition: Recognize a face from an uploaded image.

app = Flask(__name__)
CORS(app)  # Enable cross-origin requests

# Load models and database
mtcnn_detector = load_mtcnn()
face_sess, inputs_placeholder, embeddings = load_mobilefacenet()
faces_db = load_faces(face_sess, inputs_placeholder, embeddings)

Registration endpoint:

@app.route("/register", methods=['POST'])
def register():
    global faces_db
    upload_file = request.files['image']
    user_name = request.values.get("name")
    if upload_file:
        try:
            image = cv2.imdecode(np.frombuffer(upload_file.read(), np.uint8), cv2.IMREAD_UNCHANGED)
            faces, landmarks = mtcnn_detector.detect(image)
            if faces.shape[0] is not 0:
                faces_sum = 0
                for i, face in enumerate(faces):
                    if round(faces[i, 4], 6) > 0.95:
                        faces_sum += 1
                if faces_sum == 1:
                    nimg = face_preprocess.preprocess(image, bbox, points, image_size='112,112')
                    cv2.imencode('.png', nimg)[1].tofile('face_db/%s.png' % user_name)
                    faces_db = load_faces(face_sess, inputs_placeholder, embeddings)  # Update database
                    return json.dumps({"code": 0, "msg": "Registration successful"})
            return json.dumps({"code": 3, "msg": "Image should contain exactly one face"})
        except:
            return json.dumps({"code": 2, "msg": "Invalid image format or no face detected"})
    else:
        return json.dumps({"code": 1, "msg": "No image uploaded"})

Recognition endpoint:
```python
@app.route(“/recognition”, methods=[‘POST’])
def recognition():
upload_file = request.files[‘image’]
is_chrome_camera = request.values.get(“is_chrome_camera”)
if upload_file:
try:
img = cv2.imdecode(np.frombuffer(upload_file.read(), np.uint8), cv2.IMREAD_UNCHANGED)
if is_chrome_camera == “True”:
cv2.imwrite(‘temp.png’, img)
img = cv2.imdecode(np.fromfile(‘temp.png’, dtype=np.uint

Introduction¶

Local Image Face Recognition¶

Camera-Based Face Recognition¶

HTTP Service-Based Face Recognition¶

Related Articles