MediaPipe is a framework for building cross-platform multi-modal application ML pipelines, including fast ML inference, classical computer vision, and media content processing (e.g., video decoding). Below is an example diagram of MediaPipe for object detection and tracking, consisting of 4 computational nodes: a PacketResampler calculator; a previously published ObjectDetection subgraph; an ObjectTracking subgraph surrounding the above BoxTracking subgraph; and a Renderer subgraph for drawing visualization effects.

The ObjectDetection subgraph runs only on request, for example, at any frame rate or triggered by a specific signal. More specifically, before passing video frames to ObjectDetection, the PacketResampler in this example temporarily samples them at 0.5 fps. You can configure this option in PacketResampler to different frame rates. This ensures less temporal jitter during recognition and maintains object IDs across frames.

MediaPipe open-source address: https://github.com/google/mediapipe

Step 1: Install the MediaPipe Framework

Install dependency environment.

sudo apt-get update && sudo apt-get install -y build-essential git python zip adb openjdk-8-jdk

Install Bazel build environment, as MediaPipe is compiled using Bazel.

curl -sLO --retry 5 --retry-max-time 10 \
https://storage.googleapis.com/bazel/2.0.0/release/bazel-2.0.0-installer-linux-x86_64.sh && \
sudo mkdir -p /usr/local/bazel/2.0.0 && \
chmod 755 bazel-2.0.0-installer-linux-x86_64.sh && \
sudo ./bazel-2.0.0-installer-linux-x86_64.sh --prefix=/usr/local/bazel/2.0.0 && \
source /usr/local/bazel/2.0.0/lib/bazel/bin/bazel-complete.bash

/usr/local/bazel/2.0.0/lib/bazel/bin/bazel version && \
alias bazel='/usr/local/bazel/2.0.0/lib/bazel/bin/bazel'

Install the ADB command, and Windows users should install the same version of ADB. For Windows, download the corresponding version from: https://dl.google.com/android/repository/platform-tools_r26.0.1-windows.zip

sudo apt-get install android-tools-adb
adb version

# Android Debug Bridge version 1.0.39

Clone the MediaPipe source code.

git clone https://github.com/google/mediapipe.git
cd mediapipe

Install the OpenCV environment with the following command:

sudo apt-get install libopencv-core-dev libopencv-highgui-dev \
libopencv-calib3d-dev libopencv-features2d-dev \
libopencv-imgproc-dev libopencv-video-dev

Execute the following command to test if the environment is installed successfully:

export GLOG_logtostderr=1

bazel run --define MEDIAPIPE_DISABLE_GPU=1 \
mediapipe/examples/desktop/hello_world:hello_world

If the environment is installed successfully, the following information will be output:

I20200707 09:21:50.275205 16138 hello_world.cc:56] Hello World!
I20200707 09:21:50.276554 16138 hello_world.cc:56] Hello World!
I20200707 09:21:50.276665 16138 hello_world.cc:56] Hello World!
I20200707 09:21:50.276768 16138 hello_world.cc:56] Hello World!
I20200707 09:21:50.276887 16138 hello_world.cc:56] Hello World!
I20200707 09:21:50.277523 16138 hello_world.cc:56] Hello World!
I20200707 09:21:50.278563 16138 hello_world.cc:56] Hello World!
I20200707 09:21:50.279263 16138 hello_world.cc:56] Hello World!
I20200707 09:21:50.279850 16138 hello_world.cc:56] Hello World!
I20200707 09:21:50.280354 16138 hello_world.cc:56] Hello World!

Step 2: Compile the MediaPipe Android AAR Package

Execute the following script in the root directory of mediapipe to install the Android SDK and NDK. During installation, you need to accept the license agreement by entering y. After executing the script, verify that the SDK and NDK have been downloaded to the specified directory.

chmod +x ./setup_android_sdk_and_ndk.sh
bash ./setup_android_sdk_and_ndk.sh ~/Android/Sdk ~/Android/Ndk r18b

If you encounter the $'\r': command not found error (common when using Windows to clone the repository), execute:

vim setup_android_sdk_and_ndk.sh
:set ff=unix
:wq

Add SDK and NDK environment variables. Based on the parameters entered during the script execution, the directories are as follows:

vim ~/.bashrc

Add the following lines (replace test with your actual username):

export ANDROID_HOME=$PATH:/home/test/Android/Sdk
export ANDROID_NDK_HOME=$PATH:/home/test/Android/Ndk/android-ndk-r18b

Execute source ~/.bashrc to apply the changes.

Create a build file for MediaPipe to generate an Android AAR:

cd mediapipe/examples/android/src/java/com/google/mediapipe/apps/
mkdir build_aar && cd build_aar
vim BUILD

The content of the BUILD file is as follows:

load("//mediapipe/java/com/google/mediapipe:mediapipe_aar.bzl", "mediapipe_aar")

mediapipe_aar(
    name = "mediapipe_hand_tracking",
    calculators = ["//mediapipe/graphs/hand_tracking:mobile_calculators"],
)
  • name: Name of the generated AAR.
  • calculators: Models and computational units to use. Other available models and calculators can be found in the mediapipe/graphs/ directory. The hand_tracking directory contains the hand tracking model. For computational units, check the cc_library in the BUILD file of the target directory. For Android deployment, select the mobile calculators.

Return to the mediapipe root directory and execute the following command to generate the Android AAR file:

chmod -R 755 mediapipe/

bazel build -c opt --fat_apk_cpu=arm64-v8a,armeabi-v7a \
//mediapipe/examples/android/src/java/com/google/mediapipe/apps/build_aar:mediapipe_hand_tracking

The generated AAR file will be located at:

bazel-bin/mediapipe/examples/android/src/java/com/google/mediapipe/apps/build_aar/mediapipe_hand_tracking.aar

Generate the MediaPipe binary graph with the following command (replace the binary graph name as needed):

bazel build -c opt mediapipe/graphs/hand_tracking:hand_tracking_mobile_gpu_binary_graph

The generated binary graph file will be located at:

bazel-bin/mediapipe/graphs/hand_tracking/hand_tracking_mobile_gpu.binarypb

Step 3: Build the Android Project

  1. Create a new “TestMediaPipe” project in Android Studio.

  2. Copy the generated AAR file from the previous step to the app/libs/ directory:

   bazel-bin/mediapipe/examples/android/src/java/com/google/mediapipe/apps/build_aar/mediapipe_hand_tracking.aar
  1. Copy the following files to the app/src/main/assets/ directory:
   bazel-bin/mediapipe/graphs/hand_tracking/hand_tracking_mobile_gpu.binarypb
   mediapipe/models:handedness.txt
   mediapipe/models/hand_landmark.tflite
   mediapipe/models/palm_detection.tflite
   mediapipe/models/palm_detection_labelmap.txt
  1. Download the OpenCV SDK from:
   https://github.com/opencv/opencv/releases/download/3.4.3/opencv-3.4.3-android-sdk.zip

After unzipping, copy the arm64-v8a and armeabi-v7a directories from OpenCV-android-sdk/sdk/native/libs/ to the app/src/main/jniLibs/ directory of the Android project.

  1. Add dependencies in app/build.gradle:
   dependencies {
       implementation fileTree(dir: "libs", include: ["*.jar", '*.aar'])
       implementation 'androidx.appcompat:appcompat:1.1.0'
       implementation 'androidx.constraintlayout:constraintlayout:1.1.3'
       testImplementation 'junit:junit:4.13'
       androidTestImplementation 'androidx.test.ext:junit:1.1.1'
       androidTestImplementation 'androidx.test.espresso:espresso-core:3.2.0'
       // MediaPipe dependencies
       implementation 'com.google.flogger:flogger:0.3.1'
       implementation 'com.google.flogger:flogger-system-backend:0.3.1'
       implementation 'com.google.code.findbugs:jsr305:3.0.2'
       implementation 'com.google.guava:guava:27.0.1-android'
       implementation 'com.google.protobuf:protobuf-java:3.11.4'
       // CameraX core library
       implementation "androidx.camera:camera-core:1.0.0-alpha06"
       implementation "androidx.camera:camera-camera2:1.0.0-alpha06"
   }
   // Set Java version to 1.8
   compileOptions {
       targetCompatibility = 1.8
       sourceCompatibility = 1.8
   }
  1. Add camera permissions in AndroidManifest.xml:
   <!-- For camera access -->
   <uses-permission android:name="android.permission.CAMERA" />
   <uses-feature android:name="android.hardware.camera" />
   <uses-feature android:name="android.hardware.camera.autofocus" />
   <!-- For MediaPipe -->
   <uses-feature android:glEsVersion="0x00020000" android:required="true" />
  1. Modify the page and logic code in MainActivity.java and activity_main.xml:

activity_main.xml:

<?xml version="1.0" encoding="utf-8"?>
<LinearLayout xmlns:android="http://schemas.android.com/apk/res/android"
    xmlns:app="http://schemas.android.com/apk/res-auto"
    xmlns:tools="http://schemas.android.com/tools"
    android:layout_width="match_parent"
    android:layout_height="match_parent">

    <FrameLayout
        android:id="@+id/preview_display_layout"
        android:layout_width="match_parent"
        android:layout_height="match_parent">

        <TextView
            android:id="@+id/no_camera_access_view"
            android:layout_width="match_parent"
            android:layout_height="match_parent"
            android:gravity="center"
            android:text="Camera connection failed" />
    </FrameLayout>
</LinearLayout>

MainActivity.java:
```java
public class MainActivity extends AppCompatActivity {
private static final String TAG = “MainActivity”;

// Resource names and stream outputs
private static final String BINARY_GRAPH_NAME = "hand_tracking_mobile_gpu.binarypb";
private static final String INPUT_VIDEO_STREAM_NAME = "input_video";
private static final String OUTPUT_VIDEO_STREAM_NAME = "output_video";
private static final String OUTPUT_HAND_PRESENCE_STREAM_NAME = "hand_presence";
private static final String OUTPUT_LANDMARKS_STREAM_NAME = "hand_landmarks";

private SurfaceTexture previewFrameTexture;
private SurfaceView previewDisplayView;
private EglManager eglManager;
private FrameProcessor processor;
private ExternalTextureConverter converter;
private CameraXPreviewHelper cameraHelper;
private boolean handPresence;
private static final boolean USE_FRONT_CAMERA = false;
private static final boolean FLIP_FRAMES_VERTICALLY = true;

static {
    System.loadLibrary("mediapipe_jni");
    System.loadLibrary("opencv_java3");
}

@Override
protected void onCreate(Bundle savedInstanceState) {
    super.onCreate(savedInstanceState);
    setContentView(R.layout.activity_main);

    previewDisplayView = new SurfaceView(this);
    setupPreviewDisplayView();
    PermissionHelper.checkAndRequestCameraPermissions(this);
    AndroidAssetUtil.initializeNativeAssetManager(this);

    eglManager = new EglManager(null);
    processor = new FrameProcessor(this,
            eglManager.getNativeContext(),
            BINARY_GRAPH_NAME,
            INPUT_VIDEO_STREAM_NAME,
            OUTPUT_VIDEO_STREAM_NAME);
    processor.getVideoSurfaceOutput().setFlipY(FLIP_FRAMES_VERTICALLY);

    processor.addPacketCallback(OUTPUT_HAND_PRESENCE_STREAM_NAME,
            (packet) -> {
                handPresence = PacketGetter.getBool(packet);
                Log.d(TAG, "[TS:" + packet.getTimestamp() + "] Hand presence: " + handPresence);
            });

    processor.addPacketCallback(OUTPUT_LANDMARKS_STREAM_NAME,
            (packet) -> {
                try {
                    NormalizedLandmarkList landmarks = NormalizedLandmarkList.parseFrom(PacketGetter.getProtoBytes(packet));
                    if (landmarks != null && handPresence) {
                        Log.d(TAG, "[TS:" + packet.getTimestamp() + "] #Landmarks: " + landmarks.getLandmarkCount());
                        Log.d(TAG, getLandmarksDebugString(landmarks));
                    }
                } catch (InvalidProtocolBufferException e) {
                    Log.e(TAG, "Error parsing landmarks: " + e);
                }
            });
}

@Override
protected void onResume() {
    super.onResume();
    converter = new ExternalTextureConverter(eglManager.getContext());
    converter.setFlipY(FLIP_FRAMES_VERTICALLY);
    converter.setConsumer(processor);
    if (PermissionHelper.cameraPermissionsGranted(this)) {
        startCamera();
    }
}

@Override
protected void onPause() {
    super.onPause();
    converter.close();
}

@Override
public void onRequestPermissionsResult(int requestCode, @NonNull String[] permissions, @NonNull int[] grantResults) {
    super.onRequestPermissionsResult(requestCode, permissions, grantResults);
    PermissionHelper.onRequestPermissionsResult(requestCode, permissions, grantResults);
}

protected void onPreviewDisplaySurfaceChanged(SurfaceHolder holder, int format, int width, int height) {
    Size viewSize = computeViewSize(width, height);
    Size displaySize = cameraHelper.computeDisplaySizeFromViewSize(viewSize);
    boolean isCameraRotated = cameraHelper.isCameraRotated();
    converter.setSurfaceTextureAndAttachToGLContext(
            previewFrameTexture,
            isCameraRotated ? displaySize.getHeight() : displaySize.getWidth(),
            isCameraRotated ? displaySize.getWidth() : displaySize.getHeight());
}

private void setupPreviewDisplayView() {
    previewDisplayView.setVisibility(View.GONE);
    ViewGroup viewGroup = findViewById(R.id.preview_display_layout);
    viewGroup.addView(previewDisplayView);

    previewDisplayView.getHolder().addCallback(new SurfaceHolder.Callback() {
        @Override
        public void surfaceCreated(SurfaceHolder holder) {
            processor.getVideoSurfaceOutput().setSurface(holder.getSurface());
        }

        @Override
        public void surfaceChanged(SurfaceHolder holder, int format, int width, int height) {
            onPreviewDisplaySurfaceChanged(holder, format, width, height);
        }

        @Override
        public void surfaceDestroyed(SurfaceHolder holder) {
            processor.getVideoSurfaceOutput().setSurface(null);
        }
    });
}

protected void onCameraStarted(SurfaceTexture surfaceTexture) {
    previewFrameTexture = surfaceTexture;
    previewDisplayView.setVisibility(View.VISIBLE);
}

protected Size cameraTargetResolution() {
    return null
Xiaoye