Skip to main content
Android
iOS
macOS
Web
Windows
Electron
Flutter
React Native
React JS
Unity
Unreal Engine
Unreal (Blueprint)

Facial data capture

The Facial Capture extension collects quantitative data such as facial feature points, head rotation, and head translation. You can use this data to drive 3D facial stickers, headgear, pendant applications, or digital humans, adding more vivid expressions to virtual images.

This guide is intended for scenarios where the facial capture extensions is used independently to capture facial data, while a third-party rendering engine is used to animate a virtual human.

Information

To avoid collecting facial data through callbacks and building your own collection, encoding, and transmission framework, use the Agora MetaKit extension for facial capture.

Prerequisites

Ensure that you have:

  • Integrated Video SDK version 4.3.0, including the face capture extension dynamic library libagora_face_capture_extension.so.
  • Obtained obtained the face capture authentication parameters authentication_information and company_id by contacting Agora technical support.

Implement the logic

This section shows you how to integrate the Facial Capture extension in your app to capture facial data.

Enable the extension

To enable the Facial Capture extension, call enableExtension:


_5
rtcEngine.enableExtension("agora_video_filters_face_capture",
_5
"face_capture",
_5
true,
_5
Constants.MediaSourceType.PRIMARY_CAMERA_SOURCE
_5
);

Info

When you enable the Facial Capture extension for the first time, a delay may occur. To ensure smooth operation during a session, call enableExtension before joining a channel.

Set authentication parameters

To ensure that the extension functions properly, call setExtensionProperty to pass the necessary authentication parameters.


_7
rtcEngine.setExtensionProperty(
_7
"agora_video_filters_face_capture",
_7
"face_capture",
_7
"authentication_information",
_7
"{\"company_id\":\"xxxxx\",\"license\":\"xxxxx\"}",
_7
Constants.MediaSourceType.PRIMARY_CAMERA_SOURCE
_7
);

Retrieve facial data

Retrieve the raw video data containing facial information through the onCaptureVideoFrame callback.


_14
public boolean onCaptureVideoFrame(int sourceType, VideoFrame videoFrame) {
_14
if (null != videoFrame.metaInfo) {
_14
VideoFrameMetaInfo metaInfo = videoFrame.getMetaInfo();
_14
SparseArray<IMetaInfo> customMetaInfo = metaInfo.getCustomMetaInfo(
_14
"FaceCaptureInfo"
_14
);
_14
if (null != customMetaInfo && customMetaInfo.size >= 1) {
_14
String face_info =
_14
((FaceCaptureInfo) (customMetaInfo.get(0))).getInfoStr();
_14
Log.d(TAG, "Face Info: " + face_info);
_14
}
_14
}
_14
return true;
_14
}

Important

Currently, the facial capture function outputs data for only one face at a time. After the callback is triggered, you must allocate memory separately to store the facial data and process it in a separate thread. Otherwise, the raw data callback may lead to frame loss.

Use facial information to drive virtual humans

The output facial data is in JSON format and includes quantitative information such as facial feature points, head rotation, and head translation. This data follows the Blend Shape (BS) format in compliance with the ARKit standard. You can use a third-party 3D rendering engine to further process the BS data. The key elements are:

  • faces: An array of objects, each representing recognized facial information for one face.
    • detected: A float representing the confidence level of face recognition, ranging from 0.0 to 1.0.
    • blendshapes: An object containing the face capture coefficients. The keys follow the ARKit standard, with each key-value pair representing a blendshape coefficient, where the value is a float between 0.0 and 1.0.
    • rotation: An array of objects representing head rotation. It contains three key-value pairs. All values are floating points between -180.0 and 180.0.
      • pitch: The pitch angle of the head. Positive values represent head lowering, negative values represent head raising.
      • yaw: The angle of head rotation. Positive values represent left rotation, negative values represent right rotation.
      • roll: The tilt angle of the head. Positive values represent right tilt, negative values represent left tilt.
  • translation: An object representing head translation, with three key-value pairs: x, y, and z. The values are floats between 0.0 and 1.0.
  • faceState: An integer indicating the current face capture control state:
    • 0: The algorithm is in surface capture control.
    • 1: The algorithm control returns to the center.
    • 2: The algorithm is restored and not in control.
  • timestamp: A string representing the output result's timestamp, in milliseconds.

This data can be used to animate virtual humans by applying the blendshape coefficients and head movement data to a 3D model.


_23
{
_23
"faces": [{
_23
"detected": 0.98,
_23
"blendshapes": {
_23
"eyeBlinkLeft": 0.9, "eyeLookDownLeft": 0.0, "eyeLookInLeft": 0.0, "eyeLookOutLeft": 0.0, "eyeLookUpLeft": 0.0,
_23
"eyeSquintLeft": 0.0, "eyeWideLeft": 0.0, "eyeBlinkRight": 0.0, "eyeLookDownRight": 0.0, "eyeLookInRight": 0.0,
_23
"eyeLookOutRight": 0.0, "eyeLookUpRight": 0.0, "eyeSquintRight": 0.0, "eyeWideRight": 0.0, "jawForward": 0.0,
_23
"jawLeft": 0.0, "jawRight": 0.0, "jawOpen": 0.0, "mouthClose": 0.0, "mouthFunnel": 0.0, "mouthPucker": 0.0,
_23
"mouthLeft": 0.0, "mouthRight": 0.0, "mouthSmileLeft": 0.0, "mouthSmileRight": 0.0, "mouthFrownLeft": 0.0,
_23
"mouthFrownRight": 0.0, "mouthDimpleLeft": 0.0, "mouthDimpleRight": 0.0, "mouthStretchLeft": 0.0, "mouthStretchRight": 0.0,
_23
"mouthRollLower": 0.0, "mouthRollUpper": 0.0, "mouthShrugLower": 0.0, "mouthShrugUpper": 0.0, "mouthPressLeft": 0.0,
_23
"mouthPressRight": 0.0, "mouthLowerDownLeft": 0.0, "mouthLowerDownRight": 0.0, "mouthUpperUpLeft": 0.0, "mouthUpperUpRight": 0.0,
_23
"browDownLeft": 0.0, "browDownRight": 0.0, "browInnerUp": 0.0, "browOuterUpLeft": 0.0, "browOuterUpRight": 0.0,
_23
"cheekPuff": 0.0, "cheekSquintLeft": 0.0, "cheekSquintRight": 0.0, "noseSneerLeft": 0.0, "noseSneerRight": 0.0,
_23
"tongueOut": 0.0
_23
},
_23
"rotation": {"pitch": 30.0, "yaw": 25.5, "roll": -15.5},
_23
"rotationMatrix": [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0],
_23
"translation": {"x": 0.5, "y": 0.3, "z": 0.5},
_23
"faceState": 1
_23
}],
_23
"timestamp": "654879876546"
_23
}

Disable the extension

To disable the Facial Capture extension, call enableExtension:


_6
rtcEngine.enableExtension(
_6
"agora_video_filters_face_cxapture",
_6
"face_capture",
_6
false,
_6
Constants.MediaSourceType.PRIMARY_CAMERA_SOURCE
_6
);

Reference

This section contains content that completes the information in this page, or points you to documentation that explains other aspects to this product.

Sample project

Agora provides an open source sample project on GitHub for your reference. Download or view Face Capture for a more detailed example.

API reference

Interactive Live Streaming