Skip to content


Mojo Facial Expression Recognition is a cloud-based facial expression recognition software. It estimates facial expressions and social cues in real-time, based on facial key points. We use facial keypoints to guarantee privacy.

The API to access facial expression recognition is divided into 2 different parts :

  • Anonymization, that runs locally with the frontend API
  • Emotion recognition that runs in the cloud-based backend API

Architecture Overview

What does the sequence flow looks like?

  1. Asking the final user for access to his local camera
  2. Accessing camera
  3. Downloading anonymisation model
  4. Initializing the API with API Key
  5. Establishing a secured connection to the cloud API
  6. While the facial expression recognition service is required
    1. Computing images from the camera on-device with the anonymization model
    2. Sending on-the-fly anonymized facial keypoints to the cloud API
    3. Receiving estimation for each emotion and social cues through a secured channel
  7. Closing the channel
  8. Releasing access to the camera


On-device, the API is in charge of:

  • Camera access permission request
  • computing raw images to extract anonymized facial key points

Cloud connection

Using the user’s credentials, it’s in charge of sending the stream of facial keypoints to the backend API’s endpoint and request for a prediction. If configured accordingly, the backend API sends back predictions to the on-device frontend API for local use.

What is a session?

A session starts when someone is using an application based on the Mojo Facial Expression API.

A typical session sequence consists of an initialization phase, then streaming anonymized data from the frontend API to the cloud API, and in parallel estimations are sent back from the backend API to the frontend API.

A session ends when the disconnect event is received from the server, or when no more data is sent for a certain period of time.

Why sessions are important

The number of simultaneous sessions is determined by your Mojo Facial Expression Recognition plan.
If too many sessions are opened, the first arrived, first served principle applies.
💡 You can change your plan anytime to adjust to your real need, and growth.

Is it compatible with smartphones? Laptops?

It’s compatible with any browser, on your laptop, smartphone or tablet.

At which frequency do the estimations are made?

Once a new session is started, and as soon as a new series of facial key points is received by the cloud recognition component, the calculation is usually computed in less than 50 milliseconds. The response is sent directly to the frontend API.

How is it secured?

Each customer application has its token access to send the data to Mojo Cloud and a separate key to get access to the SocketIO server. The software architecture uses standard state of the art secured protocol, data encryption end-to-end.

More info

👉   You can find more informations on the architecture, and privacy principles in the "Concepts" section

Are there some limitations?

The API is working best in those situations :

  • Head orientation : ±25° facing the camera
  • Light facing the user

Within the following situations, the API will be less stable and accuracy will be low

  • If the person is wearing a mask
  • When light is on the opposite side of the person.