React models

react-1 introduces the first performance control primitive for video editing. react-1 can synchronize the lip movements, facial expressions, and head movements to match a target audio, while following an emotional prompt. We describe a couple of workflows below that are only possible using react-1.

Key Features

  • Model Modes: You can choose which facial region to edit: just the mouth, or the facial expressions, or even the head movements
  • Expressive lipsync: react-1 operates on a much larger facial region, giving you the the most expressive mouth movements that match the speech
  • Facial Expressions: facial expressions rewritten using an emotion prompt. and every tiny micro-expressions are perfectly in sync with the speech.
  • Head Movements: react-1 can also synchronize your head movements to match the pacing, prosody, and intonation of the new dialogue.

Model modes

The model can be controlled with three modes of operation: lips, face, and head.

This allows you to specify the spatial region you want to edit. You can only opt for lipsync, or additionally choose for facial expressions or head movements as well. The default is face.

ModeLipsyncFacial ExpressionsHead Movements
lips
face
head

Emotion prompts

You can guide the facial expressions by specifying the emotion prompts. You can also choose not to specify one, in which case, the model will follow the emotional context of the input video. Please see usage below for more details.

Usage

react-1 is available through the same /v2/generate API endpoint used for standard lipsync, with additional parameters to control the emotional and movement effects.

To use react-1 with the API, set the model parameter to react-1 and configure the additional options:

1from sync import Sync
2from sync.common import Audio, Video, GenerationOptions
3
4sync = Sync()
5
6response = sync.generations.create(
7 input=[
8 Video(url="https://assets.sync.so/docs/example-video.mp4"),
9 Audio(url="https://assets.sync.so/docs/example-audio.wav")
10 ],
11 model="react-1",
12 options=GenerationOptions(
13 prompt="happy",
14 model_mode="face"
15 )
16)

API Parameters

model

Set to react-1 to use the react-1 model.

options.model_mode

Controls the edit region and movement scope for the model. Available options:

  • lips: Only lipsync using react-1 (minimal facial changes)
  • face (default): Lipsync + facial expressions without head movements
  • head: Lipsync + facial expressions + natural talking head movements

The model_mode parameter only works with the react-1 model. For other models, this parameter is ignored.

options.prompt

Emotion prompt for the generation. Currently supports single-word emotions only.

Available options:

  • happy
  • angry
  • sad
  • neutral
  • disgusted
  • surprised

The prompt parameter only works with the react-1 model. For other models, this parameter is ignored.

react-1 is available in Sync Studio with an intuitive interface for controlling emotional expressions and head movements.

2

Select your assets

Upload or select your video and audio inputs. Remember that react-1 supports inputs up to 15 seconds in duration.

3

Choose react-1 model

Select react-1 from the model dropdown in the generation settings.

4

Configure model mode

Choose your desired model mode:

  • Lips: For lipsync only
  • Face: For lipsync with facial expressions (default)
  • Head: For lipsync with facial expressions and natural head movements
5

Change expression

Select an expression from the emotion wheel by clicking in the video player controls.

6

Generate

Click generate to create your lipsync with emotional expressions and optional head movements.

Best Practices

Choose the Right Mode
  • Use lips mode when you only need lipsync without emotional changes
  • Use face mode (default) for most use cases where you want natural expressions
  • Use head mode when you want the most dynamic and natural talking head movements
Select Appropriate Emotions

Choose emotion prompts that match the tone and context of your audio. The model will generate facial expressions that align with the selected emotion throughout the generation.

Input Duration

Keep your inputs under 15 seconds. For longer content, break your video into segments and process them separately, or use the standard lipsync models for longer durations.

Current Limitations

The following features are not yet supported for react-1:

  • Input Duration: react-1 supports inputs up to 15 seconds in duration. For longer content, consider breaking your video into segments.
  • Segments: Multi-segment generation with different audio inputs is not available. Process each segment separately if needed.
  • Speaker Selection: The active_speaker_detection option is not supported, including both automatic detection (auto_detect) and manual selection via bounding box or frame number. Ensure your input video contains a single, clearly visible speaker.
  • Occlusion detection: The occlusion_detection_enabled option for handling partially hidden faces is not available for react-1.

When to use react-1

react-1 is ideal for:

  • Short-form content (≤ 15 seconds) requiring emotional expressions
  • Videos where natural head movements enhance the result
  • Content that benefits from emotion-aware facial expressions
  • Projects where you want more dynamic and expressive lipsync results

For longer content (> 15 seconds) or when you only need standard lipsync, consider using lipsync-2 or lipsync-2-pro instead.