Media Content Tips

The quality of your input media has a direct impact on the final lipsync output. For optimal results, please follow these tips for preparing your video and audio content.

Tips for Video Content

Avoid profile views and obstructions

For best performance, avoid full profile (side-view) shots and obstructions covering the face.

Ensure Natural Talking Motion

The model performs best when the character in the video appears to be talking naturally. It will preserve the speaker’s style during lipsync.

Tip for AI-Generated Video: When creating videos with third-party AI video generation models, include this instruction in the text prompt: "the character should be speaking naturally". The generated AI video will have some random mouth movements, which are necessary to get the best results from our lipsync model.

Tips for Audio Content

Use clear audio

For best performance, avoid audio with music, background noise, or multiple simultaneous speakers.

Sync Mode Options

When your video and audio have different durations, you can choose how to handle the mismatch using the sync_mode parameter. Here’s a brief overview of each option:

bounce

When video is shorter than audio, the video will reverse playback at the end to match audio duration. Otherwise, video is cropped to match audio.

loop

When video is shorter than audio, the video will loop from the beginning to match audio duration. Otherwise, video is cropped to match audio.

cut_off

When audio is longer than video, the audio will be cut off to match video duration. Otherwise, video is cropped to match audio.

silence

When video is longer than audio, silence will be added to the audio to match video duration. Otherwise, video is cropped to match audio.

remap

The video playback speed will be adjusted (sped up or slowed down) to exactly match the audio duration, preserving all content from both.

Default Sync Mode: The default sync mode depends on whether you’re using segmented generations:

For non-segmented generations: bounce is the default
For segmented generations (using segments_secs or segments_frames parameters): remap is the default and is recommended to avoid abrupt cuts in the middle of the video.

Choosing the Right Sync Mode: Use bounce or loop for short videos with longer audio, cut_off when you want to prioritize video length, silence when you want to preserve the full video, and remap when you need to preserve all content from both video and audio.