Media Content Tips
The quality of your input media has a direct impact on the final lipsync output. For optimal results, please follow these tips for preparing your video and audio content.
Tips for Video Content
For best performance, avoid full profile (side-view) shots and obstructions covering the face.
The model performs best when the character in the video appears to be talking naturally. It will preserve the speaker’s style during lipsync.
Tip for AI-Generated Video: When creating videos with third-party AI video generation models, include this instruction in the text prompt: "the character should be speaking naturally"
. The generated AI video will have some random mouth movements, which are necessary to get the best results from our lipsync model.
Tips for Audio Content
For best performance, avoid audio with music, background noise, or multiple simultaneous speakers.
Sync Mode Options
When your video and audio have different durations, you can choose how to handle the mismatch using the sync_mode
parameter. Here’s a brief overview of each option:
When video is shorter than audio, the video will reverse playback at the end to match audio duration. Otherwise, video is cropped to match audio.
When video is shorter than audio, the video will loop from the beginning to match audio duration. Otherwise, video is cropped to match audio.
When audio is longer than video, the audio will be cut off to match video duration. Otherwise, video is cropped to match audio.
When video is longer than audio, silence will be added to the audio to match video duration. Otherwise, video is cropped to match audio.
The video playback speed will be adjusted (sped up or slowed down) to exactly match the audio duration, preserving all content from both.
Choosing the Right Sync Mode: Use bounce
or loop
for short videos with longer audio, cut_off
when you want to prioritize video length, silence
when you want to preserve the full video, and remap
when you need to preserve all content from both video and audio.