Best Practices & Tips

Supported Media Formats

Video Formats

The Sync API accepts the following video file formats:

MIME Type	File Extension	Format
`video/mp4`	`.mp4`	MP4
`video/quicktime`	`.mov`	QuickTime
`video/webm`	`.webm`	WebM
`video/x-msvideo`	`.avi`	AVI

Audio Formats

The Sync API accepts the following audio file formats:

MIME Type	File Extension	Format
`audio/wav`	`.wav`	WAV
`audio/mpeg`	`.mp3`	MP3
`audio/ogg`	`.ogg`	OGG
`audio/x-m4a`	`.m4a`	M4A
`audio/x-m3a`	`.m3a`	M3A
`audio/aac`	`.aac`	AAC
`audio/x-ms-wma`	`.wma`	WMA
`audio/flac`	`.flac`	FLAC
`audio/mp4`	`.mp4`	MP4 Audio

File Format Recommendation: While multiple formats are supported, we recommend using MP4 for video and WAV or MP3 for audio to ensure optimal compatibility and processing performance.

Output Quality

Video Processing Overview

The Sync video pipeline uses the H.264 codec for internal processing, and all videos are re-encoded. While we strive to preserve the input video’s quality and properties, this process may change properties like the original codec, bitrate, and frame rate.

A Note on HDR Video: 10-bit color space (HDR) videos are not fully supported. HDR videos will be normalized to 8-bit color space (SDR), which may cause changes to the color grading in the output.

Recommended Input Properties

Video

Property	Recommended Value
Codec	H.264 (High Profile)
Resolution	1920x1080
Average Bitrate	50 Mbps
Frame Rate (FPS)	24, 25, or 30 fps constant
Color Space	8-bit (SDR)

Audio

For the best results, use a sampling rate of 44.1kHz or 48kHz. If you provide audio with a higher sampling rate, it will be downsampled to 48kHz during lipsync, which can result in quality loss.

If an input file contains multiple audio streams, only the first stream is processed. All other streams are discarded.

Input Video Codec Comparison

Processing speed is similar for all codecs because every input is transcoded to a standard format. However, some codecs experience greater quality loss during this process.

The following results are from our internal testing, where quality was measured using VMAF.

Input Codec	Quality Loss During Processing
H.264	Best (Least quality loss)
MPEG-2	Good (Up to 15% quality loss)
H.265	Good (Up to 15% quality loss)
VP9	Fair (Up to 20% quality loss)
AV1	Fair (Over 20% quality loss)

Lipsync Quality (Lipsync-2 Model)

To achieve the best results with our Lipsync-2 model, keep the following tips in mind:

Provide High-Quality Inputs: For optimal performance, use a clear, well-lit video where the speaker’s face is front-facing and unobstructed. The model’s performance is directly tied to the quality of the source video.
Natural Talking Movements: The model performs best when the character in the video appears to be talking naturally throughout the video. It will preserve the speaker’s style during lipsync.

Tip for AI-Generated Video: When creating videos with AI, include a prompt instruction like "the character should be speaking naturally" to get the best results from our lipsync model.