Segments Guide

Overview

segments feature enables you to sync multiple video segments with different audio inputs in a single generation. Using segments, you can:

  • LipSync different audio clips to different parts of your video
  • Use specific portion of audio input to lipsync a segment for precise timing
  • Use both audio and text-to-speech inputs to lipsync multiple segments with different input types in a single generation

Basic Concepts

To use segments feature, you need to provide a top-level segments array with each item defining a video time range/segment, each with its own audio configuration.

Segment

Each segment item takes the following properties:

startTime
doubleRequired

Segment start time in seconds

endTime
doubleRequired

Segment end time in seconds

audioInput
SegmentAudioInputRequired

Audio configuration with refId and optional cropping

audioInput

Each segment requires exactly one audioInput. audioInput takes the following properties:

refId
stringRequired

Reference ID of the audio/text-to-speech input to use for this segment

startTime
double

Optional start time (in seconds) to crop the referenced audio. When specified, endTime must also be provided

endTime
double

Optional end time (in seconds) to crop the referenced audio. When specified, startTime must also be provided

The specified audioInput will be used to lipsync the video segment between startTime and endTime.

API Usage Examples

1from sync import Sync
2from sync.common import Audio, Video
3
4sync = Sync()
5
6response = sync.generations.create(
7 input=[
8 Video(url="https://assets.sync.so/docs/example-video.mp4"),
9 Audio(url="https://assets.sync.so/docs/example-audio.wav", ref_id="audio_1"),
10 ],
11 model="lipsync-2",
12 segments=[
13 GenerationSegment(
14 start_time=2,
15 end_time=5,
16 audio_input=SegmentAudioInput(ref_id="audio_1"),
17 ),
18 ],
19)

Multiple Segments with Single Audio Input

1from sync import Sync
2from sync.common import Audio, Video, TTS
3
4sync = Sync()
5
6response = sync.generations.create(
7 input=[
8 Video(url="https://assets.sync.so/docs/example-video.mp4"),
9 Audio(url="https://assets.sync.so/docs/example-audio.wav", ref_id="audio_1")
10 ],
11 segments=[
12 {
13 "startTime": 2,
14 "endTime": 5,
15 "audioInput": {"refId": "audio_1", "startTime": 2, "endTime": 5}
16 },
17 {
18 "startTime": 6,
19 "endTime": 8,
20 "audioInput": {"refId": "audio_1", "startTime": 6, "endTime": 8}
21 }
22 ],
23 model="lipsync-2"
24)

Multiple Segments with Single Audio Input

1from sync import Sync
2from sync.common import Audio, Video, TTS
3
4sync = Sync()
5
6response = sync.generations.create(
7 input=[
8 Video(url="https://assets.sync.so/docs/example-video.mp4"),
9 Audio(url="https://assets.sync.so/docs/example-audio.wav", ref_id="audio_1"),
10 Audio(url="https://assets.sync.so/docs/example-audio.wav", ref_id="audio_2")
11 ],
12 segments=[
13 {
14 "startTime": 2,
15 "endTime": 5,
16 "audioInput": {"refId": "audio_1", "startTime": 2, "endTime": 5}
17 },
18 {
19 "startTime": 6,
20 "endTime": 8,
21 "audioInput": {"refId": "audio_2", "startTime": 6, "endTime": 8}
22 }
23 ],
24 model="lipsync-2"
25)

Best Practices

Planning Your Segments

  1. Map your timeline: Identify video segments and corresponding audio needs
  2. Prepare audio files: Ensure audio quality and appropriate duration
  3. Test segment boundaries: Verify smooth transitions between segments

Audio Preparation

  • Use consistent audio quality across all segments and the video’s audio.
  • For best results, ensure proper timing alignment with video segments. If segment duration and corresponding audio duration don’t match, rely on sync_mode to determine how to handle the mismatch.

Troubleshooting

Common Errors

Provide a top-level segments array when using multiple audio or text inputs.

1# ❌ This will fail
2response = sync.generations.create(
3 input=[
4 Video(url="video.mp4"),
5 Audio(url="audio1.wav"), # Multiple audio without segments
6 Audio(url="audio2.wav")
7 ],
8 model="lipsync-2"
9)
10
11# ✅ This will work
12response = sync.generations.create(
13 input=[
14 Video(url="video.mp4"),
15 Audio(url="audio1.wav", ref_id="a1"),
16 Audio(url="audio2.wav", ref_id="a2")
17 ],
18 segments=[
19 {"start_time": 0, "end_time": 10, "audio_input": {"refId": "a1"}},
20 {"start_time": 10, "end_time": 20, "audio_input": {"refId": "a2"}}
21 ],
22 model="lipsync-2"
23)

Ensure all audio inputs have valid url or assetId values and that referenced refId values exist in your audio or text inputs.

1# ❌ Missing refId reference
2response = sync.generations.create(
3 input=[
4 Video(url="video.mp4"),
5 Audio(url="audio.wav", ref_id="audio1")
6 ],
7 segments=[
8 {"start_time": 0, "end_time": 10, "audio_input": {"refId": "missing"}} # Wrong refId
9 ],
10 model="lipsync-2"
11)
12
13# ✅ Correct refId reference
14response = sync.generations.create(
15 input=[
16 Video(url="video.mp4"),
17 Audio(url="audio.wav", ref_id="audio1")
18 ],
19 segments=[
20 {"start_time": 0, "end_time": 10, "audio_input": {"refId": "audio1"}} # Correct refId
21 ],
22 model="lipsync-2"
23)

This error occurs when a segment’s audio_input is missing a refId or the refId is empty. Each segment must reference a valid audio or text input through its refId.

1# ❌ Missing refId in segment
2response = sync.generations.create(
3 input=[
4 Video(url="video.mp4"),
5 Audio(url="audio.wav", ref_id="audio1")
6 ],
7 segments=[
8 {
9 "start_time": 0,
10 "end_time": 10,
11 "audio_input": {} # Missing refId
12 }
13 ],
14 model="lipsync-2"
15)
16
17# ✅ Include refId in segment
18response = sync.generations.create(
19 input=[
20 Video(url="video.mp4"),
21 Audio(url="audio.wav", ref_id="audio1")
22 ],
23 segments=[
24 {
25 "start_time": 0,
26 "end_time": 10,
27 "audio_input": {"refId": "audio1"} # Valid refId
28 }
29 ],
30 model="lipsync-2"
31)

This error occurs when a segment references a refId that doesn’t exist in your audio or text inputs. Ensure all referenced refId values match exactly with those defined in your inputs.

1# ❌ Segment references unknown refId
2response = sync.generations.create(
3 input=[
4 Video(url="video.mp4"),
5 Audio(url="audio.wav", ref_id="audio1") # refId is "audio1"
6 ],
7 segments=[
8 {
9 "start_time": 0,
10 "end_time": 10,
11 "audio_input": {"refId": "nonexistent"} # References unknown refId
12 }
13 ],
14 model="lipsync-2"
15)
16
17# ✅ Segment references existing refId
18response = sync.generations.create(
19 input=[
20 Video(url="video.mp4"),
21 Audio(url="audio.wav", ref_id="audio1") # refId is "audio1"
22 ],
23 segments=[
24 {
25 "start_time": 0,
26 "end_time": 10,
27 "audio_input": {"refId": "audio1"} # References existing refId
28 }
29 ],
30 model="lipsync-2"
31)

Ensure both start_time and end_time are provided for audio cropping, and verify start_time < end_time for all crop ranges.

1# ❌ Incomplete crop range
2segments=[
3 {
4 "start_time": 0,
5 "end_time": 10,
6 "audio_input": {
7 "refId": "audio1",
8 "start_time": 5 # Missing end_time
9 }
10 }
11]
12
13# ✅ Complete crop range
14segments=[
15 {
16 "start_time": 0,
17 "end_time": 10,
18 "audio_input": {
19 "refId": "audio1",
20 "start_time": 5,
21 "end_time": 15
22 }
23 }
24]

Ensure you have at least one audio input or text input with a valid refId when using segments.

1# ❌ This will fail - no audio or text inputs
2response = sync.generations.create(
3 input=[
4 Video(url="https://example.com/video.mp4")
5 ],
6 segments=[
7 {"start_time": 0, "end_time": 10, "audio_input": {"refId": "missing"}}
8 ],
9 model="lipsync-2"
10)
11
12# ✅ This will work - includes text input
13response = sync.generations.create(
14 input=[
15 Video(url="https://example.com/video.mp4"),
16 TTS(
17 provider={
18 "name": "elevenlabs",
19 "voiceId": "EXAVITQu4vr4xnSDxMaL",
20 "script": "Hello world"
21 },
22 ref_id="text1"
23 )
24 ],
25 segments=[
26 {"start_time": 0, "end_time": 10, "audio_input": {"refId": "text1"}}
27 ],
28 model="lipsync-2"
29)