Segments Guide

Overview

segments feature enables you to sync multiple video segments with different audio inputs in a single generation. Using segments, you can:

LipSync different audio clips to different parts of your video
Use specific portion of audio input to lipsync a segment for precise timing
Use both audio and text-to-speech inputs to lipsync multiple segments with different input types in a single generation

Basic Concepts

To use segments feature, you need to provide a top-level segments array with each item defining a video time range/segment, each with its own audio configuration.

Segment

Each segment item takes the following properties:

startTime

doubleRequired

Segment start time in seconds

endTime

doubleRequired

Segment end time in seconds

audioInput

SegmentAudioInputRequired

Audio configuration with refId and optional cropping

audioInput

Each segment requires exactly one audioInput. audioInput takes the following properties:

refId

stringRequired

Reference ID of the audio/text-to-speech input to use for this segment

startTime

double

Optional start time (in seconds) to crop the referenced audio. When specified, endTime must also be provided

endTime

double

Optional end time (in seconds) to crop the referenced audio. When specified, startTime must also be provided

The specified audioInput will be used to lipsync the video segment between startTime and endTime.

API Usage Examples

Single Segment with Single Audio

1 from sync import Sync
2 from sync.common import Audio, Video
3 
4 sync = Sync()
5 
6 response = sync.generations.create(
7     input=[
8         Video(url="https://assets.sync.so/docs/example-video.mp4"),
9         Audio(url="https://assets.sync.so/docs/example-audio.wav", ref_id="audio_1"),
10     ],
11     model="lipsync-2",
12     segments=[
13         GenerationSegment(
14             start_time=2,
15             end_time=5,
16             audio_input=SegmentAudioInput(ref_id="audio_1"),
17         ),
18     ],
19 )

Multiple Segments with Single Audio

Multiple Segments with Single Audio Input

1 from sync import Sync
2 from sync.common import Audio, Video, TTS
3 
4 sync = Sync()
5 
6 response = sync.generations.create(
7     input=[
8         Video(url="https://assets.sync.so/docs/example-video.mp4"),
9         Audio(url="https://assets.sync.so/docs/example-audio.wav", ref_id="audio_1")
10     ],
11     segments=[
12         {
13             "startTime": 2,
14             "endTime": 5,
15              "audioInput": {"refId": "audio_1", "startTime": 2, "endTime": 5}
16         },
17         {
18             "startTime": 6,
19             "endTime": 8,
20             "audioInput": {"refId": "audio_1", "startTime": 6, "endTime": 8}
21         }
22     ],
23     model="lipsync-2"
24 )

Multiple Segments with Multiple Audio

Multiple Segments with Single Audio Input

1 from sync import Sync
2 from sync.common import Audio, Video, TTS
3 
4 sync = Sync()
5 
6 response = sync.generations.create(
7     input=[
8         Video(url="https://assets.sync.so/docs/example-video.mp4"),
9         Audio(url="https://assets.sync.so/docs/example-audio.wav", ref_id="audio_1"),
10         Audio(url="https://assets.sync.so/docs/example-audio.wav", ref_id="audio_2")
11     ],
12     segments=[
13         {
14             "startTime": 2,
15             "endTime": 5,
16              "audioInput": {"refId": "audio_1", "startTime": 2, "endTime": 5}
17         },
18         {
19             "startTime": 6,
20             "endTime": 8,
21             "audioInput": {"refId": "audio_2", "startTime": 6, "endTime": 8}
22         }
23     ],
24     model="lipsync-2"
25 )

Best Practices

Planning Your Segments

Map your timeline: Identify video segments and corresponding audio needs
Prepare audio files: Ensure audio quality and appropriate duration
Test segment boundaries: Verify smooth transitions between segments

Audio Preparation

Use consistent audio quality across all segments and the video’s audio.
For best results, ensure proper timing alignment with video segments. If segment duration and corresponding audio duration don’t match, rely on sync_mode to determine how to handle the mismatch.

Troubleshooting

Common Errors

"Multiple audio inputs are only allowed when using multi-segments"

Provide a top-level segments array when using multiple audio or text inputs.

1 # ❌ This will fail
2 response = sync.generations.create(
3     input=[
4         Video(url="video.mp4"),
5         Audio(url="audio1.wav"),  # Multiple audio without segments
6         Audio(url="audio2.wav")
7     ],
8     model="lipsync-2"
9 )
10 
11 # ✅ This will work
12 response = sync.generations.create(
13     input=[
14         Video(url="video.mp4"),
15         Audio(url="audio1.wav", ref_id="a1"),
16         Audio(url="audio2.wav", ref_id="a2")
17     ],
18     segments=[
19         {"start_time": 0, "end_time": 10, "audio_input": {"refId": "a1"}},
20         {"start_time": 10, "end_time": 20, "audio_input": {"refId": "a2"}}
21     ],
22     model="lipsync-2"
23 )

"Unable to resolve audio input URL"

Ensure all audio inputs have valid url or assetId values and that referenced refId values exist in your audio or text inputs.

1 # ❌ Missing refId reference
2 response = sync.generations.create(
3     input=[
4         Video(url="video.mp4"),
5         Audio(url="audio.wav", ref_id="audio1")
6     ],
7     segments=[
8         {"start_time": 0, "end_time": 10, "audio_input": {"refId": "missing"}}  # Wrong refId
9     ],
10     model="lipsync-2"
11 )
12 
13 # ✅ Correct refId reference
14 response = sync.generations.create(
15     input=[
16         Video(url="video.mp4"),
17         Audio(url="audio.wav", ref_id="audio1")
18     ],
19     segments=[
20         {"start_time": 0, "end_time": 10, "audio_input": {"refId": "audio1"}}  # Correct refId
21     ],
22     model="lipsync-2"
23 )

"Segment at index X is missing a valid audioInput.refId"

This error occurs when a segment’s audio_input is missing a refId or the refId is empty. Each segment must reference a valid audio or text input through its refId.

1 # ❌ Missing refId in segment
2 response = sync.generations.create(
3     input=[
4         Video(url="video.mp4"),
5         Audio(url="audio.wav", ref_id="audio1")
6     ],
7     segments=[
8         {
9             "start_time": 0, 
10             "end_time": 10, 
11             "audio_input": {}  # Missing refId
12         }
13     ],
14     model="lipsync-2"
15 )
16 
17 # ✅ Include refId in segment
18 response = sync.generations.create(
19     input=[
20         Video(url="video.mp4"),
21         Audio(url="audio.wav", ref_id="audio1")
22     ],
23     segments=[
24         {
25             "start_time": 0, 
26             "end_time": 10, 
27             "audio_input": {"refId": "audio1"}  # Valid refId
28         }
29     ],
30     model="lipsync-2"
31 )

"Segment at index X references unknown refId"

This error occurs when a segment references a refId that doesn’t exist in your audio or text inputs. Ensure all referenced refId values match exactly with those defined in your inputs.

1 # ❌ Segment references unknown refId
2 response = sync.generations.create(
3     input=[
4         Video(url="video.mp4"),
5         Audio(url="audio.wav", ref_id="audio1")  # refId is "audio1"
6     ],
7     segments=[
8         {
9             "start_time": 0, 
10             "end_time": 10, 
11             "audio_input": {"refId": "nonexistent"}  # References unknown refId
12         }
13     ],
14     model="lipsync-2"
15 )
16 
17 # ✅ Segment references existing refId
18 response = sync.generations.create(
19     input=[
20         Video(url="video.mp4"),
21         Audio(url="audio.wav", ref_id="audio1")  # refId is "audio1"
22     ],
23     segments=[
24         {
25             "start_time": 0, 
26             "end_time": 10, 
27             "audio_input": {"refId": "audio1"}  # References existing refId
28         }
29     ],
30     model="lipsync-2"
31 )

"Invalid audio_input crop range"

Ensure both start_time and end_time are provided for audio cropping, and verify start_time < end_time for all crop ranges.

1 # ❌ Incomplete crop range
2 segments=[
3     {
4         "start_time": 0,
5         "end_time": 10,
6         "audio_input": {
7             "refId": "audio1",
8             "start_time": 5  # Missing end_time
9         }
10     }
11 ]
12 
13 # ✅ Complete crop range
14 segments=[
15     {
16         "start_time": 0,
17         "end_time": 10,
18         "audio_input": {
19             "refId": "audio1",
20             "start_time": 5,
21             "end_time": 15
22         }
23     }
24 ]

"When using multi-segments, please provide at least one audio or text input"

Ensure you have at least one audio input or text input with a valid refId when using segments.

1 # ❌ This will fail - no audio or text inputs
2 response = sync.generations.create(
3     input=[
4         Video(url="https://example.com/video.mp4")
5     ],
6     segments=[
7         {"start_time": 0, "end_time": 10, "audio_input": {"refId": "missing"}}
8     ],
9     model="lipsync-2"
10 )
11 
12 # ✅ This will work - includes text input
13 response = sync.generations.create(
14     input=[
15         Video(url="https://example.com/video.mp4"),
16         TTS(
17             provider={
18                 "name": "elevenlabs",
19                 "voiceId": "EXAVITQu4vr4xnSDxMaL",
20                 "script": "Hello world"
21             },
22             ref_id="text1"
23         )
24     ],
25     segments=[
26         {"start_time": 0, "end_time": 10, "audio_input": {"refId": "text1"}}
27     ],
28     model="lipsync-2"
29 )