What You’ll Learn
- How to implement audio ducking with the
sidechaincompressfilter - How a sidechain compressor works and what each parameter does
- Example settings for lowering BGM automatically while narration plays
- How to use
attackandreleaseto achieve a natural ducking feel
Tested with: FFmpeg 6.1
Platform: Windows / macOS / Linux
What is Audio Ducking?
Audio ducking is the technique of automatically lowering background music whenever narration or voice is detected. It’s used everywhere in video productions, podcasts, and explainer videos. In FFmpeg, the sidechaincompress filter does the job.
How It Works
Narration (sidechain signal)
↓ level rises
Compressor attenuates the BGM
↓
BGM level drops automatically
sidechaincompress takes two audio inputs:
- Main signal (BGM): the track that gets compressed
- Sidechain signal (narration): the trigger that drives compression
Basic Command
Duck the audio track of a video file using a separate narration file:
ffmpeg -i input.mp4 -i input.mp3 \
-filter_complex "[0:a]aformat=fltp:44100:stereo[bg];[1:a]aformat=fltp:44100:stereo[voice];[bg][voice]sidechaincompress=threshold=0.02:ratio=4:attack=200:release=1000[ducked]" \
-map 0:v -map "[ducked]" -c:v copy output_ducked.mp4
In this example, the audio of input.mp4 acts as BGM and input.mp3 as narration. BGM is attenuated whenever the narration is speaking.
Parameter Reference
| Parameter | Description | Default | Recommended range |
|---|---|---|---|
threshold | Sidechain level above which compression starts | 0.125 | 0.01–0.05 |
ratio | Compression ratio (e.g. 4 = 4:1) | 2 | 3–8 |
attack | Time before the compressor kicks in (ms) | 20 | 100–300 |
release | Time to return to the original level (ms) | 250 | 500–2000 |
makeup | Gain applied after compression (dB) | 1 | — |
knee | Knee width (softness of the threshold) | 2.82843 | — |
Tuning threshold
Adjust threshold to match the loudness of your narration. Smaller values make the compressor react to quieter speech:
ffmpeg -i input.mp4 -i input.mp3 \
-filter_complex "[0:a]aformat=fltp:44100:stereo[bg];[1:a]aformat=fltp:44100:stereo[voice];[bg][voice]sidechaincompress=threshold=0.05:ratio=4:attack=200:release=1000[ducked]" \
-map 0:v -map "[ducked]" -c:v copy output.mp4
Shaping a Natural Ducking Curve
Tuning attack and release controls how softly or abruptly the BGM swells back up around speech.
Slow, natural ducking
ffmpeg -i input.mp4 -i input.mp3 \
-filter_complex "[0:a]aformat=fltp:44100:stereo[bg];[1:a]aformat=fltp:44100:stereo[voice];[bg][voice]sidechaincompress=threshold=0.02:ratio=6:attack=300:release=1500[ducked]" \
-map 0:v -map "[ducked]" -c:v copy output_natural.mp4
Fast, tight ducking
ffmpeg -i input.mp4 -i input.mp3 \
-filter_complex "[0:a]aformat=fltp:44100:stereo[bg];[1:a]aformat=fltp:44100:stereo[voice];[bg][voice]sidechaincompress=threshold=0.02:ratio=8:attack=50:release=500[ducked]" \
-map 0:v -map "[ducked]" -c:v copy output_tight.mp4
Pre-Attenuate the BGM Before Ducking
If the BGM is already too loud, lower its level first and then apply ducking:
ffmpeg -i input.mp4 -i input.mp3 \
-filter_complex "[0:a]volume=0.5,aformat=fltp:44100:stereo[bg];[1:a]aformat=fltp:44100:stereo[voice];[bg][voice]sidechaincompress=threshold=0.02:ratio=4:attack=200:release=1000[ducked]" \
-map 0:v -map "[ducked]" -c:v copy output.mp4
Mix Narration Back In After Ducking
To produce a final mix that contains both the ducked BGM and the narration:
ffmpeg -i input.mp4 -i input.mp3 \
-filter_complex "[0:a]aformat=fltp:44100:stereo[bg];[1:a]aformat=fltp:44100:stereo[voice];[bg][voice]sidechaincompress=threshold=0.02:ratio=4:attack=200:release=1000[ducked];[ducked][voice]amix=inputs=2:duration=longest[out]" \
-map 0:v -map "[out]" -c:v copy output_mixed.mp4
The BGM is ducked first, then mixed with the narration for the final output.
Notes
sidechaincompress requires both inputs to share the same sample rate and audio format. The safest approach is to normalize them with aformat.
Note: if the two inputs have different sample rates, normalize them with aformat.
Example: aformat=fltp:44100:stereo
Frequently Asked Questions
What is sidechain compression in plain English?
A compressor that turns down one signal whenever a different signal gets loud — typically used to lower BGM under voiceover. FFmpeg implements it via sidechaincompress.
What threshold and ratio should I start with?
Threshold around -25 dB and ratio 8:1 produce audible ducking for podcasts. Tune attack to 5 ms and release to 200 ms so the BGM recovers naturally between sentences.
Is ducking better than just lowering BGM volume?
Yes for narrated content — static volume cuts kill the music between sentences too, while ducking only reduces it during speech, keeping the energy higher overall.
Can I duck two BGM tracks with one voice?
Yes — feed both BGMs through separate sidechaincompress filters with the same voice as the sidechain input, then mix the ducked outputs together.
Why does my output have audible pumping?
Attack and release are too fast. Slow the attack to 10–20 ms and release to 300–500 ms; you can also drop the ratio to 4:1 to soften the effect.
Related Articles
- Loudness Normalization (loudnorm / LUFS)
- Volume Detection and Adjustment (volumedetect / volume)
- Audio Format Conversion
Tested with ffmpeg 6.1 / Ubuntu 24.04 (GitHub Actions runner)
Primary source: ffmpeg.org/ffmpeg-filters.html#sidechaincompress