Automatically Duck BGM Under Narration with sidechaincompress

Q: Can I duck two BGM tracks with one voice?

Yes — feed both BGMs through separate `sidechaincompress` filters with the same voice as the sidechain input, then mix the ducked outputs together.

What You’ll Learn

How to implement audio ducking with the sidechaincompress filter
How a sidechain compressor works and what each parameter does
Example settings for lowering BGM automatically while narration plays
How to use attack and release to achieve a natural ducking feel

Tested with: FFmpeg 6.1
Platform: Windows / macOS / Linux

What is Audio Ducking?

Audio ducking is the technique of automatically lowering background music whenever narration or voice is detected. It’s used everywhere in video productions, podcasts, and explainer videos. In FFmpeg, the sidechaincompress filter does the job.

How It Works

Narration (sidechain signal)
       ↓ level rises
Compressor attenuates the BGM
       ↓
BGM level drops automatically

sidechaincompress takes two audio inputs:

Main signal (BGM): the track that gets compressed
Sidechain signal (narration): the trigger that drives compression

Basic Command

Duck the audio track of a video file using a separate narration file:

ffmpeg -i input.mp4 -i input.mp3 \
  -filter_complex "[0:a]aformat=fltp:44100:stereo[bg];[1:a]aformat=fltp:44100:stereo[voice];[bg][voice]sidechaincompress=threshold=0.02:ratio=4:attack=200:release=1000[ducked]" \
  -map 0:v -map "[ducked]" -c:v copy output_ducked.mp4

In this example, the audio of input.mp4 acts as BGM and input.mp3 as narration. BGM is attenuated whenever the narration is speaking.

Parameter Reference

Parameter	Description	Default	Recommended range
`threshold`	Sidechain level above which compression starts	0.125	0.01–0.05
`ratio`	Compression ratio (e.g. 4 = 4:1)	2	3–8
`attack`	Time before the compressor kicks in (ms)	20	100–300
`release`	Time to return to the original level (ms)	250	500–2000
`makeup`	Gain applied after compression (dB)	1	—
`knee`	Knee width (softness of the threshold)	2.82843	—

Tuning threshold

Adjust threshold to match the loudness of your narration. Smaller values make the compressor react to quieter speech:

ffmpeg -i input.mp4 -i input.mp3 \
  -filter_complex "[0:a]aformat=fltp:44100:stereo[bg];[1:a]aformat=fltp:44100:stereo[voice];[bg][voice]sidechaincompress=threshold=0.05:ratio=4:attack=200:release=1000[ducked]" \
  -map 0:v -map "[ducked]" -c:v copy output.mp4

Shaping a Natural Ducking Curve

Tuning attack and release controls how softly or abruptly the BGM swells back up around speech.

Slow, natural ducking

ffmpeg -i input.mp4 -i input.mp3 \
  -filter_complex "[0:a]aformat=fltp:44100:stereo[bg];[1:a]aformat=fltp:44100:stereo[voice];[bg][voice]sidechaincompress=threshold=0.02:ratio=6:attack=300:release=1500[ducked]" \
  -map 0:v -map "[ducked]" -c:v copy output_natural.mp4

Fast, tight ducking

ffmpeg -i input.mp4 -i input.mp3 \
  -filter_complex "[0:a]aformat=fltp:44100:stereo[bg];[1:a]aformat=fltp:44100:stereo[voice];[bg][voice]sidechaincompress=threshold=0.02:ratio=8:attack=50:release=500[ducked]" \
  -map 0:v -map "[ducked]" -c:v copy output_tight.mp4

Pre-Attenuate the BGM Before Ducking

If the BGM is already too loud, lower its level first and then apply ducking:

ffmpeg -i input.mp4 -i input.mp3 \
  -filter_complex "[0:a]volume=0.5,aformat=fltp:44100:stereo[bg];[1:a]aformat=fltp:44100:stereo[voice];[bg][voice]sidechaincompress=threshold=0.02:ratio=4:attack=200:release=1000[ducked]" \
  -map 0:v -map "[ducked]" -c:v copy output.mp4

Mix Narration Back In After Ducking

To produce a final mix that contains both the ducked BGM and the narration:

ffmpeg -i input.mp4 -i input.mp3 \
  -filter_complex "[0:a]aformat=fltp:44100:stereo[bg];[1:a]aformat=fltp:44100:stereo[voice];[bg][voice]sidechaincompress=threshold=0.02:ratio=4:attack=200:release=1000[ducked];[ducked][voice]amix=inputs=2:duration=longest[out]" \
  -map 0:v -map "[out]" -c:v copy output_mixed.mp4

The BGM is ducked first, then mixed with the narration for the final output.

Notes

sidechaincompress requires both inputs to share the same sample rate and audio format. The safest approach is to normalize them with aformat.

Note: if the two inputs have different sample rates, normalize them with aformat.
Example: aformat=fltp:44100:stereo

Frequently Asked Questions

What is sidechain compression in plain English?

A compressor that turns down one signal whenever a different signal gets loud — typically used to lower BGM under voiceover. FFmpeg implements it via sidechaincompress.

What threshold and ratio should I start with?

Threshold around -25 dB and ratio 8:1 produce audible ducking for podcasts. Tune attack to 5 ms and release to 200 ms so the BGM recovers naturally between sentences.

Is ducking better than just lowering BGM volume?

Yes for narrated content — static volume cuts kill the music between sentences too, while ducking only reduces it during speech, keeping the energy higher overall.

Can I duck two BGM tracks with one voice?

Yes — feed both BGMs through separate sidechaincompress filters with the same voice as the sidechain input, then mix the ducked outputs together.

Why does my output have audible pumping?

Attack and release are too fast. Slow the attack to 10–20 ms and release to 300–500 ms; you can also drop the ratio to 4:1 to soften the effect.

Loudness Normalization (loudnorm / LUFS)
Volume Detection and Adjustment (volumedetect / volume)
Audio Format Conversion

Tested with ffmpeg 6.1 / Ubuntu 24.04 (検証スクリプトで実行確認)
Primary source: ffmpeg.org/ffmpeg-filters.html#sidechaincompress