What You’ll Learn

  • How to implement audio ducking with the sidechaincompress filter
  • How a sidechain compressor works and what each parameter does
  • Example settings for lowering BGM automatically while narration plays
  • How to use attack and release to achieve a natural ducking feel

Tested with: FFmpeg 6.1
Platform: Windows / macOS / Linux


What is Audio Ducking?

Audio ducking is the technique of automatically lowering background music whenever narration or voice is detected. It’s used everywhere in video productions, podcasts, and explainer videos. In FFmpeg, the sidechaincompress filter does the job.


How It Works

Narration (sidechain signal)
       ↓ level rises
Compressor attenuates the BGM

BGM level drops automatically

sidechaincompress takes two audio inputs:

  1. Main signal (BGM): the track that gets compressed
  2. Sidechain signal (narration): the trigger that drives compression

Basic Command

Duck the audio track of a video file using a separate narration file:

ffmpeg -i input.mp4 -i input.mp3 \
  -filter_complex "[0:a]aformat=fltp:44100:stereo[bg];[1:a]aformat=fltp:44100:stereo[voice];[bg][voice]sidechaincompress=threshold=0.02:ratio=4:attack=200:release=1000[ducked]" \
  -map 0:v -map "[ducked]" -c:v copy output_ducked.mp4

In this example, the audio of input.mp4 acts as BGM and input.mp3 as narration. BGM is attenuated whenever the narration is speaking.


Parameter Reference

ParameterDescriptionDefaultRecommended range
thresholdSidechain level above which compression starts0.1250.01–0.05
ratioCompression ratio (e.g. 4 = 4:1)23–8
attackTime before the compressor kicks in (ms)20100–300
releaseTime to return to the original level (ms)250500–2000
makeupGain applied after compression (dB)1
kneeKnee width (softness of the threshold)2.82843

Tuning threshold

Adjust threshold to match the loudness of your narration. Smaller values make the compressor react to quieter speech:

ffmpeg -i input.mp4 -i input.mp3 \
  -filter_complex "[0:a]aformat=fltp:44100:stereo[bg];[1:a]aformat=fltp:44100:stereo[voice];[bg][voice]sidechaincompress=threshold=0.05:ratio=4:attack=200:release=1000[ducked]" \
  -map 0:v -map "[ducked]" -c:v copy output.mp4

Shaping a Natural Ducking Curve

Tuning attack and release controls how softly or abruptly the BGM swells back up around speech.

Slow, natural ducking

ffmpeg -i input.mp4 -i input.mp3 \
  -filter_complex "[0:a]aformat=fltp:44100:stereo[bg];[1:a]aformat=fltp:44100:stereo[voice];[bg][voice]sidechaincompress=threshold=0.02:ratio=6:attack=300:release=1500[ducked]" \
  -map 0:v -map "[ducked]" -c:v copy output_natural.mp4

Fast, tight ducking

ffmpeg -i input.mp4 -i input.mp3 \
  -filter_complex "[0:a]aformat=fltp:44100:stereo[bg];[1:a]aformat=fltp:44100:stereo[voice];[bg][voice]sidechaincompress=threshold=0.02:ratio=8:attack=50:release=500[ducked]" \
  -map 0:v -map "[ducked]" -c:v copy output_tight.mp4

Pre-Attenuate the BGM Before Ducking

If the BGM is already too loud, lower its level first and then apply ducking:

ffmpeg -i input.mp4 -i input.mp3 \
  -filter_complex "[0:a]volume=0.5,aformat=fltp:44100:stereo[bg];[1:a]aformat=fltp:44100:stereo[voice];[bg][voice]sidechaincompress=threshold=0.02:ratio=4:attack=200:release=1000[ducked]" \
  -map 0:v -map "[ducked]" -c:v copy output.mp4

Mix Narration Back In After Ducking

To produce a final mix that contains both the ducked BGM and the narration:

ffmpeg -i input.mp4 -i input.mp3 \
  -filter_complex "[0:a]aformat=fltp:44100:stereo[bg];[1:a]aformat=fltp:44100:stereo[voice];[bg][voice]sidechaincompress=threshold=0.02:ratio=4:attack=200:release=1000[ducked];[ducked][voice]amix=inputs=2:duration=longest[out]" \
  -map 0:v -map "[out]" -c:v copy output_mixed.mp4

The BGM is ducked first, then mixed with the narration for the final output.


Notes

sidechaincompress requires both inputs to share the same sample rate and audio format. The safest approach is to normalize them with aformat.

Note: if the two inputs have different sample rates, normalize them with aformat.
Example: aformat=fltp:44100:stereo

Frequently Asked Questions

What is sidechain compression in plain English?

A compressor that turns down one signal whenever a different signal gets loud — typically used to lower BGM under voiceover. FFmpeg implements it via sidechaincompress.

What threshold and ratio should I start with?

Threshold around -25 dB and ratio 8:1 produce audible ducking for podcasts. Tune attack to 5 ms and release to 200 ms so the BGM recovers naturally between sentences.

Is ducking better than just lowering BGM volume?

Yes for narrated content — static volume cuts kill the music between sentences too, while ducking only reduces it during speech, keeping the energy higher overall.

Can I duck two BGM tracks with one voice?

Yes — feed both BGMs through separate sidechaincompress filters with the same voice as the sidechain input, then mix the ducked outputs together.

Why does my output have audible pumping?

Attack and release are too fast. Slow the attack to 10–20 ms and release to 300–500 ms; you can also drop the ratio to 4:1 to soften the effect.



Tested with ffmpeg 6.1 / Ubuntu 24.04 (GitHub Actions runner)
Primary source: ffmpeg.org/ffmpeg-filters.html#sidechaincompress