Recompile ffmpeg, is it worth?
Recompile FFmpeg: is it worth?
Overview
This article documents a controlled benchmark study aimed at evaluating whether recompiling FFmpeg with different compilers and optimization flags provides measurable performance benefits on a first-generation AMD Ryzen CPU.
The focus is on CPU-bound, real-world video transcoding workloads, not synthetic microbenchmarks. All tests were performed on the same machine, using identical inputs and methodology, varying only the compiler and compilation flags.
Test System Configuration
Hardware
- CPU: AMD Ryzen 7 1700 (Zen 1, 8 cores / 16 threads)
- RAM: 16 GB
- Storage: SSD
- Architecture: x86_64
Software Environment
- Operating System: Linux Mint 22.3 (Cinnamon, 64-bit)
- Debian base: trixie/sid
- Kernel: 6.8.0-90-generic
- CPU governor: performance (fixed during benchmarks)
Toolchain
- FFmpeg (system): 6.1.1-3ubuntu5
- GCC: 13.3.0
- Clang/LLVM: 18.1.3
- Assembler:
- NASM 2.16.01
- YASM 1.3.0
External Libraries
- x264: built locally from source and linked dynamically
(system x264 package not used)
Test Material
All input files are unmodified Blender Foundation open movies. Files were not trimmed, re-encoded, or altered in any way. SHA-256 checksums were recorded to ensure bitwise-identical inputs across all tests.
| File | Resolution | FPS | Codec |
|---|---|---|---|
| Sintel.2010.1080p.mkv | 1920×818 | 24 | H.264 High |
| Big Buck Bunny 60fps 4K - Official Blender Foundation Short Film.mp4 | 1920×1080 | 60 | H.264 High |
| Tears of Steel - Blender VFX Open Movie.mp4 | 1728×720 | 24 | H.264 High |
Benchmark Methodology
Transcoding Scenario
All benchmarks used a realistic end-to-end transcoding workflow:
- Video encoder: libx264
- Preset: slow
- Rate control: CRF 18
- Audio: copied bit-exact (-c:a copy)
- Threads: 16 (matching logical CPU threads)
Example command (simplified):
ffmpeg -i input \
-c:v libx264 -preset slow -crf 18 \
-c:a copy -threads 16 output.mkv
Measurement
- Tool:
/usr/bin/time -v - Primary metric: wall clock time
- Secondary metrics:
- user CPU time
- peak RSS memory
- Three runs per test, arithmetic mean reported
- Locale forced to
LC_ALL=Cto ensure numeric consistency
Tested Configurations
Baseline
- System FFmpeg package (distribution build)
GCC Builds
- GCC
-O2 - GCC
-O2 -march=znver1 -mtune=znver1 - GCC
-O3 - GCC
-O3 -march=znver1 -mtune=znver1
Clang Builds
- Clang
-O2 - Clang
-O2 -march=znver1 -mtune=znver1
All FFmpeg builds:
- were linked against the same locally built x264
- used identical configure options except for compiler flags
Results Summary
Average Encoding Time (wall clock, seconds)
| Video | System | GCC O2 | GCC O2 znver1 | Clang O2 |
|---|---|---|---|---|
| Sintel | 554.00 | 550.68 | 548.06 | 547.48 |
| Big Buck Bunny | 598.33 | 595.53 | 592.65 | 589.88 |
| Tears of Steel | 308.67 | 306.59 | 306.04 | 305.26 |
(Standard deviation across runs was consistently below 1 second.)
Analysis
GCC
- Recompiling FFmpeg with GCC -O2 provides a small but consistent improvement (~0.5–0.7%).
- Adding Zen1-specific tuning (
-march=znver1) yields an additional ~0.3–0.5%. -O3does not consistently improve performance and may slightly degrade results depending on workload.
Best GCC configuration:
-O2 -march=znver1 -mtune=znver1
Clang
- Clang -O2 outperforms all GCC configurations tested.
- CPU-specific tuning (
znver1) does not improve results with Clang and can be marginally negative. - Memory usage and stability remain comparable to GCC builds.
Best overall configuration:
Clang -O2
Final Conclusion
On a first-generation AMD Zen processor (Ryzen 7 1700):
- Recompiling FFmpeg provides measurable but modest gains.
- The best results in this study were achieved using Clang -O2, with improvements of approximately 1–1.4% over the distribution build.
- GCC benefits slightly from CPU-specific tuning; Clang does not.
- No configuration produced dramatic gains, as most performance-critical paths in x264 are already hand-optimized in assembly.
Practical Recommendation
Recompiling FFmpeg is worthwhile only if:
- encoding is CPU-bound
- workloads are frequent or long-running
- maintaining a custom build is acceptable
For general desktop usage or GPU-based encoding, the system FFmpeg package remains the most practical choice.