Sunday, 8 April 2018

ffmpeg and Nvidia GTX hardware encoding

ffmpeg video processing on an old linux/machine, like a i7 860, running Fedora26 can be painful if you only rely on the CPU. With a recent Nvidia card supporting NVENC (GeForce 6/7/8/10.x) ffmpeg can be compiled directly to support these cards.

To use a prebuilt and CUDA enabled ffmpeg use the version from Negativo 17 over the one from RPM fusion since this version enables additional CUDA features and in particular the scaling and transpose functions
$ dnf config-manager --add-repo=https://negativo17.org/repos/fedora-multimedia.repo
# add lines to rpmfusion to prevent them from offering ffmpeg $ vi /etc/yum.repos.d/rpmfusion-free.repo /etc/yum.repos.d/rpmfusion-free-updates.repo ... exclude=ffmpeg* mpv*
# install $ dnf install ffmpeg ffmpeg-libs mpv
$ for i in encoders decoders filters; do echo $i:; ffmpeg -hide_banner -${i} | egrep -i "npp|cuvid|nvenc|cuda"; done encoders: V..... h264_nvenc NVIDIA NVENC H.264 encoder (codec h264) V..... nvenc NVIDIA NVENC H.264 encoder (codec h264) V..... nvenc_h264 NVIDIA NVENC H.264 encoder (codec h264) V..... nvenc_hevc NVIDIA NVENC hevc encoder (codec hevc) V..... hevc_nvenc NVIDIA NVENC hevc encoder (codec hevc) decoders: V..... h264_cuvid Nvidia CUVID H264 decoder (codec h264) V..... hevc_cuvid Nvidia CUVID HEVC decoder (codec hevc) V..... mjpeg_cuvid Nvidia CUVID MJPEG decoder (codec mjpeg) V..... mpeg1_cuvid Nvidia CUVID MPEG1VIDEO decoder (codec mpeg1video) V..... mpeg2_cuvid Nvidia CUVID MPEG2VIDEO decoder (codec mpeg2video) V..... mpeg4_cuvid Nvidia CUVID MPEG4 decoder (codec mpeg4) V..... vc1_cuvid Nvidia CUVID VC1 decoder (codec vc1) V..... vp8_cuvid Nvidia CUVID VP8 decoder (codec vp8) V..... vp9_cuvid Nvidia CUVID VP9 decoder (codec vp9) filters: ... hwupload_cuda V->V Upload a system memory frame to a CUDA device. ... scale_npp V->V NVIDIA Performance Primitives video scaling and format conversion ... transpose_npp V->V NVIDIA Performance Primitives video transpose
If you wish to build locally Nvidia provide detailed instructions on their ffmpeg dedicated page and can be summarised and tailored:
# install nasm and nvidia CUDA dev kit first (see Nvidia instructions)

# install the wrapper api
$ git clone https://github.com/FFmpeg/nv-codec-headers
$ cd nv-codec-headers && make install

# ... some dependencies
$ dnf install lame-devel pulseaudio-libs-devel libfdk-aac-devel

# ...and config/compile ffmpeg
$ git clone https://git.ffmpeg.org/ffmpeg.git
$ cd ffmpeg && \
PATH=/usr/local/cuda/bin/:$PATH \
PKG_CONFIG_PATH=/usr/local/lib/pkgconfig \
./configure --prefix=/usr/local/ffmpeg-nv \
--disable-debug --disable-static --enable-shared \
--enable-cuda --enable-cuvid --enable-nvenc \
--enable-nonfree --enable-libnpp \
--enable-libmp3lame --enable-libfdk-aac --enable-libpulse \
--extra-cflags=-I/usr/local/cuda/include \
--extra-ldflags="-L/usr/local/cuda/lib64 \
-Wl,-rpath=/usr/local/cuda/lib64:/usr/local/ffmpeg-nv/lib
" && \
make install -j
The tailored items are underlined as I want to install this version along side the system installed version

Surprisingly, the entire process outlined above took less than 20mins to install, even on an old i7 860 machine.

ffmpeg encoding Examples

ffmpeg's and Nvidia's dedicated h/w encoding pages gives numerous details but for the us that want to encode/decode with NVENC, we list the basics below.

The Nvidia cards, in our instance a GTX1060, supports hardware decode (known as NVDEC or CUVID) for mpeg2/vc1/mpeg4/hevc/v8/v9 and encode (NVENC) for mpeg4 (aka h264)/hevc (aka h265) and the locally compiled ffmpeg can utilise the full capabilities.

To verify what nvenc capable encoders are supported:
$ ffmpeg -hide_banner -encoders | grep nvenc V..... h264_nvenc NVIDIA NVENC H.264 encoder (codec h264) V..... nvenc NVIDIA NVENC H.264 encoder (codec h264) V..... nvenc_h264 NVIDIA NVENC H.264 encoder (codec h264) V..... nvenc_hevc NVIDIA NVENC hevc encoder (codec hevc) V..... hevc_nvenc NVIDIA NVENC hevc encoder (codec hevc)
To verify what the h264_nvenc encoder is capable of:
$ ffmpeg -hide_banner -h encoder=h264_nvenc Encoder nvenc [NVIDIA NVENC H.264 encoder]: General capabilities: delay hardware Threading capabilities: none Supported hardware devices: cuda cuda Supported pixel formats: yuv420p nv12 p010le yuv444p p016le yuv444p16le bgr0 rgb0 cuda h264_nvenc AVOptions: -preset E..V...... Set the encoding preset (from 0 to 11) (default medium) default 0 E..V...... slow 1 E..V...... hq 2 passes medium 2 E..V...... hq 1 pass fast 3 E..V...... hp 1 pass hp 4 E..V...... hq 5 E..V...... bd 6 E..V...... ll 7 E..V...... low latency llhq 8 E..V...... low latency hq llhp 9 E..V...... low latency hp lossless 10 E..V...... losslesshp 11 E..V...... -profile E..V...... Set the encoding profile (from 0 to 3) (default main) baseline 0 E..V...... main 1 E..V...... high 2 E..V...... high444p 3 E..V...... -level E..V...... Set the encoding level restriction (from 0 to 51) (default auto) auto 0 E..V...... 1 10 E..V...... 1.0 10 E..V...... 1b 9 E..V...... 1.0b 9 E..V...... 1.1 11 E..V...... 1.2 12 E..V...... 1.3 13 E..V...... 2 20 E..V...... 2.0 20 E..V...... 2.1 21 E..V...... 2.2 22 E..V...... 3 30 E..V...... 3.0 30 E..V...... 3.1 31 E..V...... 3.2 32 E..V...... 4 40 E..V...... 4.0 40 E..V...... 4.1 41 E..V...... 4.2 42 E..V...... 5 50 E..V...... 5.0 50 E..V...... 5.1 51 E..V...... -rc E..V...... Override the preset rate-control (from -1 to INT_MAX) (default -1) constqp 0 E..V...... Constant QP mode vbr 1 E..V...... Variable bitrate mode cbr 2 E..V...... Constant bitrate mode vbr_minqp 8388612 E..V...... Variable bitrate mode with MinQP (deprecated) ll_2pass_quality 8388616 E..V...... Multi-pass optimized for image quality (deprecated) ll_2pass_size 8388624 E..V...... Multi-pass optimized for constant frame size (deprecated) vbr_2pass 8388640 E..V...... Multi-pass variable bitrate mode (deprecated) cbr_ld_hq 8 E..V...... Constant bitrate low delay high quality mode cbr_hq 16 E..V...... Constant bitrate high quality mode vbr_hq 32 E..V...... Variable bitrate high quality mode ...
To understand profiles see this answer, where it summarises that high is better choice for long term storage whereas preset would allow the encoder to enable further tools available for compression but to remain within the target bps.

In the examples below, we demonstrate simple downscaling of a 1min 2.7k 23.976fps (GoPro) mp4 to 1080 mp4 and the differences in timings and the subtle fffmpeg options.

Fully use the Nvidia card, using h/w decode/encode AND gpu RAM

top barely registers any usage as all the encodin work happens on the GPU.
NB: use of -hwaccel cuvid to force GPU mem only (no copy between gpu/system mem) and Nvidia specific libnpp's scale_npp in the test to h/w transcode and h/w scale.
$ time ffmpeg -y \ -hwaccel cuda -hwaccel_output_format cuda \
-c:v h264_cuvid
\ -i GOPR1535.MP4 \ -preset hp -rc cbr \ -vf scale_npp=-1:1080 \ -c:a copy -c:v h264_nvenc \ foo.mp4 real 0m6.943s user 0m1.735s sys 0m0.915s

h/w decode/encode with system RAM

$ time ffmpeg -y \ -c:v h264_cuvid \ -i GOPR1535.MP4 \ -preset hp -rc cbr \ -vf scale_npp=-1:1080 \ -c:a copy -c:v h264_nvenc \ foo.mp4 real 0m32.322s user 0m31.352s sys 0m0.792s

CPU s/w decode/encode

top reports the ffmpeg process, on the 8x core CPU, at ~750% utilisation (100% would be showing 800% utilisation).
$ time ffmpeg -y \ -i GOPR1535.MP4 \ -cbr -vf scale=1920:1080 \ -c:a copy -c:v libx264 \ foo.mp4 real 1m38.173s user 12m17.288s sys 0m1.911s

As shown above, ffmpeg and NVENC provides a significant improvement over CPU only encoding.

ffmpeg cheat sheet

  • specify NV profile/presets/max rates
    ffmpeg -hwaccel cuda -hwaccel_output_format cuda -c:v h264_cuvid \ -i GOPR1535.MP4 \ -profile high -preset hq -rc vbr_hq -vb 4M -minrate 500k -maxrate 12M \ -vf scale_npp=-1:720 -c:a copy -r 23.976 -cq 20 \ -c:v h264_nvenc /tmp/foo.mp4

  • Convert h265/hevc .mov to h264 .mp4
    Recent Apple devices can record video as hevc (aka h265) wrapped in a .mov but some older devices don't have h265 h/w decode.
    ffmpeg -y -hide_banner -hwaccel cuda \ -c:v hevc_cuvid -i foo.MOV \ -map_metadata 0 \
    -map_metadata:s:v 0:s:v \
    -map_metadata:s:a 0:s:a \
    -movflags use_metadata_tags
    \ -c:a copy \ -profile:v main -preset ll \ -rc vbr -vb 2.5M -minrate 500k -maxrate 6M -r 23.976 -cq 24 \ -c:v h264_nvenc foo.mp4

    Note the -movflags use_metadata_tags is needed to force the metadata copy from the .mov container.

  • change the metadata rotation
    ffmpeg -i foo.mp4 -map_metadata 0 -metadata:s:v rotate="180" -codec copy bar.mp4

  • GPU decoder video scaling
    # scale 2k 16:9 (2704) into a 1080 output $ time \ ffmpeg -hwaccel cuda -hwaccel_output_format cuda \ -resize 1920x1080 \ -c:v h264_cuvid -i foo2k.mp4 \ -c:a copy -c:v h264_nvenc foo1080.mp4 real 0m11.180s user 0m4.277s sys 0m0.818s
    The -vf scale_npp' is useful for multiple sizes for output in a single transcoding pipeline - if only one output required use -resize. Additionally, the filter allow specification of interperlation algorithm useful for upscaling.
    $ ffmpeg -h filter=scale_npp $ time \ ffmpeg -y -hwaccel cuda -hwaccel_output_format cuda \ -c:v h264_cuvid -i foo2k.mp4 \ -vf scale_npp=1920:1080 -c:a copy -c:v h264_nvenc foo1080.mp4 real 0m11.772s user 0m4.600s sys 0m0.902s $ ffmpeg -y -hwaccel cuda -hwaccel_output_format cuda \ -c:v h264_cuvid -i foo480p.mp4 \ -vf scale_npp=w=1280:h=720:interp_algo=lanczos \ -c:a copy -c:v h264_nvenc foo720lanczos.mp4 \ -vf scale_npp=w=-1:h=1080:interp_algo=lanczos \ -c:a copy -c:v h264_nvenc foo1080.mp4
    Note that you can use the shorthand scale_npp=-1:1080 with no other parameters to let the filter decide on the other value to preserve the aspect ratio.

  • GPU crop an area of video
    # crop the centre section of 2k 16:9 (2704x1520) into a 1920x1080 output # argument takes top x bottom x left x right # top/bottom = (1520-1080)/2 = 220, left/right = (2704-1920)/2 = 392 ffmpeg -y -hwaccel cuda -hwaccel_output_format cuda \ -crop 220x220x392x392 \ -c:v h264_cuvid -i foo2k.mp4 \ -c:a copy -c:v h264_nvenc foo1080centre.mp4 # crop the centre section of 1080 into a 720 output ffmpeg -y -hwaccel cuda -hwaccel_output_format cuda \ -crop 180x180x320x320 \ -c:v h264_cuvid -i foo1080.mp4 \ -c:a copy -c:v h264_nvenc foo720centre.mp4

  • (s/w) crop an area of video
    ffmpeg -i GOPR1535.MP4 \ -filter:v "crop=out_w=1920:out_h=1000:x=0:y=0" \ -c:a copy -c:v h264_nvenc /tmp/foo.mp4

  • GPU rotoate video
    $ ffmpeg -hide_banner -h=filter=transpose_npp
    # rotates 90 $ ffmpeg -y -hwaccel cuvid -c:v h264_cuvid -i foo.mp4 -c copy \ -vf scale_npp=format=yuv420p,transpose_npp=clock \ -c:v h264_nvenc bar.mp4

  • GPU mirroring video
    $ ffmpeg -y -hwaccel cuvid -c:v h264_cuvid -i foo.mp4 -c copy \ -vf scale_npp=format=yuv420p,\ transpose_npp=clock,\ transpose_npp=clock_flip \ -c:v h264_nvenc bar.mp4
    # SW Notice none of the usual cuvid/cuda items except on encode $ ffmpeg -i foo.mp4 -c:a copy \ -vf hflip -c:v h264_nvenc bar.mp4

  • copy section of video
    # know exactly where to end, using -to ffmpeg -hwaccel cuvid -c:v h264_cuvid \ -ss 1:01 \ -i GOPR1535.MP4 \ -to 4:38 \ -c copy /tmp/foo.mp4 # know duration using -t ffmpeg -hwaccel cuvid -c:v h264_cuvid \ -i GOPR1535.MP4 \ -ss 1:01 -t 0:3:37 \ -c copy /tmp/foo.mp4

    This performs a copy of the video but in the past I've had to re-encode for video output due to problems with direct video stream copy (-c copy) with the resulting file being unplayable with some players due to missing headers etc.

  • convert flac to mp3 but keeping metadata
    ffmpeg -i foo.flac \ -ar 44100 -q:a 1 -map_metadata 0 -id3v2_version 3\ foo.mp3

    See mp3 encoding guide. The example above is VBR at roughly 190-250k but CBR can be selected with -ab 320k etc

  • change metadata to aac/m4a
    # extract the metadata from file ffmpeg -i foo.m4a -f ffmetadata foo.m4a.txt # updated the meta data extracted into foo.m4a.txt vi foo.m4a.txt # create new data file with updated metadata ffmpeg -i foo.m4a -f ffmetadata -i foo.m4a.txt -map_metadata 1 -c:a copy bar.m4a

  • replacing/merging audio
    A typical case where you want to replace the original audio of a video with a modified version that you have extracted; this could be to clean up levels etc.

    First you need to determine the streams that are stored in the container.
    $ ffprobe -hide_banner foo.mp4 Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'foo.mp4': Metadata: major_brand : isom minor_version : 512 compatible_brands: isomiso2avc1mp41 encoder : Lavf58.12.100 Duration: 00:17:01.01, start: 0.000000, bitrate: 859 kb/s Stream #0:0(und): Video: h264 (Main) (avc1 / 0x31637661), yuv420p, 1280x720 [SAR 1:1 DAR 16:9], 727 kb/s, 29.97 fps, 29.97 tbr, 30k tbn, 59.94 tbc (default) Metadata: handler_name : VideoHandler Stream #0:1(und): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 125 kb/s (default) Metadata: handler_name : SoundHandler

    This file has 1 video and 1 audio stream; some files may contain multiple audio streams - think multi-language DVD/BluRay rips. It is important to specify the streams you want and this is via ffmpeg -map option

    Extracting the audio you can perform direct extract (matching the output to the original audio stream's format) or you can ask ffmpeg to convert to another format
    # extract the audio and convert to wav $ ffmpeg -i foo.mp4 foo.wav # extract the audio by specifying the stream and no conversion (.aac) extension to match the original $ ffmpeg -i foo.mp4 -map 0:1 -c copy foo.aac

    Now you can modify the audio file. To mux the video and new audio streams together, again you use -map to identify which streams you require from the input files; notice below -map 0:0 and -map 1:0. This allows you specify stream #x from input 0:x (first file) and so on.

    This will perform the mux using direct copying of the streams.
    $ ffmpeg \ -i foo.mp4 \ -i foo.aac \ -map 0:0 -map 1:0 \ -c:a copy -c:v copy \ -metadata:s:a:0 language=eng \ /tmp/foo.mp4
    Note we have defined the audio stream as English. We can have multiple audio streams with different languages encoded too.

    This will perform the mux using copying the video stream but convert audio to aac format
    $ ffmpeg \ -i foo.mp4 \ -i foo.wav \ -map 0:0 -map 1:0 \ -c:a aac -c:v copy \ /tmp/foo.mp4

  • screen grab with audio
    $ ffmpeg -hwaccel cuvid \ -f x11grab -s 1920x1200 -framerate 60 \ -i :${DISPLAY}.0 \ -f pulse -ac 2 -i default -c:a aac \ -pix_fmt yuv420p \ -c:v h264_nvenc -cq 27 -preset llhq -rc vbr \ -vb 3MB -minrate 500k -maxrate 8MB \ /tmp/foo.mp4

    Note that you if you have a combined virtual source (in the case you have a USB headset/mic that you've setup for dual output on system speaker and headset) you will need to adjust the -i default above. The easiest way to select pulse audio source is via pavucontrol once you start recording using above and then adjust the recording tab to select your prefered device. Once this has been set, it is remembered even if you try to set it again to anything other than "default". I have taken to using a value from pactl list short sources | grep monitor such as combined.monitor

    The scaling as in previous versions is having problems with the brg0 colourspace going into the scaler: hwupload_cuda,scale_npp=w=1280:h=720:format=nv12:interp_algo=lanczos,hwdownload,format=nv12

  • Removing specific streams
    If you have an input with multiple streams that you want to keep except for one, use the -map 0 -map -0:0 format, where in this case you are mapping everything (-map 0) but dropping the 3rd stream. This is useful, for example, when transcoding multilanguage movie rips to mobile devices that don't allow audio stream selection:
    ffmpeg \ -i foo.mp4 -map 0 -map -0:2 -c copy bar.mp4

  • Transcoding ProRes or >8bit encoded files
    The nvenc h264 encoder can't handle 10bit or higher bit depth, so you must force the format.
    ffmpeg \ -hwaccel cuda -hwaccel_output_format cuda \ -i foo.mov \ -vf format=yuv420p ... \ -c:v h264_nvenc foo.mp4

  • Adding chapters to existing mp4
    We need to prepare a file that has on line 1 ;FFMETADATA1 and defining chapters based on time sequence
    $ vi chapters.ffmeta ;FFMETADATA1 title=2018 May, Trip to Mars artist=Me DATE_RECORDED=2018-05-01 RECORDING_LOCATION=UK, London PRODUCER=Me ; time is in 1/10ths of seconds based on TIMEBASE ; title can be blank [CHAPTER] TIMEBASE=1/10 START=0 END=300 title=Intro [CHAPTER] TIMEBASE=1/10 START=301 END=600 title= [CHAPTER] TIMEBASE=1/10 START=601 END=6000 title=Credits $ ffmpeg -i foo.mp4 \ -i chapters.ffmeta -map_metadata 1 \ -codec copy foo-chapters.mp4
    Extracting meta from existing file can be achieved: ffmpeg -i foo.mp4 -f ffmetadata chapters.ffmeta

  • burning on-screen subtitles
    Whilst many players can accept the .srt file along side the video file, sometimes you may want a hard/burned in subtitle
    # two stage, if you need further modification to the .ass file in another editor $ ffmpeg -i subtitles.srt subtitles.ass
    $ ffmpeg -hwaccel cuvid -c:v h264_cuvid \ -i in.mp4 \ -vf ass=subtitles.ass \ -c:v h264_nvenc out.mp4
    # single stage, specifiy font size and offset from bottom of frame for subtitles
    $ ffmpeg -hwaccel cuda -c:v h264_cuvid -i foo.mp4 \ -vf "subtitles=f=subs.srt:force_style='Fontsize=14,Fontname=Roboto Sans,MarginV=6'" -c:a copy -c:v h264_nvenc bar.mp4
    # if there is no border (ie solid black bar) where the subtitles will sit and it will be directly onto the video, you may want to add stroke to the font
    $ ffmpeg -hwaccel cuda -c:v h264_cuvid -i foo.mp4 \ -vf "subtitles=f=subs.srt:force_style='Fontsize=28,Fontname=Roboto Sans,MarginV=6,Outline=1,OutlineColour=&H000000&'" -c:a copy -c:v h264_nvenc bar.mp4

  • Blocking areas on video
    If you've made a mistake with your burned in subtitles and at somepoint want to hide the previously burned in subtitles with a solid color bar, this can be achieved using the bordersfill filter. One starting point is to grab a screen shot to figure out the border offsets in pixels. Lets say we want to top/bottom 30pixels turned to black
    $ ffmpeg -hwaccel cuda -c:v h264_cuvid -i foo.mp4 \ -i overlay.png -filter_complex "overlay" -c:a copy -c:v h264_nvenc bar.mp4
    This is relatively quick. If you have a specific png (with transparency etc) you want to overlay onto a video, this can be done but is slower.
    $ ffmpeg -hwaccel cuda -c:v h264_cuvid -i foo.mp4 \ -vf fillborders=color=black:mode=fixed:bottom=30:top=30 -c:a copy -c:v h264_nvenc bar.mp4

  • removing all metadata/adding title
    $ ffmpeg -i foo.mp4 \ -c copy -map_metadata -1 -metadata title="The Title" -map_chapters -1\ foo-newtitle.mp4

  • Mapping all audio streams on encode. Note the -map 0:a?
    $ ffmpeg -hwaccel cuvid -i in.mp4 \ -map 0:v -map 0:a? \ -c copy out.mp4
    See ffmpeg's advanced options. Conversely, to include only the 3rd (-map 0:a:2) audio stream:
    $ ffmpeg -hwaccel cuvid -i in.mp4 \ -map 0:v -map 0:a:2 \ -c copy out.mp4
    and to include all audio stream except the 2nd stream using the minus (-0:a:1):
    $ ffmpeg -hwaccel cuvid -i in.mp4 \ -map 0:v -map -0:a:1 \ -c copy out.mp4
    A real world example that I often encounter is once I've completed a rip/transcode, I find that sometimes I forget to bring across all the audio streams (such as the audio/director commentary from). To save reencoding again you can use:
    $ ffmpeg -i encoded-but-missing.mp4 -i full.whatever \ -map 0:v \ ## the video stream from first source (the already encoded mp4) -map 1:a? \ ## all audio streams from second source -c copy \ ## explicit no encoding final.mp4

  • concatenate .ts files
    If you have files of the same codec and parameters you can use the simply method:
    $ cat foo1.ts foo2.ts | ffmpeg -i pipe: -c copy foo.mp4 # alternatively using m3u8 playlist $ cat > foo.m3u8 << EOF foo1.ts foo2.ts EOF $ ffmpeg -i foo.m3u8 -c copy foo.mp4
    If however your files are different, you must use ffmpeg's concat filter.

  • normalising audio
    Historically we'd use either the sox or normalize utils to perform audio normalization (and ignoring this) but of course ffmpeg offers this functionality too:
    $ ffmpeg -i foo.flac -af volumedetect -vn -s -dn -f null /dev/null ... [Parsed_volumedetect_0 @ 0x790c00] n_samples: 23755644 [Parsed_volumedetect_0 @ 0x790c00] mean_volume: -26.7 dB [Parsed_volumedetect_0 @ 0x790c00] max_volume: -3.6 dB [Parsed_volumedetect_0 @ 0x790c00] histogram_3db: 45 [Parsed_volumedetect_0 @ 0x790c00] histogram_4db: 142 [Parsed_volumedetect_0 @ 0x790c00] histogram_5db: 381 [Parsed_volumedetect_0 @ 0x790c00] histogram_6db: 859 [Parsed_volumedetect_0 @ 0x790c00] histogram_7db: 1892 [Parsed_volumedetect_0 @ 0x790c00] histogram_8db: 3580 [Parsed_volumedetect_0 @ 0x790c00] histogram_9db: 6603 [Parsed_volumedetect_0 @ 0x790c00] histogram_10db: 13503
    We need to know the max_volume and we can adjust up to this value before clipping occurs. In this example we have 3.6dB of headroom before clipping so we can raise the volume across the entire clip by this.
    $ ffmpeg -i foo.flac -af volume=3.2dB bar.flac
    For video files (and in particular movies) where the dynamic range of the audio is too wide, ffmpeg provides a filter that can raise the volume of quieter scenes but this will change the audio characteristics.
    $ ffmpeg -loglevel quiet -nostdin \ -i foo.mp4 \ -c:v copy -c:a aac -b:a 192k \ -af "dynaudnorm=p=0.93:g=71" bar.mp4
    The dynaudnorm provides singificantly more functionality and control than the simple example above that simply sets the peak and window size to analyse but for simple and quick audio adjustments on your ripped movies this is a good start. Furthermore, the -loglevel quiet -nostdin allows this to be run from within a script that can be sent to the background.

    To perform this manually, we can also use sox using pipes and then re-add the new audio file back to video file.
    $ ffmpeg -i foo.mp4 -c:a pcm_s16le -ar 44.1k -f wav - |\ sox -t wav - -t wav - vol 5db 0.05 |\ ffmpeg -i - -vbr 4 foo.aac $ ffmpeg -i foo.mp4 -i foo.aac \ -map 0:0 -map 1:0 \ -c:a copy -c:v copy \ -metadata:s:a:0 language=eng \ bar.mp4

  • generating random 5sec audio
    # brown noise $ ffmpeg -f lavfi -i 'anoisesrc=color=brown' -b:a 128k -ar 44100 -t 5 5sec.mp3 # sine 300hz wav, with 1sec interval 1.2khz beeps $ ffmpeg -f lavfi -i 'sine=frequency=300:beep_factor=4:duration=5' -b:a 96k -ar 44100 -t 5 sine200hhz_5s.mp3

  • generating random 5sec video Colour noise:
    $ ffmpeg -hwaccel cuda \ -f rawvideo -video_size 1280x720 -pixel_format yuv420p -framerate 23.976 \ -i /dev/random \ -f lavfi -i 'anoisesrc=color=brown' -b:a 96k -ar 44100 \ -c:v h264_nvenc -t 5 \ foo.mp4
    Static SMPTE bars:
    $ ffmpeg -hwaccel cuda \ -f lavfi -i smptebars=duration=5:size=1280x720:rate=1 \ -c:v h264_nvenc \ foo.mp4
    Moving test image
    $ ffmpeg -hwaccel cuda \ -f lavfi -i testsrc=duration=5:size=1280x720:rate=23.976 \ -c:v h264_nvenc \ foo.mp4

  • generating timecodes Burn in with monospace font timecode of h:m:s.ms counter, positioned 50pixels from bottom and centred, using a bouding blackbox with 55% opacity and 10pixel internal border. Using default monospace font but specific font file can be specified such as fontfile=/usr/share/fonts/google-roboto/Roboto-Thin.ttf
    $ ffmpeg -hwaccel cuda -c:v h264_cuvid -i foo.mp4 \ -vf "drawtext=text='%{pts\:hms}':font=monospace:fontsize=24:fontcolor=white:box=1:boxborderw=10:boxcolor=black@0.55:x=(w-text_w)/2:y=h-text_h-50" -c:a copy -c:v h264_nvenc \ bar.mp4
    Generating a transparent standalone 5 timecode that can be overlaid in an editor. For specific timings, can use -ss 0 -to 01:02:03.4
    $ ffmpeg -hwaccel cuda -c:v h264_cuvid \ -f lavfi -i color=white@0.0:s=780x200,format=rgba -vf "drawtext=text='%{pts\:hms}':font=monospace:fontsize=96:x=(w-text_w)/2:y=(h-text_h)/2" \ -c:v prores_ks \ -t 5 foo.mov

  • splitting an audio file based on silence See stackoverflow answer. We have to use the silencedetect filter and the start/end sections. In our example we specify -55dB as the threshold for silence and we expect it to last 1seeconds and we'll accept an segment length of at least 5 seconds. ffmpeg would give us output such as:
    [silencedetect @ 0x148acc0] silence_start: 195.017ts/s speed= 175x [silencedetect @ 0x148acc0] silence_end: 197.8 | silence_duration: 2.783 [silencedetect @ 0x148acc0] silence_start: 294.431ts/s speed= 179x [silencedetect @ 0x148acc0] silence_end: 296.464 | silence_duration: 2.03262 ...
    Not very easy to handle but as the stackoverflow answer notes, its possible to script:
    $ SEGMENTS="$(ffmpeg -hide_banner -v warning -i foo.mp3 \ -af silencedetect=-55dB:d=1,ametadata=mode=print:file=-:key=lavfi.silence_start \ -vn -sn -f s16le -y /dev/null \ | grep lavfi.silence_start= \ | cut -f 2-2 -d= \ | perl -ne ' our $prev; INIT { $prev = 0.0; } chomp; if (($_ - $prev) >= 5) { our $out = $_ + 1; print "$out,"; $prev = $_; } ' \ | sed 's!,$!!')"
    $ echo $SEGMENTS 195.017,294.431,490.275,689.289,883.086,1094.36,1294.03,1517.1,1751.91,1840.58,2041.03,2185.16,2349.59,2558.09,2768.49,2989.99,3221.9,3296.51,3328.74,3507.47,3541.94,3778.77,3803.03,3821.83,3861.01,3889.2,3949.61,4035.18,4140.01,4573.94 # using the output from above
    $ ffmpeg -i foo.mp3 \ -c:a libmp3lame -ar 44100 -q:a 1 \ -metadata album="foo" -metadata artist="Bar" \ -map 0 -f segment -segment_times ${SEGMENTS} \ bar-%02d.mp3

    We can also use -c copy instead of the mp3 encoding args if this file is a .wav or a h264.
  • slideshow video from images of different sizes with each
    $ ffmpeg \ -framerate 1/1.5 \ -pattern_type glob -i '*.jpg' \ -vf "scale=1280:720:force_original_aspect_ratio=decrease,pad=1280:720:(ow-iw)/2:(oh-ih)/2:color=white:eval=frame,setsar=1" \ -c:v h264_qsv \ -r 24 -pix_fmt yuv420p \ slideshow.mp4
    Additional information.
  • Visualisation to audio A number of options to generate the classic wav forms with additional options: http://www.ffmpeg.org/ffmpeg-filters.html#showwaves
    $ ffmpeg \ -i audio.wav \ -filter_complex "[0:a]showwaves=s=hd720:mode=line:rate=25:colors=green|white,format=yuv420p[v]" \ -map "[v]" \ -map 0:a \ -c:v h264_qsv \ -c:a aac -ab 192k \ audio.mp4 # https://lukaprincic.si/development-log/ffmpeg-audio-visualization-tricks # with text $ ffmpeg \ -i audio.wav \ -filter_complex "[0:a]showwaves=s=hd720:mode=line:rate=25:colors=green|white,format=yuv420p[v];[v]drawtext=text='title of this audio':fontcolor=white:fontsize=30:x=(w-text_w)/5:y=(h-text_h)/5[out]" \ -map "[out]" \ -map 0:a \ -c:v h264_qsv \ -c:a aac -ab 192k \ audio.mp4

No comments:

Post a Comment