Creating WebMs with FFmpeg

From Rigged Wiki
Revision as of 10:23, 1 October 2015 by Koala (talk | contribs) (Hardsubbing)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

This article will focus on how to create WebM files of cup moments and other content in a way that 4chan accepts them, using only FFmpeg and a media player.

While WebM supports both VP8 and VP9 as video codecs, 4chan currently only allows for VP8. Unfortunately, VP8 is a mediocre video codec and libvpx is not a good encoder, so if you're used to encoding with something like x264 to H.264, then you'll notice that you'll have a lot more knobs to tweak here but the results will be a lot less satisfactory.

Glossary

A few expressions need to be expanded upon as this article will be making use of them. If you're in a hurry, skip this section and venture ahead, and only look back on it as a reference if a word comes up that you don't understand.

  • Bitrate: The rate of information (bits) that a video stream contains.
  • Variable Bitrate (VBR): A bitrate which varies depending on the content of the video over time, using less for easier to compress video, and more for harder to compress segments.
  • Constant Bitrate (CBR): A bitrate which is remains the same over the entire span of the video, resulting in a predictable filesize but no flexibility for the encoder when it comes to how it wishes to allocate the bitrate.
  • Average Bitrate (ABR): A bitrate which, averaged over a certain time segment, meets a target bitrate. This is a form of VBR which remains somewhat predictable, i.e. without sudden "bitrate spikes" in difficult segments, which is important for streaming.
  • Constant Quality Mode: An encoding mode which uses a Constant Rate Factor to aim to encode each frame with a similar quality.
  • Target Bitrate: A bitrate which the encoder aims to reach as its overall average bitrate, but may over- or undershoot to guarantee a constant quality.

General FFmpeg Syntax

To give you a good idea of what a usual FFmpeg command line looks like, here is an overview:

ffmpeg -ss starttime -i inputfile -t length -c:v videocodec other encoding options outputfile 

Seeking

First, we need to know how we can only encode a certain part of the video, or in other words, "cut" it. We do not need to make use of an external video editor for this, as FFmpeg's built-in seeking capabilities are enough.

Finding the Timestamps

With your favourite media-player, seek out the timestamps in the video which will be your start and end. Note them down or remember them.

Note: If you're using mpv, pass --osd-fractions for millisecond-accurate timestamps.

Using them with FFmpeg

FFmpeg has two seeking modes: Precise seeking and fast seeking. Fast seeking skips parsing the part of the file before the point you seek to, saving you a lot of time. In recent FFmpeg versions, fast seeking is just as precise as precise seeking. The only pitfall of fast seeking is that it changes the time reference of all options following it, such as -to, which then assumes the point you sought to is where the start of the video is, or in other words, starts behaving just like -t.

This usually is not a big deal, unless you're trying to burn in subtitles, at which point it becomes a big deal. However, for this task of creating webms from cup stuff, fast seeking is the way to go.

Precise Seeking:

ffmpeg -i inputfile -ss starttime

Fast Seeking:

ffmpeg -ss starttime -i inputfile

Since -to no longer works properly with fast seek, we'll need to subtract the start time from the end time to get our length and pass it to -t. For example, a clip from 1:00 to 1:30 would be:

ffmpeg -ss 1:00 -i inputfile -t 30

If you want to make sure your seeking is correct without having to encode something, you can use the ffplay command. It acts like a video player, but accepts the same kind of manipulation commands as ffmpeg.

ffplay -ss 1:00 -i inputfile -t 30

Video Encoding

Next up, we need to encode the video selecting the right parameters and discarding any audio. Note that we're most likely working with an already previously lossy encoded input here, so there's only so much we can do quality-wise. Generally, the less re-encoding a video clip has seen, the better.

ffmpeg -ss starttime -i inputfile -t length -c:v libvpx -an -b:v 1000k -crf 10 hello.webm

The first thing you may notice is -c:v libvpx and an. The former means "Use the libvpx encoder to encode video", the latter "discard any audio".

Note: If your input file contains a subtitle track (e.g. the VODs from klaxa), be sure to also pass -sn to strip those.

Next come two encoding parameters for libvpx. -b:v sets the target bitrate for the video. -crf switches into constant quality mode and sets a constant rate factor. This also switches the meaning of the bitrate into the maximum allowed bitrate.

Note: You should always set both a bitrate and a constant rate factor, as libvpx is not as smart as x264 and can't derive target bitrate from crf.

Choosing the Right Options

CRF is a number between 4 and 63, where lower means better quality. CRF and target bitrate are tightly coupled, as too low of a CRF combined with too low of a target bitrate means you're going to get something akin to constant bitrate encoding, where you're essentially constantly hitting the maximum allowed bitrate no matter the complexity of the frame. The proper course of action in this case is to either raise the CRF (decrease the quality) or raise the bitrate (increase the filesize).

Similarily, too high of a bitrate with too high of a CRF means your maximum bitrate won't ever be used, as the constant quality you request from the encoder is so crap that it manages to get it not fully utilising your bitrate. In that case, either lower your maximum allowed bitrate or lower your CRF to use your maximum allowed bitrate.

Bitrate

Getting the best result is a process which requires fiddling with some knobs. A good starting point for your bitrate is to simply go for the maximum 4chan allows you to use. Most boards have a WebM filesize limitation of 3072 KiB (exceptions are /gif/, /wsg/, /r9k/, /s4s/ and /b/). 3072 KiB translate into 3145728 bytes, and with each byte consisting of 8 bits, this in turn translates into 25165824 bits. Divide this through the length of your WebM, and you get your maximum bitrate. Of course, whether your file will actually reach those 3072 KiB depends on the content and your CRF; it may end up smaller, in which case it means you can set a higher bitrate to give the encoder more room to play with during hard to encode frames.

Note: Windows file sizes claim to be in Mega- and Kilobyte (MB, kB), but are actually in Mebibyte and Kibibyte (MiB, KiB). The difference is that one Kibibyte is 1024 bytes whereas one Kilobyte is 1000 bytes. In the same way, one Mebibyte is 1024 Kibibytes, while one Megabyte is 1000 Kilobytes. This means the FFmpeg prefixes for bitrates like "500k" or "2M" are for the 1000s, not 1024s like Windows uses. You can make the encoder use the latter system by appending "i" to the bitrate, e.g. "500ki" or "2Mi".

CRF

Choosing CRF is a more annoying process. Essentially, the best CRF is the highest at which you still get acceptable results. A good starting point for libvpx is 10. If you don't care about minmaxing your quality per filesize, this is an okay value and the one you can simply set and be okay with the results in one try.

If you want to get the best out of it, watch the current bitrate ffmpeg tells you while it is encoding. If you see it often hitting the target bitrate, then your CRF is too low for the bitrate. If it never reaches the target bitrate, your CRF is too high for your bitrate.

Quantizer Parameters

Optionally, you can further fine-tune the behaviour of the encoder by setting the maximum and minimum quantizer parameters. These values range from 0 to 63, where the default minimum quantizer setting is 4 and the default maximum quantizer setting is 63. Again, lower means better quality.

The associated options are named -qmin and -qmax. If you want to give the encoder free reign to choose its quantizer parameter, set -qmin to 0 and -qmax to 63. If you want to force it to use a quantizer parameter which generally results in better quality, lower the -qmax value. As with CRF, this is a setting where the tradeoff is with filesize; should you set quantizer parameters which are too low, you're going to end up saturating your target bitrate, which according to the VP8 reference means it will revert to the standard VBR behaviour, which in turn means it'll look bad.

Quality Setting

Optionally, there is also a -quality setting. The good thing about this one is that it's a tradeoff with time, not filesize, so if you increase the -quality setting the encoder will take a longer time but potentially produce better results.

You can set -quality to "good" (recommended) or to "best" (very slow, not much benefit).

Multi-Pass Encoding

libvpx is a bad encoder, so unlike x264, it actually benefits from 2-pass encoding, since it doesn't do any lookahead magic. In 2-pass encoding, a first pass will gather information about the to be encoded video stream, and the second pass does the actual encoding.

In practice, this means first running your FFmpeg command line with -pass 1, then running it again with -pass 2 and telling FFmpeg to overwrite the (still empty) output file from pass 1.

Whether you want to take the time to use 2-pass encoding is up to you. I've seen it produce less compression soup artifacts that I sometimes encounter when encoding with libvpx; especially when encoding a mostly static scene, some bug in the encoder makes it turn into a blurry mess at a regular interval when only using one pass.

A Note about Multi-Threaded Encoding

libvpx's VP8 encoder only supports multi-threaded encoding through slicing. This means it will slice the video up into different pieces and encode them individually, which reduces quality. Therefore I do not recommend using multi-threaded encoding with libvpx.

Advanced Video Manipulation

Now that we know how to cut and encode a video to produce a WebM file, it may also be useful to know some other ways to manipulate it other than simply cutting it. FFmpeg comes with a fuckload of filters for both audio and video, so it's time to take a look at some commonly useful ones. For simple video filtering (one input stream), the -vf option will do. I won't cover complex (i.e. -filter_complex) stuff in this article.

The general syntax for video filters is filtername=option1:option2:option3 or filtername=option1name=option1:option3name=option3. As you can see, the latter way allows you to skip options you don't want to explicitly set. : is the separator between options. So the first example would actually expand to filtername=option1name=option1:option2name=option2:option3name=option3.

As before, you can use ffplay to see your manipulations without having to encode the video.

Cropping

Sometimes we only want to have one subsection of the video, i.e. we want to crop it. This is possible using the crop filter. Here's what most of us will use:

ffmpeg -i inputvideo -vf "crop=width:height:xoffset:yoffset" othershit

For example, if you want to extract a region of 300 by 300 pixels in the bottom right of a 1280 by 720 video, you'd write:

ffmpeg -i inputvideo -vf "crop=300:300:980:420" othershit

If no x and y offset is specified, the cropping operation defaults to being centred around the middle of the input video.

Scaling

FFmpeg also has a scaling filter, which allows you to scale the input video. Especially if you're working with a high resolution input video and simply want to extract a scene from it as a 4chan reaction image, you'll probably want to scale it down a bit.

Here is again the most common format we'll need:

ffmpeg -i inputvideo -vf "scale=width:height" othershit
Protip: You can specify -1 as either width or height, which means "calculate the number from the other one by preserving aspect ratio"

For example, if you'd want to scale a video down to 360p, you'd use:

ffmpeg -i inputvideo -vf "scale=-1:360" othershit

If you want to scale a video by half, you can use some values the filter provides you with:

ffmpeg -i inputvideo -vf "scale=iw/2:ih/2" othershit

Chaining Filters

Multiple filters can be chained by separating them with a comma. Here's an example:

ffmpeg -i inputvideo -vf "rotate=PI/3,scale=iw/2:ih/2" othershit

This would rotate an input video by 60 degrees and then scale it to half the resolution.

Of course you can go absolutely nuts with this:

ffmpeg -i inputvideo -vf "perspective=100:100:W/2,rotate=t,hue=h=t*30" othershit

Hardsubbing

While webplebs have developed their own WebVTT softsub format, 4chan currently doesn't allow it in WebM files. If you want to hardsub something, FFmpeg's "subtitles" filter can come in handy.

Because fast-seeking sets the time base to a different value, the subtitles will be off. This can be corrected through clever use of -copyts and the setpts filter. -copyts copies the timebase so subtitles are properly timed, and the proper setpts filter makes sure the output file properly starts at 0.

ffmpeg -ss 13:37 -i inputvideo -copyts -vf "subtitles=inputvideo,setpts=PTS-STARTPTS" -sn othershit

Note how the first parameter to the subtitles filter is the input video; this is assuming the subtitles are a track inside the input video. If they are not, you can specify a subtitle file. Fonts attached to the input file fill be automatically used as needed.

-sn is passed in so FFmpeg doesn't automatically convert and output the input subs to WebVTT.

Audio Encoding

Some boards, notably /wsg/ and /gif/, also allow for audio in WebM files, and have a maximum file size of 4 MiB.

The WebM standard allows for both the Vorbis and Opus codecs for audio, though 4chan currently only allows the Vorbis codec. While Opus is superior to Vorbis (and as of the time of writing the best audio codec in the world), Vorbis itself is already superior to common audio codecs such as MPEG Layer-3 (commonly known as MP3).

General Considerations

Audio encoding is a bit different from video encoding, in that compressing human speech has pretty much been a solved problem for a while. While for today's standards quite poor, mobile phone speech codecs for telephone conversations provide acceptable quality for conversations at a bandwidth as little as 13 kbit/s. If you're messing with streaming on Twitch, you'll probably be using MP3 or AAC at bitrates from 128 kbit/s to 196 kbit/s. At those bitrates with the aforementioned codecs, and assuming they were encoded by proper encoders, many people will already not be able to recognise any difference compared to lossless audio.

In other words, don't needlessly waste bits on audio because some audiophile tells you to. Most codecs and encoders have been rigorously tested and designed around human speech, so even at low bitrates 4CC commentary should sound just fine.

Stream Copy

The first question you should ask yourself before encoding something is whether you actually even need to re-encode it. If the input file's audio stream is already in Vorbis format, and the bitrate works for the use-case, you can simply copy it.

ffmpeg -i inputvideo -c:a copy othershit

This has the advantage that objectively no quality is lost from the input audio stream. The obvious downside is that it only works for a very small subset of media files you may want to turn into WebM files. A tool such as mediainfo can tell you whether this is the case.

Encoding Vorbis

General syntax:

ffmpeg -i inputfile -c:a libvorbis othershit

The FFmpeg documentation specifies some other potentially useful options for libvorbis encoding, which this article won't go into. The reason for this omission is that for most use-cases they're redundant.

Warning: Do NOT use the vorbis encoder, but libvorbis. The former is garbage. I'm not sure why it's even still kept around.

VBR Encoding

The recommended way of encoding is to use a variable bitrate, so the encoder can be as flexible as possible when it comes to allocating bits.

The associated option for this method of encoding is -q:a, which sets the constant quality level. Higher means better quality, lower means worse quality, so it's exactly the other way around compared to the related options from the video encoding section.

The quality level can vary from -1.0 to 10.0, the default being 3.0. This is a floating point value, so you may use numbers such as 3.14 and so on. I recommend a level above 4.

ABR Encoding

Sometimes, you want to know exactly how big your file gets or want a predictable bitrate in the output. In that case, ABR is useful.

The -b:a option sets the bitrate for ABR mode. While libvorbis is usable at 96 kbit/s, the recommended range is above 128 kbit/s. 128 kbit/s should probably already be good enough for 4CC VODs.