Preparing videos for WordPress.TV with FFmpeg

This year I was one of the co-organisers of WordCamp Frankfurt 2016. WordCamps are conferences that focuses on everything WordPress. An important goal is to make it accessible to as many people as possible. For example, tickets are cheap. Also, when possible sessions are made available as videos on WordPress.tv. A number of camera kits are passed around so talks can be recorded. There is a short guide to edit videos, but there is no standardised way of doing this.

Usually, videos are trimmed to only shown the relevant part. When possible, an intro is added. The videos will be available for download, so an intro helpful to see who is presenting and about what. I do not really have experience with video editing software. In the past I have used FFmpeg to automate video customisations. So I decided to use a similar process.

The videos were recorded with video cameras on SD cards as MPEG streams. These files are used in AVCHD recording mode. Files are somewhat hidden and can be found under (or similar):


PRIVATE/AVCHD/BDMV/STREAM/00001.MTS

Recordings are broken up into multiple files depending on the maximum file size (usually 2 or 4 GB). To get a single file you have to concatenate them. You can concatenate MPEG streams as an input parameter. This does not work for MP4 files. In that case you need to use the more complex filters. The copy value is used to copy the audio and video stream without transcoding them.


ffmpeg -i "concat:00001.MTS|concat:00002.MTS" -c copy combined.MTS

Next, you would trim the beginning and the end. To do this you have to set the start time (-ss) and the duration (-t) in seconds.


ffmpeg -i combined.MTS -c copy -ss 52 -t 2230 wc-location-session.MTS

Some recordings from one session room had the wrong zoom setting. This happened because the camera was switch off during breaks and this would reset the zoom factor. The videos were recorded in FullHD (1920x1080 pixels). Because on WordPress.tv the videos would be resized to HD (1280x720 pixels) and a large part of the room was visible, I decided to crop the video to the final resolution. This helped to focus more on the speaker and slides.

The crop function takes width, height, horizontal position and vertical position. With cropping you cannot copy the video stream. So you have to set the video codec (libx264). I also set the constant rate factor (crf) for the video quality. 0 = lossless, 51 = worst.

You can combine the concatenate, cut & crop actions into a single command.


ffmpeg -i "concat:00001.MTS|concat:00002.MTS" \
       -c:v libx264 -crf 16 \
       -c:a copy \
       -vf crop=1280:720:398:214 \
       -ss 52 -t 2230 \
       wc-location-session.MTS

The sound for some sessions was relatively poor. These tracks were put through Auphonic. Two hours of audio recording a month are free of charge.

Videos usually start with an intro with information about the event, speaker name and session title. To do this you can add an image and turn that into a video. You will have to know the frame rate of the final video. Common frame rates are 24, 25 and 30. New video cameras often record with a frame rate of 50 or 60. This would map to 25 and 30, because online video does not really need a high frame rate, unless you are making high quality movie productions or movies with lots of fast movements. Also older or low-end mobile devices or tablets cannot play high frame rates. In FFmpeg -loop 1 ensures the image is repeated during the whole sequence, not just the first frame. The duration is set in seconds with the -t argument. You have to add a silent audio track using -f lavfi -i anullsrc. This is needed to concatenate multiple sequences afterwards. FFmpeg can only do concatenation if the number of tracks for each sequence match.


ffmpeg -loop 1 -i intro.png \
       -f lavfi -i anullsrc \
       -c:a libfdk_aac -ac 2 -b:a 128k -ar 48000 \
       -c:v libx264 -pix_fmt yuv420p -r 25 -crf 0 -s 1920x1080 \
       -t 2 \
       intro.mp4

For WordCamp Frankfurt I added 2 intro sequences. The first with the logo and event name, and the second with the speaker name and session title. This information was added from text files using the drawtext filters. This way the videos could be prepared in advance.

After trimming the videos, I used a script to prepare all the pieces and concatenate them into the final video using the concat filters. I also made sure that the final video was compatible with mobile and tablet devices by rendering a MP4 file with appropriate video and audio settings.


ffmpeg -i intro-1.mp4 \
       -i intro-2.mp4 \
       -i session.MTS
       -filter_complex "[0:0] [0:1] [1:0] [1:1] [2:0] [2:1]  concat=n=3:v=1:a=1 [v] [a]" -map "[v]" -map "[a]" \
       -c:a libfdk_aac \
       -ac 2 -b:a 128k -ar 48000 \
       -c:v libx264 -pix_fmt yuv420p -profile:v high -level 3.2 -movflags faststart -r 25 -crf 23 -s 1920x1080 \
       final.mp4

The preparation script is available on GitLab. These include configuration options and a fallback for intro sequences.

You can see the result in the WordCamp Frankfurt 2016 videos.