Adventures with gamma transfer functions and video colorspace conversions

October 18, 2020

Recently I wrote Video Player from scratch for Subtitle Composer that relies only on FFmpeg for decoding, OpenGL for rendering video and OpenAL for playing audio.

After using several external video players and various libraries over the years - I got fed up with annoying issues, code getting filled with nasty workarounds for simple stuff like seeks, timestamps and volume control and wasting time debugging player/file issues that shouldn’t be there in the first place.

The idea is simple:

I started by taking example ffplay code from FFmpeg, cleaning it up a bit and replacing small bits with C++ and Qt5 classes and APIs. In no time I had working video player that was working nicely and rendering video into a SDL window and playing SDL audio.

After awhile SDL window was replaced with QOpenGLWidget. YUV video frame was fed into a texture, simple GLSL shader was doing YUV -> RGB conversion and voila - frame is rendered inside a video widget. Everything seemed fine so far, but I knew that video frames don’t always have to be in YUV format and they can have different colorspaces.

Afterwards SDL audio was replaced with OpenAL - that part went smoothly too and audio part was done.

Then I went digging into various frame formats and colorspaces and looking information about that - honestly was hoping that I would find some information on Wikipedia or online, and some nice source in existing video player like MPV or VLC do some copy/pasta add some credits and be done with it.

What I ended up figuring is that there is a whole science about image encoding and perception that needed to be taken into account. Then there is science and history of transmitting and encoding analog/digital video signals and their relative standards (whose specs are nowhere to be found publicly, hard to obtain and hidden behind paywalls) and some linear algebra. I found out that most open source video players around did wrong color space conversions, did approximate gamma correction which resulted in colors which weren’t perfect and dark video frames being sometimes completely unperceivable.

Dark video frames in most software video players were very close to all black and I had trouble perceiving shapes, objects and people on dark scenes without tweaking brightness/contrast which would result in light scenes becoming too light. Watching the same video on my smart TV resulted with completely different experience - I could perceive people and objects in dark scenes, and light scenes had normal color and lightness.

This was all happening because of gamma encoding of video pixels. To put it simply they aren’t properly decoded into RGB pixels. I won’t go into too much detail here and will explain it with somewhat incorrect terms, just to give the general idea. I have listed references to books that I’ve found very informative, useful and accurate at the bottom.

The thing that bothered me the most is how I was able only to find a bunch of incorrect and inaccurate matrices and examples online for colorspace conversion and gamma correction, but the math, science and accurate explanation is nowhere to be freely found - that sadly includes Wikipedia which doesn’t explain much and some information it does is incorrect.

It all boils down to human vision experiments done in 1931 by CIE - human eye has 3 different type of cells that are sensible to different light spectrum. Some cells are more sensitive to light than the others, and in general we can better perceive small color differences in low light than in bright light conditions.

Light spectrum information is encoded into RGB values (red, green and blue - one for each receptor cell in human eye) which are then mathematically transformed into XYZ colorspace. More or less Y component is our perception of lightness - XZ components are our perception of color (hue).

Video is usually encoded into some YUV colorspace - Y component was supposed to represent lightness (but it isn’t) and UV are supposed to represent color. As said before human eye is more sensitive to differences in low light than in bright light. For example if Y was to be represented with values between 0 and 1000, with 0 being complete darkness, human eye might notice the difference between 3 and 4 but would might not notice the difference between 900 and 960.

Since each component in YUV is usually encoded into one byte (0-255 or -128-127) - values are transformed using exponential function. That way Y of 2-9 would end up between 1.4-3.0, while 900-960 would change to 30.0-31.0 (this is just crude example). This can then easily be stored into one byte and when restored we will perceive lightness with little distortion. The problem with using exponential function is that it’s distorting low (dark) values too much. As said before eye might notice the difference between Y of 3 and 4, but exponential function (after rounding to integer) is encoding both those values into value of 2. So it was decided keep low values upto some threshold as they are and encode higher values using exponential function. That way 2-9 would stay 2.0-9.0 and 900-960 would end up in 40.0-41.0 range. And more or less what is what gamma function does.

The computer monitor and video cards also has some gamma function (usually that of CRT as defined in sRGB) and is sometimes very similar to that of encoded video.

To accurately display colors and pixels video renderer should first decode pixels using inverse gamma function and colorspace that was used on source, and afterwards apply gamma and colorspace conversion so pixel values match that of your video card and monitor.

Since each video frame contain a bunch of pixels (480000 pixels for 800x600), each has to be gamma corrected in 4ms (25 fps) - what most software video players did is ignore the fact about linear part in gamma function and approximate the exponential part in order to reduce the number of calculations on CPU so they can display fluid video in realtime without dropping any frames.

Nowadays most of the video players do the GPU accelerated rendering, but a lot of them do gamma and colorspace correction incorrectly or simple ignore it.

Subtitle Composer video player is using matrices for colorspace and gamma correction that were calculated by the script I’ve made here from information taken from ISO/IEC 23001-8:2016 standard book (apparently same thing that FFmpeg referneces). The script generates colorspace conversion matrices and GLSL gamma functions that are used by the OpenGL renderer. Colorspace conversion matrices are more accurate than anything I’ve found around (some are just copy pasted from completelt different/wrong colorspace) or at least I believe so. GLSL gamma correction (color transfer) functions are accurate, some of the inverse functions are wrong since inverse ones are calculated by javascript script that did the math and calculated inverses for me. They will be fixed once I manage to get my hands on video encoded using that gamma function and test it properly.

Here’s the literature I’ve used and I recommend reading if you’re into whole colorspace thing:

P.S. Several months after completing color conversion and redering work I’ve found out libplacebo that does what i did with rendering, and likely better. So be sure to check it out - maybe it’ll be included into SC in the future - for now current player works good.

— Mladen