Containers in the movies

I’ve recently been called in to perform surgery on a service that streams media across an entire network, globally. Users were complaining that this service was slow, which was leading to a lot of pain (bear in mind this is a media company).

../../_images/3d_audience.jpg

The service also gets used during group dailies with artists, and review sessions with clients; if it falls over, clients cannot review the latest changes. In fact no one can see anything: not an ideal situation. Upon inspecting prometheus dashboards, I noticed this slow but persistent memory and swap usage induitably creeping upwards.

../../_images/memory_leak.png ../../_images/swap_leak.png

This service is complex, and complicated. There’s an instance in each site. For viewing thumbnails it gets invoked by thumbor, which is also running on the host. It uses nginx to route the traffic to either the other service, or a flask application served by uwsgi, and to deliver images, and stream mp4s (with support for byte ranges).

It queries the asset tracking service to decide whether the user belongs to the movie that the media belongs to. It also uses uid and gid information, mime types, ip / subnet information, hashed urls with timeouts to implement security policies that the client needs to comply to, to work with their client (Disney).

It had been noted that restarting the service would improve things, and I was asked to write some kind of cron job that would restart the service on a daily basis. The first thing that struck me early on, was that the uwsgi application was configured to run with 10 processes all having 10 threads each, which seemed heavy-handed.

../../_images/bad_nginx_config.png

I also noticed the service would consume all of the available memory on that host after several days. The host would then enter swapping hell. The service was very difficult to deploy, had no staging server, and due to timezones could only be worked on early in the morning or on weekends.

Docker image: uwsgi-nginx-flask

The first thing I did was to move the flask service to the tiangolo’s uwsgi-nginx-flask docker image. I had to create a new service to handle uid and gid requests to the OS, since that information isn’t available from the running container (the client uses authconfig which I couldn’t get to play well with my image without investing time on configuring selinux with docker).

uwsgi-nginx-flask’s uwsgi config uses uwsgi’s adaptive process spawning which I knew would most likely solve the memory issues.

../../_images/dash_01.png

As you can see, it certainly did. The first drop in memory was restarting the service as I was working on the new one overnight. The memory plateau that hovers around 1GiB is the new containerised app. The load on the cpu is very low since the application spends most of its time waiting on io.

The swapping also took a nose-dive to near nothing, which dramatically reduced paging.

../../_images/dash_02.png

Redis

Another major improvement was the integration of the redis docker image to cache my calls to the production tracking database. You can read more about this here.