ftrack_api redis caching

The ftrack_api has built in classes that can be used for ease of caching. They provide an excellent foundation for implementing a versatile caching solution, but cannot really be used out of the box since, for example MemoryCache would not make sense in a multiprocess environment. Likewise the FileCache would lead to severe disk IO penalties, and does not include any file locking. The ExpiringCache is naïve in that it only expires entries in a get call.

So a more realistic setup for production servers would be to use an in-memory caching solution, e.g Redis. We can also contemplate which caching algorithm to use, between an LRU or LFU.

If you would like to spin up an instance of ftrack, you can read our article on deploying ftrack on AWS.

We’ll be using this redis.conf, for redis to behave as a LFU cache:

maxmemory 10mb
maxmemory-policy allkeys-lfu

So let’s start by spinning up a redis server as a docker container:

$ docker run -d -p 6379:6379 -v `pwd`/redis.conf:/usr/local/etc/redis/redis.conf --name redis --restart always redis redis-server /usr/local/etc/redis/redis.conf

You’ll also want to setup a virtual environment:

$ virtualenv ftrack_api_redis
$ source ftrack_api_redis/bin/activate
$ pip install redis ftrack-python-api

Now let’s look at tying redis into the ftrack api:

import time, argparse, sys, ftrack_api, redis

class RedisCache(ftrack_api.cache.Cache):
    def __init__(self):
        self._cache = redis.StrictRedis(host='127.0.0.1', port=6379, db=0)
        super(RedisCache, self).__init__()

    def get(self, key):
        return self._cache[key]

    def set(self, key, value):
        # if we were not using an LFU, but a simple timeout:
        # self._cache.setex(name=key, time=60, value=value)
        self._cache.set(name=key, value=value)

    def remove(self, key):
        self._cache.delete(key)

    def keys(self):
        return self._cache.keys()

def cache_maker(session):
    return ftrack_api.cache.SerialisedCache(
        RedisCache(),
        encode=session.encode,
        decode=session.decode
    )

session = ftrack_api.Session(server_url=sys.argv[1],  api_user=sys.argv[2], api_key=sys.argv[3], cache=cache_maker)

start = time.time()
for p in session.query("Component"):
    print(p.get('name'))
print("time taken: %f"%(time.time()-start))

That was easy. Let’s run this on our cloud instance.

$ python ftrack_api_redis.py http://ec2-35-177-153-42.eu-west-2.compute.amazonaws.com root c2134e14-db4e-11e7-a1c9-0a414bb
image
ftrackreview-webm
drone_notes_v02
[...]
ftrackreview-webm
frame_021
time taken: 7.335973

That’s 7.5 seconds to get all our components. However on a re-run:

$ python ftrack_api_redis.py http://ec2-35-177-153-42.eu-west-2.compute.amazonaws.com root c2134e14-db4e-11e7-a1c9-0a414bb
image
ftrackreview-webm
drone_notes_v02
[...]
ftrackreview-webm
frame_021
time taken: 0.183186

That represents an enormous speedup, as well as paving the way for a much reduced server load. Please note I am aware that I can use projections to make my queries a lot more efficient, but that would not serve the purpose of this test.

This post barely scratches the surface, as we’d probably not want to simply cache all entities this way, especially if want would like to stay current when data gets mutated in ftrack. A potentially interesting topic for a future post would be to investigate the use of the event hub for cache invalidation.