Building a Service Virtualisation Server

This post is about when I was developing for the SkyQ Set Top Box (STB) back in London in 2015. I came across a severe bottleneck: the Electronic Programme Guide (EPG) communicated with middleware via a web service.

That’s not a problem in itself, however this RESTful web service was in constant flux, and the EPG developers kept on having to wait on a tiny and overloaded team of 3 Web Service developers to implement the much needed functionality.

As the people at 37 signals would say: you need to scratch your own itch. So I decided to build a service virtualisation system. It was a game-changer: we could inject our own data into the box, without having to wait on the other teams.

The results were as expected: entire features that were waiting on this web service were unblocked.

Service Virtualisation

What I didn’t realise at the time, was just how popular Service Virtualisation is, with many existing commercial and open source solutions, e.g SoapUI, WireMock, and Mirage just to name a few.

They offer a plethora of features, but the real crux of Service Virtualisation is this:

  • Proxy requests
  • Mutating requests
  • Storing (caching) response requests
  • Replaying cached requests
  • Generating data for features that are not yet implemented in your service.

In this post I’ll explore how to build a simple service virtualisation server. The accompanying code can be found here.

Tornado

Tornado is a much under-appreciated web server. It’s python-based, so enjoys Python’s incredible flexibility, diverse module ecosystem, and development speed. Installation:

$ sudo pip install tornado

Now try running the hello world program hello.py.

Hello World

hello.py

import tornado.ioloop
import tornado.web


class MainHandler(tornado.web.RequestHandler):

    def get(self):
        self.write("Hello, world")


def make_app():
    return tornado.web.Application([
        (r"/", MainHandler),
    ])

if __name__ == "__main__":
    app = make_app()
    app.listen(8888)
    tornado.ioloop.IOLoop.current().start()

$ python hello.py

Ouput:

$ curl localhost:8888

Hello, world

Building our client app

For our client app, we’ll use a slightly modified version of the client used on specto.io’s great blog post on the very same topic.

Get the client here ./client.py.

Running the client:

./client.py --urls=1

This command caches the list of urls that needs to be queried, into links.p. Now to query these URLs:

./client.py

Output:

url: http://readthedocs.org/api/v1/project/52519/, content: {"absolute_url": "/projects/-/", "allow_comments":
url: http://readthedocs.org/api/v1/project/42820/, status code: 200
url: http://readthedocs.org/api/v1/project/42820/, content: {"absolute_url": "/projects/0/", "allow_comments":
url: http://readthedocs.org/api/v1/project/42313/, status code: 200
url: http://readthedocs.org/api/v1/project/42313/, content: {"absolute_url": "/projects/001/", "allow_comments
url: http://readthedocs.org/api/v1/project/42069/, status code: 200
url: http://readthedocs.org/api/v1/project/42069/, content: {"absolute_url": "/projects/007-spectre/", "allow_
url: http://readthedocs.org/api/v1/project/42070/, status code: 200
[...]
url: http://readthedocs.org/api/v1/project/42962/, content: {"absolute_url": "/projects/14-12-2015-link-hd/",
17.2875459194

As you can see, at 17.3 seconds this is awfully slow. What’s more, if you were to depend on this service, and it went down, or it started serving you junk, you’d fall over, but through no fault of your own.

Building our Service Virtualisation Server

svs0.py

import tornado.ioloop
import tornado.web


class MainHandler(tornado.web.RequestHandler):

    def get(self):
        self.set_status(200)
        self.write(self.request.uri)

if __name__ == "__main__":
    app = tornado.web.Application([
        (r".*", MainHandler),
    ], debug=True)
    app.listen(8888)
    tornado.ioloop.IOLoop.current().start()

Above is our stub for the proxy server.

tornado.web.Application([
        (r".*", MainHandler),
    ], debug=True)

This is the part which tells Tornado to catch all requests. debug=True enables auto reloading, along with other debug features, which is incredibly useful.

svs1.py

import tornado.ioloop
import tornado.web
import requests


class MainHandler(tornado.web.RequestHandler):

    def get(self):
        print("fetching", self.request.uri)
        r = requests.get(self.request.uri)
        print("done fetching")
        self.set_status(r.status_code)
        self.write(r.text)

if __name__ == "__main__":
    app = tornado.web.Application([
        (r".*", MainHandler),
    ], debug=True)
    app.listen(8888)
    tornado.ioloop.IOLoop.current().start()

And there we have it: we’ve built a very primitive proxy server. Try running it. Then set your HTTP_PROXY environment variable with:

$ export HTTP_SERVER=http://localhost:8888
$ python svs1.py

Output:

url: http://readthedocs.org/api/v1/project/52519/, status code: 200
url: http://readthedocs.org/api/v1/project/52519/, content: {"absolute_url": "/projects/-/", "allow_comments":
url: http://readthedocs.org/api/v1/project/42820/, status code: 200
url: http://readthedocs.org/api/v1/project/42820/, content: {"absolute_url": "/projects/0/", "allow_comments":
url: http://readthedocs.org/api/v1/project/42313/, status code: 200
url: http://readthedocs.org/api/v1/project/42313/, content: {"absolute_url": "/projects/001/", "allow_comments

Caching

Since communication over networks is relatively slow, let’s cache the results into a mongoDB collection, and replay them when needed.

caching.py

import tornado.ioloop
import tornado.web
import requests
import sys
from pymongo import MongoClient
from argparse import ArgumentParser

client = MongoClient('localhost', 27017)
db = client.sv  # sv is the name of our database for caching requests
cache = db["gets"]  # this is our collection, we'll call it 'gets'

parser = ArgumentParser(description="Service Virtualisation Server")
parser.add_argument(
    "--record",
    help="record (cache) the responses from the host")
args = parser.parse_args()


class MainHandler(tornado.web.RequestHandler):

    def get(self):
        print("fetching", self.request.uri)
        # we are recording the request
        if args.record:
            # so pass the request to the actual target
            r = requests.get(self.request.uri)
            # if all went well
            if r.status_code == 200:
                print("done fetching")
                print("caching")
                # store the response into our database
                cache.insert({"uri": self.request.uri,
                              "text": r.text},
                             check_keys=False,
                             upsert=True)
            # write the status and the actual response from the request back to
            # the client
            self.set_status(r.status_code)
            self.write(r.text)
        else:
            # we are not recording, so we are either
            # proxying the request, or playing it back if it's stored
            # check to see if this URL was already cached
            doc = cache.find_one({"uri": self.request.uri})

            if doc:
                # good we have a document for this URI, let's simply return it
                self.set_status(200)
                self.write(doc["text"])
            else:
                # we don't seem to have a record of it, so simple make the
                # request and proxy it
                r = requests.get(self.request.uri)
                self.set_status(r.status_code)
                self.write(r.text)

if __name__ == "__main__":
    app = tornado.web.Application([
        (r".*", MainHandler),
    ], debug=True)
    app.listen(8888)
    tornado.ioloop.IOLoop.current().start()

Above, we have successfully built a Service Virtualisation Server.

Run this updated version of the server:

$ python caching.py --record=1

And run the client application:

$ python client.py

This caches all of our requests into our MongoDB database. For viewing your local database, I strongly recommend RoboMongo.

/* 1 */
{
    "_id" : ObjectId("580f3b520b6737e8cbf31016"),
    "text" : "{\"absolute_url\": \"/projects/-/\", \"allow_comments\": false, \"allow_promos\": true, \"analytics_code\": null, \"build_queue\": null, \"canonical_url\": \"\", \"cdn_enabled\": false, \"comment_moderation\": false, \"conf_py_file\": \"\", \"container_image\": null, \"container_mem_limit\": null, \"container_time_limit\": null, \"copyright\": \"\", \"default_branch\": null, \"default_version\": \"latest\", \"description\": \"\", \"django_packages_url\": \"\", \"documentation_type\": \"sphinx\", \"downloads\": {\"epub\": \"//readthedocs.org/projects/-/downloads/epub/latest/\", \"htmlzip\": \"//readthedocs.org/projects/-/downloads/htmlzip/latest/\", \"pdf\": \"//readthedocs.org/projects/-/downloads/pdf/latest/\"}, \"enable_epub_build\": true, \"enable_pdf_build\": true, \"has_valid_clone\": false, \"has_valid_webhook\": true, \"id\": 52519, \"install_project\": false, \"language\": \"ru\", \"mirror\": false, \"modified_date\": \"2016-05-27T03:23:16.338560\", \"name\": \"Москоллектор - Техпортал\", \"num_major\": 2, \"num_minor\": 2, \"num_point\": 2, \"privacy_level\": \"public\", \"project_url\": \"\", \"pub_date\": \"2016-05-27T03:23:02.150498\", \"python_interpreter\": \"python\", \"repo\": \"https://gitlab.com/mikhailborodin/collector\", \"repo_type\": \"git\", \"requirements_file\": null, \"resource_uri\": \"/api/v1/project/52519/\", \"single_version\": false, \"skip\": false, \"slug\": \"-\", \"suffix\": \".rst\", \"theme\": \"default\", \"use_system_packages\": false, \"users\": [\"/api/v1/user/49818/\"], \"version\": \"\", \"version_privacy_level\": \"public\"}",
    "uri" : "http://readthedocs.org/api/v1/project/52519/"
}
/* 2 */
{
    "_id" : ObjectId("580f3b530b6737e8cbf31017"),
    "text" : "{\"absolute_url\": \"/projects/0/\", \"allow_comments\": false, \"allow_promos\": true, \"analytics_code\": null, \"build_queue\": null, \"canonical_url\": \"\", \"cdn_enabled\": false, \"comment_moderation\": false, \"conf_py_file\": \"\", \"container_image\": null, \"container_mem_limit\": null, \"container_time_limit\": null, \"copyright\": \"\", \"default_branch\": null, \"default_version\": \"latest\", \"description\": \"Contribution guide\", \"django_packages_url\": \"\", \"documentation_type\": \"sphinx\", \"downloads\": {\"epub\": \"//readthedocs.org/projects/0/downloads/epub/latest/\", \"htmlzip\": \"//readthedocs.org/projects/0/downloads/htmlzip/latest/\", \"pdf\": \"//readthedocs.org/projects/0/downloads/pdf/latest/\"}, \"enable_epub_build\": true, \"enable_pdf_build\": true, \"has_valid_clone\": false, \"has_valid_webhook\": true, \"id\": 42820, \"install_project\": false, \"language\": \"en\", \"mirror\": false, \"modified_date\": \"2016-03-20T23:43:12.300770\", \"name\": \"0\", \"num_major\": 2, \"num_minor\": 2, \"num_point\": 2, \"privacy_level\": \"public\", \"project_url\": \"https://github.com/ruslo/0\", \"pub_date\": \"2015-12-10T23:03:16.236664\", \"python_interpreter\": \"python\", \"repo\": \"https://github.com/ruslo/0\", \"repo_type\": \"git\", \"requirements_file\": null, \"resource_uri\": \"/api/v1/project/42820/\", \"single_version\": false, \"skip\": false, \"slug\": \"0\", \"suffix\": \".rst\", \"theme\": \"default\", \"use_system_packages\": false, \"users\": [\"/api/v1/user/33728/\"], \"version\": \"\", \"version_privacy_level\": \"public\"}",
    "uri" : "http://readthedocs.org/api/v1/project/42820/"
}
/* 3 */
{
    "_id" : ObjectId("580f3b540b6737e8cbf31018"),
    "text" : "{\"absolute_url\": \"/projects/001/\", \"allow_comments\": false, \"allow_promos\": true, \"analytics_code\": null, \"build_queue\": null, \"canonical_url\": \"\", \"cdn_enabled\": false, \"comment_moderation\": false, \"conf_py_file\": \"\", \"container_image\": null, \"container_mem_limit\": null, \"container_time_limit\": null, \"copyright\": \"\", \"default_branch\": null, \"default_version\": \"latest\", \"description\": \"更新中...\", \"django_packages_url\": \"\", \"documentation_type\": \"mkdocs\", \"downloads\": {\"epub\": \"//readthedocs.org/projects/001/downloads/epub/latest/\", \"htmlzip\": \"//readthedocs.org/projects/001/downloads/htmlzip/latest/\", \"pdf\": \"//readthedocs.org/projects/001/downloads/pdf/latest/\"}, \"enable_epub_build\": true, \"enable_pdf_build\": true, \"has_valid_clone\": false, \"has_valid_webhook\": false, \"id\": 42313, \"install_project\": false, \"language\": \"en\", \"mirror\": false, \"modified_date\": \"2015-12-02T08:18:27.018542\", \"name\": \"001\", \"num_major\": 2, \"num_minor\": 2, \"num_point\": 2, \"privacy_level\": \"public\", \"project_url\": \"\", \"pub_date\": \"2015-12-02T08:18:10.577487\", \"python_interpreter\": \"python\", \"repo\": \"https://github.com/it-andy-hou/001.git\", \"repo_type\": \"git\", \"requirements_file\": null, \"resource_uri\": \"/api/v1/project/42313/\", \"single_version\": false, \"skip\": false, \"slug\": \"001\", \"suffix\": \".rst\", \"theme\": \"default\", \"use_system_packages\": false, \"users\": [\"/api/v1/user/36121/\"], \"version\": \"\", \"version_privacy_level\": \"public\"}",
    "uri" : "http://readthedocs.org/api/v1/project/42313/"
}
[...]

Once we’ve successfully cached all of our requests, we can relaunch our server, in replay mode this time (the default).

$ python caching.py

Now rerunning our tests simply pulls everything from inside the collection, thereby dramatically speeding up what previously took so long.

$ python client.py

Output:

url: http://readthedocs.org/api/v1/project/52519/, status code: 200
url: http://readthedocs.org/api/v1/project/52519/, content: {"absolute_url": "/projects/-/", "allow_comments":
url: http://readthedocs.org/api/v1/project/42820/, status code: 200
url: http://readthedocs.org/api/v1/project/42820/, content: {"absolute_url": "/projects/0/", "allow_comments":
url: http://readthedocs.org/api/v1/project/36271/, content: {"absolute_url": "/projects/1231/", "allow_comment
[...]
url: http://readthedocs.org/api/v1/project/65335/, content: {"absolute_url": "/projects/12-ka-4-dial-184441448
url: http://readthedocs.org/api/v1/project/45934/, status code: 200
url: http://readthedocs.org/api/v1/project/45934/, content: {"absolute_url": "/projects/12twenty-demo/", "allo
url: http://readthedocs.org/api/v1/project/42962/, status code: 200
url: http://readthedocs.org/api/v1/project/42962/, content: {"absolute_url": "/projects/14-12-2015-link-hd/",
0.119288921356

That’s a dramatic speedup, which is not surprising since all we’re really doing is querying a local database.

Going Async

Currently this code is still synchronous, so our next step is to move this code to async using Tornado’s gen module, its async http client, and the async mongo client motor.

async.py

import tornado.ioloop
import tornado.web
import tornado.gen
import tornado.httpclient
import requests
import sys
import json
from argparse import ArgumentParser
import motor.motor_tornado

# using motor's async mongo client
client = motor.motor_tornado.MotorClient('localhost', 27017)

parser = ArgumentParser(description="Service Virtualisation Server")
parser.add_argument(
    "--record",
    help="record (cache) the responses from the host")
args = parser.parse_args()


class MainHandler(tornado.web.RequestHandler):

    @tornado.web.asynchronous
    @tornado.gen.engine
    def get(self):
        print("fetching", self.request.uri)
        http_client = tornado.httpclient.AsyncHTTPClient()
        # we are recording the request
        if args.record:
            # so pass the request to the actual target
            r = yield tornado.gen.Task(http_client.fetch, self.request.uri)
            # if all went well
            if r.code == 200:
                print("done fetching")
                print("caching")
                db = self.settings["db"]
                cache = db["gets"]
                # store the response into our database
                cache.insert({"uri": self.request.uri,
                              "text": r.body},
                             check_keys=False,
                             upsert=True)
            # write the status and the actual response from the request back to
            # the client
            self.set_status(r.code)
            self.write(r.body)
        else:
            # we are not recording, so we are either
            # proxying the request, or playing it back if it's stored
            db = self.settings["db"]
            cache = db["gets"]
            # check to see if this URL was already cached
            doc = yield cache.find_one({"uri": self.request.uri})
            if doc:
                # good we have a document for this URI, let's simply return it
                self.set_status(200)
                self.write(doc["text"])
            else:
                # we don't seem to have a record of it, so simple make the
                # request and proxy it
                r = yield tornado.gen.Task(http_client.fetch, self.request.uri)
                self.set_status(r.code)
                self.write(r.body)
        self.finish()

if __name__ == "__main__":
    # passing the database in as a parameter this time
    app = tornado.web.Application([
        (r".*", MainHandler),
    ], debug=True, db=client.sv)
    app.listen(8888)
    tornado.ioloop.IOLoop.current().start()

This is an area which causes quite a lot of confusion to Tornado newcomers; technically Tornado is an asynchronous (non-blocking) web server. However some extra effort does need to be taken to make the code async.

The all important module is tornado.gen, if you want to get a solid grasp of async tornado, I suggest you read through it a couple of times, and let the content seep in for a while.

Conclusion

In this article we have covered:

  • the basics of the Tornado webserver
  • building a simple proxy server for get requests
  • building a caching proxy server using mongo db
  • building an asychronous caching proxy server using mongo db