10x your deep learning severless function using openfaas framework!

Intro

Recently i discovered a way to speed out the serverless functon using [Openfaas] (https://www.openfaas.com/) framework. The idea is fairly simple, we have a “pre-load function”. After a few attempts and with the help of [alexellis] (https://github.com/alexellis/) I manage to speed up my openfaas production code from 1 second to 10 seconds.

As we all know that to speed up the programs, we need to identify the slowest function call (profiling) and which part of your program is having delay. So I go ahead to do profiling of my AI machine learning models. Then I have discovered that most of the AI prediction model have this patten.

Load the heavy library (Example: Load tensorflow, Load Keras, Load PyTorch)
Reconstruct the AI models
Load pre-trained weight (Disk I/O, usually this is the real bottle neck)
Take new input using the new data provided by user
Data transformation (base on your models input data)
Do the prediction
Parse the output and return human readable answers (return json?)

By having a pre-load function, I can easily offload step 1 to step 3. This is because at run time, we only need to take new inputs data from user (Step 4 to Step 7).

My skeleton code

$ mkdir flask-preload
$ cd flask-preload
$ faas template pull https://github.com/openfaas-incubator/python-flask-template
$ faas new --lang python3-flask function
$ touch function/faasinit.py

$ cat function/faasinit.py
faas = "awesome\r\n"

$ cat template/python3-flask/index.py
# Copyright (c) Alex Ellis 2017. All rights reserved.
# Licensed under the MIT license. See LICENSE file in the project root for full license information.

from flask import Flask, request
from function import handler
#from gevent.wsgi import WSGIServer
from gevent.pywsgi import WSGIServer

try:
  from function.faasinit import * # This is the heavy loading scripts faasinit.py
except ImportError:
  pass

app = Flask(__name__)

@app.before_request
def fix_transfer_encoding():
    """
    Sets the "wsgi.input_terminated" environment flag, thus enabling
    Werkzeug to pass chunked requests as streams.  The gunicorn server
    should set this, but it's not yet been implemented.
    """

    transfer_encoding = request.headers.get("Transfer-Encoding", None)
    if transfer_encoding == u"chunked":
        request.environ["wsgi.input_terminated"] = True

@app.route("/", defaults={"path": ""}, methods=["POST", "GET"])
@app.route("/<path:path>", methods=["POST", "GET"])
def main_route(path):
    #ret = handler.handle(request.get_data())
    #ret = handler.handle(faas)
    ret = handler.handle(request.get_data(), faas)
    return ret

if __name__ == '__main__':
    #app.run(host='0.0.0.0', port=5000, debug=False)

    http_server = WSGIServer(('', 5000), app)
    http_server.serve_forever()

$ cat function/handler.py
def handle(req, faas):
    """handle a request to the function
    Args:
        req (str): request body
    """

    return faas

$ faas up -f function.yml
$ curl -s https://endpoint/function/heavy1 -d "faas is"
awesome

conclusion

And this is my test without using fassinit.py

chenglim@chenglim-GL503VM:/amaris/faas/flask-preload$ time curl -s https://faas.amaris.ai/function/flask-preload
[["n02504458", "African_elephant", 0.5097971558570862], ["n01871265", "tusker", 0.45040571689605713], ["n02504013", "Indian_elephant", 0.03866980969905853]]

real    0m10.074s
user    0m0.010s
sys    0m0.005s

And this is my test with fassinit.py

chenglim@chenglim-GL503VM:/amaris/faas/flask-preload$ time curl -s https://faas.amaris.ai/function/flask-preload
[["n02504458", "African_elephant", 0.5097971558570862], ["n01871265", "tusker", 0.45040571689605713], ["n02504013", "Indian_elephant", 0.03866980969905853]]

real    0m0.689s
user    0m0.014s
sys    0m0.000s

Written on January 24, 2019