10x your deep learning severless function using openfaas framework!
Intro
Recently i discovered a way to speed out the serverless functon using [Openfaas] (https://www.openfaas.com/) framework. The idea is fairly simple, we have a “pre-load function”. After a few attempts and with the help of [alexellis] (https://github.com/alexellis/) I manage to speed up my openfaas production code from 1 second to 10 seconds.
As we all know that to speed up the programs, we need to identify the slowest function call (profiling) and which part of your program is having delay. So I go ahead to do profiling of my AI machine learning models. Then I have discovered that most of the AI prediction model have this patten.
- Load the heavy library (Example: Load tensorflow, Load Keras, Load PyTorch)
- Reconstruct the AI models
- Load pre-trained weight (Disk I/O, usually this is the real bottle neck)
- Take new input using the new data provided by user
- Data transformation (base on your models input data)
- Do the prediction
- Parse the output and return human readable answers (return json?)
By having a pre-load function, I can easily offload step 1 to step 3. This is because at run time, we only need to take new inputs data from user (Step 4 to Step 7).
My skeleton code
$ mkdir flask-preload
$ cd flask-preload
$ faas template pull https://github.com/openfaas-incubator/python-flask-template
$ faas new --lang python3-flask function
$ touch function/faasinit.py
$ cat function/faasinit.py
faas = "awesome\r\n"
$ cat template/python3-flask/index.py
# Copyright (c) Alex Ellis 2017. All rights reserved.
# Licensed under the MIT license. See LICENSE file in the project root for full license information.
from flask import Flask, request
from function import handler
#from gevent.wsgi import WSGIServer
from gevent.pywsgi import WSGIServer
try:
from function.faasinit import * # This is the heavy loading scripts faasinit.py
except ImportError:
pass
app = Flask(__name__)
@app.before_request
def fix_transfer_encoding():
"""
Sets the "wsgi.input_terminated" environment flag, thus enabling
Werkzeug to pass chunked requests as streams. The gunicorn server
should set this, but it's not yet been implemented.
"""
transfer_encoding = request.headers.get("Transfer-Encoding", None)
if transfer_encoding == u"chunked":
request.environ["wsgi.input_terminated"] = True
@app.route("/", defaults={"path": ""}, methods=["POST", "GET"])
@app.route("/<path:path>", methods=["POST", "GET"])
def main_route(path):
#ret = handler.handle(request.get_data())
#ret = handler.handle(faas)
ret = handler.handle(request.get_data(), faas)
return ret
if __name__ == '__main__':
#app.run(host='0.0.0.0', port=5000, debug=False)
http_server = WSGIServer(('', 5000), app)
http_server.serve_forever()
$ cat function/handler.py
def handle(req, faas):
"""handle a request to the function
Args:
req (str): request body
"""
return faas
$ faas up -f function.yml
$ curl -s https://endpoint/function/heavy1 -d "faas is"
awesome
conclusion
And this is my test without using fassinit.py
chenglim@chenglim-GL503VM:/amaris/faas/flask-preload$ time curl -s https://faas.amaris.ai/function/flask-preload
[["n02504458", "African_elephant", 0.5097971558570862], ["n01871265", "tusker", 0.45040571689605713], ["n02504013", "Indian_elephant", 0.03866980969905853]]
real 0m10.074s
user 0m0.010s
sys 0m0.005s
And this is my test with fassinit.py
chenglim@chenglim-GL503VM:/amaris/faas/flask-preload$ time curl -s https://faas.amaris.ai/function/flask-preload
[["n02504458", "African_elephant", 0.5097971558570862], ["n01871265", "tusker", 0.45040571689605713], ["n02504013", "Indian_elephant", 0.03866980969905853]]
real 0m0.689s
user 0m0.014s
sys 0m0.000s