Vicente Rodríguez

April 24, 2019

How to deploy a Keras model to Google Cloud

In this tutorial we will see how to deploy a Keras model to be accessible through a web page. Due to the fact we will use a Linux server it’s recommendable to know the Linux ecosystem and the Linux terminal, we will use the Flask framework and JavaScript to built the web app and Google Cloud as a server since you can use a GPU with this service.

In this link you can see the final app running on Heroku, a free server service that we will check as well, we will use the Tesla app that we built in the Convolutional Neural Networks tutorial, you can use a different model if you want, you only have to save the model in the JSON format and the weights in the h5 format.

The web app

Firstly we will build the web app with Flask and Javascript. We are going to import some libraries:




from flask import Flask, render_template, request, jsonify



from keras.models import model_from_json



from PIL import Image

from io import BytesIO

from keras.preprocessing.image import img_to_array

import numpy as np

We write the two main functions that the app will use:


app = Flask(__name__)

model = None



@app.route("/")

def index():

    return render_template("index.html")



@app.route("/predict", methods=["POST"])

def predict():

    image = request.files["image"].read()

    image = load_request_image(image)

    class_predicted = predict_class(image)

    image_class = { "image_class": class_predicted } 



    return jsonify(image_class)

The index function handles the request to the index page "/" and render a html page, the predict function is quite different, in this function we expect an image sent from the html page, then the model will use this image to make a prediction and we will return this prediction to the html page.

The html page looks like:


<!DOCTYPE html>

<html lang="en" dir="ltr">

  <head>

    <meta charset="utf-8">

    <title>Tesla App</title>

    <link rel="stylesheet" href="{{ url_for('static', filename='style.css') }}">

    <script src="{{ url_for('static', filename='script.js') }}"></script>

    <meta charSet="utf-8" />

    <meta name="viewport" content="width=device-width, initial-scale=1" />

  </head>

  <body>

   <section class="header">

      <h2>Tesla app</h2>

      <p>Sube la imagen de un coche tesla:</p>

      <input type="file" id="image" accept="image/*">

    </section>



    <section class="content">

      <div class="image_content">

        <img src="" alt="" id="imageContainer">

      </div>

      <p id="imageClass">

      </p>

    </section>

  </body>

</html>



index.html

The most important part of the html file is the input button, we will use this button to select an image of some tesla car and send it to the predict function.

The function below opens the image sent by the html page and preprocess the image to use it in the model.


def load_request_image(image):

    image = Image.open(BytesIO(image))

    if image.mode != "RGB":

        image = image.convert("RGB")

    image = image.resize((256, 256))

    image = img_to_array(image)

    image = np.expand_dims(image, axis=0)

    image = image.astype('float32') / 255



    return image

The predict_class function as its name suggests will predict the class of the image sent:


def predict_class(image_array):

    clases = ["Model 3", "Model S", "Model X"]

    y_pred = model.predict(image_array, batch_size=None, verbose=0, steps=None)

    max_score = np.argmax(y_pred, axis=1)[0]

    image_class = clases[max_score]



    return image_class

We also need to load the model, we use the global variable model defined before to have access to the model in all the parts of the code.


def load_model():

    json_file = open('./model/model.json', 'r')

    model_json = json_file.read()

    json_file.close()

    global model

    model = model_from_json(model_json)

    model.load_weights("./model/weights.h5")

Finally the code below will execute the function to load the model once the web app is up:


if __name__ == "__main__":

    load_model()

    app.run(debug = False, threaded = False)

The complete code of app.py is:


from flask import Flask, render_template, request, jsonify



from keras.models import model_from_json



from PIL import Image

from io import BytesIO

from keras.preprocessing.image import img_to_array

import numpy as np



app = Flask(__name__)

model = None



def load_request_image(image):

    image = Image.open(BytesIO(image))

    if image.mode != "RGB":

        image = image.convert("RGB")

    image = image.resize((256, 256))

    image = img_to_array(image)

    image = np.expand_dims(image, axis=0)

    image = image.astype('float32') / 255



    return image



def load_model():

    json_file = open('./model/model.json', 'r')

    model_json = json_file.read()

    json_file.close()

    global model

    model = model_from_json(model_json)

    model.load_weights("./model/weights.h5")



def predict_class(image_array):

    clases = ["Model 3", "Model S", "Model X"]

    y_pred = model.predict(image_array, batch_size=None, verbose=0, steps=None)

    max_score = np.argmax(y_pred, axis=1)[0]

    image_class = clases[max_score]



    return image_class



@app.route("/")

def index():

    return render_template("index.html")



@app.route("/predict", methods=["POST"])

def predict():

    # print(request.headers)

    image = request.files["image"].read()

    image = load_request_image(image)

    class_predicted = predict_class(image)

    image_class = { "image_class": class_predicted } 



    return jsonify(image_class)



if __name__ == "__main__":

    load_model()

    app.run(debug = False, threaded = False)

Javascript

As we have seen in the previous section the html page will send an image to the predict function, we need Javascript to send the image. If you don't know Javascript, this language works with events and we want to listen to two events, the first event is triggered once the page is loaded, the second event is triggered when we use the input button to select an image:


document.addEventListener("DOMContentLoaded", ready);



function ready() {

    const inputFile = document.querySelector("#image");

    inputFile.addEventListener('change', imageUploaded);

}

Once the page is loaded the function ready will execute its code and the page will start listening to the input button.


function imageUploaded(event) {

    const target = event.target;

    const image = target.files[0];



    if (!image) return;



    const imageContainer = document.querySelector("#imageContainer");

    imageContainer.src = window.URL.createObjectURL(image);



    get_prediction(image);

}

The function imageUploaded is triggered when we use the input button, the function loads the image and executes the get_prediction function:


function get_prediction(image) {

    const formData = new FormData()

    formData.append('image', image);



    fetch("/predict", {

        method: "POST",

        body: formData

    })

    .then(response => {

        response.json().then(data => {

            const imageClassContainer = document.querySelector("#imageClass");

            const imageClass = data["image_class"];

            imageClassContainer.innerHTML = `Tesla: ${imageClass}`

        });

    })

    .catch(error => {

        console.log("There was an error :c");

    });

}

In this last function we will send the image to the Flask app and wait for the response that is the class of the image and finally we will show the class in the html page.

CSS

We will add some css to the html page:


body {

    font-family: helvetica;

    margin: 0;

    padding: 0;

}



.header {

    width: 70%;

    margin: 0 auto;

}



h2 {

    font-size: 2em;

}



.content {

    width: 70%;

    margin: 0 auto;

}



.image_content {

    margin: 20px auto;

    width: 300px;

    height: 200px;

}



#imageContainer {

    width: 300px;

    height: 200px;

}

style.css

Executing the web app

To execute the web app we will use the following command:


python3 app.py

Finally we can go to http://127.0.0.1:5000/ and test the app.

Heroku

Once we have the web app ready we will first use Heroku to deploy the app, using Heroku is quite easy, we need to install the cli client from this link and we need git as well. In the Heroku dashboard we will create a new app, the name we use in the app will be the same that will appear in the url.

In the web app we need to create a new folder called model, put inside the files model.json and weights.h5 and create two more files:


gunicorn==19.9.0

Keras==2.2.4

tensorflow==1.12.0

numpy==1.15.4

Flask==1.0.2

Pillow==3.0.0

requirements.txt

This file contains the libraries needed to run the web app.


web: gunicorn app:app

Procfile

This file indicates that we use gunicorn to run the web app.

Due to the fact we are using gunicorn we need to add two more lines of code to the file app.py:


if __name__ == "app":

    load_model()

Git

Heroku uses git to deploy apps to its servers, we will use git to upload the web app to Google Cloud as well.

We need to execute some commands:


git init

With git init we initialize a git repository that saves the changes we make to the code.


git add .

This command adds the changes we have made to the repository.


git commit -m “finished app"

git commit saves the added changes in the repository, once we have the app saved in the repository we can push the repository to Heroku:


heroku git:remote -a app

git:remote adds the reference to Heroku.


git push heroku master

Finally we deploy the app to Heroku.

Google Cloud

Google gives away 300 dollars to be used in its servers, you should be aware that the server with GPU we will use is expensive and you should delete it once we end with the tutorial to avoid high payments.

Firstly we will go to the Google console, we will use the service called Compute Engine to create our server.

We will use the following configuration:

configuration

We choose ubuntu 16.04 with 15gb and allow http and https traffic.

We need to generate a ssh code to access to the server, then we will add the code to the security configuration.

We will use the terminal to generate the ssh file:


cd ./.ssh

We will go to the ssh folder where are all the ssh files.


ssh-keygen -t rsa -b 4096 -C "email@example.com"

You have to change email@example.com with your email, the email part will be the username to access to the server.

When we are asked for a file name we will use google_cloud and then we will leave blank the next two steps

With the command bellow we print the contents of the ssh file.


cat google_cloud.pub

you need to add this code to the security configuration.

The server

We will access to the server with the command below:


ssh -i ./.ssh/google_cloud [username]@[ip]

Where username is the value I mentioned before and ip is the external ip that we can find in VM instances.

Once we are in the sever we have to update the system:


sudo apt-get update

sudo apt-get upgrade

sudo apt-get install -y build-essential

We install miniconda that contains some necessary libraries:


curl -O https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh



bash Miniconda3-latest-Linux-x86_64.sh



source ~/.bashrc

We use yes to accept everything

We can use the following command to check if the GPU is available:


lspci | grep -i nvidia

00:04.0 3D controller: NVIDIA Corporation GK210GL Tesla K80 (rev a1)

Cuda

We need two libraries in order to use TensorFlow and Keras with the GPU, first we will install Cuda:


curl -O https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/cuda-repo-ubuntu1604_9.2.88-1_amd64.deb



sudo dpkg -i cuda-repo-ubuntu1604_9.2.88-1_amd64.deb

Sometimes we need to execute an extra command:


sudo apt-key adv --fetch-keys http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/7fa2af80.pub

Then we execute the command below again:


sudo dpkg -i cuda-repo-ubuntu1604_9.2.88-1_amd64.deb

We update the linux packages:


sudo apt-get update

sudo apt-get install

Finally we can use the code below to check if Cuda is installed correctly:


nvidia-smi

cuDNN

The second library that we will install is cuDNN, in this case we need to download it from the Nvidia’s page, we need two files: cuDNN Runtime Library for Ubuntu16.04 (Deb) and cuDNN Developer Library for Ubuntu16.04 (Deb). (Right now the last version is the 10.0) , you need an account to download these files.


scp -i ~/.ssh/google_cloud libcudnn7_7.4.2.24-1+cuda10.0_amd64.deb [username]@[ip]:/home/[username]



scp -i ~/.ssh/google_cloud libcudnn7-dev_7.4.2.24-1+cuda10.0_amd64.deb [username]@[ip]:/home/[username]

With the code above we upload the two files from our local pc to the server, you need to be in the folder that contains the files.

Once we have both files in the server we can use them to install cuDNN:


sudo dpkg -i libcudnn7_7.4.2.24-1+cuda10.0_amd64.deb



sudo dpkg -i libcudnn7-dev_7.4.2.24-1+cuda10.0_amd64.deb

Tensorflow

In order to use Keras we need to install TensorFlow, we create a folder called project in the route /home/[username]:


mkdir project

We need to install the version of TensorFlow that works with a GPU. Inside the folder project we execute the following command:


conda create -n teslapp_conda python=3.5

We are using python 3.5 due to it is the last version of python compatible with TensorFlow.

We created a virtual environment with python 3.5, we can activate this environment with the command below:


source activate teslapp_conda

Finally we install tensorflow-gpu:


pip install tensorflow-gpu

We can verify the installation with the commands below:


python



import tensorflow as tf

from tensorflow.python.client import device_lib

print(device_lib.list_local_devices())

To exit python we use:


exit()

Tesla App

Once we have TensorFlow installed we need the files of the web app:


git clone https://github.com/vincent1bt/keras-teslapp.git

We also need the model and its weights, we can upload these files to the server with the command below:


scp -i ~/.ssh/google_cloud model.zip [username]@35.192.221.152:/home/[username]/project/keras-teslapp

You can replace model.zip with the route where your model is.

We use unzip to uncompress the file:


unzip model.zip

May we will need to install zip:


sudo apt-get install zip

We install the libraries that the web app needs:


pip install Flask



pip install Pillow



pip install keras



pip install gunicorn

We will use Nginx to access to the web app from an external computer:


sudo apt-get install nginx

We have to configure Nginx:


sudo vim /etc/nginx/sites-enabled/default

We search the section called server and we comment everything except:


listen 80 default_server;

listen [::]:80 default_server;

Now we search the location / section and we add and comment some code:


location / {

        # First attempt to serve request as file, then

        # as directory, then fall back to displaying a 404.

        # try_files $uri $uri/ =404;

        proxy_pass http://127.0.0.1:8000;

}

We restart Nginx to apply the changes:


sudo service nginx restart

Finally we can execute our app:


gunicorn app:app -b localhost:8000 &

We don’t have to change any code of the app and Keras will use the available GPU automatically due to the fact we installed the TensorFlow version that works with a GPU.

We go to the same ip that we used to make the connection to the server and we can see the app running.

As I mentioned before, you should delete the server after playing with it a little bit since it’s really expensive.

We could also use docker to automatize some steps of this tutorial, docker has already installed some libraries and tools to make everything smoother.