Vicente Rodríguez

April 23, 2019

How to build a neural network with Keras

Keras is a machine learning framework used to create neural networks. Keras is simple to understand with a friendly sintaxis.

In this tutorial we are going to build a neural network to classify Tesla cars, model s, model x, and model 3, I obtained the images from Google Images.

Keras has a lot of datasets already preloaded, we could use one of those datasets, they are easy to import:


from keras.datasets import cifar10

(X_train, y_train), (X_val, y_val) = cifar10.load_data()

With two lines of code we can have the training set and the validation set ready to be used, but in the real life, data sometimes is hard to obtain, we could find null values, wrong values or corrupted values. In order to learn how to use a real life dataset we will download the set of images that I obtain.

The images

You can use this notebook to see all the code of this tutorial.

You can download the images from Github.

We will import the libraries that we will use throughout the tutorial:


import numpy as np

import os

import shutil

from keras.preprocessing.image import load_img, img_to_array

from keras.utils import to_categorical



from keras.models import Sequential

from keras.layers import Dense, Activation



from matplotlib import pyplot as plt



from numpy.random import seed

import tensorflow as tf

from keras import backend as k

import os

We need to install Keras, you can use Google Colab where everything is already installed.

We need some folders, we can create these folders with the following commands if you are on macOS, Linux or Google Colab:


!mkdir -p validation_images/tesla_model_3 && mkdir validation_images/tesla_model_s && mkdir validation_images/tesla_model_x

Since we don’t have a lot of images in the dataset, we will use only 30 images for each class in the validation set.


validation_set_size = 30



def move_images(from_path, to_path):

  files = os.listdir(from_path)

  folder_size = len(files)

  first_index = folder_size - validation_set_size

  files_to_move = files[first_index:]



  for file_name in files_to_move:

    source_file_name = from_path + file_name

    destination_file_name = to_path + file_name

    shutil.move(source_file_name, destination_file_name)

With the code above we can search folders and move files. The shutil and os libraries do the hard work for us.


move_images("./tesla-cars-dataset-master/tesla-model-3/", "./validation_images/tesla_model_3/")

move_images("./tesla-cars-dataset-master/tesla-model-s/", "./validation_images/tesla_model_s/")

move_images("./tesla-cars-dataset-master/tesla-model-x/", "./validation_images/tesla_model_x/")

Now we have a validation set with 90 images, 30 images for each class.

We can do all this process with the file explorer but it’s important to know how can achieve this only with python code and terminal commands, some we don’t have access to a file explorer and we must use the terminal to create or move folders.

We rename the folder tesla-cars-dataset-master to training_images:


!mv tesla-cars-dataset-master training_images



!mv training_images/tesla-model-3 training_images/tesla_model_3

!mv training_images/tesla-model-s training_images/tesla_model_s

!mv training_images/tesla-model-x training_images/tesla_model_x

Loading the images

Once we have the validation and training sets images in different folders we need to load these images in python variables, this can be done by the following code:


img_height = 64

img_width = 64



def load_images(paths):

  X = []

  y = []



  for path in paths:

    images_paths = os.listdir(path)



    for image_path in images_paths:

      complete_path = path + image_path

      image = load_img(complete_path, target_size=(img_height, img_width))

      image_array = img_to_array(image)

      X.append(image_array)

      label = paths.index(path)

      y.append(label)



  return X, y

The function load_img loads an image from a path and resize this image to a 64x64 resolution, the function img_to_array converts this image to an array that can be used by Keras to train the neural network, the images we are using have a depth of 3, it means they are images with three color channels (RGB), each channel has a value from 0 to 255 that indicates how strong is each color in each pixel, then the final image will have a size of 64x64x3.


training_paths = ["training_images/tesla_model_3/", "training_images/tesla_model_s/", "training_images/tesla_model_x/"]

validation_paths = ["validation_images/tesla_model_3/", "validation_images/tesla_model_s/", "validation_images/tesla_model_x/"]



X_train, y_train = load_images(training_paths)

X_val, y_val = load_images(validation_paths)

The size of X_train is 356x64x64x3, it means we have 356 images of size 64x64x3, and the size of X_val is 90x64x64x3, y_train and y_val have the labels of each image, in this case we have three classes (model 3, model s, model x), if the image in X_train[0] is a Tesla model 3 the value of y_train[0] will be 0, if it is a model s the value of y_train[0] will be 1 and if the image is a model x the value of y_train[0] will be 2.

Finally we have to transform this arrays to numpy arrays:


X_train = np.array(X_train)

X_val = np.array(X_val)



y_train = np.array(y_train)

y_val = np.array(y_val)

Numpy arrays are better than normal python arrays, they have more functions like shape:


X_train.shape, X_val.shape


((356, 64, 64, 3), (90, 64, 64, 3))


y_train.shape, y_val.shape


((356,), (90,))

Now we have to normalize the images, it means we have to change the range of the images from (0-255) to (0-1), this helps the neural network to train better and faster.


X_train = X_train.astype('float32') / 255

X_val = X_val.astype('float32') / 255

Currently y_train and y_val are similar to:


y = [0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 2]

Each position represent the class of each image in the X_train and X_val arrays, however Keras needs a different format for these variables, we can change the format with a Keras function:


y_train = to_categorical(y_train)

y_val = to_categorical(y_val)

Now our variables y are similar to:


y = [[1., 0., 0.],

     [1., 0., 0.],

     [1., 0., 0.],

     [0., 1., 0.],

     [0., 1., 0.],

     [0., 1., 0.],

     [0., 0., 1.],

     [0., 0., 1.],

     [0., 0., 1.],

     [0., 0., 1.]]

and the value of y_train[0] is [1, 0, 0], the length of this array must match the size of the classes, in this case we have 3 classes, if the number 1 appears in the first position of this array it means that X_train[0] class is model 3, if the number 1 appears in the second position [0, 1, 0] X_train[0] class is model s and finally if the number 1 appears in the third position [0, 0, 1] X_train[0] class is model x.

At the end we have a different representation that means the same.


y_train.shape, y_val.shape


(356, 3), (90, 3))

The shape is different as well.

In this problem we have to transform X_train and X_val, in a normal neural network like the one we will use, the input layer must have two dimensions, where the first dimension is the number of images we have and the second dimension is the number of features that each image has, therefore we have to transform 64 * 64 * 3 into one dimension of size 12288, as a result the input layer will have 12288* neurons.


second_shape = 64 * 64 * 3



X_train = X_train.reshape(X_train.shape[0], second_shape)

X_val = X_val.reshape(X_val.shape[0], second_shape)

We can see the new shape of both variables X:


X_train.shape, X_val.shape


((356, 12288), (90, 12288))

Finally we have the variables X and y ready to be used in the neural network.

The neural network

We will build a neural network with two hidden layers:


model = Sequential()

model.add(Dense(units=64, input_shape=(12288,)))

model.add(Activation('relu'))

model.add(Dense(units=32))

model.add(Activation('relu'))

model.add(Dense(units=3, activation="softmax"))

With the code above we created a simple neural network, now I will explain each part:


model = Sequential()

This line defines a new Sequential object, we can add several layers to this object.


model.add(Dense(units=64, input_shape=(12288,)))

model.add(Activation('relu'))

Here we added one hidden layer with 64 neurons and relu as the activation function, we can also indicate that the input data has a shape of 12288 in the same line.


model.add(Dense(units=32))

model.add(Activation('relu'))

We can add more layers, this second hidden layer has 32 neurons and the activation function relu. We can see that Keras names these layers as Dense layers.


model.add(Dense(units=3, activation="softmax"))

At the end we have the output layer where we have 3 neurons, one for each class and the softmax as activation function, we use this activation function in the output layer when we have more than 2 classes

We don't have to create the parameters W and b by ourselves, the model do this by its own.


model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

With the compile method we can define some parameters, categorical crossentropy is the loss function that we will use, previously we have used the binary cross entropy loss function but this time we have 3 classes, Adam is the optimizer that the neural network will use, an optimizer is the way how the gradient descent algorithm works, with the metrics parameter the model will compute the accuracy after each epoch.

All these parameters are called hyperparameters and we have to tune them to obtain a better model.

In order to train this model we need 4 more lines:


epochs = 10

batch_size = 32



model_train = model.fit(X_train, y_train, batch_size=batch_size, epochs=epochs, verbose=1, validation_data=(X_val, y_val))

The fit method will train the neural network with the training data and will measure the performance of the model with the validation data, usually the epochs variable indicates how many times the model will compute the forwardpropagation step and the backpropagation step but in this case we have a variable called batch_size that changes this behavior, if we have 960 images, in each epoch the model will learn from these 960 images, the model will take 32 images and will compute the forwardpropagation step and the backpropagation step for these 32 images, consequently the model will compute these two steps 30 times in each epoch (960 / 32 = 30).To sum up, the model will compute 300 times these steps throughout 10 epochs.

We can change the value of batch_size, if we have a greater value the model will train faster but we will have less updates of the W and b values, if the value is smaller the model will train slower but we could reach a better accuracy. Sometimes we don’t have a good hardware and these images have to be in the ram, therefore we can use a small batch size to only load ,for example, 32 images at one time in the ram.

We can plot the accuracy and the loss values in each epoch with the matplotlib library:


def plot_loss_and_accuracy(model_train):

  accuracy = model_train.history['acc']

  val_accuracy = model_train.history['val_acc']

  loss = model_train.history['loss']

  val_loss = model_train.history['val_loss']

  epochs = range(len(accuracy))

  plt.plot(epochs, accuracy, 'b', label='Training accuracy')

  plt.plot(epochs, val_accuracy, 'r', label='Validation accuracy')

  plt.ylim(ymin=0)

  plt.ylim(ymax=1)

  plt.xlabel('Epochs ', fontsize=16)

  plt.ylabel('Accuracity', fontsize=16)

  plt.title('Training and validation accuracy', fontsize = 20)

  plt.legend()

  plt.figure()

  plt.plot(epochs, loss, 'b', label='Training loss')

  plt.plot(epochs, val_loss, 'r', label='Validation loss')

  plt.xlabel('Epochs ',fontsize=16)

  plt.ylabel('Loss',fontsize=16)

  plt.title('Training and validation loss', fontsize= 20)

  plt.legend()

  plt.show()


plot_loss_and_accuracy(model_train)

Now we have two plots:

primer modelo

We can check the accuracy of the model:


validation_acc = model_train.history['val_acc'][-1] * 100

training_acc = model_train.history['acc'][-1] * 100

print("Validation accuracy: {}%\nTraining Accuracy: {}%".format(validation_acc, training_acc))


Validation accuracy: 34.44444470935398%

Training Accuracy: 83.70786516853933%

Here we have an overfitting problem, actually we have some more problems, firstly the dataset is too small, we have 446 images for 3 classes almost 150 images for each class, the architecture we used is not optimal for this type of problem, there is a better architecture when we are dealing with images, this architecture called CNN (convolutional neural network) allows us to use a technique called data augmentation, with this technique we can transform the images to obtain more, we will see all these things in the convolutional neural network tutorial, right now we will create a new model and fix the overfitting problem.

A second model

This model is more complex and has an extra hidden layer:


second_model = Sequential()

second_model.add(Dense(units=16, input_shape=(12288,)))

second_model.add(Activation('relu'))

second_model.add(Dense(units=32))

second_model.add(Activation('relu'))

second_model.add(Dense(units=64))

second_model.add(Activation('relu'))

second_model.add(Dense(units=3, activation="softmax"))

Now we train this new model:


second_model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])



second_model_train = second_model.fit(X_train, y_train, batch_size=batch_size, epochs=epochs, verbose=1, validation_data=(X_val, y_val))

We plot the loss and accuracy values:


plot_loss_and_accuracy(second_model_train)

second model

Finally we check the accuracy of the model:


validation_acc = second_model_train.history['val_acc'][-1] * 100

training_acc = second_model_train.history['acc'][-1] * 100

print("Validation accuracy: {}%\nTraining Accuracy: {}%".format(validation_acc, training_acc))


Validation accuracy: 46.66666673289405%

Training Accuracy: 52.80898876404494%

Now the accuracy values are more equal but both are pretty bad yet.

We can use three new images to make predictions with this model:

We load these images in python variables:


X = []



image = load_img("./Tesla-Model-3-4-720x550.jpg", target_size=(img_height, img_width))

image_array = img_to_array(image)

X.append(image_array)



image = load_img("./109102_source-2.jpg", target_size=(img_height, img_width))

image_array = img_to_array(image)

X.append(image_array)



image = load_img("./2017_Tesla_Model_X_100D_Front.jpg", target_size=(img_height, img_width))

image_array = img_to_array(image)

X.append(image_array)



X_test = np.array(X)

We should always normalize the images:


X_test = X_test.astype('float32') / 255

X_test = X_test.reshape(X_test.shape[0], second_shape)

The function predict will output a class for each image:


y_pred = second_model.predict(X_test, batch_size=None, verbose=1, steps=None)

Now we have an array with the prediction score for each class for each image, we have to choose the prediction with the highest score


y_pred = [[0.22741655, 0.4156091 , 0.35697436],

       [0.583717  , 0.3267301 , 0.08955295],

       [0.35106906, 0.5099873 , 0.13894367]]



np.argmax(y_pred, axis=1)


array([1, 0, 1])

The true labels are [0, 1, 2] and the prediction labels were [1, 0, 1] this means that our model is pretty bad but don’t worry we can improve this using more techniques like Dropout and Batch normalization or use a convolutional neural network, we are going to see this in future tutorials.