How to Classify Star Wars Lego Images using CNN and Transfer Learning

This tutorial is about training deep learning (DL) models to classify Star Wars Lego images. We use the TensorFlow library to create and compare the image classifiers.

Are you looking for interesting deep learning projects that are suitable for beginners? Do not worry, this is not another MNIST image classification tutorial. Instead, we are going to classify some Star Wars Lego images using the TensorFlow library. This tutorial will sharpen your knowledge about convolutional neural networks and transfer learning. Intrigued? Let’s get started.

Install and Import Modules

Feel free to download the script for this tutorial from this GitHub repo. We will execute it in Google Colab and use some free GPU resources for model training. If you would like to try Google Colab out, head over to the site and sign up using your Gmail account. It looks like Jupyter Notebook but with its storage location in your Google Drive. Upload the script onto your Google Colab and execute it along as you read through this article.

Execute the following command on a terminal or command prompt to clone any GitHub repo:

$ git clone https://github.com/username/project_name.git

The second step is to enable the GPU resource in our Colab environment. There are two ways to do this:

  • Method 1: Click on the Edit tab. At the Notebook settings, choose GPU at the drop-down, and click Save.
  • Method 2: Click on the Runtime tab. At the Change Runtime Type, choose GPU, and click Save.

Feel free to check out the video version of this tutorial for more in-depth explanations.

Now, install the necessary packages using pip:

$ pip install numpy pandas matplotlib seaborn tensorflow

As well as importing all the required libraries:

import os
import math
import random
import shutil

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sn

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

We need the first four libraries to restructure data folders. The Numpy and Pandas libraries will be used for data manipulation. The Matplotlib and Seaborn libraries will be used to display plots and figures. The TensorFlow library is what we will use for machine learning modeling.

To check if GPU is enabled in our notebook environment, execute the following:

tf.test.gpu_device_name()

If a GPU is allocated for your notebook, you will see a printout like ‘/device: GPU:0’ instead of an empty string.

Execute the following line to check which GPU is assigned for you:

!nvidia-smi

Locate the GPU name at the printout. For example, ‘Tesla P4’.

Load Dataset and Preprocess Data

So far, so good. Now, we need to download the dataset, which is the Lego Minifigures dataset from Kaggle. We are going to use only the Star Wars folder for this tutorial. There are 15 subfolders of images in the Star Wars folder. To simplify things, we are going to use only the first five folders out of 15. So, go ahead and remove everything except for the first five folders of Star Wars. Then, upload this data folder to your Google Drive (the one that has the same Gmail account as your Colab).

Dataset uploaded. Now, go back to our Google Colab interface and mount the Google Drive to it so that we can access the data. To do that, click on the Google Drive icon at the left of the interface to mount it – as shown in Figure 1.

Figure 1: Mount Google Drive on Google Colab.

You will see a folder named “drive” appear on the data repository once it is mounted. Next, we are going to restructure our data folder. We want to create a train set, a validation set, and a test set for modeling and evaluation.

Execute the following code to restructure the data folder:

BASE_DIR = '/content/drive/MyDrive/star-wars/'
names = ["YODA", "LUKE SKYWALKER", "R2-D2", "MACE WINDU", "GENERAL GRIEVOUS"]
train_proportion = 0.6
val_proportion = 0.25
total_train = 0
total_val = 0
total_test = 0

In the code, we defined a base directory for the Google Drive folder. We also reassigned names to the five data subfolders. We defined the proportion of image distribution in each folder to be copied onto the new sets. For example, in a subfolder of 10 images, 6 images will be copied to the train set, 3 to the validation set, and the remaining images to the test set. The variables total_train, total_val, and total_test are counters to calculate the total number of images in the said folders.

Execute the following lines to create new folders:

if not os.path.isdir(f'{BASE_DIR}train/'):
    for name in names:
        os.makedirs(f'{BASE_DIR}train/{name}')
        os.makedirs(f'{BASE_DIR}val/{name}')
        os.makedirs(f'{BASE_DIR}test/{name}')

Executing the following code will copy the images over to the new folders:

orig_folders = ["0001/", "0002/", "0003/", "0004/", "0005/"]

for folder_idx, folder in enumerate(orig_folders):
    files = os.listdir(BASE_DIR + folder)

    folder_name = names[folder_idx]
    number_of_images = len([name for name in files])

    n_train = int((number_of_images * train_proportion) + 0.5)
    n_valid = int((number_of_images * val_proportion) + 0.5)
    n_test = number_of_images - n_train - n_valid

    total_train += n_train
    total_val += n_valid
    total_test += n_test

    print(f'Folder {folder_name} has {number_of_images} images in total:\n train - {n_train}, val - {n_valid}, test - {n_test}\n')

    # copy images from original folders to the new ones
    for idx, file in enumerate(files):
        file_name = BASE_DIR + folder + file
        if idx < n_train:
            shutil.copy(file_name, f'{BASE_DIR}train/{folder_name}')
        elif idx < n_train + n_valid:
            shutil.copy(file_name, f'{BASE_DIR}val/{folder_name}')
        else:
            shutil.copy(file_name, f'{BASE_DIR}test/{folder_name}')

The for loop iterates through each subfolder and copy the images to the new folders. You will see that three new data folders are created, each containing five image folders.

We can now proceed to the data pre-processing step. One of the common pre-processing techniques for image data is normalization. It is to squeeze the value of input images from a range of 0-255 to a range of 0-1. That usually helps for better model training and convergence.

Execute the following lines to create three data generators:

train_gen = keras.preprocessing.image.ImageDataGenerator(rescale=1./255, 
	     rotation_range=20, horizontal_flip=True, 
             width_shift_range=0.2, height_shift_range=0.2, 
             shear_range=0.2, zoom_range=0.2)
valid_gen = keras.preprocessing.image.ImageDataGenerator(rescale=1./255)
test_gen = keras.preprocessing.image.ImageDataGenerator(rescale=1./255)

Data generators load data in batch, perform data pre-processing and pass them to a machine learning model. The rescale=1./255 configuration is for image normalization. More pre-processing techniques are also configured for the train generator.

Next, execute the following lines to load the data in batches using data generators:

target_size = (256, 256)
batch_size = 4

train_batches = train_gen.flow_from_directory(
    f'{BASE_DIR}train',
    target_size=target_size,
    class_mode='sparse',
    batch_size=batch_size,
    shuffle=True,
    color_mode="rgb",
    classes=names   
)

val_batches = valid_gen.flow_from_directory(
    f'{BASE_DIR}val',
    target_size=target_size,
    class_mode='sparse',
    batch_size=batch_size,
    shuffle=False,
    color_mode="rgb",
    classes=names
)

test_batches = test_gen.flow_from_directory(
    f'{BASE_DIR}test',
    target_size=target_size,
    class_mode='sparse',
    batch_size=batch_size,
    shuffle=False,
    color_mode="rgb",
    classes=names
)

We define the target input size as (256, 256) with a batch size of 4. The flow_from_directory() function of the data generators pulls the data from the given directory one batch at a time.

The rest of the code in this section is to check how data batches look like. Execute the code in the script and see if the outcome makes sense.

Train and Evaluate A CNN Model

It seems like we spent a lot of time fiddling with the data. That is where most data scientists spent their time. Good data processing is crucial for training machine learning models. You will thank yourself for the effort later.

Now it’s time to get our hands dirty for machine learning! Execute the following lines to create a basic convolutional neural network model:

model = keras.models.Sequential()
model.add(layers.Conv2D(32, (3,3), strides=(1,1), padding="valid", 
activation='relu', input_shape=(256, 256,3)))
model.add(layers.MaxPool2D((2,2)))
model.add(layers.Conv2D(64, 3, activation='relu'))
model.add(layers.MaxPool2D((2,2)))
model.add(layers.Flatten())
model.add(layers.Dense(64, activation='relu'))
model.add(layers.Dense(5))

If you want to see how the model architecture looks like, execute this line:

model.summary()

From the printout, we see that the model consists of two convolution layers and max-pooling layers, as well as a flatten layer. It is followed by a dense layer and an output layer of 5 units. The unit of output layer determines the number of categories for model prediction.

Let’s compile the model with the loss, accuracy, and optimization functions:

loss = keras.losses.SparseCategoricalCrossentropy(from_logits=True)
optim = keras.optimizers.Adam(learning_rate=0.001)
metrics = ["accuracy"]

model.compile(optimizer=optim, loss=loss, metrics=metrics)

Note that there is no one-hot encoding for the labels (and no Softmax at the output layer). The loss function is chosen based on that.

Execute the following lines for the actual model training:

epochs = 30

early_stopping = keras.callbacks.EarlyStopping(monitor="val_loss", 
patience=5, verbose=2)

history = model.fit(train_batches, validation_data=val_batches,
callbacks=[early_stopping],epochs=epochs, verbose=2)

model.save(f"{BASE_DIR}lego_model.h5")

We defined 30 epochs and an EarlyStopping() function to the fit() function. The EarlyStopping() function will end the model training when the criteria are fulfilled. The fit() function trains and validates the model. We also saved the model as an H5 file when the training is completed.

Let’s check out how the model performs by plotting the loss and accuracy outcomes:

plt.figure(figsize=(16, 6))
plt.subplot(1, 2, 1)
plt.plot(history.history['loss'], label='train loss')
plt.plot(history.history['val_loss'], label='valid loss')
plt.grid()
plt.legend(fontsize=15)

plt.subplot(1, 2, 2)
plt.plot(history.history['accuracy'], label='train acc')
plt.plot(history.history['val_accuracy'], label='valid acc')
plt.grid()
plt.legend(fontsize=15)
Figure 2: Losses and accuracies of the basic CNN model.

As shown in Figure 2, most of the train and validation losses are in the high range of 1.0 and 2.0. The train and validation accuracies fluctuated a lot. Generally, the model outcome is not optimal.

We can see the model performance more obvious with images and predictions. Execute the following lines to make predictions with test data and plot the outcome:

predictions = model.predict(test_batches)
predictions = tf.nn.softmax(predictions)
labels = np.argmax(predictions, axis=1)

print(test_batches[0][1])
print(labels[0:4])

show(test_batches[0], labels[0:4])

Note that Softmax is added here to compare the result with respective labels. This is how the prediction looks like:

Figure 3: Original labels vs predicted labels of the basic CNN model, with a sample batch of test data.

As we can see, the model predicted all four sample test data incorrectly. It categorized most of the inputs as R2-D2, which shows a sign of overfitting. So, how do we go about improving the model performance?

Comparison with a Transfer Learning Model

There are many things that we can do to improve the model. An example would be to adjust the hyperparameters. We will instead replace the model architecture with a transfer learning model. Let’s see if this can yield a better model performance.

Create a transfer learning model based on the VGG16 architecture:

vgg_model = tf.keras.applications.vgg16.VGG16()

model = keras.models.Sequential()
for layer in vgg_model.layers[0:-1]:
    model.add(layer)

for layer in model.layers:
    layer.trainable = False

Here, we download the pre-trained weights of the VGG16 model. We add all layers except the output layer to a new model and make the weights unchangeable. Then, we add an output layer to the model, as follows:

model.add(layers.Dense(5))

That’s it, now we have a transfer learning model with a custom output layer. We only need to train the last layer while all other layers stay the same. Compile the model with the same configurations as the basic CNN model:

loss = keras.losses.SparseCategoricalCrossentropy(from_logits=True)
optim = keras.optimizers.Adam(learning_rate=0.001)
metrics = ["accuracy"]

model.compile(optimizer=optim, loss=loss, metrics=metrics)

The VGG16 function from TensorFlow comes with its pre-processing function. We use it for all the data generators:

preprocess_input = tf.keras.applications.vgg16.preprocess_input

train_gen = keras.preprocessing.image.ImageDataGenerator(preprocessing_function=preprocess_input)
valid_gen = keras.preprocessing.image.ImageDataGenerator(preprocessing_function=preprocess_input)
test_gen = keras.preprocessing.image.ImageDataGenerator(preprocessing_function=preprocess_input)

Train the model using the same workflow we defined earlier, as follows:

target_size = (224, 224)
batch_size = 4

train_batches = train_gen.flow_from_directory(
    f'{BASE_DIR}train',
    target_size=target_size,
    class_mode='sparse',
    batch_size=batch_size,
    shuffle=True,
    color_mode="rgb",
    classes=names   
)

val_batches = valid_gen.flow_from_directory(
    f'{BASE_DIR}val',
    target_size=target_size,
    class_mode='sparse',
    batch_size=batch_size,
    shuffle=False,
    color_mode="rgb",
    classes=names
)

test_batches = test_gen.flow_from_directory(
    f'{BASE_DIR}test',
    target_size=target_size,
    class_mode='sparse',
    batch_size=batch_size,
    shuffle=False,
    color_mode="rgb",
    classes=names
)

epochs = 30

# callbacks
early_stopping = keras.callbacks.EarlyStopping(
    monitor="val_loss",
    patience=5,
    verbose=2)

history = model.fit(train_batches, 
                    validation_data=val_batches,
                    callbacks=[early_stopping],
                    epochs=epochs, verbose=2)

model.save(f"{BASE_DIR}lego_model_transfer-learning.h5")

Done with model training. Now we plot its losses and accuracies, as well as testing the model with the same batch of test data:

plt.figure(figsize=(16, 6))
plt.subplot(1, 2, 1)
plt.plot(history.history['loss'], label='train loss')
plt.plot(history.history['val_loss'], label='valid loss')
plt.grid()
plt.legend(fontsize=15)

plt.subplot(1, 2, 2)
plt.plot(history.history['accuracy'], label='train acc')
plt.plot(history.history['val_accuracy'], label='valid acc')
plt.grid()
plt.legend(fontsize=15);
Figure 4: Losses and accuracies of the transfer learning CNN model.
model.evaluate(test_batches, verbose=2)

# make some predictions
predictions = model.predict(test_batches)
predictions = tf.nn.softmax(predictions)
labels = np.argmax(predictions, axis=1)

print(test_batches[0][1])
print(labels[0:4])

show(test_batches[0], labels[0:4])
Figure 5: Original labels vs predicted labels of the transfer learning CNN model.

As shown in Figure 5, the model got three out of four sample data right. The training loss in Figure 4 looks better than the previous model. Both the train and validation accuracies achieved a higher and less fluctuating outcome. So we can say that the transfer learning model performs better than the basic CNN model. Note that both the models can be further optimized, so do not take this code example as an end-all result.

Conclusion

Yay! We learned about image classification by implementing a basic CNN and a transfer learning CNN. I hope this was a fun learning process for you! If you encounter any issues and would like an in-depth walkthrough of the code, the video explanation is there to help you out. Happy learning!