5 Best Ways to Preprocess Fashion MNIST Data in Python Using TensorFlow

Rate this post

πŸ’‘ Problem Formulation: Fashion MNIST dataset is a collection of 28×28 grayscale images of 10 fashion categories, often used for benchmarking machine learning algorithms. The preprocessing goal is to convert these images into a suitable format for training models, enhancing features, and improving network performance. Input consists of raw image data, and output is structured and normalized data tensors ready for model ingestion.

Method 1: Normalizing Image Data

Normalizing image pixel values is crucial for convergence during training. TensorFlow offers utilities to scale the image data from the range of 0-255 to 0-1, which is often required for neural network inputs. This process speeds up the training by reducing the computational burden during the gradient descent.

Here’s an example:

import tensorflow as tf
from tensorflow.keras.datasets import fashion_mnist

(train_images, train_labels), (test_images, test_labels) = fashion_mnist.load_data()
train_images = train_images / 255.0
test_images = test_images / 255.0

Output: Normalized image tensors with values ranging from 0 to 1.

This code snippet demonstrates how to load the Fashion MNIST dataset and normalize the image pixels by dividing by the maximum value of the pixel intensity, 255. The ‘train_images’ and ‘test_images’ are arrays of grayscale image data that are now scaled between 0 and 1.

Method 2: Reshaping Data for Convolutional Layers

Convolutional Neural Networks (CNNs) expect images to have a specific dimension, often including the channel even if it’s grayscale. TensorFlow and Keras can reshape the dataset to include a channel dimension, which enables the use of CNNs for feature extraction.

Here’s an example:

train_images = train_images.reshape(train_images.shape[0], 28, 28, 1)
test_images = test_images.reshape(test_images.shape[0], 28, 28, 1)

Output: Reshaped image tensors suitable for convolutional networks, with an added channel dimension.

This snippet reshapes the input image tensors to include a channel dimension, transforming them from shape (60000, 28, 28) to (60000, 28, 28, 1) for the training set, making them suitable for CNNs.

Method 3: Data Augmentation

Data augmentation is a technique to artificially expand the dataset by applying random transformations to the images, such as rotation, shifting, or flipping. TensorFlow’s ImageDataGenerator can be used to perform these operations on-the-fly during training, increasing the model’s robustness and reducing overfitting.

Here’s an example:

from tensorflow.keras.preprocessing.image import ImageDataGenerator

datagen = ImageDataGenerator(
    rotation_range=10,
    width_shift_range=0.1,
    height_shift_range=0.1,
    horizontal_flip=True
)

# Apply it to the training images
datagen.fit(train_images)

Output: An ImageDataGenerator object, which can be used to generate batches of augmented image data.

In the provided code, ImageDataGenerator augments the training images by randomly rotating, shifting, and flipping them. The model will learn from this varied data, which simulates different ways the fashion items might be seen in real-life scenarios.

Method 4: Encoding Categorical Labels

Many neural network architectures require labels to be in a one-hot encoded format. TensorFlow’s tf.keras.utils module contains the to_categorical function, which converts a class vector (integers) to a binary class matrix.

Here’s an example:

from tensorflow.keras.utils import to_categorical

train_labels_one_hot = to_categorical(train_labels)
test_labels_one_hot = to_categorical(test_labels)

Output: One-hot encoded label tensors for training and test datasets.

The code converts categorical label vectors such as ‘2’ for the ‘Pullover’ category into a binary matrix where the index corresponding to ‘2’ is marked with a ‘1’, and all others are ‘0’. This format is required for classification models that use categorical crossentropy as a loss function.

Bonus One-Liner Method 5: Using tf.data for Efficient Data Pipelining

TensorFlow’s tf.data API is designed to build complex input pipelines from simple, reusable pieces. It allows you to efficiently manage large datasets by leveraging prefetching, parallel processing, and optimized data formats.

Here’s an example:

train_data = tf.data.Dataset.from_tensor_slices((train_images, train_labels_one_hot)).batch(32).prefetch(tf.data.AUTOTUNE)
test_data = tf.data.Dataset.from_tensor_slices((test_images, test_labels_one_hot)).batch(32).prefetch(tf.data.AUTOTUNE)

Output: A tf.data.Dataset object that represents an input pipeline for the training and test data.

This code utilizes the tf.data API to create a dataset pipeline that can be fed directly into a TensorFlow model. By prefetching and batching the data, the model can train more efficiently without waiting for data preprocessing between epochs.

Summary/Discussion

  • Method 1: Normalizing Image Data. Essential for model training. Ensures faster convergence. However, it does not introduce additional information to the model.
  • Method 2: Reshaping for CNNs. Enables the use of powerful CNN architectures. It’s a simple, one-off preprocessing step but requires knowledge of the desired input shape for the specific model architecture.
  • Method 3: Data Augmentation. Increases model generalization by extending the dataset with transformed images. Can be computationally expensive and may introduce artifacts if not configured correctly.
  • Method 4: Encoding Categorical Labels. Transforms labels into a format for classification models with softmax activation. Straightforward process but adds an extra step in both preprocessing and postprocessing stages (decoding one-hot).
  • Method 5: Using tf.data. Enhances data loading efficiency and is scalable. Can be complex to configure for optimal performance across different hardware setups.