Building Your First Image Classifier with TensorFlow and VGG16
Building Your First Image Classifier with TensorFlow and VGG16
Have you ever wondered how your phone’s gallery app can recognize faces or objects in pictures? This technology is known as image classification, and in this blog post, we’ll walk you through how to build your image classifier.
We’ll teach a computer to recognize different types of flowers (daisies, tulips, roses, etc.). Don’t worry if you’re new to this! We’re on the same boat. I’ve only started learning about Deep Learning recently, see my post: Why I Went Back to School for AI (Even With a Full Plate).
We’ll be using a powerful technique called transfer learning, which is like getting a head start from an expert.
Our Tools:
- TensorFlow & Keras: An open-source library for machine learning that makes building models straightforward. It provides a set of tools that hide the complexities of dealing with neural networks.
- VGG16: A pre-trained model developed by researchers at Oxford. It’s already an expert at understanding general features in images, and we’ll adapt its knowledge for our flower dataset.
By the end of this post, you’ll have a clear understanding of how to prepare data, build and train a sophisticated model, and evaluate its performance. Let’s get started!
Step 1: Setting Up Our Workspace in Google Colab
First, we need to set up our environment and import the necessary tools. If you’re using Google Colab, the first step is to connect it to your Google Drive, where we’ll store our dataset.
# Mount Google Drive folder
import os
from google.colab import drive
DRIVE_FOLDER = "/drive" # Your drive folder may be different
drive.mount(DRIVE_FOLDER, force_remount=True)
# Import all the packages we'll need
import tensorflow as tf
import pathlib
import shutil
import random
import matplotlib.pyplot as plt
import numpy as np
from collections import Counter
from PIL import Image
from tensorflow.keras.callbacks import EarlyStopping
Step 2: Downloading the Flower Dataset
We’ll use a popular flower photo dataset provided by TensorFlow. The following code will download and extract it into a specific folder in your Google Drive. If it’s already there, it will skip the download.
# Configuration for our dataset
DATASET_URL = "http://download.tensorflow.org/example_images/flower_photos.tgz"
# Make sure this path points to where you want the data in your Drive
DATA_DIR = f"{DRIVE_FOLDER}/MyDrive/Colab Notebooks/Flower_classifier/flower_photos"
# Download and Extract Dataset into the specified path
def download_and_extract(url, path):
if not os.path.exists(path):
print(f"Downloading and extracting dataset to {path}...")
# Use Keras's utility to get the file
tf.keras.utils.get_file(
origin=url,
fname='flower_photos', # name for the downloaded file
extract=True,
cache_dir='.', # This ensures the files are extracted inside the desired directory
cache_subdir=os.path.dirname(path)
)
print("Download and extraction complete.")
else:
print(f"Dataset already exists at {path}.")
# Execute the download
download_and_extract(DATASET_URL, DATA_DIR)
# Get a list of all image file paths
data_root = pathlib.Path(DATA_DIR)
all_image_paths = [str(p) for p in data_root.glob('*/*') if p.is_file()]
random.shuffle(all_image_paths)
print(f"\nFound {len(all_image_paths)} images.")
Our dataset contains five types of flowers. The folder names from the downloaded dataset are used as the labels. Let’s get these labels.
# The label names are the directory names
label_names = sorted(item.name for item in data_root.glob('*/') if item.is_dir())
print(f"Classes: {label_names}")
Step 3: Exploring Our Data
A crucial step in any machine learning project is to understand the data you’re working with.
First, let’s see how many images we have for each flower. A balanced dataset (where each class has a similar number of images) is ideal.
# Get the labels for each image path
all_image_labels = [pathlib.Path(path).parent.name for path in all_image_paths]
# Count the occurrences of each label
label_counts = Counter(all_image_labels)
# Display the counts per label
print("\nNumber of instances per label:")
for label, count in label_counts.items():
print(f"{label}: {count}")
Next, let’s visualize a few samples from each class to get a feel for what they look like.
def display_sample_images_per_class(image_paths, label_names, num_samples=3):
images_by_class = {name: [] for name in label_names}
for img_path in image_paths:
class_name = os.path.basename(os.path.dirname(img_path))
if len(images_by_class[class_name]) < num_samples:
images_by_class[class_name].append(img_path)
plt.figure(figsize=(num_samples * 3, len(label_names) * 3))
for i, class_name in enumerate(label_names):
for j, img_path in enumerate(images_by_class[class_name]):
ax = plt.subplot(len(label_names), num_samples, i * num_samples + j + 1)
plt.imshow(Image.open(img_path))
plt.title(f"{class_name}\nSample {j+1}")
plt.axis("off")
plt.tight_layout()
plt.show()
# Display 3 samples for each flower type
display_sample_images_per_class(all_image_paths, label_names, num_samples=3)
Source: TensorFlow
Step 4: Preparing Data for Training
A model needs to be trained on one set of data, validated on another, and finally tested on a “secret” test set it has never seen before. This ensures the model is truly learning and not just memorizing.
We’ll split our data:
- Training Set (80%): The model learns from this data.
- Validation Set (15% of the training set): Used to tune the model during training.
- Test Set (the remaining 20%): Used for the final grade to see how well the model generalizes.
def split_image_dataset(image_paths, train_split=0.8, val_split_from_train=0.15):
train_paths, test_paths = [], []
class_paths = {label: [] for label in label_names}
# Group paths by class to ensure balanced splits
for path in image_paths:
class_name = os.path.basename(os.path.dirname(path))
class_paths[class_name].append(path)
# Split each class group into train and test
for class_name, paths in class_paths.items():
random.shuffle(paths)
num_train = int(len(paths) * train_split)
train_paths.extend(paths[:num_train])
test_paths.extend(paths[num_train:])
random.shuffle(train_paths)
random.shuffle(test_paths)
# Further split training into training and validation
num_val = int(len(train_paths) * val_split_from_train)
val_paths = train_paths[:num_val]
train_paths = train_paths[num_val:]
return train_paths, val_paths, test_paths
train_image_paths, val_image_paths, test_image_paths = split_image_dataset(all_image_paths)
print(f"\nTotal images: {len(all_image_paths)}")
print(f"Training images: {len(train_image_paths)}")
print(f"Validation images: {len(val_image_paths)}")
print(f"Test images: {len(test_image_paths)}")
Creating a Data Pipeline
Now we’ll create an efficient data pipeline using tf.data.Dataset
. This pipeline will:
- Read the images from their file paths.
- Decode them into pixel data.
- Resize all images to a consistent size.
- Normalize the pixel values from a range of
[0, 255]
to[0, 1]
. This helps the model train more effectively. - Convert our text labels (e.g., “daisy”) into numbers (e.g., 0).
- Group the data into batches.
TARGET_IMAGE_SIZE = (200, 200)
BATCH_SIZE = 8
# Create a lookup table to map string labels to integer indices
table = tf.lookup.StaticHashTable(
tf.lookup.KeyValueTensorInitializer(
keys=tf.constant(label_names),
values=tf.constant(list(range(len(label_names))))
),
default_value=-1
)
def preprocess_image(image_path):
image = tf.io.read_file(image_path)
image = tf.image.decode_jpeg(image, channels=3)
image = tf.image.resize(image, TARGET_IMAGE_SIZE)
image = tf.cast(image, tf.float32) / 255.0 # Normalize to [0, 1]
return image
def get_label(image_path):
parts = tf.strings.split(image_path, os.path.sep)
class_name = parts[-2] # The folder name is the class name
return table.lookup(class_name)
def create_dataset(image_paths, batch_size, shuffle=True):
path_ds = tf.data.Dataset.from_tensor_slices(image_paths)
# Map the processing functions to our paths
image_label_ds = path_ds.map(
lambda x: (preprocess_image(x), get_label(x)),
num_parallel_calls=tf.data.AUTOTUNE
)
if shuffle:
image_label_ds = image_label_ds.shuffle(buffer_size=len(image_paths))
# Batch the data and use prefetch for performance
image_label_ds = image_label_ds.batch(batch_size).prefetch(tf.data.AUTOTUNE)
return image_label_ds
train_ds = create_dataset(train_image_paths, BATCH_SIZE, shuffle=True)
val_ds = create_dataset(val_image_paths, BATCH_SIZE, shuffle=False)
test_ds = create_dataset(test_image_paths, BATCH_SIZE, shuffle=False)
Step 5: The Fun Part — Building Our Models!
We will build and train our model in three stages to see how different transfer learning strategies affect performance.
Loading the VGG16 Base Model
First, we load our expert model, VGG16
. We chop off its original classification layer (include_top=False
) because we only need its feature-finding brain, not its original knowledge of 1000 different classes.
IMG_SHAPE = TARGET_IMAGE_SIZE + (3,)
# Load the pre-trained VGG16 model
base_model = tf.keras.applications.VGG16(
input_shape=IMG_SHAPE,
include_top=False, # Don't include the final classification layer
weights='imagenet' # Use weights pre-trained on ImageNet
)
# Freeze the base model's layers so we don't change them initially
base_model.trainable = False
Approach 1: Building a Strong Baseline with Feature Extraction
In our first approach, we use the VGG16 model as a fixed feature extractor. We’ll freeze all its layers and only train a new, custom classification “head” that we add on top.
The Strategy:
- Add a
GlobalAveragePooling2D
layer to summarize the features extracted by VGG16. - Add a couple of
Dense
layers to learn patterns specific to our flowers. - Use
BatchNormalization
to stabilize training andDropout
to prevent the model from simply memorizing the training images (a technique called regularization).
***A Quick Note:***When you run the code below, your results might be slightly different from the numbers you see here. This is perfectly normal! Machine learning models have a degree of randomness (like how the data is shuffled or how the model’s initial weights are set). The final accuracy should be close, but don’t worry if it doesn’t match exactly.
# Create our custom classification head
model_baseline = tf.keras.Sequential([
base_model, # The frozen VGG16 base
tf.keras.layers.GlobalAveragePooling2D(),
# Add our own classification layers
tf.keras.layers.Dense(512, activation='relu'),
tf.keras.layers.BatchNormalization(), # Helps stabilize training
tf.keras.layers.Dropout(0.7), # Prevents overfitting
tf.keras.layers.Dense(256, activation='relu'),
tf.keras.layers.BatchNormalization(),
tf.keras.layers.Dropout(0.5),
# Final output layer: 5 neurons for 5 flower classes
tf.keras.layers.Dense(len(label_names), activation='softmax')
])
# Compile the model
# A lower learning rate is good when adding layers on a pre-trained base
model_baseline.compile(
optimizer=tf.keras.optimizers.Adam(learning_rate=0.0001),
loss=tf.keras.losses.SparseCategoricalCrossentropy(),
metrics=['accuracy']
)
model_baseline.summary()
# Train the model
print("\n--- Training the Baseline Model (VGG16 Frozen) ---")
# Stop training early if performance on the validation set stops improving
early_stopping = EarlyStopping(monitor='val_loss', patience=5, restore_best_weights=True)
history_baseline = model_baseline.fit(
train_ds,
epochs=50,
validation_data=val_ds,
callbacks=[early_stopping]
)
After training, we evaluate our baseline model on the unseen test data.
# Evaluate the baseline model on the test dataset
loss, accuracy = model_baseline.evaluate(test_ds)
print(f"Baseline Model Test Accuracy: {accuracy:.2%}")
This model should achieve around84–85% accuracy. This is a very strong start! By just training a few small layers, we’ve built a competent classifier, all thanks to the powerful, pre-trained features from VGG16.
Model 1 Improved Test Loss: 0.46328628063201904, Model 1 Improved Test Accuracy: 0.8478260636329651
Approach 2: Improving with Partial Fine-Tuning
For our second approach, we’ll try to improve performance by unfreezing thelast few layersof the VGG16 model. The idea is that the first layers of VGG16 learn general features like edges and colors, while the later layers learn more complex patterns. By unfreezing them, we allow them to adapt to the specific features of our flowers.
The Strategy:
- Keep the same architecture.
- Unfreeze the layers in the final convolutional block of VGG16 (
block5
). - Use avery low learning rate. This is crucial! We want to gently nudge the existing expert weights, not erase them.
# Unfreeze the base model to allow fine-tuning
base_model.trainable = True
# Unfreeze just the last few layers (Block 5)
fine_tune_at = len(base_model.layers) - 4
for layer in base_model.layers[:fine_tune_at]:
layer.trainable = False
# We'll build a new model for this approach
# For simplicity, we define the architecture again
model_fine_tuned = tf.keras.Sequential([
base_model, # Partially unfrozen base
tf.keras.layers.GlobalAveragePooling2D(),
tf.keras.layers.Dense(512, activation='relu'),
tf.keras.layers.BatchNormalization(),
tf.keras.layers.Dropout(0.7),
tf.keras.layers.Dense(256, activation='relu'),
tf.keras.layers.BatchNormalization(),
tf.keras.layers.Dropout(0.5),
tf.keras.layers.Dense(len(label_names), activation='softmax')
])
# Compile with a very low learning rate for fine-tuning
model_fine_tuned.compile(
loss=tf.keras.losses.SparseCategoricalCrossentropy(),
optimizer=tf.keras.optimizers.Adam(learning_rate=0.00001), # 10x smaller
metrics=['accuracy']
)
# Train the model
print("\n--- Training the Partially Fine-Tuned Model (Unfreeze Block 5) ---")
history_fine_tuned = model_fine_tuned.fit(
train_ds,
epochs=50,
validation_data=val_ds,
callbacks=[early_stopping]
)
Let’s see the result:
# Evaluate the fine-tuned model
loss_fine_tuned, accuracy_fine_tuned = model_fine_tuned.evaluate(test_ds)
print(f"Partially Fine-Tuned Model Test Accuracy: {accuracy_fine_tuned:.2%}")
You should see a nice bump in performance, likely to around87–88% accuracy. By allowing the model to adjust its most specialized layers, we gained a few percentage points. This confirms that adapting the pre-trained weights to our specific dataset is beneficial.
Model 2 Test Loss: 0.3447573482990265, Model 2 Test Accuracy: 0.873641312122345
Approach 3: Maximum Performance with Full Fine-Tuning
For our final approach, let’s go all out and unfreeze theentireVGG16 base model. This gives the model maximum flexibility to adjust all its learned features to our flower dataset.
The Strategy:
- Keep the same architecture.
- Unfreeze all layers of the
base_model
. - Use anextremely low learning rateto avoid destroying the valuable pre-trained knowledge.
# Unfreeze the entire base model
base_model.trainable = True
# We'll build one more model for this final approach
model_full_tune = tf.keras.Sequential([
base_model, # Fully unfrozen base
tf.keras.layers.GlobalAveragePooling2D(),
tf.keras.layers.Dense(512, activation='relu'),
tf.keras.layers.BatchNormalization(),
tf.keras.layers.Dropout(0.7),
tf.keras.layers.Dense(256, activation='relu'),
tf.keras.layers.BatchNormalization(),
tf.keras.layers.Dropout(0.5),
tf.keras.layers.Dense(len(label_names), activation='softmax')
])
# Compile with an even lower learning rate
model_full_tune.compile(
loss=tf.keras.losses.SparseCategoricalCrossentropy(),
optimizer=tf.keras.optimizers.Adam(learning_rate=0.000001), # 10x smaller again!
metrics=['accuracy']
)
# Train the model
print("\n--- Training the Fully Fine-Tuned Model (Unfreeze All) ---")
history_full_tune = model_full_tune.fit(
train_ds,
epochs=80, # Allow more epochs for this delicate tuning
validation_data=val_ds,
callbacks=[early_stopping]
)
And the final result:
# Evaluate the fully fine-tuned model
loss_full_tune, accuracy_full_tune = model_full_tune.evaluate(test_ds)
print(f"Fully Fine-Tuned Model Test Accuracy: {accuracy_full_tune:.2%}")
This model should achieve the best result yet, likely reaching88–89% accuracy. By giving the model the most freedom to adapt, while being very careful with a tiny learning rate, we squeezed out the best performance.
Model 3 Test Loss: 0.34380343556404114, Model 3 Test Accuracy: 0.885869562625885
Final Thoughts: Which Strategy Is Right for You?
Let’s quickly recap our journey and the results.
Baseline (Frozen)
- Typical Test Accuracy:~85%
- **When to Use It:****Fast and reliable.**Great for getting a strong baseline quickly, especially with smaller datasets or limited computing power.
Partial Fine-Tune
- Typical Test Accuracy:~87%
- **When to Use It:****The balanced approach.**Often gives you the best bang for your buck, improving accuracy without the high risk and cost of full tuning.
Full Fine-Tune
- Typical Test Accuracy:****~89%
- **When to Use It:****For maximum performance.**The best option if you need the highest possible accuracy and have enough data and time to tune it carefully.
This process clearly illustrates the power of transfer learning. We started with a strong baseline and iteratively improved it by giving the model more freedom to adapt to our specific problem.
For your projects, starting with a frozen baseline (Approach 1) is always a great first step. If you need more performance, you can then move on to partial or full fine-tuning, always remembering to use a low learning rate!
Congratulations! You’ve just walked through the entire process of building and fine-tuning a powerful deep learning model for image classification.