Cat Classification with Logistic Regression¶
Introduction¶
I’ll be implementing a logistic regression model to classify images as cats or non-cats using a cat dataset. The goal is to build a model that can accurately predict whether an image contains a cat.
Importing Libraries¶
First, we need to import the necessary Python libraries:
# coding=utf-8
import matplotlib.pyplot as plt
import numpy as np
import scipy
from scipy import ndimage
from lr_utils import load_dataset # Custom utility to load the cat dataset
Loading and Preprocessing Data¶
The dataset consists of training and test images of cats and non-cats. We need to preprocess the data to make it suitable for our model:
# Load the dataset
train_set_x_orig, train_set_y, test_set_x_orig, test_set_y, classes = load_dataset()
# Get dimensions of the dataset
m_train = train_set_x_orig.shape[0] # Number of training examples
m_test = test_set_x_orig.shape[0] # Number of test examples
num_px = train_set_x_orig.shape[1] # Height/width of each image (64x64)
# Reshape images from (num_px, num_px, 3) to (num_px*num_px*3, 1)
train_set_x_flatten = train_set_x_orig.reshape(train_set_x_orig.shape[0], -1).T
test_set_x_flatten = test_set_x_orig.reshape(test_set_x_orig.shape[0], -1).T
# Normalize the data by dividing by 255 (pixel values range from 0-255)
train_set_x = train_set_x_flatten / 255.
test_set_x = test_set_x_flatten / 255.
General Architecture of the Learning Algorithm¶
The logistic regression model follows these steps:
1. Define the model structure (number of input features)
2. Initialize model parameters
3. Loop:
- Compute current loss (forward propagation)
- Compute current gradients (backward propagation)
- Update parameters (gradient descent)
1. Define the Sigmoid Function¶
The sigmoid function maps any real number to a value between 0 and 1:
def sigmoid(x):
"""
Compute the sigmoid of x
:param x: A scalar or numpy array of any size
:return: s -- sigmoid(x)
"""
s = 1 / (1 + np.exp(-x))
return s
2. Initialize Parameters¶
We initialize weights to zeros and bias to zero:
def initialize_with_zeros(dim):
"""
Initialize w as a zero vector of shape (dim, 1) and b as 0
:param dim: The size of the w vector
:return: w -- initialized vector of shape (dim, 1)
b -- initialized scalar (0)
"""
w = np.zeros((dim, 1))
b = 0
return w, b
3. Propagate (Forward and Backward Propagation)¶
This function computes the cost function and gradients:
def propagate(w, b, X, Y):
"""
Implement forward and backward propagation for the logistic regression cost function
:param w: Weights, shape (num_px*num_px*3, 1)
:param b: Bias, scalar
:param X: Data, shape (num_px*num_px*3, number of examples)
:param Y: True labels, shape (1, number of examples)
:return: grads -- dictionary containing dw and db
cost -- negative log-likelihood cost
"""
m = X.shape[1] # Number of examples
# Forward propagation
A = sigmoid(np.dot(w.T, X) + b) # Activation
cost = -(np.dot(Y, np.log(A).T) + np.dot(1 - Y, np.log(1 - A).T)) / m # Cost
# Backward propagation
dw = np.dot(X, (A - Y).T) / m
db = np.sum(A - Y) / m
grads = {"dw": dw, "db": db}
return grads, cost
4. Gradient Descent Optimization¶
We optimize the parameters using gradient descent:
def optimize(w, b, X, Y, num_iterations, learning_rate, print_cost=False):
"""
Optimize w and b by running a gradient descent algorithm
:param w: Initial weights
:param b: Initial bias
:param X: Data
:param Y: True labels
:param num_iterations: Number of iterations of optimization loop
:param learning_rate: Learning rate of the gradient descent update rule
:param print_cost: If True, print the cost every 100 iterations
:return: params -- dictionary containing weights w and bias b
grads -- dictionary containing gradients dw and db
costs -- list of costs recorded during optimization
"""
costs = []
for i in range(num_iterations):
# Run forward/backward propagation
grads, cost = propagate(w, b, X, Y)
# Retrieve gradients
dw = grads["dw"]
db = grads["db"]
# Update parameters
w = w - learning_rate * dw
b = b - learning_rate * db
# Record cost every 100 iterations
if i % 100 == 0:
costs.append(cost)
# Print cost every 100 iterations
if print_cost and i % 100 == 0:
print(f"Cost after iteration {i}: {cost:.6f}")
params = {"w": w, "b": b}
grads = {"dw": dw, "db": db}
return params, grads, costs
5. Prediction Function¶
Once we have optimized parameters, we use them to make predictions:
def predict(w, b, X):
"""
Predict whether the label is 0 or 1 using learned logistic regression parameters (w, b)
:param w: Weights
:param b: Bias
:param X: Data to predict
:return: Y_prediction -- a numpy array (vector) containing predictions (0/1)
"""
m = X.shape[1]
Y_prediction = np.zeros((1, m))
w = w.reshape(X.shape[0], 1)
A = sigmoid(np.dot(w.T, X) + b)
for i in range(A.shape[1]):
Y_prediction[0, i] = 1 if A[0, i] > 0.5 else 0
return Y_prediction
6. Combine All Functions into a Model¶
This function integrates all the previous components:
def model(X_train, Y_train, X_test, Y_test, num_iterations=2000, learning_rate=0.5, print_cost=False):
"""
Build the logistic regression model by calling the helper functions
:param X_train: Training data
:param Y_train: Training labels
:param X_test: Test data
:param Y_test: Test labels
:param num_iterations: Number of iterations for gradient descent
:param learning_rate: Learning rate
:param print_cost: If True, print cost every 100 iterations
:return: d -- dictionary containing information about the model
"""
# Initialize parameters
w, b = initialize_with_zeros(X_train.shape[0])
# Optimize parameters
parameters, grads, costs = optimize(w, b, X_train, Y_train, num_iterations, learning_rate, print_cost)
# Retrieve optimized parameters
w = parameters["w"]
b = parameters["b"]
# Predict on test and training sets
Y_prediction_test = predict(w, b, X_test)
Y_prediction_train = predict(w, b, X_train)
# Calculate accuracy
train_accuracy = 100 - np.mean(np.abs(Y_prediction_train - Y_train)) * 100
test_accuracy = 100 - np.mean(np.abs(Y_prediction_test - Y_test)) * 100
print(f"train accuracy: {train_accuracy:.2f} %")
print(f"test accuracy: {test_accuracy:.2f} %")
d = {
"costs": costs,
"Y_prediction_test": Y_prediction_test,
"Y_prediction_train": Y_prediction_train,
"w": w,
"b": b,
"learning_rate": learning_rate,
"num_iterations": num_iterations
}
return d
Testing Different Learning Rates¶
The choice of learning rate significantly affects convergence:
def test_learning_rates():
learning_rates = [0.01, 0.001, 0.0001]
models = {}
for lr in learning_rates:
print(f"Learning rate: {lr}")
models[str(lr)] = model(
train_set_x, train_set_y, test_set_x, test_set_y,
num_iterations=1500, learning_rate=lr, print_cost=False
)
print("\n" + "="*50 + "\n")
# Plot cost vs iterations for different learning rates
for lr in learning_rates:
plt.plot(
np.squeeze(models[str(lr)]["costs"]),
label=f"LR={lr}"
)
plt.ylabel("Cost")
plt.xlabel("Iterations (hundreds)")
plt.legend(loc="upper right")
plt.show()
Predicting Custom Images¶
To test with your own images:
def predict_custom_image(image_path, model_info):
"""
Predict if a custom image contains a cat
:param image_path: Path to the image
:param model_info: Dictionary containing model parameters
"""
image = np.array(ndimage.imread(image_path, flatten=False))
image = scipy.misc.imresize(image, size=(num_px, num_px)).reshape((1, num_px*num_px*3)).T
prediction = predict(model_info["w"], model_info["b"], image)
plt.imshow(image)
class_name = classes[int(np.squeeze(prediction))].decode("utf-8")
print(f"Prediction: {class_name}")
Main Execution¶
if __name__ == "__main__":
# Train the model
model_info = model(
train_set_x, train_set_y, test_set_x, test_set_y,
num_iterations=1000, learning_rate=0.005, print_cost=True
)
# Test with custom image (uncomment to use)
# predict_custom_image("images/cat.jpg", model_info)
# Test different learning rates (uncomment to use)
# test_learning_rates()
Expected Output¶
When running the main function, you should see:
- Cost values decreasing over iterations
- Training and test accuracy metrics
- Visualization of cost vs iterations for different learning rates (if tested)
References¶
- Deep Learning Specialization by Andrew Ng
- Custom
lr_utilslibrary for dataset loading
Notes¶
This implementation is a basic logistic regression approach for binary classification. For better performance, you might consider using neural networks or optimizing the learning rate further. The code provided assumes the lr_utils library is available to load the cat dataset.
Explanation of Key Concepts¶
- Sigmoid Function: Maps any value to a probability between 0 and 1
- Gradient Descent: Minimizes the cost function by updating parameters
- Cost Function: Measures the error between predicted and actual values
- Propagation: Combines forward (prediction) and backward (gradient calculation) steps
- Hyperparameters: Key parameters like learning rate and number of iterations
This implementation provides a solid foundation for understanding logistic regression and binary classification.