Beginner-Friendly: Basics of PyTorch Loss Functions and Training Loops

1. Why Loss Functions and Training Loops Are Needed?¶

In machine learning, the goal of a model is to “learn” patterns in data, and loss functions and training loops are the core tools to achieve this.

Loss Function: Measures the gap between a model’s predictions and true labels, acting as a “score” for the model. A smaller gap means a higher score, indicating a better model.
Training Loop: Adjusts model parameters to continuously reduce loss, similar to a student correcting answers based on mistakes (loss) to improve scores (model accuracy).

2. Loss Functions: The “Gap Measure” of Models¶

2.1 What is a Loss Function?¶

A loss function is a mathematical formula that takes a model’s predictions (e.g., classification probabilities, regression values) and true labels, outputting a non-negative “gap value.” PyTorch provides pre-defined loss functions tailored for different tasks (classification/regression).

2.2 Common Loss Functions (Must-Know for Beginners)¶

MSE Loss (Mean Squared Error)
Use Case: Regression tasks (predicting continuous values like house prices or temperature).
Formula: \(L = \frac{1}{n}\sum_{i=1}^{n}(y_{\text{pred},i} - y_{\text{true},i})^2\)
PyTorch Call: loss_fn = torch.nn.MSELoss()
Example: Measures the average squared difference between predicted and actual house prices.
CrossEntropy Loss
Use Case: Classification tasks (predicting discrete categories like “cat/dog” or “0-9 digits”).
Formula: \(L = -\frac{1}{n}\sum_{i=1}^{n}\sum_{c=1}^{C}y_{\text{true},i,c} \log(y_{\text{pred},i,c})\)
PyTorch Call: loss_fn = torch.nn.CrossEntropyLoss()
Note: Inputs must be raw model outputs (logits, not softmaxed), and target labels are class indices (e.g., 0, 1, 2).

3. Training Loops: How Models “Learn”¶

A training loop is the step-by-step process where the model adjusts parameters to minimize loss. It follows 4 core steps: Forward Pass → Compute Loss → Backward Pass → Parameter Update.

3.1 Core Steps of the Training Loop¶

Forward Pass: Feed input data into the model to generate predictions (y_pred).
- Code: y_pred = model(x) (where x is input data and model is the defined network).
Compute Loss: Use the loss function to compare y_pred with true labels (y_true).
- Code: loss = loss_fn(y_pred, y_true)
Backward Pass: Calculate gradients of the loss with respect to model parameters (i.e., “where to adjust”).
- Code: loss.backward() (PyTorch automatically computes gradients).
Parameter Update: Adjust model parameters using an optimizer (e.g., SGD, Adam) based on gradients.
- Code: optimizer.step() (e.g., optimizer = torch.optim.Adam(model.parameters(), lr=0.01)).

4. Complete Training Example (Linear Regression)¶

Let’s demonstrate a linear regression task (predicting house prices) with PyTorch, using simplified data and model.

4.1 Prepare Data¶

Assume 100 samples with 1 feature, where the true relationship is y = 2x + 3 plus random noise.

import torch
import torch.nn as nn
import torch.optim as optim
import numpy as np

# Generate synthetic data
np.random.seed(42)
x = np.random.randn(100, 1)  # 100 samples, 1 feature
y = 2 * x + 3 + 0.1 * np.random.randn(100, 1)  # True relation + noise

# Convert to PyTorch tensors
x = torch.from_numpy(x).float()
y = torch.from_numpy(y).float()

4.2 Define the Model (Simple Linear Layer)¶

A linear layer models house price prediction: 1 input feature → 1 output value.

class LinearRegression(nn.Module):
    def __init__(self):
        super(LinearRegression, self).__init__()
        self.linear = nn.Linear(1, 1)  # y = w*x + b (input dim=1, output dim=1)

    def forward(self, x):
        return self.linear(x)  # Forward pass

model = LinearRegression()

4.3 Initialize Loss Function and Optimizer¶

Loss: MSE for regression.
Optimizer: Adam with learning rate lr=0.01.

loss_fn = nn.MSELoss()  # Mean Squared Error
optimizer = optim.Adam(model.parameters(), lr=0.01)  # Adam optimizer

4.4 Training Loop (Core!)¶

Iterate over epochs (full data passes), performing forward/backward passes and parameter updates:

epochs = 1000  # Number of training rounds
for epoch in range(epochs):
    # 1. Forward pass: Compute predictions
    y_pred = model(x)

    # 2. Compute loss: Compare predictions with true labels
    loss = loss_fn(y_pred, y)

    # 3. Zero gradients (prevent accumulation)
    optimizer.zero_grad()

    # 4. Backward pass: Compute gradients
    loss.backward()

    # 5. Update parameters
    optimizer.step()

    # Print progress (every 100 epochs)
    if (epoch + 1) % 100 == 0:
        print(f'Epoch [{epoch+1}/{epochs}], Loss: {loss.item():.4f}')

4.5 Verify Training Results¶

After training, model parameters should approximate the true values (slope=2, intercept=3):

print("Model parameter w:", model.linear.weight.item())  # ~2
print("Model parameter b:", model.linear.bias.item())   # ~3

5. Key Details and Notes¶

Gradient Clearing: optimizer.zero_grad() must be called before loss.backward() to avoid gradient accumulation.
Model Mode: Use model.train() for training (enables Dropout) and model.eval() for inference (disables Dropout for stability).
Optimizer Selection:
SGD: Basic, suitable for simple models.
Adam: More efficient; use default parameters for most cases (recommended for beginners).
Batch Size: For large datasets, use DataLoader with batch_size=32 to speed up training.

6. Summary¶

Loss functions are the model’s “correction ruler,” and training loops are the process of “self-improvement.” By minimizing loss through parameter updates, the model learns patterns from data.

Regression (continuous values): Use MSE loss.
Classification (discrete categories): Use CrossEntropy loss.
Training Loop Core: Forward → Loss → Backward → Update.

With these basics, you can experiment with more complex models (e.g., CNNs, RNNs) or tasks (image classification, text generation). PyTorch’s strength lies in the reusability of these foundational components. Keep learning!