PyTorch Beginner's Guide: Understanding Model Construction with Simple Examples

1. Introduction and Installation of PyTorch¶

PyTorch is a Python-based scientific computing library, especially excelling at deep learning tasks. Its key advantage is the dynamic computation graph, which makes model debugging more intuitive and code more readable, making it very suitable for beginners.

The installation command is straightforward:

pip install torch

After installation, you can verify success by importing PyTorch:

import torch

2. PyTorch Basics: Tensors¶

Tensors are the core data structure in PyTorch, similar to NumPy arrays but with GPU acceleration support.

1. Creating Tensors¶

import torch

# 1D Tensor
x = torch.tensor([1, 2, 3])
print(x)  # Output: tensor([1, 2, 3])

# 2D Tensor (matrix)
y = torch.tensor([[1, 2], [3, 4]])
print(y)
# Output: tensor([[1, 2],
#              [3, 4]])

# Random Tensor (mean=0, variance=1)
z = torch.randn(2, 3)  # Random tensor with shape (2,3)
print(z)

2. Tensor Operations¶

Tensors support basic operations like addition, subtraction, multiplication, and division, with syntax similar to NumPy:

a = torch.tensor([1, 2, 3])
b = torch.tensor([4, 5, 6])
print(a + b)  # Output: tensor([5, 7, 9])
print(a * b)  # Output: tensor([4, 10, 18])

# Matrix Multiplication
m1 = torch.tensor([[1, 2], [3, 4]])
m2 = torch.tensor([[5, 6], [7, 8]])
print(torch.matmul(m1, m2))  # Or m1 @ m2
# Output: tensor([[19, 22],
#              [43, 50]])

3. Tensor-NumPy Conversion¶

import numpy as np

# Tensor to NumPy
n = z.numpy()
print(type(n))  # Output: <class 'numpy.ndarray'>

# NumPy to Tensor
t = torch.from_numpy(n)
print(type(t))  # Output: <class 'torch.Tensor'>

3. Automatic Differentiation (Autograd)¶

Model training requires gradient computation, and PyTorch’s autograd module automatically handles this.

Example: Calculating Derivatives¶

Given the function: y = x² + 3x, find the derivative dy/dx when x=2.

x = torch.tensor(2.0, requires_grad=True)  # Mark x for gradient computation
y = x**2 + 3 * x

# Backpropagation to compute gradients
y.backward()  # Calculates gradients of y with respect to all variables requiring gradients

print(x.grad)  # Output: tensor(7.)  (Since dy/dx = 2x + 3, which is 7 when x=2)

Key Point: requires_grad=True tells PyTorch to track operations on x for subsequent gradient calculations.

4. Model Construction: Linear Regression (Basic Model)¶

Model construction is a core PyTorch task. We start with the simplest linear model: y = wx + b (where w is weight and b is bias).

1. Defining the Linear Model¶

Use the nn.Module base class to define the model, with the forward method implementing forward propagation:

import torch.nn as nn

class LinearRegression(nn.Module):
    def __init__(self, input_size, output_size):
        super(LinearRegression, self).__init__()
        self.linear = nn.Linear(input_size, output_size)  # Linear layer: y = wx + b

    def forward(self, x):
        return self.linear(x)  # Forward propagation

# Initialize the model (input dimension 1, output dimension 1)
model = LinearRegression(input_size=1, output_size=1)

Explanation:
- nn.Linear(in_features, out_features): Creates a linear layer with parameters w and b.
- forward(x): Defines the model’s forward computation logic.

5. Data Preparation: Simulated Dataset¶

We generate a simple dataset to train the model. Assume the true relationship is y = 2x + 3 + noise.

1. Data Generation¶

import torch
import numpy as np

# Generate 100 samples with x in [0, 10] and added noise
np.random.seed(42)
x = np.random.rand(100, 1) * 10  # x: (100,1)
noise = np.random.randn(100, 1) * 1.5  # Gaussian noise
y = 2 * x + 3 + noise  # True relationship with noise

# Convert to PyTorch tensors
x_tensor = torch.FloatTensor(x)
y_tensor = torch.FloatTensor(y)

2. Data Loading¶

Use TensorDataset and DataLoader for batch processing:

from torch.utils.data import TensorDataset, DataLoader

# Combine features and labels
dataset = TensorDataset(x_tensor, y_tensor)
dataloader = DataLoader(dataset, batch_size=10, shuffle=True)  # Batch size 10, shuffle data

6. Model Training: Loss Function and Optimizer¶

Training involves three core steps: computing loss, backpropagation, and parameter update.

1. Defining Loss Function and Optimizer¶

criterion = nn.MSELoss()  # Mean Squared Error loss
optimizer = torch.optim.SGD(model.parameters(), lr=0.01)  # SGD optimizer with learning rate 0.01

2. Training Loop¶

epochs = 1000  # Number of training iterations

for epoch in range(epochs):
    for batch_x, batch_y in dataloader:
        # Forward pass: model prediction
        pred = model(batch_x)

        # Compute loss
        loss = criterion(pred, batch_y)

        # Backpropagation: zero gradients → compute gradients → update parameters
        optimizer.zero_grad()  # Clear gradients (to prevent accumulation)
        loss.backward()        # Compute gradients via backpropagation
        optimizer.step()       # Update parameters (w and b)

    # Print loss every 100 epochs
    if (epoch + 1) % 100 == 0:
        print(f'Epoch [{epoch+1}/{epochs}], Loss: {loss.item():.4f}')

Key Steps:
- optimizer.zero_grad(): Must clear gradients in each iteration to avoid accumulating gradients from previous iterations.
- loss.backward(): Computes gradients of the loss with respect to all trainable parameters.
- optimizer.step(): Updates model parameters (w and b) based on the computed gradients.

7. Model Validation and Visualization¶

After training, we predict with the model and visualize the results:

import matplotlib.pyplot as plt

# Get predictions
predicted = model(x_tensor).detach().numpy()  # Remove gradient info and convert to NumPy

# Plot comparison
plt.scatter(x, y, label='Real Data')
plt.plot(x, predicted, 'r-', label='Predictions')
plt.xlabel('x')
plt.ylabel('y')
plt.legend()
plt.show()

# Print learned parameters
print("Learned weight w:", model.linear.weight.item())  # Close to 2
print("Learned bias b:", model.linear.bias.item())      # Close to 3

8. Summary and Extensions¶

Through this example, we’ve mastered PyTorch’s core workflow:
1. Tensor operations and autograd are fundamental.
2. nn.Module defines models, with forward implementing forward propagation.
3. DataLoader handles batch processing for efficient training.
4. Loss functions (MSE) and optimizers (SGD) drive parameter updates.

Extension Directions: Try more complex models (e.g., multi-layer networks), switch optimizers (Adam), or loss functions (cross-entropy). PyTorch’s flexibility allows building any deep learning model!

This beginner tutorial should help you quickly get started with PyTorch. Next, we can explore more complex models (e.g., CNNs, RNNs) or real datasets (e.g., MNIST).