PyTorch Beginner Tutorial: Building Your First Neural Network Model¶

1. Introduction¶

PyTorch is an open-source Python machine learning library, widely used in the field of deep learning. Renowned for its dynamic computation graph, intuitive syntax, and efficient performance, it is an excellent choice for beginners to get started with deep learning. This tutorial will guide you through building your first neural network model step by step, from data loading to model training, enabling you to quickly master the core operations of PyTorch.

2. Environment Setup¶

Before starting, ensure PyTorch is installed. If not, use the following commands (select the appropriate version for your system):

# CPU version
pip install torch torchvision

# GPU version (CUDA must be pre-installed first)
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118/torch_stable.html

3. Data Preparation: Loading the MNIST Dataset¶

MNIST is a classic handwritten digit recognition dataset, containing 60,000 training images and 10,000 test images, each being a 28×28 grayscale image. We will use PyTorch’s torchvision library to load and preprocess the data.

3.1 Import Required Libraries¶

import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms
from torch.utils.data import DataLoader

3.2 Define Data Transformations and Data Loaders¶

# Data transformations: convert images to tensors and normalize
transform = transforms.Compose([
    transforms.ToTensor(),  # Convert to tensor (pixel values from 0-255 to 0-1)
    transforms.Normalize((0.1307,), (0.3081,))  # Normalize (subtract mean, divide by std)
])

# Load training and test datasets
train_dataset = datasets.MNIST(
    root='./data', train=True, download=True, transform=transform
)
test_dataset = datasets.MNIST(
    root='./data', train=False, download=True, transform=transform
)

# Create data loaders (batch processing, shuffle data)
batch_size = 64  # Number of samples per training batch
train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=batch_size, shuffle=False)

4. Define the Neural Network Model¶

We will build a simple fully connected neural network (MLP) with:
- Input layer: 784 neurons (flattened 28×28 images)
- Hidden layer: 128 neurons (ReLU activation)
- Output layer: 10 neurons (0-9 digits, Softmax activation)

class SimpleNN(nn.Module):
    def __init__(self):
        super(SimpleNN, self).__init__()
        # Define fully connected layers: input 784 → output 128
        self.fc1 = nn.Linear(28*28, 128)
        # Define fully connected layers: input 128 → output 10
        self.fc2 = nn.Linear(128, 10)

    def forward(self, x):
        # Forward pass: flatten image, pass through FC layers and activation
        x = x.view(-1, 28*28)  # Flatten images (-1 auto-calculates batch size)
        x = torch.relu(self.fc1(x))  # ReLU activation
        x = self.fc2(x)  # Output layer (no activation, handled by CrossEntropyLoss)
        return x

# Instantiate the model
model = SimpleNN()

5. Define Loss Function and Optimizer¶

Loss Function: Cross-entropy loss (nn.CrossEntropyLoss), automatically handles Softmax.
Optimizer: Stochastic Gradient Descent (SGD) with learning rate 0.01.

criterion = nn.CrossEntropyLoss()  # Cross-entropy loss for classification
optimizer = optim.SGD(model.parameters(), lr=0.01)  # SGD optimizer

6. Model Training¶

The training process includes:
1. Iterate over the training dataset (multiple epochs)
2. For each batch: forward pass → compute loss → backward pass → update parameters

epochs = 5  # Number of training iterations

for epoch in range(epochs):
    model.train()  # Set model to training mode (enables Dropout/BatchNorm)
    running_loss = 0.0

    for batch_idx, (data, target) in enumerate(train_loader):
        # Zero out gradients
        optimizer.zero_grad()

        # Forward pass: compute model outputs
        outputs = model(data)

        # Calculate loss
        loss = criterion(outputs, target)

        # Backward pass: compute gradients
        loss.backward()

        # Update model parameters
        optimizer.step()

        # Accumulate loss
        running_loss += loss.item()

        # Print progress every 100 batches
        if batch_idx % 100 == 99:
            print(f'Epoch {epoch+1}, Batch {batch_idx+1}, Loss: {running_loss/100:.4f}')
            running_loss = 0.0

7. Model Testing¶

Evaluate model performance on the test set by calculating accuracy:

model.eval()  # Set model to evaluation mode (disables Dropout)
correct = 0
total = 0

with torch.no_grad():  # Disable gradient computation for memory efficiency
    for data, target in test_loader:
        outputs = model(data)
        _, predicted = torch.max(outputs.data, 1)  # Get predicted class
        total += target.size(0)
        correct += (predicted == target).sum().item()

print(f'Test Accuracy: {100 * correct / total:.2f}%')

8. Summary and Extensions¶

You have now built your first PyTorch neural network! Key steps included:
- Data loading and preprocessing
- Model definition (inheriting nn.Module, defining layers and forward pass)
- Loss function and optimizer selection
- Training loop (forward → loss → backward → update)
- Testing and validation (accuracy calculation)

Possible Extensions:¶

Adjust hidden layer size or add more layers
Replace optimizer (e.g., Adam, RMSprop)
Use more complex datasets (e.g., Fashion-MNIST)
Add Dropout for regularization
Visualize training loss and accuracy curves

PyTorch’s strength lies in its flexibility and ease of use. With practice, you’ll master building more complex models!