Learning PyTorch from Scratch: A Beginner's Guide from Tensors to Neural Networks

1. Why Choose PyTorch?¶

In the field of deep learning, PyTorch is a very popular framework in recent years, known for its flexibility, intuitiveness, and ease of use, making it particularly suitable for beginners. Compared to frameworks like TensorFlow, PyTorch’s syntax is closer to Python native, with strong code readability and convenient debugging, allowing quick conversion of ideas into code.

2. Tensor: The Foundation of PyTorch¶

2.1 What is a Tensor?¶

A tensor can be understood as a multidimensional array, the basic unit of data in PyTorch. It is similar to NumPy arrays but supports GPU acceleration and has built-in automatic differentiation capabilities.

2.2 Creating Tensors¶

import torch

# 1. Create directly from data
x = torch.tensor([1, 2, 3])
print(x)  # tensor([1, 2, 3])

# 2. Create all-zero/all-one tensors
zeros = torch.zeros(2, 3)  # 2 rows, 3 columns, all zeros
ones = torch.ones(2, 3)    # 2 rows, 3 columns, all ones
print(zeros, ones)

# 3. Create random tensors (normal distribution)
random = torch.randn(2, 3)  # Standard normal distribution (mean 0, variance 1)
print(random)

# 4. Convert from/to NumPy arrays
import numpy as np
arr = np.array([[1, 2], [3, 4]])
tensor_from_np = torch.from_numpy(arr)
np_from_tensor = tensor_from_np.numpy()

2.3 Basic Tensor Operations¶

# 1. Shape operations
x = torch.tensor([[1, 2], [3, 4]])
print(x.shape)    # torch.Size([2, 2])
x_reshaped = x.view(4, 1)  # Reshape to 4 rows, 1 column (-1 infers dimension)
print(x_reshaped)

# 2. Arithmetic operations
a = torch.tensor([1, 2, 3])
b = torch.tensor([4, 5, 6])
print(a + b)      # Element-wise addition: tensor([5, 7, 9])
print(a * b)      # Element-wise multiplication: tensor([4, 10, 18])
print(torch.matmul(a, b))  # Matrix multiplication (dot product): 1*4 + 2*5 + 3*6 = 32

# 3. Device conversion (CPU/GPU)
if torch.cuda.is_available():
    x = x.to('cuda')  # Move to GPU
    y = y.to('cuda')

3. Automatic Differentiation: The Core of PyTorch¶

PyTorch implements automatic differentiation through the autograd package, which automatically calculates gradients. This is crucial for training neural networks!

3.1 Key Concept: `requires_grad`¶

Tensors with requires_grad=True will track their computation history for subsequent gradient calculations.

# Define tensors that require gradient calculation
x = torch.tensor(2.0, requires_grad=True)  # Initial value 2.0, requires gradient
y = x ** 2 + 3 * x - 5  # Define function y = x² + 3x -5

# Compute gradients (call backward())
y.backward()  # Backpropagation to compute gradients for all tensors with requires_grad=True

# View gradient: x.grad stores dy/dx
print(x.grad)  # Output: 7.0 (since dy/dx = 2x + 3, substituting x=2 gives 7)

Principle: y.backward() triggers backpropagation, calculating gradients for all tensors with requires_grad=True and storing them in the tensor’s .grad attribute.

4. Building Neural Networks: From Layers to Models¶

PyTorch’s torch.nn module provides basic components for building neural networks.

4.1 Basic Components¶

Linear Layer (nn.Linear): Fully connected layer, implementing \( y = Wx + b \)
Activation Functions: ReLU, Sigmoid, Tanh, etc., introducing non-linearity
Loss Functions: MSELoss (regression), CrossEntropyLoss (classification)
Optimizers: SGD, Adam, etc., updating model parameters

4.2 Defining a Simple Network¶

import torch.nn as nn

# Define a simple 2-layer neural network
class SimpleNet(nn.Module):
    def __init__(self):
        super(SimpleNet, self).__init__()
        # First layer: input layer (20) → hidden layer (10), using ReLU activation
        self.fc1 = nn.Linear(20, 10)  
        self.relu = nn.ReLU()         
        # Second layer: hidden layer (10) → output layer (1)
        self.fc2 = nn.Linear(10, 1)   

    def forward(self, x):
        # Forward pass: x → fc1 → ReLU → fc2
        x = self.fc1(x)
        x = self.relu(x)
        x = self.fc2(x)
        return x

# Instantiate the model
model = SimpleNet()
print(model)

# Test forward pass
x_test = torch.randn(1, 20)  # Input: 1 sample, 20 features
output = model(x_test)
print(output.shape)  # Output: torch.Size([1, 1])

4.3 Composing Networks with `nn.Sequential`¶

For simple sequential networks, use nn.Sequential for quick composition:

model = nn.Sequential(
    nn.Linear(20, 10),
    nn.ReLU(),
    nn.Linear(10, 5),
    nn.Sigmoid(),
    nn.Linear(5, 1)
)

5. Practical: Training a Linear Regression Model¶

Now, we’ll train a simple linear regression model in PyTorch to fit \( y = 2x + 3 + \text{noise} \).

5.1 Preparing Data¶

import torch
import numpy as np
import matplotlib.pyplot as plt

# Generate simulated data: y = 2x + 3 + 0.5*noise
np.random.seed(42)
x = np.random.rand(100, 1)  # 100 samples, 1 feature
y = 2 * x + 3 + 0.5 * np.random.randn(100, 1)  # Add noise

# Convert to PyTorch tensors
x_tensor = torch.FloatTensor(x)
y_tensor = torch.FloatTensor(y)

5.2 Defining Model, Loss Function, and Optimizer¶

# 1. Define model: Linear regression with one fully connected layer
model = nn.Linear(in_features=1, out_features=1)  # Input: 1D, Output: 1D

# 2. Define loss function: Mean Squared Error (MSE)
criterion = nn.MSELoss()

# 3. Define optimizer: Stochastic Gradient Descent (SGD), learning rate 0.01
optimizer = torch.optim.SGD(model.parameters(), lr=0.01)

5.3 Training the Model¶

epochs = 1000  # Number of training iterations
for epoch in range(epochs):
    # Forward pass: compute predictions
    y_pred = model(x_tensor)

    # Calculate loss
    loss = criterion(y_pred, y_tensor)

    # Backward pass + parameter update
    optimizer.zero_grad()  # Clear gradients (avoid accumulation)
    loss.backward()        # Backpropagation to compute gradients
    optimizer.step()       # Update parameters (W and b)

    # Print training progress
    if (epoch + 1) % 100 == 0:
        print(f'Epoch [{epoch+1}/{epochs}], Loss: {loss.item():.4f}')

# After training, model parameters should approximate true values W=2, b=3
print("Model parameters:")
print("W =", model.weight.item())  # Close to 2
print("b =", model.bias.item())    # Close to 3

5.4 Visualizing Results¶

# Plot the fitted line
plt.scatter(x, y, label='True data')
plt.plot(x, model(x_tensor).detach().numpy(), 'r', label='Fitted line')
plt.legend()
plt.show()

6. Summary and Advanced Learning¶

Key Takeaways¶

Tensor: Fundamental data structure in PyTorch, supporting GPU and automatic differentiation
Automatic Differentiation: requires_grad=True + backward() for gradient calculation
Neural Networks: nn.Linear (fully connected layers), nn.Sequential (composing networks), nn.Module (custom models)
Training Pipeline: Forward pass → Loss calculation → Backward pass → Parameter update

Next Learning Directions¶

Dataset Handling: Use torch.utils.data to load custom datasets
Advanced Optimizers: Try Adam, RMSprop, etc.
Convolutional Neural Networks: Learn nn.Conv2d for image data
RNN/LSTM: Process sequential data (text, time series)

Recommended Resources¶

Official Documentation: https://pytorch.org/docs/stable/
Tutorials: PyTorch Official Tutorials (https://pytorch.org/tutorials/)
Practice Platform: Google Colab (no GPU setup needed, run PyTorch code online)

PyTorch’s flexibility and ease of use make it an excellent choice for beginners in deep learning. Start with tensors, gradually master automatic differentiation and model building, and consolidate knowledge through practice to quickly get started with neural network training!