Mastering PyTorch Basics: A Detailed Explanation of Tensor Operations and Automatic Differentiation

1. What is a Tensor?¶

In PyTorch, a Tensor is the fundamental unit for storing and manipulating data. It is similar to NumPy arrays but supports GPU acceleration and serves as the core data structure for neural network computations. Think of a tensor as a “container” holding numbers (integers, floats, etc.) on which you can perform various mathematical operations.

2. Creating Tensors¶

To start working with tensors, you need to learn how to create them. PyTorch provides multiple ways to create tensors, with the most common methods listed below:

1. From Python Lists/NumPy Arrays¶

Use torch.tensor() or torch.as_tensor() to directly create tensors from data:

import torch
import numpy as np

# From a Python list
data_list = [1, 2, 3, 4]
tensor_list = torch.tensor(data_list)  # Output: tensor([1, 2, 3, 4])

# From a NumPy array
np_array = np.array([1, 2, 3])
tensor_np = torch.as_tensor(np_array)  # Output: tensor([1, 2, 3])

2. Using Constructor Functions¶

torch.zeros(*size): Creates a tensor filled with zeros (e.g., (2, 3) for a 2x3 matrix).
torch.ones(*size): Creates a tensor filled with ones.
torch.rand(*size): Creates a tensor with random values in [0, 1):

zeros_tensor = torch.zeros(2, 3)  # 2x3 matrix of zeros
ones_tensor = torch.ones(1, 4)    # 1x4 matrix of ones
rand_tensor = torch.rand(3, 3)    # 3x3 matrix of random numbers

3. Basic Tensor Properties¶

After creating a tensor, understanding its properties is essential for subsequent operations:
- Shape: Use .shape or .size() to view the tensor’s dimensions (returns a tuple).
- Data Type (dtype): Use .dtype to check the data type (e.g., torch.float32 by default, torch.int64).
- Device: Use .device to check which device the tensor is on (default: CPU; can be moved to GPU with .to('cuda')).

tensor = torch.rand(2, 3)
print(tensor.shape)   # torch.Size([2, 3])
print(tensor.dtype)   # torch.float32
print(tensor.device)  # cpu

# Convert data type and move to GPU
tensor = tensor.to(torch.float64).to('cuda')  # First convert to double, then transfer to GPU

4. Tensor Operations¶

Tensor operations are central to PyTorch; mastering them enables flexible data manipulation.

1. Arithmetic Operations¶

Tensors support common arithmetic operations:
- Addition: + or torch.add()
- Subtraction: - or torch.sub()
- Multiplication: * or torch.mul() (matrix multiplication uses @ or torch.matmul())
- Division: / or torch.div()

a = torch.tensor([1, 2, 3])
b = torch.tensor([4, 5, 6])

# Addition
print(a + b)        # tensor([5, 7, 9])
print(torch.add(a, b))  # Same as above

# Matrix multiplication (2x2 * 2x2)
a = torch.tensor([[1, 2], [3, 4]])
b = torch.tensor([[5, 6], [7, 8]])
print(a @ b)        # tensor([[19, 22], [43, 50]])

2. Indexing and Slicing¶

Tensors support indexing and slicing similar to Python lists, with multidimensional support:

tensor = torch.tensor([[1, 2, 3], [4, 5, 6]])

# First row (dimension 0)
print(tensor[0])                # tensor([1, 2, 3])
# Second column (dimension 1)
print(tensor[:, 1])             # tensor([2, 5])
# Rows 0-2 and columns 0-1
print(tensor[0:2, 0:2])         # tensor([[1, 2], [4, 5]])

3. Reshaping Operations¶

These operations adjust tensor dimensions while preserving the total number of elements:
- reshape(): Changes the shape (raises error if dimensions are invalid).
- squeeze(): Removes axes of size 1 (e.g., (1, 3, 1) → (3,)).
- unsqueeze(): Adds a new axis of size 1 (e.g., (3,) → (1, 3)).

tensor = torch.rand(1, 2, 1, 4)  # Shape: (1, 2, 1, 4)
print(tensor.shape)  # torch.Size([1, 2, 1, 4])

# Remove size-1 dimensions
squeezed = tensor.squeeze()  # Shape: (2, 4)
print(squeezed.shape)  # torch.Size([2, 4])

# Add a size-1 dimension at position 1
unsqueezed = tensor.unsqueeze(1)  # Shape: (1, 1, 2, 1, 4)
print(unsqueezed.shape)  # torch.Size([1, 1, 2, 1, 4])

4. Concatenation and Splitting¶

torch.cat(): Concatenates tensors along a specified dimension (requires matching other dimensions).
torch.stack(): Adds a new dimension to concatenate (requires all dimensions to match).
torch.split() / torch.chunk(): Splits tensors into parts.

a = torch.tensor([[1, 2], [3, 4]])
b = torch.tensor([[5, 6], [7, 8]])

# Concatenate along dimension 0 (vertical stacking)
cat_dim0 = torch.cat([a, b], dim=0)  # Shape: (4, 2)
print(cat_dim0)
# Concatenate along dimension 1 (horizontal stacking)
cat_dim1 = torch.cat([a, b], dim=1)  # Shape: (2, 4)
print(cat_dim1)

# Split tensor into 2 parts of size 2 along dimension 0
split_tensor = torch.cat([a, b], dim=0)
split_parts = torch.split(split_tensor, split_size_or_sections=2, dim=0)
print(split_parts)  # (tensor([[1,2],[3,4]]), tensor([[5,6],[7,8]]))

5. Automatic Differentiation (Autograd)¶

Autograd is PyTorch’s core for backpropagation, enabling automatic gradient computation.

1. Key Concepts¶

requires_grad: A tensor property; set to True to track gradients.
Computational Graph: Records the flow of operations to enable gradient computation.
backward(): Triggers gradient calculation for leaf nodes in the graph.
grad: The gradient of a tensor (only valid for requires_grad=True tensors).

2. Autograd Example¶

Using the function y = x² to demonstrate gradient computation:

# Create a tensor to track gradients
x = torch.tensor(3.0, requires_grad=True)  # x=3.0, track gradients

# Build the computational graph: y = x²
y = x ** 2  

# Backpropagate to compute gradients
y.backward()  

# Check gradient: dy/dx = 2x = 6
print(x.grad)  # tensor(6.)

3. Important Notes¶

Non-leaf gradients: Gradients of intermediate variables are released after backward() (use retain_graph=True to retain).
Gradient accumulation: Multiple backward() calls accumulate gradients; reset with x.grad.zero_().
detach(): Detaches a tensor from the computation graph (returns a gradient-free tensor).

x = torch.tensor(2.0, requires_grad=True)
y = x ** 2
z = y ** 2  # z = (x²)² = x⁴

# First backward pass (y = x²)
y.backward()
print(x.grad)  # tensor(4.) (dy/dx = 2x = 4)

# Second backward pass (z = x⁴)
z.backward(retain_graph=True)
print(x.grad)  # tensor(16.) (4 + 12 = 16; dy/dx = 4, dz/dx = 4x³ = 32? Wait, correction: z = y², y = x² → dz/dx = 2y*2x = 4xy = 4*2*2=16. Yes.)

6. Summary¶

Tensors are the fundamental data structure in PyTorch, supporting creation, arithmetic, indexing, reshaping, and more.
Autograd enables automatic gradient computation via requires_grad and backward(), critical for training neural networks.
Key considerations: shape matching, gradient accumulation, and device management (CPU/GPU).

With this foundation, you’re ready to explore more advanced PyTorch concepts like neural networks and optimization!