Learning PyTorch from Scratch: A Basic Explanation of Activation Functions and Convolutional Layers
### Overview of Activation Functions and Convolutional Layers **Activation Functions**: Neural networks require non-linear transformations to fit complex relationships, and activation functions introduce this non-linearity. Common functions include: - **ReLU**: `y = max(0, x)`, simple computation, solves the vanishing gradient problem, and is the most widely used (PyTorch: `nn.ReLU()`). - **Sigmoid**: `y = 1/(1+exp(-x))`, outputs in (0,1) for binary classification but suffers from vanishing gradients (PyTorch: `nn.Sigmoid()`). - **Tanh**: `y=(exp(x)-exp(-x))/(exp(x)+exp(-x))`, outputs in (-1,1) with a mean of 0, easier to train but still prone to vanishing gradients (PyTorch: `nn.Tanh()`). **Convolutional Layers**: A core component of CNNs, convolutional layers extract local features via convolution kernels. Key concepts include: input (e.g., RGB images with shape `(batch, in_channels, H, W)`), convolution kernel (small matrix), stride (number of pixels the kernel slides), and padding (edge zero-padding to control output size). Implemented in PyTorch via `nn.Conv2d`, critical parameters include `in_channels` (input
Read More