Description

This is an example of training the CIFAR-10 dataset using PaddlePaddle.

Question 1: Network Structure

Question: Calculate the network structure of each layer, input/output dimensions, and the number of parameters. Without BN? Deeper? Dimension changes per layer? More structures?

def convolutional_neural_network(img):
    print('Input layer shape:', img.shape)
    conv_pool_1 = fluid.nets.simple_img_conv_pool(
        input=img,
        filter_size=5,
        num_filters=20,
        pool_size=2,
        pool_stride=2,
        act="relu")
    print('First convolutional pooling layer output shape:', conv_pool_1.shape)
    conv_pool_1 = fluid.layers.batch_norm(conv_pool_1)
    conv_pool_2 = fluid.nets.simple_img_conv_pool(
        input=conv_pool_1,
        filter_size=5,
        num_filters=50,
        pool_size=2,
        pool_stride=2,
        act="relu")
    print('Second convolutional pooling layer output shape:', conv_pool_2.shape)
    conv_pool_2 = fluid.layers.batch_norm(conv_pool_2)
    conv_pool_3 = fluid.nets.simple_img_conv_pool(
        input=conv_pool_2,
        filter_size=5,
        num_filters=50,
        pool_size=2,
        pool_stride=2,
        act="relu")
    print('Third convolutional pooling layer output shape:', conv_pool_3.shape)
    prediction = fluid.layers.fc(input=conv_pool_3, size=10, act='softmax')
    print('Fully connected layer output shape:', prediction.shape)
    return prediction

Convolutional Layer Output Calculation Formula:

  • Input shape: \((N, C_{in}, H_{in}, W_{in})\)
  • Kernel shape: \((C_{out}, C_{in}, H_f, W_f)\)

\(H_{out}=\frac{(H_{in}+2*padding-(dilation*(H_f-1)+1))}{stride}+1\)

\(W_{out}=\frac{(W_{in}+2*padding-(dilation*(W_f-1)+1))}{stride}+1\)

  • Output shape: \((N, C_{out}, H_{out}, W_{out})\)

Pooling Layer Output Calculation Formula:

  • Input shape: \((N, C, H_{in}, W_{in})\)
  • Pooling shape: \((1, 1, ksize, ksize)\)

\(H_{out}=\frac{H_{in}-ksize}{stride}+1\)

\(W_{out}=\frac{W_{in}-ksize}{stride}+1\)

  • Output shape: \((N, C, H_{out}, W_{out})\)

Parameter Size Calculation Formula:

\(psize=C_{out}*C_{in}*ksize*ksize\)

Answer 1: The input data shape of this network is (128, 3, 32, 32). The outputs are as follows:

  • First convolutional layer output shape: (128, 20, 14, 14) with parameter size: \(20*3*5*5=1500\)
  • First pooling layer output shape: (128, 20, 14, 14)
  • Second convolutional layer output shape: (128, 50, 5, 5) with parameter size: \(50*20*5*5=25000\)
  • Second pooling layer output shape: (128, 50, 5, 5)
  • Third convolutional layer output shape: (128, 50, 1, 1) with parameter size: \(50*50*5*5=62500\)
  • Third pooling layer output shape: (128, 50, 1, 1)
  • Final fully connected layer output shape: (128, 10) with parameter size: \(50*10=500\)
  • Total parameter size: \(1500+25000+62500+500=89500\)

PaddlePaddle Network Output:

Input layer shape: (-1, 3, 32, 32)
First convolutional pooling layer output shape: (-1, 20, 14, 14)
Second convolutional pooling layer output shape: (-1, 50, 5, 5)
Third convolutional pooling layer output shape: (-1, 50, 1, 1)
Fully connected layer output shape: (-1, 10)

Answer 2:

Before using BN layers:

  • Parameter updates cause input/output distributions of each layer to change, known as ICS (Internal Covariate Shift)
  • This difference increases with network depth
  • Requires smaller learning rates and better parameter initialization

After adding BN layers:

  • Can use larger learning rates
  • Reduces dependency on parameter initialization
  • Can suppress gradient vanishing
  • Acts as regularization
  • Accelerates model convergence

Training results with BN:

Training results without BN:

From the graphs, the BN version achieves higher accuracy with smaller fluctuations in loss and accuracy during training.

Answer 3: Since the output width and height of the third convolutional pooling layer are both 1, no more convolutional pooling layers can be added. Attempting to add another would result in the following error:

EnforceNotMet: Due to padding(0), filter_size(5), dilation(1), and stride(1) settings, the output size is negative. Please check the configuration again. Input size:1
Xiaoye