Table of Contents

@[toc]

Common Activation Functions

We commonly use three activation functions: sigmoid, tanh, and ReLU. Let’s learn about each one.

Sigmoid Function

In deep learning, the sigmoid function is often used as an activation function, especially for binary classification. The formula for the sigmoid function is:
\(\(sigmoid(x) = \frac{1}{1+e^{-x}}\tag{1}\)\)

The graph of the sigmoid function is:

Python implementation of sigmoid:

import numpy as np

def sigmoid(x):
    s = 1 / (1 + np.exp(-x))
    return s

This sigmoid function can compute values for real numbers, vectors, and matrices. For real numbers:

if __name__ == '__main__':
    x = 3
    s = sigmoid(x)
    print(s)

Output:

0.952574126822

For vectors or matrices:
$$
sigmoid(x) = sigmoid\begin{pmatrix}
x_1 \
x_2 \
… \
x_n \
\end{pmatrix} = \begin{pmatrix}
\frac{1}{1+e^{-x_1}} \
\frac{1}{1+e^{-x_2}} \
… \
\frac{1}{1+e^{-x_n}} \
\end{pmatrix}\tag{2}
$$

Example usage:

if __name__ == '__main__':
    x = np.array([2, 3, 4])
    s = sigmoid(x)
    print(s)

Output:

[0.88079708 0.95257413 0.98201379]

Sigmoid Gradient

To use backpropagation for optimizing the loss function, we need to compute the gradient of the sigmoid function. The formula is:
\(\(sigmoid\_derivative(x) = \sigma'(x) = \sigma(x) (1 - \sigma(x))\tag{3}\)\)

Python implementation:

import numpy as np

def sigmoid_derivative(x):
    s = 1 / (1 + np.exp(-x))
    ds = s * (1 - s)
    return ds

Example usage:

if __name__ == '__main__':
    x = 3
    s = sigmoid_derivative(x)
    print(s)

Output:

0.0451766597309

For vectors or matrices:

if __name__ == '__main__':
    x = np.array([2, 3, 4])
    s = sigmoid_derivative(x)
    print(s)

Output:

[0.10499359 0.04517666 0.01766271]

Tanh Function

Tanh is another commonly used activation function. Its formula is:
\(\(tanh(x) = \frac{e^x-e^{-x}}{e^x+e^{-x}}\tag{4}\)\)

The graph of the tanh function is:

Python implementation:

import numpy as np

def tanh(x):
    s1 = np.exp(x) - np.exp(-x)
    s2 = np.exp(x) + np.exp(-x)
    s = s1 / s2
    return s

Example usage for real numbers, vectors, and matrices:

if __name__ == '__main__':
    x = 3
    s = tanh(x)
    print(s)
    x = np.array([2, 3, 4])
    s = tanh(x)
    print(s)

Output:

0.995054753687
[0.96402758 0.99505475 0.9993293 ]

Tanh Gradient

The gradient of the tanh function is calculated as:
$$
tanh_derivative(x) = tanh’(x) = 1 - \tanh^2x = 1- \left(\frac{e^x-e^{-x}}{e^x+e^{-x}}\right)^2\tag{5}
$$

Python implementation:

import numpy as np

def tanh_derivative(x):
    s1 = np.exp(x) - np.exp(-x)
    s2 = np.exp(x) + np.exp(-x)
    tanh = s1 / s2
    s = 1 - tanh * tanh
    return s

Example usage:

if __name__ == '__main__':
    x = 3
    s = tanh_derivative(x)
    print(s)
    x = np.array([2, 3, 4])
    s = tanh_derivative(x)
    print(s)

Output:

0.00986603716544
[0.07065082 0.00986604 0.00134095]

ReLU Function

ReLU is currently the most commonly used activation function in deep learning. Its formula is:
\(\(relu(x) = max(0,x)=\left\{\begin{matrix}x,& \text{if} \quad x > 0 \\ 0,& \text{if} \quad x \leq0\end{matrix}\right.\tag{6}\)\)

The graph of the ReLU function is:

Python implementation:

import numpy as np

def relu(x):
    s = np.where(x < 0, 0, x)
    return s

Example usage:

if __name__ == '__main__':
    x = -1
    s = relu(x)
    print(s)
    x = np.array([2, -3, 1])
    s = relu(x)
    print(s)

Output:

0
[2 0 1]

Image to Vector

To improve training speed, images are often converted to vectors. A 3-channel image with dimensions \((width, height, 3)\) becomes \((width*height*3, 1)\).

Python implementation:

import numpy as np

def image2vector(image):
    v = image.reshape((image.shape[0] * image.shape[1] * image.shape[2], 1))
    return v

Example usage:

if __name__ == '__main__':
    image = np.array([[[0.67826139, 0.29380381],
                       [0.90714982, 0.52835647],
                       [0.4215251, 0.45017551]],

                      [[0.92814219, 0.96677647],
                       [0.85304703, 0.52351845],
                       [0.19981397, 0.27417313]],

                      [[0.60659855, 0.00533165],
                       [0.10820313, 0.49978937],
                       [0.34144279, 0.94630077]]])
    vector = image2vector(image)
    print("image shape is :", image.shape)
    print("vector shape is :", vector.shape)
    print("vector is :" + str(image2vector(image)))

Output:

image shape is : (3, 3, 2)
vector shape is : (18, 1)
vector is :[[0.67826139]
 [0.29380381]
 [0.90714982]
 [0.52835647]
 [0.4215251 ]
 [0.45017551]
 [0.92814219]
 [0.96677647]
 [0.85304703]
 [0.52351845]
 [0.19981397]
 [0.27417313]
 [0.60659855]
 [0.00533165]
 [0.10820313]
 [0.49978937]
 [0.34144279]
 [0.94630077]]

Normalize Rows

Normalizing rows helps models converge faster. The formula for normalization is:

For a matrix \(x\):
$$
x = \begin{bmatrix}
0 & 3 & 4 \
2 & 6 & 4 \
\end{bmatrix}\tag{7}
$$

First, compute the L2 norm (magnitude) of each row:
$$
| x| = np.linalg.norm(x, axis = 1, keepdims = True) = \begin{bmatrix}
5 \
\sqrt{56} \
\end{bmatrix}\tag{8}
$$

Then normalize each row by dividing by its norm:
$$ x_normalized = \frac{x}{| x|} = \begin{bmatrix}
0 & \frac{3}{5} & \frac{4}{5} \
\frac{2}{\sqrt{56}} & \frac{6}{\sqrt{56}} & \frac{4}{\sqrt{56}} \
\end{bmatrix}\tag{9}
$$

Python implementation:

import numpy as np

def normalizeRows(x):
    x_norm = np.linalg.norm(x, axis=1, keepdims=True)
    print("x_norm = ", x_norm)
    x = x / x_norm
    return x

Example usage:

if __name__ == '__main__':
    x = np.array([
        [0, 3, 4],
        [1, 6, 4]])
    print("normalizeRows(x) = " + str(normalizeRows(x)))

Output:

x_norm =  [[5.        ]
 [7.28010989]]
normalizeRows(x) = [[0.         0.6        0.8       ]
 [0.13736056 0.82416338 0.54944226]]

Broadcasting and Softmax Function

Broadcasting scales smaller matrices to match the shape of larger matrices for element-wise operations. The softmax function uses broadcasting.

Softmax for a vector:
$$ x \in \mathbb{R}^{1\times n} \text{, } softmax(x) = softmax(\begin{bmatrix}
x_1 &&
x_2 &&
… &&
x_n
\end{bmatrix}) = \begin{bmatrix}
\frac{e^{x_1}}{\sum_{j}e^{x_j}} &&
\frac{e^{x_2}}{\sum_{j}e^{x_j}} &&
… &&
\frac{e^{x_n}}{\sum_{j}e^{x_j}}
\end{bmatrix} \tag{10}
$$

Softmax for a matrix:
$$
x \in \mathbb{R}^{m \times n} \text{, }x_{ij} \quad softmax(x) = softmax\begin{bmatrix}
x_{11} & x_{12} & x_{13} & \dots & x_{1n} \
x_{21} & x_{22} & x_{23} & \dots & x_{2n} \
\vdots & \vdots & \vdots & \ddots & \vdots \
x_{m1} & x_{m2} & x_{m3} & \dots & x_{mn}
\end{bmatrix} = \begin{bmatrix}
\frac{e^{x_{11}}}{\sum_{j}e^{x_{1j}}} & \frac{e^{x_{12}}}{\sum_{j}e^{x_{1j}}} & \frac{e^{x_{13}}}{\sum_{j}e^{x_{1j}}} & \dots & \frac{e^{x_{1n}}}{\sum_{j}e^{x_{1j}}} \
\frac{e^{x_{21}}}{\sum_{j}e^{x_{2j}}} & \frac{e^{x_{22}}}{\sum_{j}e^{x_{2j}}} & \frac{e^{x_{23}}}{\sum_{j}e^{x_{2j}}} & \dots & \frac{e^{x_{2n}}}{\sum_{j}e^{x_{2j}}} \
\vdots & \vdots & \vdots & \ddots & \vdots \
\frac{e^{x_{m1}}}{\sum_{j}e^{x_{mj}}} & \frac{e^{x_{m2}}}{\sum_{j}e^{x_{mj}}} & \frac{e^{x_{m3}}}{\sum_{j}e^{x_{mj}}} & \dots & \frac{e^{x_{mn}}}{\sum_{j}e^{x_{mj}}}
\end{bmatrix} = \begin{pmatrix}
softmax\text{(first row of x)} \
softmax\text{(second row of x)} \
… \
softmax\text{(last row of x)} \
\end{pmatrix} \tag{11}
$$

Python implementation:

import numpy as np

def softmax(x):
    x_exp = np.exp(x)
    x_sum = np.sum(x_exp, axis=1, keepdims=True)
    print("x_sum = ", x_sum)
    s = x_exp / x_sum
    return s

Example usage:

if __name__ == '__main__':
    x = np.array([
        [9, 2, 5, 0, 0],
        [7, 5, 0, 0, 0]])
    print("softmax(x) = " + str(softmax(x)))

Output:

x_sum =  [[8260.88614278]
 [1248.04631753]]
softmax(x) = [[9.80897665e-01 8.94462891e-04 1.79657674e-02 1.21052389e-04 1.21052389e-04]
 [8.78679856e-01 1.18916387e-01 8.01252314e-04 8.01252314e-04 8.01252314e-04]]

Numpy Matrix Operations

Numpy provides three main matrix operations: np.dot(), np.outer(), and np.multiply().

Dot product (matrix multiplication):

# coding=utf-8
import numpy as np

if __name__ == '__main__':
    s1 = [[1,2,3],[4,5,6]]
    s2 = [[2,2],[3,3],[4,4]]
    dot = np.dot(s1, s2)
    print('dot = ', dot)

Output:

dot =  [[20 20]
 [47 47]]

Outer product (element-wise product of vectors, expanded to matrix):

    outer = np.outer(s1, s2)
    print('outer = ', outer)

Output:
```python
outer = [[ 2 2 3 3 4 4]
[ 4 4 6 6 8 8]
[ 6 6 9 9 12 12]

Xiaoye