Table of Contents

@[toc]

Preface

In this chapter, we will learn about TensorFlow. We will explore some of its basic libraries and get familiar with them by computing a linear function. Finally, we will use TensorFlow to build a neural network for hand gesture recognition. Some libraries used in this chapter can be downloaded here.

Basic Libraries of TensorFlow

First, we import the necessary libraries. The most important one is TensorFlow, which we alias as tf.

import math
import numpy as np
import h5py
import tensorflow as tf
from tensorflow.python.framework import ops
from tf_utils import load_dataset, random_mini_batches, convert_to_one_hot, predict

Next, we use TensorFlow to compute a loss function. The loss function formula is:
\(\(loss = \mathcal{L}(\hat{y}, y) = (\hat y^{(i)} - y^{(i)})^2 \tag{1}\)\)

First, we define two variables corresponding to \(\hat{y}\) and \(y\) in the formula. We assign \(\hat{y} = 36\) and \(y = 39\):

y_hat = tf.constant(36, name='y_hat')
y = tf.constant(39, name='y')

Then, we define the computation based on formula (1). Calculating the square is convenient using the ** operator:

loss = tf.Variable((y - y_hat)**2, name='loss')

Before using TensorFlow, we need to initialize its variables. Computations are executed within a session.

init = tf.global_variables_initializer()

with tf.Session() as session:
    session.run(init)
    print(session.run(loss)) 

The output of the above code is 9.

From the above example, we can see that TensorFlow encapsulates variable definitions and assignments. The calculation method is similar to regular programming. For example:

a = tf.constant(2)
b = tf.constant(10)
c = tf.multiply(a, b)
print(c)

The output here is not 20 but the tensor Tensor("Mul:0", shape=(), dtype=int32). To get the value, we must run it in a session:

sess = tf.Session()
print(sess.run(c))

This will output the correct result 20.

In some cases, we don’t know the variable values upfront. We use placeholders for such scenarios:

x = tf.placeholder(tf.int64, name='x')
print(sess.run(2 * x, feed_dict={x: 3}))
sess.close()

Here, we don’t specify x initially; instead, we use a feed_dict dictionary to assign values when running the session.

Common Computations

Linear Function

The formula for a linear function is:
\(\(Y = WX + b\tag{2}\)\)

We use the following functions:
- tf.matmul() for matrix multiplication
- tf.add() for addition
- np.random.randn() for random initialization

def linear_function():
    # Randomly generate a tensor
    X = tf.constant(np.random.randn(3, 1), name="X")
    W = tf.constant(np.random.randn(4, 3), name="W")
    b = tf.constant(np.random.randn(4, 1), name="b")

    # Compute the linear function
    Y = tf.add(tf.matmul(W, X), b)

    # Run the session
    sess = tf.Session()
    result = sess.run(Y)
    sess.close()

    return result

Sigmoid Function

We use TensorFlow’s built-in sigmoid function:

def sigmoid(z):
    x = tf.placeholder(tf.float32, name="x")
    sigmoid = tf.sigmoid(x)

    with tf.Session() as sess:
        result = sess.run(sigmoid, feed_dict={x: z})

    return result

Loss Function

The cross-entropy loss formula is:
$$ J = - \frac{1}{m} \sum_{i = 1}^m \large ( \small y^{(i)} \log a^{ [2] (i)} + (1-y^{(i)})\log (1-a^{ [2] (i)} )\large )\small\tag{3}$$

We can use TensorFlow’s built-in function:

def cost(logits, labels):
    z = tf.placeholder(tf.float32, name="z")
    y = tf.placeholder(tf.float32, name="y")

    cost = tf.nn.sigmoid_cross_entropy_with_logits(logits=z, labels=y)

    sess = tf.Session()
    cost = sess.run(cost, feed_dict={z: logits, y: labels})
    sess.close()

    return cost

One-Hot Encoding

One-Hot encoding is implemented using tf.one_hot:

def one_hot_matrix(labels, C):
    C = tf.constant(C, name="C")
    one_hot_matrix = tf.one_hot(labels, C, axis=0)

    sess = tf.Session()
    one_hot = sess.run(one_hot_matrix)
    sess.close()

    return one_hot

Testing the function:

labels = np.array([1, 2, 3, 0, 2, 1])
one_hot = one_hot_matrix(labels, C=4)
print("one_hot = " + str(one_hot))

Output:

one_hot = [[ 0.  0.  0.  1.  0.  0.]
           [ 1.  0.  0.  0.  0.  1.]
           [ 0.  1.  0.  0.  1.  0.]
           [ 0.  0.  1.  0.  0.  0.]]

Initialize Matrix

We use tf.ones to create a matrix of ones:

def ones(shape):
    ones = tf.ones(shape)

    sess = tf.Session()
    ones = sess.run(ones)
    sess.close()

    return ones

Building a Neural Network with TensorFlow

We will build a neural network to recognize hand gestures using one-hot encoding for labels.

Load Data

X_train_orig, Y_train_orig, X_test_orig, Y_test_orig, classes = load_dataset()

Preprocess Data

Flatten and normalize the images, and convert labels to one-hot encoding:

X_train_flatten = X_train_orig.reshape(X_train_orig.shape[0], -1).T
X_test_flatten = X_test_orig.reshape(X_test_orig.shape[0], -1).T
X_train = X_train_flatten / 255.
X_test = X_test_flatten / 255.
Y_train = convert_to_one_hot(Y_train_orig, 6)
Y_test = convert_to_one_hot(Y_test_orig, 6)

Create Placeholders

def create_placeholders(n_x, n_y):
    X = tf.placeholder(tf.float32, shape=(n_x, None), name="Placeholder_1")
    Y = tf.placeholder(tf.float32, shape=(n_y, None), name="Placeholder_2")

    return X, Y

Initialize Parameters

def initialize_parameters():
    W1 = tf.get_variable("W1", [25, 12288], initializer=tf.contrib.layers.xavier_initializer(seed=1))
    b1 = tf.get_variable("b1", [25, 1], initializer=tf.zeros_initializer())
    W2 = tf.get_variable("W2", [12, 25], initializer=tf.contrib.layers.xavier_initializer(seed=1))
    b2 = tf.get_variable("b2", [12, 1], initializer=tf.zeros_initializer())
    W3 = tf.get_variable("W3", [6, 12], initializer=tf.contrib.layers.xavier_initializer(seed=1))
    b3 = tf.get_variable("b3", [6, 1], initializer=tf.zeros_initializer())

    parameters = {"W1": W1, "b1": b1, "W2": W2, "b2": b2, "W3": W3, "b3": b3}

    return parameters

Forward Propagation

def forward_propagation(X, parameters):
    W1 = parameters['W1']
    b1 = parameters['b1']
    W2 = parameters['W2']
    b2 = parameters['b2']
    W3 = parameters['W3']
    b3 = parameters['b3']

    Z1 = tf.add(tf.matmul(W1, X), b1)
    A1 = tf.nn.relu(Z1)
    Z2 = tf.add(tf.matmul(W2, A1), b2)
    A2 = tf.nn.relu(Z2)
    Z3 = tf.add(tf.matmul(W3, A2), b3)

    return Z3

Compute Cost

def compute_cost(Z3, Y):
    logits = tf.transpose(Z3)
    labels = tf.transpose(Y)

    cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=labels))

    return cost

Backward Propagation and Parameter Update

optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate).minimize(cost)
_ , c = sess.run([optimizer, cost], feed_dict={X: minibatch_X, Y: minibatch_Y})

Build the Model

def model(X_train, Y_train, X_test, Y_test, learning_rate=0.0001,
          num_epochs=1500, minibatch_size=32, print_cost=True):
    ops.reset_default_graph()
    tf.set_random_seed(1)
    seed = 3
    (n_x, m) = X_train.shape
    n_y = Y_train.shape[0]
    costs = []

    X, Y = create_placeholders(n_x, n_y)
    parameters = initialize_parameters()
    Z3 = forward_propagation(X, parameters)
    cost = compute_cost(Z3, Y)
    optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost)

    init = tf.global_variables_initializer()

    with tf.Session() as sess:
        sess.run(init)

        for epoch in range(num_epochs):
            epoch_cost = 0.
            num_minibatches = int(m / minibatch_size)
            seed += 1
            minibatches = random_mini_batches(X_train, Y_train, minibatch_size, seed)

            for minibatch in minibatches:
                (minibatch_X, minibatch_Y) = minibatch
                _ , minibatch_cost = sess.run([optimizer, cost], feed_dict={X: minibatch_X, Y: minibatch_Y})
                epoch_cost += minibatch_cost / num_minibatches

            if print_cost and epoch % 100 == 0:
                print("Cost after epoch %i: %f" % (epoch, epoch_cost))
            if print_cost and epoch % 5 == 0:
                costs.append(epoch_cost)

        parameters = sess.run(parameters)
        print("Parameters have been trained!")

        correct_prediction = tf.equal(tf.argmax(Z3), tf.argmax(Y))
        accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))

        print("Train Accuracy:", accuracy.eval({X: X_train, Y: Y_train}))
        print("Test Accuracy:", accuracy.eval({X: X_test, Y: Y_test}))

        return parameters

Train the model:

parameters = model(X_train, Y_train, X_test, Y_test)

Make Predictions

Use the trained parameters to predict:

import scipy
from PIL import Image
from scipy import ndimage

my_image = "thumbs_up.jpg"
fname = "images/" + my_image
image = np.array(ndimage.imread(fname, flatten=False))
my_image = scipy.misc.imresize(image, size=(64, 64)).reshape((1, 64*64*3)).T
my_image_prediction = predict(my_image, parameters)

print("Your algorithm predicts: y = " + str(np.squeeze(my_image_prediction)))

References

http://deeplearning.ai/





This note is based on studying Andrew Ng’s course. As a beginner, I welcome corrections if there are any misunderstandings!

Xiaoye