Table of Contents¶
@[toc]
Preface¶
In this chapter, we will learn about TensorFlow. We will explore some of its basic libraries and get familiar with them by computing a linear function. Finally, we will use TensorFlow to build a neural network for hand gesture recognition. Some libraries used in this chapter can be downloaded here.
Basic Libraries of TensorFlow¶
First, we import the necessary libraries. The most important one is TensorFlow, which we alias as tf.
import math
import numpy as np
import h5py
import tensorflow as tf
from tensorflow.python.framework import ops
from tf_utils import load_dataset, random_mini_batches, convert_to_one_hot, predict
Next, we use TensorFlow to compute a loss function. The loss function formula is:
\(\(loss = \mathcal{L}(\hat{y}, y) = (\hat y^{(i)} - y^{(i)})^2 \tag{1}\)\)
First, we define two variables corresponding to \(\hat{y}\) and \(y\) in the formula. We assign \(\hat{y} = 36\) and \(y = 39\):
y_hat = tf.constant(36, name='y_hat')
y = tf.constant(39, name='y')
Then, we define the computation based on formula (1). Calculating the square is convenient using the ** operator:
loss = tf.Variable((y - y_hat)**2, name='loss')
Before using TensorFlow, we need to initialize its variables. Computations are executed within a session.
init = tf.global_variables_initializer()
with tf.Session() as session:
session.run(init)
print(session.run(loss))
The output of the above code is 9.
From the above example, we can see that TensorFlow encapsulates variable definitions and assignments. The calculation method is similar to regular programming. For example:
a = tf.constant(2)
b = tf.constant(10)
c = tf.multiply(a, b)
print(c)
The output here is not 20 but the tensor Tensor("Mul:0", shape=(), dtype=int32). To get the value, we must run it in a session:
sess = tf.Session()
print(sess.run(c))
This will output the correct result 20.
In some cases, we don’t know the variable values upfront. We use placeholders for such scenarios:
x = tf.placeholder(tf.int64, name='x')
print(sess.run(2 * x, feed_dict={x: 3}))
sess.close()
Here, we don’t specify x initially; instead, we use a feed_dict dictionary to assign values when running the session.
Common Computations¶
Linear Function¶
The formula for a linear function is:
\(\(Y = WX + b\tag{2}\)\)
We use the following functions:
- tf.matmul() for matrix multiplication
- tf.add() for addition
- np.random.randn() for random initialization
def linear_function():
# Randomly generate a tensor
X = tf.constant(np.random.randn(3, 1), name="X")
W = tf.constant(np.random.randn(4, 3), name="W")
b = tf.constant(np.random.randn(4, 1), name="b")
# Compute the linear function
Y = tf.add(tf.matmul(W, X), b)
# Run the session
sess = tf.Session()
result = sess.run(Y)
sess.close()
return result
Sigmoid Function¶
We use TensorFlow’s built-in sigmoid function:
def sigmoid(z):
x = tf.placeholder(tf.float32, name="x")
sigmoid = tf.sigmoid(x)
with tf.Session() as sess:
result = sess.run(sigmoid, feed_dict={x: z})
return result
Loss Function¶
The cross-entropy loss formula is:
$$ J = - \frac{1}{m} \sum_{i = 1}^m \large ( \small y^{(i)} \log a^{ [2] (i)} + (1-y^{(i)})\log (1-a^{ [2] (i)} )\large )\small\tag{3}$$
We can use TensorFlow’s built-in function:
def cost(logits, labels):
z = tf.placeholder(tf.float32, name="z")
y = tf.placeholder(tf.float32, name="y")
cost = tf.nn.sigmoid_cross_entropy_with_logits(logits=z, labels=y)
sess = tf.Session()
cost = sess.run(cost, feed_dict={z: logits, y: labels})
sess.close()
return cost
One-Hot Encoding¶
One-Hot encoding is implemented using tf.one_hot:
def one_hot_matrix(labels, C):
C = tf.constant(C, name="C")
one_hot_matrix = tf.one_hot(labels, C, axis=0)
sess = tf.Session()
one_hot = sess.run(one_hot_matrix)
sess.close()
return one_hot
Testing the function:
labels = np.array([1, 2, 3, 0, 2, 1])
one_hot = one_hot_matrix(labels, C=4)
print("one_hot = " + str(one_hot))
Output:
one_hot = [[ 0. 0. 0. 1. 0. 0.]
[ 1. 0. 0. 0. 0. 1.]
[ 0. 1. 0. 0. 1. 0.]
[ 0. 0. 1. 0. 0. 0.]]
Initialize Matrix¶
We use tf.ones to create a matrix of ones:
def ones(shape):
ones = tf.ones(shape)
sess = tf.Session()
ones = sess.run(ones)
sess.close()
return ones
Building a Neural Network with TensorFlow¶
We will build a neural network to recognize hand gestures using one-hot encoding for labels.
Load Data¶
X_train_orig, Y_train_orig, X_test_orig, Y_test_orig, classes = load_dataset()
Preprocess Data¶
Flatten and normalize the images, and convert labels to one-hot encoding:
X_train_flatten = X_train_orig.reshape(X_train_orig.shape[0], -1).T
X_test_flatten = X_test_orig.reshape(X_test_orig.shape[0], -1).T
X_train = X_train_flatten / 255.
X_test = X_test_flatten / 255.
Y_train = convert_to_one_hot(Y_train_orig, 6)
Y_test = convert_to_one_hot(Y_test_orig, 6)
Create Placeholders¶
def create_placeholders(n_x, n_y):
X = tf.placeholder(tf.float32, shape=(n_x, None), name="Placeholder_1")
Y = tf.placeholder(tf.float32, shape=(n_y, None), name="Placeholder_2")
return X, Y
Initialize Parameters¶
def initialize_parameters():
W1 = tf.get_variable("W1", [25, 12288], initializer=tf.contrib.layers.xavier_initializer(seed=1))
b1 = tf.get_variable("b1", [25, 1], initializer=tf.zeros_initializer())
W2 = tf.get_variable("W2", [12, 25], initializer=tf.contrib.layers.xavier_initializer(seed=1))
b2 = tf.get_variable("b2", [12, 1], initializer=tf.zeros_initializer())
W3 = tf.get_variable("W3", [6, 12], initializer=tf.contrib.layers.xavier_initializer(seed=1))
b3 = tf.get_variable("b3", [6, 1], initializer=tf.zeros_initializer())
parameters = {"W1": W1, "b1": b1, "W2": W2, "b2": b2, "W3": W3, "b3": b3}
return parameters
Forward Propagation¶
def forward_propagation(X, parameters):
W1 = parameters['W1']
b1 = parameters['b1']
W2 = parameters['W2']
b2 = parameters['b2']
W3 = parameters['W3']
b3 = parameters['b3']
Z1 = tf.add(tf.matmul(W1, X), b1)
A1 = tf.nn.relu(Z1)
Z2 = tf.add(tf.matmul(W2, A1), b2)
A2 = tf.nn.relu(Z2)
Z3 = tf.add(tf.matmul(W3, A2), b3)
return Z3
Compute Cost¶
def compute_cost(Z3, Y):
logits = tf.transpose(Z3)
labels = tf.transpose(Y)
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=labels))
return cost
Backward Propagation and Parameter Update¶
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate).minimize(cost)
_ , c = sess.run([optimizer, cost], feed_dict={X: minibatch_X, Y: minibatch_Y})
Build the Model¶
def model(X_train, Y_train, X_test, Y_test, learning_rate=0.0001,
num_epochs=1500, minibatch_size=32, print_cost=True):
ops.reset_default_graph()
tf.set_random_seed(1)
seed = 3
(n_x, m) = X_train.shape
n_y = Y_train.shape[0]
costs = []
X, Y = create_placeholders(n_x, n_y)
parameters = initialize_parameters()
Z3 = forward_propagation(X, parameters)
cost = compute_cost(Z3, Y)
optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost)
init = tf.global_variables_initializer()
with tf.Session() as sess:
sess.run(init)
for epoch in range(num_epochs):
epoch_cost = 0.
num_minibatches = int(m / minibatch_size)
seed += 1
minibatches = random_mini_batches(X_train, Y_train, minibatch_size, seed)
for minibatch in minibatches:
(minibatch_X, minibatch_Y) = minibatch
_ , minibatch_cost = sess.run([optimizer, cost], feed_dict={X: minibatch_X, Y: minibatch_Y})
epoch_cost += minibatch_cost / num_minibatches
if print_cost and epoch % 100 == 0:
print("Cost after epoch %i: %f" % (epoch, epoch_cost))
if print_cost and epoch % 5 == 0:
costs.append(epoch_cost)
parameters = sess.run(parameters)
print("Parameters have been trained!")
correct_prediction = tf.equal(tf.argmax(Z3), tf.argmax(Y))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
print("Train Accuracy:", accuracy.eval({X: X_train, Y: Y_train}))
print("Test Accuracy:", accuracy.eval({X: X_test, Y: Y_test}))
return parameters
Train the model:
parameters = model(X_train, Y_train, X_test, Y_test)
Make Predictions¶
Use the trained parameters to predict:
import scipy
from PIL import Image
from scipy import ndimage
my_image = "thumbs_up.jpg"
fname = "images/" + my_image
image = np.array(ndimage.imread(fname, flatten=False))
my_image = scipy.misc.imresize(image, size=(64, 64)).reshape((1, 64*64*3)).T
my_image_prediction = predict(my_image, parameters)
print("Your algorithm predicts: y = " + str(np.squeeze(my_image_prediction)))
References¶
http://deeplearning.ai/
This note is based on studying Andrew Ng’s course. As a beginner, I welcome corrections if there are any misunderstandings!