# 1. Introduction¶

The first step to learn Tensorflow is to understand its main key feature, the "computational graph" approach. Basically, all Tensorflow codes contain two important part:

Part 1: building the graph which represents the data flow of the computations

Part 2: running a session which executes the operations in the graph

In fact, TensorFlow separates the definition of computations from their execution. These two parts are explained in more details in the following sections. Before that, remember that the first step is to import the Tensorflow library!

In [1]:
# import the tensorflow library
import tensorflow as tf


# 2. Graph¶

The biggest idea of all of the big ideas about Tensorflow is that numeric computation is expressed as a computational graph. In other words, the backbone of any Tensorflow program is going to be a graph.

The graph nodes are operations (shortened as ops) and they have any number of Tensors as input and output.
Graph edges are Tensors which flow between nodes.

 

*Note: Tensor is a multi-dimensional array. (0-D tensor: scalar, 1-D tensor: vector, 2-D tensor: matrix, and so on)

You might wonder why such a complex(!) structure? Well, the advantage of using flow graphs as the backbone of your deep learning framework is that it allows you to build complex models in terms of small and simple operations. Finally, this is going to make gradient calculations extremely simple when we get to that.

Fig. 1- Sample data flow graph in Tensorflow

Now, let's make a simple graph to do an arithmetic (say, adding two constants) with Tensorflow.

In [2]:
# create constants
a = tf.constant(2, name="a")
b = tf.constant(3, name="b")

print(c)



As you see, when we print variable c, it prints out a Tensor with its shape and type information. It does not spit out the result. At this level, Tensorflow creates a graph like this:

Fig. 2- Generated graph using the above code

As shown in the graph, the addition operation creates a node named "Add". It's also important to remember that constants a and b are also operations (by definition) and they have their own nodes in the graph (again, nodes=operations). To execute the graph and get the values, we need a session.

# 3. Session¶

To compute anything, a graph must be launched in a session. Technically speaking, session places the graph ops onto Devices, such as CPUs or GPUs, and provides methods to execute them. In our simple example, to run the graph and get the value for c:

In [3]:
# create graph
# launch the graph in a session
with tf.Session() as sess:
print(sess.run(c))


# 4. Datatypes¶

Commonly used datatypes in Tensorflow can be explained in 3 categories:

## 4.1. Constant:¶

As the name speaks for itself, Constants are used as constant value tensors. They create a node that takes a constant value.

In [4]:
# create graph
a = tf.constant(2, name="a")
b = tf.constant(3, name="b")
# launch the graph in a session
with tf.Session() as sess:
print(sess.run(c))


# 4.2. Variable:¶

Variables are stateful nodes which output their current value; meaning that they can retain their value over multiple executions of a graph. They have a number of useful features such as:

They can be saved to your disk during and after training. This allows people from different companies and groups to save, restore and send over their model parameters to other people.

By default, gradient updates (used in all networks) will apply to all variables in your graph. In fact, variables are the things that you want to tune in order to minimize the loss.

 

These features make variables suitable to be used as the network parameters (i.e. weights and biases).

Again, it's important to remember that variables are operations. When we evaluate these operations in the session, we'll get the values of these variables.

*Note: Variables need to be initialized. The following toy example shows how we can add an op to initialize the variables.

In [5]:
# create graph
a = tf.get_variable(name="a", initializer=2)
b = tf.get_variable(name="b", initializer=3)
# Add an Op to initialize variables
init_op = tf.global_variables_initializer()
# launch the graph in a session
with tf.Session() as sess:
# run the variable initializer
sess.run(init_op)
# now we can run the desired operation
print(sess.run(c))


# 4.3. Placeholder:¶

Placeholders are nodes whose value is fed in at execution time. If you have inputs to your network that depend on some external data and you don't want your graph to depend on any real value, placeholders are the datatype you need. In fact, you can build the graph without needing the data. Therefore, they don't need any initial value; only a datatype (such as float32) and a tensor shape so the graph still knows what to compute even though it doesn't have any stored values yet.

In [6]:
# create a placeholder that takes three values (vector of size 3) and type float32
a = tf.placeholder(tf.float32, shape=[3], name="a")
# create a constant the same size and type as "a"
b = tf.constant([5, 5, 5], tf.float32, name="b")
# launch the graph in a session
with tf.Session() as sess:
print(sess.run(c))


You must feed a value for placeholder tensor 'a' with dtype float and shape [3]

As you see, it doesn't work because the placeholder is empty and there is no way to add an empty tensor to a constant tensor. We need to feed the input value to tensor "a". It can be done by creating a dictionary ("d" in the following code) whose key(s) are the placeholders and their values are the desired value to be passed to the placeholder(s), and feeding it to an argument called "feed_dict". In our example, say we want to pass [1, 2, 3] to the placeholder:

In [7]:
# create a placeholder that takes three values (vector of size 3) and type float32
a = tf.placeholder(tf.float32, shape=[3], name="a")
# create a constant the same size and type as "a"
b = tf.constant([5, 5, 5], tf.float32, name="b")
# launch the graph in a session
with tf.Session() as sess:
# create the dictionary:
d = {a: [1, 2, 3]}
# feed it to placeholder a via the dict
print(sess.run(c, feed_dict=d))

[6. 7. 8.]


So far so good?!

Now, we know all we need to start building a toy feed-forward neural network with one hidden layer with 200 hidden units (neurons). The computational graph in Tensorflow will look like this:

Fig. 3- Data flow graph of a simple neural network with one hidden layer to be created in Tensorflow

How many operations (or nodes) you see in this graph? Six, right? The three circles (X, W, b) and three rectangles. We'll go through them one by one and will discuss what is the best way to implement it.

Let's start with the input, X. This can be an input of any type, such as images, signals, etc. The general approach is to feed all inputs to the network and train the trainable parameters (here, W and b) by backpropagating the error signal. Ideally, you need to feed all inputs together, compute the error, and update the parameters. This process is called "Gradient Descent".

*Side Note: In real-world problems, we have thousands and millions of inputs which makes gradient descent computationally expensive. That's why we split the input set into several shorter pieces (called mini-batch) of size B (called mini-batch size) inputs, and feed them one by one. This is called "Stochastic Gradient Descent". The process of feeding each mini-batch of size B to the network, back-propagating errors, and updating the parameters (weights and biases) is called an iteration.

We generally use Placeholders for inputs so that we can build the graph without depending on any real value. The only point is that you need to choose the proper size for the input. Here, we have a feed-forward neural network, and let's assume inputs of size 784 (similar to 28x28 images of MNIST data). The input placeholder can be written as:

In [8]:
# create the input placeholder
X = tf.placeholder(tf.float32, shape=[None, 784], name="X")


You might wonder what is shape=[None, 784]?!

Well, that's the tricky part! Read the above side note one more time. We need to feed B images of size 784 to the network in each training iteration. So the placeholder needs to be of shape=[B, 784]. Defining the placeholder shape as [None, 784] means that you can feed any number of images of size 784 (not B images necessarily). This is especially helpful in the evaluation time where you need to feed all validation or test images to the network and compute the performance on all of them.

Enough with the placeholder. Let's continue with the network parameters, W, and b. As explained in the Variable section above, they have to be defined as variables. Since in Tensorflow, gradient updates will be applied to the graph variables, by default. As mentioned, variables need to be initialized.

*Note: Generally, weights (W) are initialized randomly, in it's the simplest form from a normal distribution, say normal distribution with zero mean and standard deviation of 0.01. Biases (b) can be initialized as small constant values, such as 0.

Since the input dimension is 784 and we have 200 hidden units, the weight matrix will be of size [784, 200]. We also need 200 biases, one for each hidden unit. The code will be like:

In [9]:
# create weight matrix initialized randomely from N(0, 0.01)
weight_initer = tf.truncated_normal_initializer(mean=0.0, stddev=0.01)
W = tf.get_variable(name="W", dtype=tf.float32, shape=[784, 200], initializer=weight_initer)

# create bias vector of size 200, all initialized as zero
bias_initer =tf.constant(0., shape=[200], dtype=tf.float32)
b = tf.get_variable(name="b", dtype=tf.float32, initializer=bias_initer)


Now let's move on to the rectangle operations. We must multiply input X[None, 784] and weight matrix W[784, 200] which gives a tensor of size [None, 200], then add the bias vector b[200] and eventually pass the final tensor from a ReLU non-linearity:

In [10]:
# create MatMul node
x_w = tf.matmul(X, W, name="MatMul")
# create ReLU node
h = tf.nn.relu(x_w_b, name="ReLU")


Okay, we are all set. The created graph looks like this:

Fig. 4- Data flow graph of the neural network created in Tensorflow

But how can you visualize this graph? How did you create this figure?! That's the magic of Tensorboard. It's thoroughly explained in our next article.

Before closing it, let's run a session on this graph (using 100 images generated by random pixel values) and get the output of hidden units (h). The whole code will be like this:

In [11]:
# import the tensorflow library
import tensorflow as tf
import numpy as np

# create the input placeholder
X = tf.placeholder(tf.float32, shape=[None, 784], name="X")
weight_initer = tf.truncated_normal_initializer(mean=0.0, stddev=0.01)

# create network parameters
W = tf.get_variable(name="W", dtype=tf.float32, shape=[784, 200], initializer=weight_initer)
bias_initer =tf.constant(0., shape=[200], dtype=tf.float32)
b = tf.get_variable(name="b", dtype=tf.float32, initializer=bias_initer)

# create MatMul node
x_w = tf.matmul(X, W, name="MatMul")
# create ReLU node
h = tf.nn.relu(x_w_b, name="ReLU")

# Add an Op to initialize variables
init_op = tf.global_variables_initializer()

# launch the graph in a session
with tf.Session() as sess:
# initialize variables
sess.run(init_op)
# create the dictionary:
d = {X: np.random.rand(100, 784)}
# feed it to placeholder a via the dict
print(sess.run(h, feed_dict=d))


Running this code will print out h[100, 200] which are the outputs of 200 hidden units in response to 100 images; i.e. 200 features extracted from 100 images.

We'll continue constructing the loss function and creating the optimizer operations in the next articles. However, we need to learn Tensorboard first to use its amazing features in our neural network code.