## Issue

I am very new to python and TensorFlow, recent days I met a problem when I study “MNIST For ML Beginners”(https://www.tensorflow.org/get_started/mnist/beginners).

In this tutorial, we use `y = tf.nn.softmax(tf.matmul(X, W) + b)`

to get our outputs.

My question is, for example, X is a [100,784] matrix, and W is [784,10] matrix, b is a [10] tensor (like a [10,1] matrix?), after we called tf.matmul(X, W) we will get a [100,10] matrix. here is my question, how can a [100,10] matrix add a b[10] tensor here? It does not make any sense to me.

I know why there are biases and I know why the biases need to be added. But I just do not know how the “+” operator worked in this problem.

## Solution

This is because of a concept called **broadcasting** which can be found in both Numpy and TensorFlow. At a high level, this is how it works:

Suppose you’re working with an op that supports broadcasting (eg + or *) and has 2 input tensors, X and Y. In order to assess whether the shapes of X and Y are compatible, the op will assess the dimensions in pairs starting at the right. Dimensions are considered compatible if:

- They are equal
- One of them is 1
- One of them is missing

Applying these rules to the add operation (+) and your inputs of shape [100, 10] and [10]:

- 10 and 10 are compatible
- 100 and ‘missing’ are compatible

If the shapes are compatible and one of the dimensions of an input is 1 or missing, the op will essentially tile that input to match the shape of the other input.

In your example, the add op will effectively tile Y of shape [10] to shape [100, 10] before doing the addition.

See the Numpy documentation on broadcasting for more detailed information (https://docs.scipy.org/doc/numpy-1.13.0/user/basics.broadcasting.html)

Answered By – Avishkar Bhoopchand

Answer Checked By – Robin (Easybugfix Admin)