Machine Learning: Why xW+b instead of Wx+b?

Question

I started to learn Machine Learning. Now i tried to play around with tensorflow.

Often i see examples like this:

pred = tf.add(tf.mul(X, W), b)

I also saw such a line in a plain numpy implementation. Why is always x*W+b used instead of W*x+b? Is there an advantage if matrices are multiplied in this way? I see that it is possible (if X, W and b are transposed), but i do not see an advantage. In school in the math class we always only used Wx+b.

Thank you very much

Rob · Accepted Answer

This is the reason:

By default w is a vector of weights and in maths a vector is considered a column, not a row.
X is a collection of data. And it is a matrix nxd (where n is the number of data and d the number of features) (upper case X is a matrix n x d and lower case only 1 data 1 x d matrix).

To correctly multiply both and use the correct weight in the correct feature you must use X*w+b:

With X*w you mutliply every feature by its corresponding weight and by adding b you add the bias term on every prediction.

If you multiply w * X you multipy a (1 x d)*(n x d) and it has no sense.

Yugnaynehc · Answer

I'm also confused with this. I guess this may be a dimension matter. For a n*m-dimension matrix W and a n-dimension vector x, using xW+b can be easily viewed as that maping a n-dimension feature to a m-dimension feature, i.e., you can easily think W as a n-dimension -> m-dimension operation, where as Wx+b (x must be m-dimension vector now) becomes a m-dimension -> n-dimension operation, which looks less comfortable in my opinion. :D

Machine Learning: Why xW+b instead of Wx+b?

Tags:

machine-learning

tensorflow

Kevin Meier

2 Answers

Rob

Yugnaynehc

Recent Activity

Donate For Us

Machine Learning: Why xW+b instead of Wx+b?

Tags:

machine-learning

tensorflow

Kevin Meier

2 Answers

Rob

Yugnaynehc

Related questions

Recent Activity

Donate For Us