Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Machine Learning: Why xW+b instead of Wx+b?

I started to learn Machine Learning. Now i tried to play around with tensorflow.

Often i see examples like this:

pred = tf.add(tf.mul(X, W), b)

I also saw such a line in a plain numpy implementation. Why is always x*W+b used instead of W*x+b? Is there an advantage if matrices are multiplied in this way? I see that it is possible (if X, W and b are transposed), but i do not see an advantage. In school in the math class we always only used Wx+b.

Thank you very much

like image 424
Kevin Meier Avatar asked Nov 16 '16 21:11

Kevin Meier


2 Answers

This is the reason:

  • By default w is a vector of weights and in maths a vector is considered a column, not a row.

  • X is a collection of data. And it is a matrix nxd (where n is the number of data and d the number of features) (upper case X is a matrix n x d and lower case only 1 data 1 x d matrix).

To correctly multiply both and use the correct weight in the correct feature you must use X*w+b:

  • With X*w you mutliply every feature by its corresponding weight and by adding b you add the bias term on every prediction.

If you multiply w * X you multipy a (1 x d)*(n x d) and it has no sense.

like image 166
Rob Avatar answered Nov 05 '22 01:11

Rob


I'm also confused with this. I guess this may be a dimension matter. For a n*m-dimension matrix W and a n-dimension vector x, using xW+b can be easily viewed as that maping a n-dimension feature to a m-dimension feature, i.e., you can easily think W as a n-dimension -> m-dimension operation, where as Wx+b (x must be m-dimension vector now) becomes a m-dimension -> n-dimension operation, which looks less comfortable in my opinion. :D

like image 24
Yugnaynehc Avatar answered Nov 05 '22 03:11

Yugnaynehc