Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to mix onehotencoding and bagofword

I am looking for a model, a sort of onehotencoding, that could mix columns of the same category in a single vector.

  • Having the Data [5,8,1,3]
  • it will gives me : [0,1,0,1,0,1,0,0,1,0,0,0]

with an arbitrary size set at 12.

I looked at bagofword but I did not find how to set the vector size independently of the input data.

If somebody can gives me some clues, it will be find.

like image 404
Harvey Avatar asked Mar 02 '23 14:03

Harvey


1 Answers

Note that Bag-of-words models are used when dealing with text. For this simpler task you can just use np.bincount and specify a minlength:

l = [5,8,1,3]

np.bincount(l, minlength=12)
# array([0., 1., 0., 1., 0., 1., 0., 0., 1., 0., 0., 0.])
like image 142
yatu Avatar answered Mar 05 '23 18:03

yatu