Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

multilabel binarizer: float object not iterable

I have below dataframe

df[['row_num','set_id']].head()

row_num     path_id_set
988681      [31672, 0]
988680      [31965, 0]
988679      [0, 78464]

I'm trying to use multilabel binarizer, but failing with error code float object not iterable

from sklearn.preprocessing import MultiLabelBinarizer
mlb = MultiLabelBinarizer()
mlb.fit_transform(df['set_id'].str.split(','))

TypeError: 'float' object is not iterable
like image 589
Krishh Avatar asked Mar 15 '26 23:03

Krishh


1 Answers

I think problem is missing values, you can use:

print (df)
   row_num     set_id
0   988681        NaN
1   988680  [31965,0]
2   988679  [0,78464]

from sklearn.preprocessing import MultiLabelBinarizer
mlb = MultiLabelBinarizer()

#create boolean mask matched non NaNs values
mask = df['set_id'].notnull()

#filter by boolean indexing
arr = mlb.fit_transform(df.loc[mask, 'set_id'].dropna().str.strip('[]').str.split(','))

#create DataFrame and add missing (NaN)s index values
df = (pd.DataFrame(arr, index=df.index[mask], columns=mlb.classes_)
               .reindex(df.index, fill_value=0))

print (df)
   0  31965  78464
0  0      0      0
1  1      1      0
2  1      0      1
like image 195
jezrael Avatar answered Mar 18 '26 13:03

jezrael



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!