multilabel binarizer: float object not iterable

Question

I have below dataframe

df[['row_num','set_id']].head()

row_num     path_id_set
988681      [31672, 0]
988680      [31965, 0]
988679      [0, 78464]

I'm trying to use multilabel binarizer, but failing with error code float object not iterable

from sklearn.preprocessing import MultiLabelBinarizer
mlb = MultiLabelBinarizer()
mlb.fit_transform(df['set_id'].str.split(','))

TypeError: 'float' object is not iterable

jezrael · Accepted Answer

I think problem is missing values, you can use:

print (df)
   row_num     set_id
0   988681        NaN
1   988680  [31965,0]
2   988679  [0,78464]

from sklearn.preprocessing import MultiLabelBinarizer
mlb = MultiLabelBinarizer()

#create boolean mask matched non NaNs values
mask = df['set_id'].notnull()

#filter by boolean indexing
arr = mlb.fit_transform(df.loc[mask, 'set_id'].dropna().str.strip('[]').str.split(','))

#create DataFrame and add missing (NaN)s index values
df = (pd.DataFrame(arr, index=df.index[mask], columns=mlb.classes_)
               .reindex(df.index, fill_value=0))

print (df)
   0  31965  78464
0  0      0      0
1  1      1      0
2  1      0      1

multilabel binarizer: float object not iterable

Tags:

python

pandas

machine-learning

scikit-learn

Krishh

1 Answers

jezrael

Recent Activity

Donate For Us

multilabel binarizer: float object not iterable

Tags:

python

pandas

machine-learning

scikit-learn

Krishh

1 Answers

jezrael

Related questions

Recent Activity

Donate For Us