I have below dataframe
df[['row_num','set_id']].head()
row_num path_id_set
988681 [31672, 0]
988680 [31965, 0]
988679 [0, 78464]
I'm trying to use multilabel binarizer, but failing with error code float object not iterable
from sklearn.preprocessing import MultiLabelBinarizer
mlb = MultiLabelBinarizer()
mlb.fit_transform(df['set_id'].str.split(','))
TypeError: 'float' object is not iterable
I think problem is missing values, you can use:
print (df)
row_num set_id
0 988681 NaN
1 988680 [31965,0]
2 988679 [0,78464]
from sklearn.preprocessing import MultiLabelBinarizer
mlb = MultiLabelBinarizer()
#create boolean mask matched non NaNs values
mask = df['set_id'].notnull()
#filter by boolean indexing
arr = mlb.fit_transform(df.loc[mask, 'set_id'].dropna().str.strip('[]').str.split(','))
#create DataFrame and add missing (NaN)s index values
df = (pd.DataFrame(arr, index=df.index[mask], columns=mlb.classes_)
.reindex(df.index, fill_value=0))
print (df)
0 31965 78464
0 0 0 0
1 1 1 0
2 1 0 1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With