I have a column, 'col2', that has a list of strings. The current code I have is too slow, there's about 2000 unique strings (the letters in the example below), and 4000 rows. Ending up as 2000 columns and 4000 rows.
In [268]: df.head()
Out[268]:
col1 col2
0 6 A,B
1 15 C,G,A
2 25 B
Is there a fast way to make this in a get dummies format? Where each string has it's own column and in each string's column there is a 0 or 1 if it that row has that string in col2.
In [268]: def get_list(df):
d = []
for row in df.col2:
row_list = row.split(',')
for string in row_list:
if string not in d:
d.append(string)
return d
df_list = get_list(df)
def make_cols(df, lst):
for string in lst:
df[string] = 0
return df
df = make_cols(df, df_list)
for idx in range(0, len(df['col2'])):
row_list = df['col2'].iloc[idx].split(',')
for string in row_list:
df[string].iloc[idx]+= 1
Out[113]:
col1 col2 A B C G
0 6 A,B 1 1 0 0
1 15 C,G,A 1 0 1 1
2 25 B 0 1 0 0
This is my current code for it but it's too slow.
Thanks you any help!
drop_first. The drop_first parameter specifies whether or not you want to drop the first category of the categorical variable you're encoding. By default, this is set to drop_first = False . This will cause get_dummies to create one dummy variable for every level of the input categorical variable.
One-hot Encoder is a popular feature encoding strategy that performs similar to pd. get_dummies() with added advantages. It encodes a nominal or categorical feature by assigning one binary column per category per categorical feature. Scikit-learn comes with the implementation of the one-hot encoder.
get_dummies() is used for data manipulation. It converts categorical data into dummy or indicator variables.
We can create dummy variables in python using get_dummies() method. Parameters: data= input data i.e. it includes pandas data frame. list .
You can use:
>>> df['col2'].str.get_dummies(sep=',')
A B C G
0 1 1 0 0
1 1 0 1 1
2 0 1 0 0
To join the Dataframes:
>>> pd.concat([df, df['col2'].str.get_dummies(sep=',')], axis=1)
col1 col2 A B C G
0 6 A,B 1 1 0 0
1 15 C,G,A 1 0 1 1
2 25 B 0 1 0 0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With