I've seen this before and simply can't remember the function.
Say I have a column "Speed" and each row has 1 of these values:
'Slow', 'Normal', 'Fast'
How do I create a new dataframe with all my rows except the column "Speed" which is now 3 columns: "Slow" "Normal" and "Fast" which has all of my rows labeled with a 1 in whichever column the old "Speed" column was. So if I had:
print df['Speed'].ix[0]
> 'Normal'
I would not expect this:
print df['Normal'].ix[0]
>1
print df['Slow'].ix[0]
>0
Using apply() method If you need to apply a method over an existing column in order to compute some values that will eventually be added as a new column in the existing DataFrame, then pandas. DataFrame. apply() method should do the trick.
split() function is used to break up single column values into multiple columns based on a specified separator or delimiter. The Series. str. split() function is similar to the Python string split() method, but split() method works on the all Dataframe columns, whereas the Series.
The main distinction between the two methods is: loc gets rows (and/or columns) with particular labels. iloc gets rows (and/or columns) at integer locations.
You can do this easily with pd.get_dummies
(docs):
In [37]: df = pd.DataFrame(['Slow', 'Normal', 'Fast', 'Slow'], columns=['Speed'])
In [38]: df
Out[38]:
Speed
0 Slow
1 Normal
2 Fast
3 Slow
In [39]: pd.get_dummies(df['Speed'])
Out[39]:
Fast Normal Slow
0 0 0 1
1 0 1 0
2 1 0 0
3 0 0 1
Here is one solution:
df['Normal'] = df.Speed.apply(lambda x: 1 if x == "Normal" else 0)
df['Slow'] = df.Speed.apply(lambda x: 1 if x == "Slow" else 0)
df['Fast'] = df.Speed.apply(lambda x: 1 if x == "Fast" else 0)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With