Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Creating new binary columns from single string column in pandas

Tags:

python

pandas

I've seen this before and simply can't remember the function.

Say I have a column "Speed" and each row has 1 of these values:

'Slow', 'Normal', 'Fast'

How do I create a new dataframe with all my rows except the column "Speed" which is now 3 columns: "Slow" "Normal" and "Fast" which has all of my rows labeled with a 1 in whichever column the old "Speed" column was. So if I had:

print df['Speed'].ix[0]
> 'Normal'

I would not expect this:

print df['Normal'].ix[0]
>1

print df['Slow'].ix[0]
>0
like image 381
user1610719 Avatar asked Mar 24 '14 22:03

user1610719


People also ask

How do I create a new column based on another column value in pandas?

Using apply() method If you need to apply a method over an existing column in order to compute some values that will eventually be added as a new column in the existing DataFrame, then pandas. DataFrame. apply() method should do the trick.

How do I split one column into multiple columns in pandas?

split() function is used to break up single column values into multiple columns based on a specified separator or delimiter. The Series. str. split() function is similar to the Python string split() method, but split() method works on the all Dataframe columns, whereas the Series.

What is the difference between LOC () and ILOC ()?

The main distinction between the two methods is: loc gets rows (and/or columns) with particular labels. iloc gets rows (and/or columns) at integer locations.


2 Answers

You can do this easily with pd.get_dummies (docs):

In [37]: df = pd.DataFrame(['Slow', 'Normal', 'Fast', 'Slow'], columns=['Speed'])

In [38]: df
Out[38]:
    Speed
0    Slow
1  Normal
2    Fast
3    Slow

In [39]: pd.get_dummies(df['Speed'])
Out[39]:
   Fast  Normal  Slow
0     0       0     1
1     0       1     0
2     1       0     0
3     0       0     1
like image 83
joris Avatar answered Sep 22 '22 10:09

joris


Here is one solution:

df['Normal'] = df.Speed.apply(lambda x: 1 if x == "Normal" else 0)
df['Slow'] = df.Speed.apply(lambda x: 1 if x == "Slow" else 0)
df['Fast'] = df.Speed.apply(lambda x: 1 if x == "Fast" else 0)
like image 45
aha Avatar answered Sep 24 '22 10:09

aha