Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python - Pandas - DataFrame - Explode single column into multiple boolean columns based on conditions

Good morning chaps,

Any pythonic way to explode a dataframe column into multiple columns with boolean flags, based on some condition (str.contains in this case)?

Let's say I have this:

Position Letter 
1        a      
2        b      
3        c      
4        b      
5        b

And I'd like to achieve this:

Position Letter is_a     is_b    is_C
1        a      TRUE     FALSE   FALSE
2        b      FALSE    TRUE    FALSE
3        c      FALSE    FALSE   TRUE
4        b      FALSE    TRUE    FALSE
5        b      FALSE    TRUE    FALSE 

Can do with a loop through 'abc' and explicitly creating new df columns, but wondering if some built-in method already exists in pandas. Number of possible values, and hence number of new columns is variable.

Thanks and regards.

like image 981
Trostis Avatar asked Nov 15 '17 12:11

Trostis


People also ask

How do I split a single column into multiple columns in pandas?

In Pandas, the apply() method can also be used to split one column values into multiple columns. The DataFrame. apply method() can execute a function on all values of single or multiple columns. Then inside that function, we can split the string value to multiple values.

How do I change a column value based on conditions in pandas?

You can replace values of all or selected columns based on the condition of pandas DataFrame by using DataFrame. loc[ ] property. The loc[] is used to access a group of rows and columns by label(s) or a boolean array. It can access and can also manipulate the values of pandas DataFrame.

How do you explode a column in pandas?

Pandas DataFrame: explode() functionThe explode() function is used to transform each element of a list-like to a row, replicating the index values. Exploded lists to rows of the subset columns; index will be duplicated for these rows. Raises: ValueError - if columns of the frame are not unique.


1 Answers

use Series.str.get_dummies():

In [31]: df.join(df.Letter.str.get_dummies())
Out[31]:
   Position Letter  a  b  c
0         1      a  1  0  0
1         2      b  0  1  0
2         3      c  0  0  1
3         4      b  0  1  0
4         5      b  0  1  0

or

In [32]: df.join(df.Letter.str.get_dummies().astype(bool))
Out[32]:
   Position Letter      a      b      c
0         1      a   True  False  False
1         2      b  False   True  False
2         3      c  False  False   True
3         4      b  False   True  False
4         5      b  False   True  False
like image 149
MaxU - stop WAR against UA Avatar answered Nov 01 '22 06:11

MaxU - stop WAR against UA