Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas: Convert lists within a single column to multiple columns

I have a dataframe that includes columns with multiple attributes separated by commas:

df = pd.DataFrame({'id': [1,2,3], 'labels' : ["a,b,c", "c,a", "d,a,b"]})

   id   labels
0   1   a,b,c
1   2   c,a
2   3   d,a,b

(I know this isn't an ideal situation, but the data originates from an external source.) I want to turn the multi-attribute columns into multiple columns, one for each label, so that I can treat them as categorical variables. Desired output:

    id  a       b       c       d   
0    1  True    True    True    False   
1    2  True    False   True    False   
2    3  True    True    False   True

I can get the set of all possible attributes ([a,b,c,d]) fairly easily, but cannot figure out a way to determine whether a given row has a particular attribute without row-by-row iteration for each attribute. Is there a better way to do this?

like image 944
Silenced Temporarily Avatar asked May 16 '16 20:05

Silenced Temporarily


1 Answers

You can use get_dummies, cast 1 and 0 to boolean by astype and last concat column id:

print df['labels'].str.get_dummies(sep=',').astype(bool)
      a      b      c      d
0  True   True   True  False
1  True  False   True  False
2  True   True  False   True

print pd.concat([df.id, df['labels'].str.get_dummies(sep=',').astype(bool)], axis=1)

   id     a      b      c      d
0   1  True   True   True  False
1   2  True  False   True  False
2   3  True   True  False   True
like image 130
jezrael Avatar answered Oct 21 '22 11:10

jezrael