exploding a pandas dataframe column

Question

I have a Pandas Dataframe that looks something like this:

text = ["abcd", "efgh", "ijkl", "mnop", "qrst", "uvwx", "yz"]

labels = ["label_1, label_2", 
          "label_1, label_3, label_2", 
          "label_2, label_4", 
          "label_1, label_2, label_5", 
          "label_2, label_3", 
          "label_3, label_5, label_1, label_2", 
          "label_1, label_3"]

df = pd.DataFrame(dict(text=text, labels=labels))
df



   text                              labels
0  abcd                    label_1, label_2
1  efgh           label_1, label_3, label_2
2  ijkl                    label_2, label_4
3  mnop           label_1, label_2, label_5
4  qrst                    label_2, label_3
5  uvwx  label_3, label_5, label_1, label_2
6    yz                    label_1, label_3

I would like to format the dataframe into something like this:

text  label_1  label_2  label_3  label_4  label_5

abcd        1.0      1.0      0.0      0.0      0.0
efgh        1.0      1.0      1.0      0.0      0.0
ijkl        0.0      1.0      0.0      1.0      0.0
mnop        1.0      1.0      0.0      0.0      1.0
qrst        0.0      1.0      1.0      0.0      0.0
uvwx        1.0      1.0      1.0      0.0      1.0
yz          1.0      0.0      1.0      0.0      0.0

How can I accomplish this? (I know I can split the strings in the labels and convert them into lists by doing something like df.labels.str.split(",") but not sure as to how to proceed from there.

(so basically I'd like to convert those keywords in the labels columns into its own columns and fill in 1 whenever they appear as shown in expected output)

jpp · Accepted Answer

You can use pd.Series.str.get_dummies and combine with the text series:

dummies = df['labels'].str.replace(' ', '').str.get_dummies(',')
res = df['text'].to_frame().join(dummies)

print(res)

   text  label_1  label_2  label_3  label_4  label_5
0  abcd        1        1        0        0        0
1  efgh        1        1        1        0        0
2  ijkl        0        1        0        1        0
3  mnop        1        1        0        0        1
4  qrst        0        1        1        0        0
5  uvwx        1        1        1        0        1
6    yz        1        0        1        0        0

exploding a pandas dataframe column

Tags:

python

string

text

pandas

dataframe

ultron

1 Answers

jpp

Recent Activity

Donate For Us

exploding a pandas dataframe column

Tags:

python

string

text

pandas

dataframe

ultron

1 Answers

jpp

Related questions

Recent Activity

Donate For Us