Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

python pandas: split comma-separated column into new columns - one per value

I have a dataframe like this:

data = np.array([["userA","event2, event3"],
            ['userB',"event3, event4"],
            ['userC',"event2"]])

data = pd.DataFrame(data)

        0         1
0   userA   "event2, event3"
1   userB   "event3, event4"
2   userC   "event2"

now I would like to get a dataframe like this:

       0    event2      event3      event4
0   userA     1           1
1   userB                 1           1
2   userC     1

can anybody help please?

like image 563
funkfux Avatar asked Feb 16 '18 08:02

funkfux


People also ask

How do you split data in one column into multiple columns in Python?

split() function is used to break up single column values into multiple columns based on a specified separator or delimiter. The Series. str. split() function is similar to the Python string split() method, but split() method works on the all Dataframe columns, whereas the Series.

How do you split comma separated values into columns in Python?

Split column by delimiter into multiple columnsApply the pandas series str. split() function on the “Address” column and pass the delimiter (comma in this case) on which you want to split the column. Also, make sure to pass True to the expand parameter.

How do you split data into columns in Python?

split() Pandas provide a method to split string around a passed separator/delimiter. After that, the string can be stored as a list in a series or it can also be used to create multiple column data frames from a single separated string.

How do I split a column into multiple columns in list in pandas?

To split a pandas column of lists into multiple columns, create a new dataframe by applying the tolist() function to the column. The following is the syntax. You can also pass the names of new columns resulting from the split as a list.


1 Answers

It seems you need get_dummies with replace 0 to empty strings:

df = data[[0]].join(data[1].str.get_dummies(', ').replace(0, ''))
print (df)
       0 event2 event3 event4
0  userA      1      1       
1  userB             1      1
2  userC      1              

Detail:

print (data[1].str.get_dummies(', '))
   event2  event3  event4
0       1       1       0
1       0       1       1
2       1       0       0
like image 141
jezrael Avatar answered Sep 21 '22 15:09

jezrael