Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas, DataFrame: Splitting one column into multiple columns

I have the following DataFrame. I am wondering whether it is possible to break the data column into multiple columns. E.g., from this:

ID       Date       data
6       21/05/2016  A: 7, B: 8, C: 5, D: 5, A: 8
6       21/01/2014  B: 5, C: 5, D: 7
6       02/04/2013  A: 4, D:7
7       05/06/2014  C: 25
7       12/08/2014  D: 20
8       18/04/2012  A: 2, B: 3, C: 3, E: 5, B: 4
8       21/03/2012  F: 6, B: 4, F: 5, D: 6, B: 4  

into this:

ID       Date       data                            A   B   C   D   E   F
6       21/05/2016  A: 7, B: 8, C: 5, D: 5, A: 8    15  8   5   5   0   0
6       21/01/2014  B: 5, C: 5, D: 7                0   5   5   7   0   0     
6       02/04/2013  B: 4, D: 7, B: 6                0   10  0   7   0   0
7       05/06/2014  C: 25                           0   0   25  0   0   0
7       12/08/2014  D: 20                           0   0   0   20  0   0   
8       18/04/2012  A: 2, B: 3, C: 3, E: 5, B: 4    2   7   3   0   5   0
8       21/03/2012  F: 6, B: 4, F: 5, D: 6, B: 4    0   8   0   6   0   11

I have tried this Split strings in tuples into columns, in Pandas, and this pandas: How do I split text in a column into multiple rows? but they are not working in my case.

EDIT

There is a bit of complexity the data column has duplicate values for example in first row A is repeated, and therefore these values are summed up under the A column (please see second table).

like image 276
user1124825 Avatar asked Jul 14 '16 20:07

user1124825


People also ask

How do I split a column in a pandas Dataframe?

Split Pandas DataFrame column by Mutiple Delimiter In this example, we are using the str.split () method to split the “Mark ” column into multiple columns by using this multiple delimiter (- _; / %) The “ Mark ” column will be split as “ Mark “ and “ Mark _”. 3. Split column by Multiple delimiters no digit check

How to split a list column of lists into multiple columns?

To split a pandas column of lists into multiple columns, create a new dataframe by applying the tolist () function to the column. The following is the syntax. import pandas as pd # assuming 'Col' is the column you want to split

How to split a Dataframe with missing data in Python?

We can use Pandas string method .str.split (',') in order to split the values into lists of lists. If you have missing data you need to ensure that you default it by empty list by .fillna (' []'): As you can see the result DataFrame has 45 columns.

How do I split a column in a series in Python?

We will use the Series.str.split () function to separate the Number column and pass the - in split () method . Make sure you pass True to the expand keyword. This example will split every value of series (Number) by -.


1 Answers

df = pd.DataFrame([
        [6, "a: 1, b: 2"],
        [6, "a: 1, b: 2"],
        [6, "a: 1, b: 2"],
        [6, "a: 1, b: 2"],
    ], columns=['ID', 'dictionary'])

def str2dict(s):
    split = s.strip().split(',')
    d = {}
    for pair in split:
        k, v = [_.strip() for _ in pair.split(':')]
        d[k] = v
    return d

df.dictionary.apply(str2dict).apply(pd.Series)

enter image description here

Or:

pd.concat([df, df.dictionary.apply(str2dict).apply(pd.Series)], axis=1)

enter image description here

like image 183
piRSquared Avatar answered Oct 21 '22 08:10

piRSquared