Pandas, DataFrame: Splitting one column into multiple columns

Tags:

I have the following DataFrame. I am wondering whether it is possible to break the data column into multiple columns. E.g., from this:

ID       Date       data
6       21/05/2016  A: 7, B: 8, C: 5, D: 5, A: 8
6       21/01/2014  B: 5, C: 5, D: 7
6       02/04/2013  A: 4, D:7
7       05/06/2014  C: 25
7       12/08/2014  D: 20
8       18/04/2012  A: 2, B: 3, C: 3, E: 5, B: 4
8       21/03/2012  F: 6, B: 4, F: 5, D: 6, B: 4

into this:

ID       Date       data                            A   B   C   D   E   F
6       21/05/2016  A: 7, B: 8, C: 5, D: 5, A: 8    15  8   5   5   0   0
6       21/01/2014  B: 5, C: 5, D: 7                0   5   5   7   0   0     
6       02/04/2013  B: 4, D: 7, B: 6                0   10  0   7   0   0
7       05/06/2014  C: 25                           0   0   25  0   0   0
7       12/08/2014  D: 20                           0   0   0   20  0   0   
8       18/04/2012  A: 2, B: 3, C: 3, E: 5, B: 4    2   7   3   0   5   0
8       21/03/2012  F: 6, B: 4, F: 5, D: 6, B: 4    0   8   0   6   0   11

I have tried this Split strings in tuples into columns, in Pandas, and this pandas: How do I split text in a column into multiple rows? but they are not working in my case.

EDIT

There is a bit of complexity the data column has duplicate values for example in first row A is repeated, and therefore these values are summed up under the A column (please see second table).

276

asked Jul 14 '16 20:07

user1124825

1 Answers

df = pd.DataFrame([
        [6, "a: 1, b: 2"],
        [6, "a: 1, b: 2"],
        [6, "a: 1, b: 2"],
        [6, "a: 1, b: 2"],
    ], columns=['ID', 'dictionary'])

def str2dict(s):
    split = s.strip().split(',')
    d = {}
    for pair in split:
        k, v = [_.strip() for _ in pair.split(':')]
        d[k] = v
    return d

df.dictionary.apply(str2dict).apply(pd.Series)

enter image description here

Or:

pd.concat([df, df.dictionary.apply(str2dict).apply(pd.Series)], axis=1)

enter image description here

183

answered Oct 21 '22 08:10

piRSquared

Related questions
                            
                                Calculating the number of specific consecutive equal values in a vectorized way in pandas
                            
                                SpooledTemporaryFile: units of maximum (in-memory) size?
                            
                                Difference between using train_test_split and cross_val_score in sklearn.cross_validation
                            
                                Plotting a imshow() image in 3d in matplotlib
                            
                                Anaconda python not available from sudo
                            
                                How to get value from a theano tensor variable backed by a shared variable?
                            
                                Remove leading NaN in pandas
                            
                                Python - list comprehension in this case is efficient?
                            
                                /usr/local/bin/python: No module named pip
                            
                                Bulk Partial Upsert in Elasticseach with python
                            
                                Django query expression for calculated fields that require conditions and casting
                            
                                Numpy: Check if float array contains whole numbers
                            
                                Django ORM - confusion about Router.allow_relation()
                            
                                Purpose of pool.join, pool.close in multiprocessing?
                            
                                Multiple pipelines that merge within a sklearn Pipeline?
                            
                                How to use Python to read one column from Excel file?
                            
                                Drawing phase space trajectories with arrows in matplotlib
                            
                                How do I set label for an already plotted line in matplotlib?
                            
                                How can I get an oauth2 access_token using Python
                            
                                multithreading for data from dataframe pandas

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Pandas, DataFrame: Splitting one column into multiple columns

Tags:

python

pandas

dataframe

user1124825

People also ask

1 Answers

piRSquared

Recent Activity

Donate For Us