I have the following csv with first row as header: <pre class="prettyprint"><code>id,data a,"{'1': 0.7778, '3': 0.5882, '2': 0.9524, '4': 0.5556}" b,"{'1': 0.7778, '3': 0.5, '2': 0.7059, '4': 0.2222}" c,"{'1': 0.8182, '3': 0.2609, '2': 0.5882}" </code></pre> I need to get to something like this <pre class="prettyprint"><code>id 1 2 3 4 a 0.7778 0.9524 0.5882 0.5556 b 0.7778 0.7059 0.5 0.2222 c 0.8182 0.5882 0.2609 NaN </code></pre> where the keys of the dictionary are the columns. How can I do this using pandas?

You can do this with Python's <code>ast</code> module: <pre class="prettyprint"><code>import ast import pandas as pd df = pd.read_csv('/path/to/your.csv') dict_df = pd.DataFrame([ast.literal_eval(i) for i in df.data.values]) >>> dict_df 1 2 3 4 0 0.7778 0.9524 0.5882 0.5556 1 0.7778 0.7059 0.5000 0.2222 2 0.8182 0.5882 0.2609 NaN df = df.drop('data',axis=1) final_df = pd.concat([df,dict_df],axis=1) >>> final_df id 1 2 3 4 0 a 0.7778 0.9524 0.5882 0.5556 1 b 0.7778 0.7059 0.5000 0.2222 2 c 0.8182 0.5882 0.2609 NaN </code></pre>

Split a pandas column of dictionaries into multiple columns

Tags:

python

pandas

I have the following csv with first row as header:

id,data
a,"{'1': 0.7778, '3': 0.5882, '2': 0.9524, '4': 0.5556}"  
b,"{'1': 0.7778, '3': 0.5, '2': 0.7059, '4': 0.2222}"  
c,"{'1': 0.8182, '3': 0.2609, '2': 0.5882}"

I need to get to something like this

id      1      2      3      4  
 a 0.7778 0.9524 0.5882 0.5556
 b 0.7778 0.7059 0.5    0.2222
 c 0.8182 0.5882 0.2609    NaN

where the keys of the dictionary are the columns.

How can I do this using pandas?

272

asked Jul 30 '16 05:07

frmo

1 Answers

You can do this with Python's ast module:

import ast
import pandas as pd

df = pd.read_csv('/path/to/your.csv')
dict_df = pd.DataFrame([ast.literal_eval(i) for i in df.data.values])

>>> dict_df
        1       2       3       4
0  0.7778  0.9524  0.5882  0.5556
1  0.7778  0.7059  0.5000  0.2222
2  0.8182  0.5882  0.2609     NaN

df = df.drop('data',axis=1)
final_df = pd.concat([df,dict_df],axis=1)

>>> final_df
  id       1       2       3       4
0  a  0.7778  0.9524  0.5882  0.5556
1  b  0.7778  0.7059  0.5000  0.2222
2  c  0.8182  0.5882  0.2609     NaN

answered Oct 01 '22 05:10

mechanical_meat

Related questions
                            
                                Multiple "where clauses" in endpoint query string parameters
                            
                                3D plot aspect ratio [matplotlib]
                            
                                in IntelliJ, showing "No module named xxx", but "xxx" is actually installed in my system
                            
                                Converting words with String upper() does not work for certain letters?
                            
                                Share data across Flask view functions
                            
                                Should I use protocol or streams in asyncio?
                            
                                How to efficiently compare rows in a pandas DataFrame?
                            
                                Plotting Dataframe column - datetime
                            
                                Regular expression to search for plural or singular of specific python
                            
                                I want to match money amount with regex for indian currency without commas
                            
                                Pandas Dataframe.to_csv decimal=',' doesn't work
                            
                                Scrapy gets NoneType Error when using Privoxy Proxy for Tor
                            
                                python logging - message not showing up in child
                            
                                How to pass Variable from Python to VBA Sub
                            
                                Pandas.read_excel: Accessing the home directory
                            
                                python, shapely: How to determine if two polygons cross each other, while allowing their edges to overlap
                            
                                How to filter a pandas series with a datetime index on the quarter and year
                            
                                Adapting binary stacking example to multiclass
                            
                                What is the standard docstring for a django model metaclass?
                            
                                When/How does an anonymous file object close?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With