I have a list of Python <code>dict</code>s each with the same keys, <pre class="prettyprint"><code>dict_keys= ['k1','k2','k3','k4','k5','k6'] # More like 30 keys in practice data = [] for i in range(20): # More like 3000 in practice data.append({k: np.random.randint(100) for k in dict_keys}) </code></pre> and would like to use it to create a corresponding Pandas dataframe with a subset of the keys. My current approach is to take each <code>dict</code> from the list one at a time and append it to the dataframe using <pre class="prettyprint"><code>df = pd.DataFrame(columns=['k1','k2','k5','k6']) for d in data: df = df.append({k: d[k] for k in list(df.columns)}, ignore_index=True) # In practice, there are some calculations on some of the values here </code></pre> but this is very slow (the actual list, and the dicts it contains, are both quite large). Is there a better, faster (and more idiomatic) method for iterating through a list of dictionaries and adding them as rows to a Pandas dataframe?

Simply pass <code>data</code> to <code>DataFrame</code>'s <code>__init__</code>, or to <code>DataFrame.from_records</code> (either would work). You might also want to set an index, e.g. <code>DataFrame.from_records(data, index = 'k1')</code>. If you need to also perform some calculations, it's usually easier and more convenient to do it on the <code>DataFrame</code>, after creating it. Leverage pandas!

Converting a list of dicts to a Pandas dataframe

Tags:

python

dictionary

pandas

dataframe

I have a list of Python dicts each with the same keys,

dict_keys= ['k1','k2','k3','k4','k5','k6'] # More like 30 keys in practice
data = []
for i in range(20): # More like 3000 in practice
    data.append({k: np.random.randint(100) for k in dict_keys})

and would like to use it to create a corresponding Pandas dataframe with a subset of the keys. My current approach is to take each dict from the list one at a time and append it to the dataframe using

df = pd.DataFrame(columns=['k1','k2','k5','k6'])
for d in data:
    df = df.append({k: d[k] for k in list(df.columns)}, ignore_index=True)
    # In practice, there are some calculations on some of the values here

but this is very slow (the actual list, and the dicts it contains, are both quite large).

Is there a better, faster (and more idiomatic) method for iterating through a list of dictionaries and adding them as rows to a Pandas dataframe?

885

asked Apr 26 '14 18:04

orome

1 Answers

Simply pass data to DataFrame's __init__, or to DataFrame.from_records (either would work).

You might also want to set an index, e.g. DataFrame.from_records(data, index = 'k1').

If you need to also perform some calculations, it's usually easier and more convenient to do it on the DataFrame, after creating it. Leverage pandas!

115

answered Sep 30 '22 16:09

shx2

Related questions
                            
                                How to html input to Flask?
                            
                                Python - getting list of numbers N to 0 [closed]
                            
                                Identifying verb tenses in python
                            
                                Finding Combinations to the provided Sum value
                            
                                Analyzing high WSGI/Response time data of django with mod_wsgi on NewRelic
                            
                                pip installing data files to the wrong place
                            
                                python.h not fond when trying to install gevent-socketio
                            
                                Pickle dump replaces current file data
                            
                                Python Django Encoding Error, Non-ASCII character '\xe5'
                            
                                Python counting through a number with >=
                            
                                Fastest way to copy columns from one DataFrame to another using pandas?
                            
                                Python decorator with Flask
                            
                                Python Tkinter refresh canvas
                            
                                How can I start an interactive python/ipython session from the middle of my python program?
                            
                                Python struct.calcsize length
                            
                                py.test: ImportError: No module named mysql
                            
                                Django - DateTimeField received a naive datetime
                            
                                scikit-learn roc_auc_score() returns accuracy values
                            
                                Multivariate Taylor approximation in sympy
                            
                                Ubuntu : Unable to correct problems, you have held broken packages

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With