Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Converting a list of dicts to a Pandas dataframe

I have a list of Python dicts each with the same keys,

dict_keys= ['k1','k2','k3','k4','k5','k6'] # More like 30 keys in practice
data = []
for i in range(20): # More like 3000 in practice
    data.append({k: np.random.randint(100) for k in dict_keys}) 

and would like to use it to create a corresponding Pandas dataframe with a subset of the keys. My current approach is to take each dict from the list one at a time and append it to the dataframe using

df = pd.DataFrame(columns=['k1','k2','k5','k6'])
for d in data:
    df = df.append({k: d[k] for k in list(df.columns)}, ignore_index=True)
    # In practice, there are some calculations on some of the values here

but this is very slow (the actual list, and the dicts it contains, are both quite large).

Is there a better, faster (and more idiomatic) method for iterating through a list of dictionaries and adding them as rows to a Pandas dataframe?

like image 885
orome Avatar asked Apr 26 '14 18:04

orome


People also ask

When we create DataFrame from list of dictionaries to dictionary keys will become?

When we create Dataframe from a list of dictionaries, matching keys will be the columns and corresponding values will be the rows of the Dataframe. If there are no matching values and columns in the dictionary, then the NaN value will be inserted into the resulted Dataframe.

Can we create DataFrame from list and dictionary in Python?

We can create a pandas DataFrame object by using the python list of dictionaries. If we use a dictionary as data to the DataFrame function then we no need to specify the column names explicitly. Here we will create a DataFrame using a list of dictionaries, in the below example.

How do you save a dictionary to a DataFrame?

We can convert a dictionary to a pandas dataframe by using the pd. DataFrame. from_dict() class-method.


1 Answers

Simply pass data to DataFrame's __init__, or to DataFrame.from_records (either would work).

You might also want to set an index, e.g. DataFrame.from_records(data, index = 'k1').

If you need to also perform some calculations, it's usually easier and more convenient to do it on the DataFrame, after creating it. Leverage pandas!

like image 115
shx2 Avatar answered Sep 30 '22 16:09

shx2