I have a list of Python dict
s each with the same keys,
dict_keys= ['k1','k2','k3','k4','k5','k6'] # More like 30 keys in practice
data = []
for i in range(20): # More like 3000 in practice
data.append({k: np.random.randint(100) for k in dict_keys})
and would like to use it to create a corresponding Pandas dataframe with a subset of the keys. My current approach is to take each dict
from the list one at a time and append it to the dataframe using
df = pd.DataFrame(columns=['k1','k2','k5','k6'])
for d in data:
df = df.append({k: d[k] for k in list(df.columns)}, ignore_index=True)
# In practice, there are some calculations on some of the values here
but this is very slow (the actual list, and the dicts it contains, are both quite large).
Is there a better, faster (and more idiomatic) method for iterating through a list of dictionaries and adding them as rows to a Pandas dataframe?
When we create Dataframe from a list of dictionaries, matching keys will be the columns and corresponding values will be the rows of the Dataframe. If there are no matching values and columns in the dictionary, then the NaN value will be inserted into the resulted Dataframe.
We can create a pandas DataFrame object by using the python list of dictionaries. If we use a dictionary as data to the DataFrame function then we no need to specify the column names explicitly. Here we will create a DataFrame using a list of dictionaries, in the below example.
We can convert a dictionary to a pandas dataframe by using the pd. DataFrame. from_dict() class-method.
Simply pass data
to DataFrame
's __init__
, or to DataFrame.from_records
(either would work).
You might also want to set an index, e.g. DataFrame.from_records(data, index = 'k1')
.
If you need to also perform some calculations, it's usually easier and more convenient to do it on the DataFrame
, after creating it. Leverage pandas!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With