Here is MRE:
data = [
{'1':20},
{'1':10},
{'1':40},
{'1':14},
{'1':33}
]
What I am trying to do is loop through each dictionary and append each value to a column in a dataframe.
right now I am doing
import pandas as pd
lst = []
for item in data:
lst.append(item['1'])
df = pd.DataFrame({"col1":lst})
outputting:
col1
0 20
1 10
2 40
3 14
4 33
Yes this is what I want however I have over 1M dictionaries in a list. Is it most efficient way?
EDIT:
pd.DataFrame(data).rename(columns={'1':'col1'})
works perfectly for above case however what if data looks like this?
data = [
{'1':
{'value':20}},
{'1':
{'value':10}},
{'1':
{'value':40}},
{'1':
{'value':14}},
{'1':
{'value':33}}]
so I would use:
lst = []
for item in data:
lst.append(item['1']['value'])
df = pd.DataFrame({"col1":lst})
is there more efficient way for list of dictionary that contain dictionary?
Vectorization is always the first and best choice. You can convert the data frame to NumPy array or into dictionary format to speed up the iteration workflow. Iterating through the key-value pair of dictionaries comes out to be the fastest way with around 280x times speed up for 20 million records.
When we create Dataframe from a list of dictionaries, matching keys will be the columns and corresponding values will be the rows of the Dataframe. If there are no matching values and columns in the dictionary, then the NaN value will be inserted into the resulted Dataframe.
You can simply iterate over the range of length of list. In the outer loop, we use range and length function to create a list that we can iterate through. We use the index value to get each dictionary. In the inner loop, we use key variable to iterate through the current dictionary.
Append list of dictionary and series to a existing Pandas DataFrame in Python. In this article, we will discuss how values from a list of dictionaries or Pandas Series can be appended to an already existing pandas dataframe. For this purpose append() function of pandas, the module is sufficient.
One idea is pass data
to DataFrame
cosntructor and then use rename
:
df = pd.DataFrame(data).rename(columns={'1':'col1'})
print (df)
col1
0 20
1 10
2 40
3 14
4 33
If is necessary filtering use list comprehension and add parameter columns
:
df = pd.DataFrame([x['1'] for x in data], columns=['col1'])
print (df)
col1
0 20
1 10
2 40
3 14
4 33
EDIT: For new data use:
data = [
{'1':
{'value':20}},
{'1':
{'value':10}},
{'1':
{'value':40}},
{'1':
{'value':14}},
{'1':
{'value':33}}]
df = pd.DataFrame([x['1']['value'] for x in data], columns=['col1'])
print (df)
col1
0 20
1 10
2 40
3 14
4 33
Or:
df = pd.DataFrame([x['1'] for x in data]).rename(columns={'value':'col1'})
print (df)
col1
0 20
1 10
2 40
3 14
4 33
@jezrael's answer is correct but to be more specific with col
:
df = pd.DataFrame(data)
print(df.add_prefix('col'))
Output:
col1
0 20
1 10
2 40
3 14
4 33
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With