Efficient way of looping through list of dictionaries and appending items into column in dataframe

Tags:

python

pandas

Here is MRE:

data = [
    {'1':20},
    {'1':10},
    {'1':40},
    {'1':14},
    {'1':33}
]

What I am trying to do is loop through each dictionary and append each value to a column in a dataframe.

right now I am doing

import pandas as pd
lst = []
for item in data:
    lst.append(item['1'])

df = pd.DataFrame({"col1":lst})

outputting:

Yes this is what I want however I have over 1M dictionaries in a list. Is it most efficient way?

EDIT: pd.DataFrame(data).rename(columns={'1':'col1'}) works perfectly for above case however what if data looks like this?

data = [
    {'1':
     {'value':20}},
    {'1':
     {'value':10}},
    {'1':
      {'value':40}},
    {'1':
      {'value':14}},
    {'1':
      {'value':33}}]

so I would use:

lst = []
for item in data:
    lst.append(item['1']['value'])

df = pd.DataFrame({"col1":lst})

is there more efficient way for list of dictionary that contain dictionary?

765

asked Dec 18 '19 06:12

haneulkim

2 Answers

One idea is pass data to DataFrame cosntructor and then use rename:

df = pd.DataFrame(data).rename(columns={'1':'col1'})
print (df)
   col1
0    20
1    10
2    40
3    14
4    33

If is necessary filtering use list comprehension and add parameter columns:

df = pd.DataFrame([x['1'] for x in data], columns=['col1'])
print (df)
   col1
0    20
1    10
2    40
3    14
4    33

EDIT: For new data use:

data = [
    {'1':
     {'value':20}},
    {'1':
     {'value':10}},
    {'1':
      {'value':40}},
    {'1':
      {'value':14}},
    {'1':
      {'value':33}}]

df = pd.DataFrame([x['1']['value'] for x in data], columns=['col1'])
print (df)
   col1
0    20
1    10
2    40
3    14
4    33

Or:

df = pd.DataFrame([x['1'] for x in data]).rename(columns={'value':'col1'})
print (df)
   col1
0    20
1    10
2    40
3    14
4    33

151

answered Oct 18 '22 22:10

jezrael

@jezrael's answer is correct but to be more specific with col:

df = pd.DataFrame(data)
print(df.add_prefix('col'))

Output:

answered Oct 18 '22 23:10

U12-Forward

Related questions
                            
                                Groupby search first and last True values
                            
                                TensorFlow tf.data.Dataset and bucketing
                            
                                requirements.txt - How to mark alternative packages
                            
                                Python Click: Multiple Key Value Pair Arguments
                            
                                Running/Debugging Pycharm Python Scripts with remote Docker Machine
                            
                                How to do a polynomial fit with fixed points in 3D
                            
                                Jinja2 check if value exists in list of dictionaries
                            
                                How to solve "Error connecting to SMTP host: [Errno 10061] No connection could be made because the target machine actively refused it''?
                            
                                Implementing an “infinite loop” Dataset & DataLoader in PyTorch
                            
                                How to get functools.lru_cache to return new instances?
                            
                                Start async task now, await later
                            
                                Finding minimal jump zero crossings in numpy
                            
                                Python Multiprocessing Queue when set to infinite is capped at 32768 (2^15)
                            
                                Why does python require you to acquire a lock before waiting on a condition
                            
                                TypeError: write() argument must be str, not byte , upgrade to python 3 [duplicate]
                            
                                Unable to close worksheet in xlsxwriter
                            
                                Does pytorch apply softmax automatically in nn.Linear
                            
                                You may need to add u'127.0.0.1' to ALLOWED_HOSTS
                            
                                How to check folder / file permissions with Pathlib
                            
                                Is there a pandas equivalent to the tidyr nest function?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With