Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Efficient way of looping through list of dictionaries and appending items into column in dataframe

Tags:

python

pandas

Here is MRE:

data = [
    {'1':20},
    {'1':10},
    {'1':40},
    {'1':14},
    {'1':33}
]

What I am trying to do is loop through each dictionary and append each value to a column in a dataframe.

right now I am doing

import pandas as pd
lst = []
for item in data:
    lst.append(item['1'])

df = pd.DataFrame({"col1":lst})

outputting:

    col1
0   20
1   10
2   40
3   14
4   33

Yes this is what I want however I have over 1M dictionaries in a list. Is it most efficient way?

EDIT: pd.DataFrame(data).rename(columns={'1':'col1'}) works perfectly for above case however what if data looks like this?

data = [
    {'1':
     {'value':20}},
    {'1':
     {'value':10}},
    {'1':
      {'value':40}},
    {'1':
      {'value':14}},
    {'1':
      {'value':33}}]

so I would use:

lst = []
for item in data:
    lst.append(item['1']['value'])

df = pd.DataFrame({"col1":lst})

is there more efficient way for list of dictionary that contain dictionary?

like image 765
haneulkim Avatar asked Dec 18 '19 06:12

haneulkim


People also ask

What is the best way to iterate through a DataFrame?

Vectorization is always the first and best choice. You can convert the data frame to NumPy array or into dictionary format to speed up the iteration workflow. Iterating through the key-value pair of dictionaries comes out to be the fastest way with around 280x times speed up for 20 million records.

Can we create DataFrame from list of dictionaries?

When we create Dataframe from a list of dictionaries, matching keys will be the columns and corresponding values will be the rows of the Dataframe. If there are no matching values and columns in the dictionary, then the NaN value will be inserted into the resulted Dataframe.

How do you iterate through a list in dictionary?

You can simply iterate over the range of length of list. In the outer loop, we use range and length function to create a list that we can iterate through. We use the index value to get each dictionary. In the inner loop, we use key variable to iterate through the current dictionary.

Can you append dictionary to a pandas DataFrame?

Append list of dictionary and series to a existing Pandas DataFrame in Python. In this article, we will discuss how values from a list of dictionaries or Pandas Series can be appended to an already existing pandas dataframe. For this purpose append() function of pandas, the module is sufficient.


2 Answers

One idea is pass data to DataFrame cosntructor and then use rename:

df = pd.DataFrame(data).rename(columns={'1':'col1'})
print (df)
   col1
0    20
1    10
2    40
3    14
4    33

If is necessary filtering use list comprehension and add parameter columns:

df = pd.DataFrame([x['1'] for x in data], columns=['col1'])
print (df)
   col1
0    20
1    10
2    40
3    14
4    33

EDIT: For new data use:

data = [
    {'1':
     {'value':20}},
    {'1':
     {'value':10}},
    {'1':
      {'value':40}},
    {'1':
      {'value':14}},
    {'1':
      {'value':33}}]

df = pd.DataFrame([x['1']['value'] for x in data], columns=['col1'])
print (df)
   col1
0    20
1    10
2    40
3    14
4    33

Or:

df = pd.DataFrame([x['1'] for x in data]).rename(columns={'value':'col1'})
print (df)
   col1
0    20
1    10
2    40
3    14
4    33
like image 151
jezrael Avatar answered Oct 18 '22 22:10

jezrael


@jezrael's answer is correct but to be more specific with col:

df = pd.DataFrame(data)
print(df.add_prefix('col'))

Output:

   col1
0    20
1    10
2    40
3    14
4    33
like image 35
U12-Forward Avatar answered Oct 18 '22 23:10

U12-Forward