Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Best way to add dictionary to dataframe

Tags:

python

pandas

I have a Pandas Dataframe and want to add the data from a dictionary uniformly to all rows in my dataframe. Currently I loop over the dictionary and set the value to my new columns. Is there a more efficient way to do this?

notebook

# coding: utf-8    
import pandas as pd

df = pd.DataFrame({'age' : [1, 2, 3],'name' : ['Foo', 'Bar', 'Barbie']}) 
d = {"blah":42,"blah-blah":"bar"}
for k,v in d.items():
    df[k] = v
df
like image 934
Rutger Hofste Avatar asked Apr 13 '18 13:04

Rutger Hofste


People also ask

Can I store dictionary in pandas DataFrame?

A pandas DataFrame can be converted into a Python dictionary using the DataFrame instance method to_dict(). The output can be specified of various orientations using the parameter orient. In dictionary orientation, for each column of the DataFrame the column value is listed against the row label in a dictionary.

Can a DataFrame hold dictionary?

The keys of the dictionary are the DataFrame's column labels, and the dictionary values are the data values in the corresponding DataFrame columns. The values can be contained in a tuple, list, one-dimensional NumPy array, Pandas Series object, or one of several other data types.

Are pandas faster than dictionary?

For certain small, targeted purposes, a dict may be faster. And if that is all you need, then use a dict, for sure! But if you need/want the power and luxury of a DataFrame, then a dict is no substitute. It is meaningless to compare speed if the data structure does not first satisfy your needs.


2 Answers

Use assign if all keys are not numeric:

df = df.assign(**d)
print (df)
   age    name  blah blah-blah
0    1     Foo    42       bar
1    2     Bar    42       bar
2    3  Barbie    42       bar

If possible numeric join working nice:

d = {8:42,"blah-blah":"bar"}
df = df.join(pd.DataFrame(d, index=df.index))
print (df)

   age    name   8 blah-blah
0    1     Foo  42       bar
1    2     Bar  42       bar
2    3  Barbie  42       bar
like image 109
jezrael Avatar answered Oct 17 '22 19:10

jezrael


The answer in my opinion is no. Looping through key,values in a dict is already efficient and assigning columns with df[k] = v is more readable. Remember that in the future you just want to remember why you did something and you won't care much if you spare some microseconds. The only thing missing is a comment why you do it.

d = {"blah":42,"blah-blah":"bar"}

# Add columns to compensate for missing values in document XXX
for k,v in d.items():
    df[k] = v

Timings (but the error is too big... I'd say they are equivalent in speed):

Your solution:

809 µs ± 70 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

df.assign():

893 µs ± 24.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
like image 5
Anton vBR Avatar answered Oct 17 '22 19:10

Anton vBR