I have a Pandas Dataframe and want to add the data from a dictionary uniformly to all rows in my dataframe. Currently I loop over the dictionary and set the value to my new columns. Is there a more efficient way to do this?
notebook
# coding: utf-8
import pandas as pd
df = pd.DataFrame({'age' : [1, 2, 3],'name' : ['Foo', 'Bar', 'Barbie']})
d = {"blah":42,"blah-blah":"bar"}
for k,v in d.items():
df[k] = v
df
A pandas DataFrame can be converted into a Python dictionary using the DataFrame instance method to_dict(). The output can be specified of various orientations using the parameter orient. In dictionary orientation, for each column of the DataFrame the column value is listed against the row label in a dictionary.
The keys of the dictionary are the DataFrame's column labels, and the dictionary values are the data values in the corresponding DataFrame columns. The values can be contained in a tuple, list, one-dimensional NumPy array, Pandas Series object, or one of several other data types.
For certain small, targeted purposes, a dict may be faster. And if that is all you need, then use a dict, for sure! But if you need/want the power and luxury of a DataFrame, then a dict is no substitute. It is meaningless to compare speed if the data structure does not first satisfy your needs.
Use assign
if all keys are not numeric:
df = df.assign(**d)
print (df)
age name blah blah-blah
0 1 Foo 42 bar
1 2 Bar 42 bar
2 3 Barbie 42 bar
If possible numeric join
working nice:
d = {8:42,"blah-blah":"bar"}
df = df.join(pd.DataFrame(d, index=df.index))
print (df)
age name 8 blah-blah
0 1 Foo 42 bar
1 2 Bar 42 bar
2 3 Barbie 42 bar
The answer in my opinion is no. Looping through key,values in a dict is already efficient and assigning columns with df[k] = v
is more readable. Remember that in the future you just want to remember why you did something and you won't care much if you spare some microseconds. The only thing missing is a comment why you do it.
d = {"blah":42,"blah-blah":"bar"}
# Add columns to compensate for missing values in document XXX
for k,v in d.items():
df[k] = v
Timings (but the error is too big... I'd say they are equivalent in speed):
Your solution:
809 µs ± 70 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
df.assign():
893 µs ± 24.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With