I have a Pandas Dataframe and want to add the data from a dictionary uniformly to all rows in my dataframe. Currently I loop over the dictionary and set the value to my new columns. Is there a more efficient way to do this?
notebook
# coding: utf-8    
import pandas as pd
df = pd.DataFrame({'age' : [1, 2, 3],'name' : ['Foo', 'Bar', 'Barbie']}) 
d = {"blah":42,"blah-blah":"bar"}
for k,v in d.items():
    df[k] = v
df
                A pandas DataFrame can be converted into a Python dictionary using the DataFrame instance method to_dict(). The output can be specified of various orientations using the parameter orient. In dictionary orientation, for each column of the DataFrame the column value is listed against the row label in a dictionary.
The keys of the dictionary are the DataFrame's column labels, and the dictionary values are the data values in the corresponding DataFrame columns. The values can be contained in a tuple, list, one-dimensional NumPy array, Pandas Series object, or one of several other data types.
For certain small, targeted purposes, a dict may be faster. And if that is all you need, then use a dict, for sure! But if you need/want the power and luxury of a DataFrame, then a dict is no substitute. It is meaningless to compare speed if the data structure does not first satisfy your needs.
Use assign if all keys are not numeric:
df = df.assign(**d)
print (df)
   age    name  blah blah-blah
0    1     Foo    42       bar
1    2     Bar    42       bar
2    3  Barbie    42       bar
If possible numeric join working nice:
d = {8:42,"blah-blah":"bar"}
df = df.join(pd.DataFrame(d, index=df.index))
print (df)
   age    name   8 blah-blah
0    1     Foo  42       bar
1    2     Bar  42       bar
2    3  Barbie  42       bar
                        The answer in my opinion is no. Looping through key,values in a dict is already efficient and assigning columns with df[k] = v is more readable. Remember that in the future you just want to remember why you did something and you won't care much if you spare some microseconds. The only thing missing is a comment why you do it.
d = {"blah":42,"blah-blah":"bar"}
# Add columns to compensate for missing values in document XXX
for k,v in d.items():
    df[k] = v
Timings (but the error is too big... I'd say they are equivalent in speed):
Your solution:
809 µs ± 70 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
df.assign():
893 µs ± 24.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With