I would like to set a cell in a pandas dataframe equal to a dictionary for rows in which another column in that same row equals 1. I am using df.loc
to filter the rows. Since my dictionary has two keys, it only works if the filtering done by df.loc
also has two keys. If it doesn't have two keys, I get ValueError: Must have equal len keys and value when setting with an iterable
.
I don't see why these two things are related.
import pandas as pd
df = pd.DataFrame(data=[[1,2], [0,3], [3,4]], columns=['Col1', 'Col2'])
#df = pd.DataFrame(data=[[1,2], [1,3], [3,4]], columns=['Col1', 'Col2'])
df.loc[df["Col1"]==1, "Col2"] = {'key1': 'A',
'key2': 'B'}
print df
If I uncomment the third line of code, I would like to produce the below results.
Col1 Col2
0 1 {u'key2': u'B', u'key1': u'A'}
1 1 {u'key2': u'B', u'key1': u'A'}
2 3 4
Before this gets marked as a duplicate, I have seen other questions regarding this pandas error, but none seem to solve this issue specifically.
We can convert a dictionary to a pandas dataframe by using the pd. DataFrame. from_dict() class-method.
We use series() function of pandas library to convert a dictionary into series by passing the dictionary as an argument. Let's see some examples: Example 1: We pass the name of dictionary as an argument in series() function. The order of output will be same as of dictionary.
Pandas DataFrame: assign() functionThe assign() function is used to assign new columns to a DataFrame. Returns a new object with all original columns in addition to new ones. Existing columns that are re-assigned will be overwritten. The column names are keywords.
Method 1: Create DataFrame from Dictionary using default Constructor of pandas. Dataframe class. Method 2: Create DataFrame from Dictionary with user-defined indexes. Method 3: Create DataFrame from simple dictionary i.e dictionary with key and simple value like integer or string value.
IIUC, wrap the dictionary in a list, and pass it to loc
:
df
Col1 Col2
0 1 2
1 1 3
2 3 4
m = df['Col1'].eq(1)
df.loc[m, 'Col2'] = [{'a' : 1, 'b' : 2}] * m.sum()
df
Col1 Col2
0 1 {'a': 1, 'b': 2}
1 1 {'a': 1, 'b': 2}
2 3 4
This should apply to any result equally well. Just keep in mind that [] * n
replicates the references, so you have the same dict
object being assigned to multiple cells! Keep this in mind going forward.
There's an alternative if you want to avoid duplicating references - you can build a list with a list comprehension.
i = {'a' : 1, 'b' : 2}
df.loc[m, 'Col2'] = [i.copy() for _ in range(m.sum())]
If you have a nested dictionary, copy
only performs a shallow copy, so use the copy
module's deepcopy
function instead:
from copy import deepcopy
df.loc[m, 'Col2'] = [deepcopy(i) for _ in range(m.sum())]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With