I would like to set a cell in a pandas dataframe equal to a dictionary for rows in which another column in that same row equals 1. I am using <code>df.loc</code> to filter the rows. Since my dictionary has two keys, it only works if the filtering done by <code>df.loc</code> also has two keys. If it doesn't have two keys, I get <code>ValueError: Must have equal len keys and value when setting with an iterable</code>. I don't see why these two things are related. <pre class="prettyprint"><code>import pandas as pd df = pd.DataFrame(data=[[1,2], [0,3], [3,4]], columns=['Col1', 'Col2']) #df = pd.DataFrame(data=[[1,2], [1,3], [3,4]], columns=['Col1', 'Col2']) df.loc[df["Col1"]==1, "Col2"] = {'key1': 'A', 'key2': 'B'} print df </code></pre> If I uncomment the third line of code, I would like to produce the below results. <pre class="prettyprint"><code> Col1 Col2 0 1 {u'key2': u'B', u'key1': u'A'} 1 1 {u'key2': u'B', u'key1': u'A'} 2 3 4 </code></pre> Before this gets marked as a duplicate, I have seen other questions regarding this pandas error, but none seem to solve this issue specifically.

IIUC, wrap the dictionary in a list, and pass it to <code>loc</code>: <pre class="prettyprint"><code>df Col1 Col2 0 1 2 1 1 3 2 3 4 m = df['Col1'].eq(1) df.loc[m, 'Col2'] = [{'a' : 1, 'b' : 2}] * m.sum() df Col1 Col2 0 1 {'a': 1, 'b': 2} 1 1 {'a': 1, 'b': 2} 2 3 4 </code></pre> This should apply to any result equally well. Just keep in mind that <code>[] * n</code> replicates the references, so you have the same <code>dict</code> object being assigned to multiple cells! Keep this in mind going forward. There's an alternative if you want to avoid duplicating references - you can build a list with a list comprehension. <pre class="prettyprint"><code>i = {'a' : 1, 'b' : 2} df.loc[m, 'Col2'] = [i.copy() for _ in range(m.sum())] </code></pre> If you have a nested dictionary, <code>copy</code> only performs a shallow copy, so use the <code>copy</code> module's <code>deepcopy</code> function instead: <pre class="prettyprint"><code>from copy import deepcopy df.loc[m, 'Col2'] = [deepcopy(i) for _ in range(m.sum())] </code></pre>

How to assign a python object (such as a dictionary) to pandas column

Tags:

python

dictionary

pandas

dataframe

I would like to set a cell in a pandas dataframe equal to a dictionary for rows in which another column in that same row equals 1. I am using df.loc to filter the rows. Since my dictionary has two keys, it only works if the filtering done by df.loc also has two keys. If it doesn't have two keys, I get ValueError: Must have equal len keys and value when setting with an iterable.

I don't see why these two things are related.

import pandas as pd
df = pd.DataFrame(data=[[1,2], [0,3], [3,4]], columns=['Col1', 'Col2'])
#df = pd.DataFrame(data=[[1,2], [1,3], [3,4]], columns=['Col1', 'Col2'])

df.loc[df["Col1"]==1, "Col2"] = {'key1': 'A',
                                 'key2': 'B'}

print df

If I uncomment the third line of code, I would like to produce the below results.

   Col1                            Col2
0     1  {u'key2': u'B', u'key1': u'A'}
1     1  {u'key2': u'B', u'key1': u'A'}
2     3                               4

Before this gets marked as a duplicate, I have seen other questions regarding this pandas error, but none seem to solve this issue specifically.

298

asked Nov 10 '17 21:11

user2242044

1 Answers

IIUC, wrap the dictionary in a list, and pass it to loc:

df

   Col1  Col2
0     1     2
1     1     3
2     3     4

m = df['Col1'].eq(1)
df.loc[m, 'Col2'] = [{'a' : 1, 'b' : 2}] * m.sum()

df

   Col1              Col2
0     1  {'a': 1, 'b': 2}
1     1  {'a': 1, 'b': 2}
2     3                 4

This should apply to any result equally well. Just keep in mind that [] * n replicates the references, so you have the same dict object being assigned to multiple cells! Keep this in mind going forward.

There's an alternative if you want to avoid duplicating references - you can build a list with a list comprehension.

i = {'a' : 1, 'b' : 2}
df.loc[m, 'Col2'] = [i.copy() for _ in range(m.sum())]

If you have a nested dictionary, copy only performs a shallow copy, so use the copy module's deepcopy function instead:

from copy import deepcopy
df.loc[m, 'Col2'] = [deepcopy(i) for _ in range(m.sum())]

answered Oct 26 '22 07:10

cs95

Related questions
                            
                                How to create a SECRET_HASH for AWS Cognito using boto3?
                            
                                How to convert a pandas dataframe into one dimensional array?
                            
                                Python tqdm and print weird printout order [duplicate]
                            
                                How to plot int to datetime on x axis using seaborn?
                            
                                Python: running pygame through Bash on Ubuntu on Windows
                            
                                Convert a column containing a list of dictionaries to multiple columns in pandas dataframe
                            
                                Rpy2: how to access the R list-type variable?
                            
                                How to select a subset of values from a named column level in a DataFrame?
                            
                                Async multiprocessing python
                            
                                What's preventing python from being compiled?
                            
                                How to intercept class creation and add attribute using a metaclass?
                            
                                How to run a `nix-shell` with a default.nix file?
                            
                                Is there a difference between 'await future' and 'await asyncio.wait_for(future, None)'?
                            
                                While loop blocks asyncio tasks
                            
                                How to access topic words only in gensim
                            
                                Writing lxml.etree with double quotes header attributes
                            
                                Why does encoding wav file to base64 with python and online webapp give different results?
                            
                                How to Have Multiple Softmax Outputs in Tensorflow?
                            
                                Is this the right way to set a timezone with dateutil?
                            
                                Scraping data from Highcharts using selenium

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With