Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to assign a python object (such as a dictionary) to pandas column

I would like to set a cell in a pandas dataframe equal to a dictionary for rows in which another column in that same row equals 1. I am using df.loc to filter the rows. Since my dictionary has two keys, it only works if the filtering done by df.loc also has two keys. If it doesn't have two keys, I get ValueError: Must have equal len keys and value when setting with an iterable.

I don't see why these two things are related.

import pandas as pd
df = pd.DataFrame(data=[[1,2], [0,3], [3,4]], columns=['Col1', 'Col2'])
#df = pd.DataFrame(data=[[1,2], [1,3], [3,4]], columns=['Col1', 'Col2'])

df.loc[df["Col1"]==1, "Col2"] = {'key1': 'A',
                                 'key2': 'B'}

print df

If I uncomment the third line of code, I would like to produce the below results.

   Col1                            Col2
0     1  {u'key2': u'B', u'key1': u'A'}
1     1  {u'key2': u'B', u'key1': u'A'}
2     3                               4

Before this gets marked as a duplicate, I have seen other questions regarding this pandas error, but none seem to solve this issue specifically.

like image 298
user2242044 Avatar asked Nov 10 '17 21:11

user2242044


People also ask

Can a Python dictionary be data in pandas?

We can convert a dictionary to a pandas dataframe by using the pd. DataFrame. from_dict() class-method.

Can Python dictionary will be converted into pandas series?

We use series() function of pandas library to convert a dictionary into series by passing the dictionary as an argument. Let's see some examples: Example 1: We pass the name of dictionary as an argument in series() function. The order of output will be same as of dictionary.

How do I assign a value to a column in pandas?

Pandas DataFrame: assign() functionThe assign() function is used to assign new columns to a DataFrame. Returns a new object with all original columns in addition to new ones. Existing columns that are re-assigned will be overwritten. The column names are keywords.

How can we create a DataFrame from a Python dictionary?

Method 1: Create DataFrame from Dictionary using default Constructor of pandas. Dataframe class. Method 2: Create DataFrame from Dictionary with user-defined indexes. Method 3: Create DataFrame from simple dictionary i.e dictionary with key and simple value like integer or string value.


1 Answers

IIUC, wrap the dictionary in a list, and pass it to loc:

df

   Col1  Col2
0     1     2
1     1     3
2     3     4

m = df['Col1'].eq(1)
df.loc[m, 'Col2'] = [{'a' : 1, 'b' : 2}] * m.sum()

df

   Col1              Col2
0     1  {'a': 1, 'b': 2}
1     1  {'a': 1, 'b': 2}
2     3                 4

This should apply to any result equally well. Just keep in mind that [] * n replicates the references, so you have the same dict object being assigned to multiple cells! Keep this in mind going forward.

There's an alternative if you want to avoid duplicating references - you can build a list with a list comprehension.

i = {'a' : 1, 'b' : 2}
df.loc[m, 'Col2'] = [i.copy() for _ in range(m.sum())]

If you have a nested dictionary, copy only performs a shallow copy, so use the copy module's deepcopy function instead:

from copy import deepcopy
df.loc[m, 'Col2'] = [deepcopy(i) for _ in range(m.sum())]
like image 79
cs95 Avatar answered Oct 26 '22 07:10

cs95