I have the following Pandas dataframe: <pre class="prettyprint"><code>1 ["Apple", "Banana"] 2 ["Kiwi"] 3 None 4 ["Apple"] 5 ["Banana", "Kiwi"] </code></pre> and the following dict: <pre class="prettyprint"><code>{1: ["Apple", "Banana"], 2: ["Kiwi"]} </code></pre> I would now like to map all the entries in the lists in my dataframe using the dictionary. The result should be the following: <pre class="prettyprint"><code>1 [1] 2 [2] 3 None 4 [1] 5 [1, 2] </code></pre> How can this be done most efficiently?

Method 1 I am using <code>unnesting</code> <pre class="prettyprint"><code>d={z : x for x , y in d.items() for z in y } s=unnesting(s.to_frame().dropna(),[0])[0]\ .map(d).groupby(level=0).apply(set).reindex(s.index) Out[260]: 0 {1} 1 {2} 2 NaN 3 {1} 4 {1, 2} Name: 0, dtype: object </code></pre> <hr> Method 2 loop it <pre class="prettyprint"><code>[set(d.get(y) for y in x) if x is not None else None for x in s ] #s=[set(d.get(y) for y in x) if x is not None else None for x in s ] Out[265]: [{1}, {2}, None, {1}, {1, 2}] </code></pre> <hr> Data input <pre class="prettyprint"><code>s=pd.Series([["Apple", "Banana"],["Kiwi"],None,["Apple"],["Banana", "Kiwi"]]) d={1: ["Apple", "Banana"], 2: ["Kiwi"]} </code></pre>

One way would be to first unnest the dictionary and set the values as keys with their corresponding keys as values. And then you can use a list comprehension and map the values in each of the lists in the dataframe. It'll be necessary to take a <code>set</code> before returning a the result from the mapping in each iteration in order to avoid repeated values. Also note that <code>or None</code> is doing the same as <code>if x is not None else None</code> here, which will return <code>None</code> in the case a list is empty. For a more detailed explanation on this you may check this post: <pre class="prettyprint"><code>df = pd.DataFrame({'col1':[["Apple", "Banana"], ["Kiwi"], None, ["Apple"], ["Banana", "Kiwi"]]}) d = {1: ["Apple", "Banana"], 2: ["Kiwi"]} </code></pre> <hr> <pre class="prettyprint"><code>d = {i:k for k, v in d.items() for i in v} # {'Apple': 1, 'Banana': 1, 'Kiwi': 2} out = [list(set(d[j] for j in i)) or None for i in df.col1.fillna('')] # [[1], [2], None, [1], [1, 2]] pd.DataFrame([out]).T 0 0 [1] 1 [2] 2 None 3 [1] 4 [1, 2] </code></pre>

<h3>Option 1</h3> Rebuild the dictionary <pre class="prettyprint"><code>m = {v: k for k, V in d.items() for v in V} </code></pre> Rebuild <pre class="prettyprint"><code>x = s.dropna() v = [*map(m.get, np.concatenate(x.to_numpy()))] i = x.index.repeat(x.str.len()) y = pd.Series(v, i) y.groupby(level=0).unique().reindex(s.index) 0 [1] 1 [2] 2 NaN 3 [1] 4 [1, 2] dtype: object </code></pre> If you insist on <code>None</code> rather than <code>NaN</code> <pre class="prettyprint"><code>y.groupby(level=0).unique().reindex(s.index).mask(pd.isna, None) 0 [1] 1 [2] 2 None 3 [1] 4 [1, 2] dtype: object </code></pre> <hr> <h3>Setup</h3> <pre class="prettyprint"><code>s = pd.Series([ ['Apple', 'Banana'], ['Kiwi'], None, ['Apple'], ['Banana', 'Kiwi'] ]) d = {1: ['Apple', 'Banana'], 2: ['Kiwi']} </code></pre>

Convert elements of list in pandas series using a dict

I have the following Pandas dataframe:

1    ["Apple", "Banana"]
2    ["Kiwi"]
3    None
4    ["Apple"]
5    ["Banana", "Kiwi"]

and the following dict:

{1: ["Apple", "Banana"],
2: ["Kiwi"]}

I would now like to map all the entries in the lists in my dataframe using the dictionary. The result should be the following:

1    [1]
2    [2]
3    None
4    [1]
5    [1, 2]

How can this be done most efficiently?

How to convert a pandas series to a dict?

How to convert a pandas series to a dict? To get a dictionary from a series, you can use the pandas series to_dict () function which returns a dictionary of “index: value” key-value pairs. The following is the syntax: # using to_dict ()

What is the difference between dictionary and series in pandas?

A series is a one-dimensional labeled array which can contain any type of data i.e. integer, float, string, python objects, etc. while dictionary is an unordered collection of key : value pairs. We use series () function of pandas library to convert a dictionary into series by passing the dictionary as an argument. Let’s see some examples:

How to get a list from a pandas series in Python?

There are a number of ways to get a list from a pandas series. You can use the tolist () function associated with the pandas series or pass the series to the python built-in list () function. The following is the syntax to use the above functions: Here, s is the pandas series you want to convert.

How do I convert a Python list to a pandas datframe?

In this article we will see how we can convert a given python list whose elements are a nested dictionary, into a pandas Datframe. We first take the list of nested dictionary and extract the rows of data from it. Then we create another for loop to append the rows into the new list which was originally created empty.

Method 1 I am using unnesting

d={z :  x for x , y in d.items() for z in y }
s=unnesting(s.to_frame().dropna(),[0])[0]\
   .map(d).groupby(level=0).apply(set).reindex(s.index)
Out[260]: 
0       {1}
1       {2}
2       NaN
3       {1}
4    {1, 2}
Name: 0, dtype: object

Method 2 loop it

[set(d.get(y) for y in x) if  x is not None  else None for x in s ]
#s=[set(d.get(y) for y in x) if  x is not None  else None for x in s ]

Out[265]: [{1}, {2}, None, {1}, {1, 2}]

Data input

s=pd.Series([["Apple", "Banana"],["Kiwi"],None,["Apple"],["Banana", "Kiwi"]])
d={1: ["Apple", "Banana"],
2: ["Kiwi"]}

One way would be to first unnest the dictionary and set the values as keys with their corresponding keys as values. And then you can use a list comprehension and map the values in each of the lists in the dataframe.

It'll be necessary to take a set before returning a the result from the mapping in each iteration in order to avoid repeated values. Also note that or None is doing the same as if x is not None else None here, which will return None in the case a list is empty. For a more detailed explanation on this you may check this post:

df = pd.DataFrame({'col1':[["Apple", "Banana"], ["Kiwi"], None, ["Apple"], ["Banana", "Kiwi"]]})
d = {1: ["Apple", "Banana"], 2: ["Kiwi"]}

d = {i:k for k, v in d.items() for i in v}
# {'Apple': 1, 'Banana': 1, 'Kiwi': 2}
out = [list(set(d[j] for j in i)) or None for i in df.col1.fillna('')]
# [[1], [2], None, [1], [1, 2]]
pd.DataFrame([out]).T

   0
0     [1]
1     [2]
2    None
3     [1]
4  [1, 2]

Option 1

Rebuild the dictionary

m = {v: k for k, V in d.items() for v in V}

Rebuild

x = s.dropna()
v = [*map(m.get, np.concatenate(x.to_numpy()))]
i = x.index.repeat(x.str.len())
y = pd.Series(v, i)
y.groupby(level=0).unique().reindex(s.index)

0       [1]
1       [2]
2       NaN
3       [1]
4    [1, 2]
dtype: object

If you insist on None rather than NaN

y.groupby(level=0).unique().reindex(s.index).mask(pd.isna, None)

0       [1]
1       [2]
2      None
3       [1]
4    [1, 2]
dtype: object

Setup

s = pd.Series([
    ['Apple', 'Banana'],
    ['Kiwi'],
    None,
    ['Apple'],
    ['Banana', 'Kiwi']
])

d = {1: ['Apple', 'Banana'], 2: ['Kiwi']}

Convert elements of list in pandas series using a dict

Tags:

python

pandas

dagrun

People also ask

Video Answer

3 Answers

BENY

yatu

Option 1

Setup

piRSquared

Recent Activity

Donate For Us

Convert elements of list in pandas series using a dict

Tags:

python

pandas

dagrun

People also ask

Video Answer

3 Answers

BENY

yatu

Option 1

Setup

piRSquared

Related questions

Recent Activity

Donate For Us