Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Convert elements of list in pandas series using a dict

Tags:

python

pandas

I have the following Pandas dataframe:

1    ["Apple", "Banana"]
2    ["Kiwi"]
3    None
4    ["Apple"]
5    ["Banana", "Kiwi"]

and the following dict:

{1: ["Apple", "Banana"],
2: ["Kiwi"]}

I would now like to map all the entries in the lists in my dataframe using the dictionary. The result should be the following:

1    [1]
2    [2]
3    None
4    [1]
5    [1, 2]

How can this be done most efficiently?

like image 493
dagrun Avatar asked Jun 14 '19 13:06

dagrun


People also ask

How to convert a pandas series to a dict?

How to convert a pandas series to a dict? To get a dictionary from a series, you can use the pandas series to_dict () function which returns a dictionary of “index: value” key-value pairs. The following is the syntax: # using to_dict ()

What is the difference between dictionary and series in pandas?

A series is a one-dimensional labeled array which can contain any type of data i.e. integer, float, string, python objects, etc. while dictionary is an unordered collection of key : value pairs. We use series () function of pandas library to convert a dictionary into series by passing the dictionary as an argument. Let’s see some examples:

How to get a list from a pandas series in Python?

There are a number of ways to get a list from a pandas series. You can use the tolist () function associated with the pandas series or pass the series to the python built-in list () function. The following is the syntax to use the above functions: Here, s is the pandas series you want to convert.

How do I convert a Python list to a pandas datframe?

In this article we will see how we can convert a given python list whose elements are a nested dictionary, into a pandas Datframe. We first take the list of nested dictionary and extract the rows of data from it. Then we create another for loop to append the rows into the new list which was originally created empty.


Video Answer


3 Answers

Method 1 I am using unnesting

d={z :  x for x , y in d.items() for z in y }
s=unnesting(s.to_frame().dropna(),[0])[0]\
   .map(d).groupby(level=0).apply(set).reindex(s.index)
Out[260]: 
0       {1}
1       {2}
2       NaN
3       {1}
4    {1, 2}
Name: 0, dtype: object

Method 2 loop it

[set(d.get(y) for y in x) if  x is not None  else None for x in s ]
#s=[set(d.get(y) for y in x) if  x is not None  else None for x in s ]

Out[265]: [{1}, {2}, None, {1}, {1, 2}]

Data input

s=pd.Series([["Apple", "Banana"],["Kiwi"],None,["Apple"],["Banana", "Kiwi"]])
d={1: ["Apple", "Banana"],
2: ["Kiwi"]}
like image 171
BENY Avatar answered Oct 19 '22 16:10

BENY


One way would be to first unnest the dictionary and set the values as keys with their corresponding keys as values. And then you can use a list comprehension and map the values in each of the lists in the dataframe.

It'll be necessary to take a set before returning a the result from the mapping in each iteration in order to avoid repeated values. Also note that or None is doing the same as if x is not None else None here, which will return None in the case a list is empty. For a more detailed explanation on this you may check this post:

df = pd.DataFrame({'col1':[["Apple", "Banana"], ["Kiwi"], None, ["Apple"], ["Banana", "Kiwi"]]})
d = {1: ["Apple", "Banana"], 2: ["Kiwi"]}

d = {i:k for k, v in d.items() for i in v}
# {'Apple': 1, 'Banana': 1, 'Kiwi': 2}
out = [list(set(d[j] for j in i)) or None for i in df.col1.fillna('')]
# [[1], [2], None, [1], [1, 2]]
pd.DataFrame([out]).T

   0
0     [1]
1     [2]
2    None
3     [1]
4  [1, 2]
like image 44
yatu Avatar answered Oct 19 '22 16:10

yatu


Option 1

Rebuild the dictionary

m = {v: k for k, V in d.items() for v in V}

Rebuild

x = s.dropna()
v = [*map(m.get, np.concatenate(x.to_numpy()))]
i = x.index.repeat(x.str.len())
y = pd.Series(v, i)
y.groupby(level=0).unique().reindex(s.index)

0       [1]
1       [2]
2       NaN
3       [1]
4    [1, 2]
dtype: object

If you insist on None rather than NaN

y.groupby(level=0).unique().reindex(s.index).mask(pd.isna, None)

0       [1]
1       [2]
2      None
3       [1]
4    [1, 2]
dtype: object

Setup

s = pd.Series([
    ['Apple', 'Banana'],
    ['Kiwi'],
    None,
    ['Apple'],
    ['Banana', 'Kiwi']
])

d = {1: ['Apple', 'Banana'], 2: ['Kiwi']}
like image 2
piRSquared Avatar answered Oct 19 '22 15:10

piRSquared