I have the following Pandas dataframe:
1 ["Apple", "Banana"]
2 ["Kiwi"]
3 None
4 ["Apple"]
5 ["Banana", "Kiwi"]
and the following dict:
{1: ["Apple", "Banana"],
2: ["Kiwi"]}
I would now like to map all the entries in the lists in my dataframe using the dictionary. The result should be the following:
1 [1]
2 [2]
3 None
4 [1]
5 [1, 2]
How can this be done most efficiently?
How to convert a pandas series to a dict? To get a dictionary from a series, you can use the pandas series to_dict () function which returns a dictionary of “index: value” key-value pairs. The following is the syntax: # using to_dict ()
A series is a one-dimensional labeled array which can contain any type of data i.e. integer, float, string, python objects, etc. while dictionary is an unordered collection of key : value pairs. We use series () function of pandas library to convert a dictionary into series by passing the dictionary as an argument. Let’s see some examples:
There are a number of ways to get a list from a pandas series. You can use the tolist () function associated with the pandas series or pass the series to the python built-in list () function. The following is the syntax to use the above functions: Here, s is the pandas series you want to convert.
In this article we will see how we can convert a given python list whose elements are a nested dictionary, into a pandas Datframe. We first take the list of nested dictionary and extract the rows of data from it. Then we create another for loop to append the rows into the new list which was originally created empty.
Method 1
I am using unnesting
d={z : x for x , y in d.items() for z in y }
s=unnesting(s.to_frame().dropna(),[0])[0]\
.map(d).groupby(level=0).apply(set).reindex(s.index)
Out[260]:
0 {1}
1 {2}
2 NaN
3 {1}
4 {1, 2}
Name: 0, dtype: object
Method 2 loop it
[set(d.get(y) for y in x) if x is not None else None for x in s ]
#s=[set(d.get(y) for y in x) if x is not None else None for x in s ]
Out[265]: [{1}, {2}, None, {1}, {1, 2}]
Data input
s=pd.Series([["Apple", "Banana"],["Kiwi"],None,["Apple"],["Banana", "Kiwi"]])
d={1: ["Apple", "Banana"],
2: ["Kiwi"]}
One way would be to first unnest the dictionary and set the values as keys with their corresponding keys as values. And then you can use a list comprehension and map the values in each of the lists in the dataframe.
It'll be necessary to take a set
before returning a the result from the mapping in each iteration in order to avoid repeated values. Also note that or None
is doing the same as if x is not None else None
here, which will return None
in the case a list is empty. For a more detailed explanation on this you may check this post:
df = pd.DataFrame({'col1':[["Apple", "Banana"], ["Kiwi"], None, ["Apple"], ["Banana", "Kiwi"]]})
d = {1: ["Apple", "Banana"], 2: ["Kiwi"]}
d = {i:k for k, v in d.items() for i in v}
# {'Apple': 1, 'Banana': 1, 'Kiwi': 2}
out = [list(set(d[j] for j in i)) or None for i in df.col1.fillna('')]
# [[1], [2], None, [1], [1, 2]]
pd.DataFrame([out]).T
0
0 [1]
1 [2]
2 None
3 [1]
4 [1, 2]
Rebuild the dictionary
m = {v: k for k, V in d.items() for v in V}
Rebuild
x = s.dropna()
v = [*map(m.get, np.concatenate(x.to_numpy()))]
i = x.index.repeat(x.str.len())
y = pd.Series(v, i)
y.groupby(level=0).unique().reindex(s.index)
0 [1]
1 [2]
2 NaN
3 [1]
4 [1, 2]
dtype: object
If you insist on None
rather than NaN
y.groupby(level=0).unique().reindex(s.index).mask(pd.isna, None)
0 [1]
1 [2]
2 None
3 [1]
4 [1, 2]
dtype: object
s = pd.Series([
['Apple', 'Banana'],
['Kiwi'],
None,
['Apple'],
['Banana', 'Kiwi']
])
d = {1: ['Apple', 'Banana'], 2: ['Kiwi']}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With