I have a dataframe like this
import pandas as pd
df = pd.DataFrame({'A':list('bbcddee'), 'B': list('klmnnoi')})
A B
0 b k
1 b l
2 c m
3 d n
4 d n
5 e o
6 e i
and I would like to create a dictionary from the columns A and B using e.g.
dict(zip(df.A, df.B))
Before doing this, I would like to check whether each value in A is mapped to only one value in B; if not, an error should be thrown; above that is not the case as b is mapped to k and l and e is mapped to o and i.
One way of approaching it would be:
df[df.groupby('A', sort=False)['B'].transform(lambda x: len(set(x))) > 1]
which returns
A B
0 b k
1 b l
5 e o
6 e i
However, that requires a lambda which might make it slow. Does anyone see an option to speed it up?
You can groupby with nunique to get how many unique values in 'B' belong to each unique value in 'A'.
df.groupby('A').B.nunique()
#A
#b 2
#c 1
#d 1
#e 2
#Name: B, dtype: int64
And so you can check if any of them have more than 1 mapping:
df.groupby('A').B.nunique().gt(1).any()
#True
The above is conceptually no different from what you proposed. However, there is often a major performance gain if you are able to use a built-in groupby operation, which has been "optimized", as opposed to a slow lambda that requires a loop. We can see that as the DataFrame gets large the lambda can become nearly 100x slower, which is a big deal when things are starting to take seconds to compute.
import perfplot
import pandas as pd
import numpy as np
def gb_lambda(df):
return df.groupby('A')['B'].apply(lambda x: len(set(x))).gt(1)
def gb_nunique(df):
return df.groupby('A').B.nunique().gt(1)
perfplot.show(
setup=lambda n: pd.DataFrame({'A': np.random.randint(0, n//2, n),
'B': np.random.randint(0, n//2, n)}),
kernels=[
lambda df: gb_lambda(df),
lambda df: gb_nunique(df),
],
labels=['groupby with lambda', 'Groupby.nunique'],
n_range=[2 ** k for k in range(2,18)],
equality_check=np.allclose,
xlabel='~len(df)'
)

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With