I have the three following dataframes:
df_A = pd.DataFrame( {'id_A': [1, 1, 1, 1, 2, 2, 3, 3], 
                    'Animal_A': ['cat','dog','fish','bird','cat','fish','bird','cat' ]})
df_B = pd.DataFrame( {'id_B': [1, 2, 2, 3, 4, 4, 5], 
                    'Animal_B': ['dog','cat','fish','dog','fish','cat','cat' ]})
df_P = pd.DataFrame( {'id_A': [1, 1, 2, 3], 
                      'id_B': [2, 3, 4, 5]})
df_A
    id_A    Animal_A
0   1       cat
1   1       dog
2   1       fish
3   1       bird
4   2       cat
5   2       fish
6   3       bird
7   3       cat
df_B
    id_B    Animal_B
0   1       dog
1   2       cat
2   2       fish
3   3       dog
4   4       fish
5   4       cat
6   5       cat
df_P
    id_A    id_B
0   1       2
1   1       3
2   2       4
3   3       5
And I would like to get an additional column to df_P that tells the number of Animals shared between id_A and id_B. What I'm doing is:
df_P["n_common"] = np.nan
for i in df_P.index.tolist():
    id_A = df_P["id_A"][i]
    id_B = df_P["id_B"][i]
    df_P.iloc[i,df_P.columns.get_loc('n_common')] = len(set(df_A['Animal_A'][df_A['id_A']==id_A]).intersection(df_B['Animal_B'][df_B['id_B']==id_B]))
The result being:
df_P
    id_A    id_B    n_common
0   1       2       2.0
1   1       3       1.0
2   2       4       2.0
3   3       5       1.0
Is there a faster, more pythonic, way to do this? Is there a way to avoid the for loop?
To find the common rows between two DataFrames with merge(), use the parameter “how” as “inner” since it works like SQL Inner Join and this is what we want to achieve.
To find the common elements between two columns of an R data frame, we can use intersect function.
Not sure if it is faster or more pythonic, but it avoids the for loop :)
import pandas as pd
df_A = pd.DataFrame( {'id_A': [1, 1, 1, 1, 2, 2, 3, 3], 
                      'Animal_A': ['cat','dog','fish','bird','cat','fish','bird','cat' ]})
df_B = pd.DataFrame( {'id_B': [1, 2, 2, 3, 4, 4, 5], 
                      'Animal_B': ['dog','cat','fish','dog','fish','cat','cat' ]})
df_P = pd.DataFrame( {'id_A': [1, 1, 2, 3], 
                      'id_B': [2, 3, 4, 5]})
df = pd.merge(df_A, df_P, on='id_A')
df = pd.merge(df_B, df, on='id_B')
df = df[df['Animal_A'] == df['Animal_B']].groupby(['id_A', 'id_B'])['Animal_A'].count().reset_index()
df.rename({'Animal_A': 'n_common'},inplace=True,axis=1)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With