Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

compare multiple column value together using pandas

I know i can do like below if we are checking only two columns together.

df['flag'] = df['a_id'].isin(df['b_id'])

where df is a data frame, and a_id and b_id are two columns of the data frame. It will return True or False value based on the match. But i need to compare multiple columns together.

For example: if there are a_id , a_region, a_ip, b_id, b_region and b_ip columns. I want to compare like below,

enter image description here

a_key = df['a_id'] + df['a_region] + df['a_ip']
b_key = df['b_id'] + df['b_region] + df['b_ip']

df['flag'] = a_key.isin(b_key)

Somehow the above code is always returning False value. The output should be like below,

enter image description here

First row flag will be True because there is a match.

a_key becomes 2a10 this is match with last row of b_key (2a10)

like image 875
Sakeer Avatar asked Apr 28 '19 07:04

Sakeer


People also ask

How to compare two DataFrames in pandas?

Note that the columns of dataframes are data series. So if you take two columns as pandas series, you may compare them just like you would do with numpy arrays. "I'd like to check if a person in one data frame is in another one." The condition is for both name and first name be present in both dataframes and in the same row.

How to compare column names of two DataFrames in R?

Comparing column names of two dataframes. Incase you are trying to compare the column names of two dataframes: If df1 and df2 are the two dataframes: set (df1.columns).intersection (set (df2.columns)) This will provide the unique column names which are contained in both the dataframes. Example:

How can I compare column values in two different Excel files?

There are multiple ways to compare column values in 2 different excel files. The approach here checks each sequence in the Unknown seq column from the first file with each sequence in the Reference_sequences column from the second file.

How to compare two NumPy columns using where method?

By using the Where () method in NumPy, we are given the condition to compare the columns. If ‘column1’ is lesser than ‘column2’ and ‘column1’ is lesser than the ‘column3’, We print the values of ‘column1’.


2 Answers

You were going in the right direction, just use:

a_key = df['a_id'].astype(str) + df['a_region'] + df['a_ip'].astype(str)
b_key = df['b_id'].astype(str) + df['b_region'] + df['b_ip'].astype(str)

a_key.isin(b_key)

Mine is giving below results:

0     True
1    False
2    False
like image 119
hacker315 Avatar answered Oct 30 '22 05:10

hacker315


You can use isin with DataFrame as value, but as per the docs:

If values is a DataFrame, then both the index and column labels must match

So this should work:

# Removing the prefixes from column names
df_a = df[['a_id', 'a_region', 'a_ip']].rename(columns=lambda x: x[2:])
df_b = df[['b_id', 'b_region', 'b_ip']].rename(columns=lambda x: x[2:])

# Find rows where all values are in the other
matched = df_a.isin(df_b).all(axis=1)

# Get actual rows with boolean indexing
df_a.loc[matched]

# ... or add boolean flag to dataframe
df['flag'] = matched
like image 32
somiandras Avatar answered Oct 30 '22 06:10

somiandras