Say I have a pandas DataFrame with four columns: A,B,C,D.
my_df = pd.DataFrame({'A': [0,1,4,9], 'B': [1,7,5,7],'C':[1,1,1,1],'D':[2,2,2,2]})
I also have a list of tuples:
my_tuples = [(0,1),(4,5),(9,9)]
I want to keep only the rows of the dataframe where the value of (my_df['A'],my_df['B'])
is equal to one of the tuples in my_tuples.
In this example, this would be row#0 and row#2.
Is there a good way to do this? I'd appreciate any help.
During data analysis, one might need to compute the difference between two rows for comparison purposes. This can be done using pandas. DataFrame. diff() function.
To combine two columns as a single column of tuples, use the DataFrame's apply(tuple) method.
Pandas is one of those packages and makes importing and analyzing data much easier. Let’s discuss all different ways of selecting multiple columns in a pandas DataFrame. Given a dictionary which contains Employee entity as keys and list of those entity as values. Select Second to fourth column. Example 2: Select one to another columns.
We can use the following code to compare the number of goals by row and output the winner of the match in a third column: The results of the comparison are shown in the new column called winner. Here are a few things to keep in mind when comparing two columns in a pandas DataFrame: The number of conditions and choices should be equal.
As you can see, both data sets contain the same column names, but partly different rows. In this example, I’ll show how to compare two pandas DataFrames with different lengths and identify all different rows that are contained in only one of the two DataFrames.
After executing the previous Python programming code the two pandas DataFrames visualized in Tables 1 and 2 have been created. As you can see, both data sets contain the same column names, but partly different rows.
Use DataFrame.merge
with DataFrame
created by tuples, there is no on
parameter for default interecton of all columns in both DataFrames
, here A
and B
:
df = my_df.merge(pd.DataFrame(my_tuples, columns=['A','B']))
print (df)
A B C D
0 0 1 1 2
1 4 5 1 2
Or:
df = my_df[my_df.set_index(['A','B']).index.isin(my_tuples)]
print (df)
A B C D
0 0 1 1 2
2 4 5 1 2
We can also use DataFrame.loc
with map
.
my_df.loc[list(map(lambda x: x in my_tuples, zip(my_df['A'], my_df['B']))),:]
#my_df.loc[[row in my_tuples for row in zip(my_df['A'], my_df['B'])],:]
Time comparison
%%timeit
my_df.loc[list(map(lambda x: x in my_tuples, zip(my_df['A'], my_df['B']))),:]
394 µs ± 24.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%%timeit
df = my_df.merge(pd.DataFrame(my_tuples, columns=['A','B']))
3.56 ms ± 248 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%%timeit
df = my_df[my_df.set_index(['A','B']).index.isin(my_tuples)]
3.99 ms ± 139 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With