Say I have a pandas DataFrame with four columns: A,B,C,D. <pre class="prettyprint lang-py prettyprint-override"><code>my_df = pd.DataFrame({'A': [0,1,4,9], 'B': [1,7,5,7],'C':[1,1,1,1],'D':[2,2,2,2]}) </code></pre> I also have a list of tuples: <pre class="prettyprint lang-py prettyprint-override"><code>my_tuples = [(0,1),(4,5),(9,9)] </code></pre> I want to keep only the rows of the dataframe where the value of <code>(my_df['A'],my_df['B'])</code> is equal to one of the tuples in my_tuples. In this example, this would be row#0 and row#2. Is there a good way to do this? I'd appreciate any help.

Use <code>DataFrame.merge</code> with <code>DataFrame</code> created by tuples, there is no <code>on</code> parameter for default interecton of all columns in both <code>DataFrames</code>, here <code>A</code> and <code>B</code>: <pre class="prettyprint"><code>df = my_df.merge(pd.DataFrame(my_tuples, columns=['A','B'])) print (df) A B C D 0 0 1 1 2 1 4 5 1 2 </code></pre> Or: <pre class="prettyprint"><code>df = my_df[my_df.set_index(['A','B']).index.isin(my_tuples)] print (df) A B C D 0 0 1 1 2 2 4 5 1 2 </code></pre>

pandas: get rows by comparing two columns of dataframe to list of tuples

Tags:

python

pandas

Say I have a pandas DataFrame with four columns: A,B,C,D.

my_df = pd.DataFrame({'A': [0,1,4,9], 'B': [1,7,5,7],'C':[1,1,1,1],'D':[2,2,2,2]})

I also have a list of tuples:

my_tuples = [(0,1),(4,5),(9,9)]

I want to keep only the rows of the dataframe where the value of (my_df['A'],my_df['B']) is equal to one of the tuples in my_tuples.

In this example, this would be row#0 and row#2.

Is there a good way to do this? I'd appreciate any help.

592

asked Mar 17 '20 10:03

abra

2 Answers

Use DataFrame.merge with DataFrame created by tuples, there is no on parameter for default interecton of all columns in both DataFrames, here A and B:

df = my_df.merge(pd.DataFrame(my_tuples, columns=['A','B']))
print (df)
   A  B  C  D
0  0  1  1  2
1  4  5  1  2

Or:

df = my_df[my_df.set_index(['A','B']).index.isin(my_tuples)]
print (df)
   A  B  C  D
0  0  1  1  2
2  4  5  1  2

answered Oct 25 '22 19:10

jezrael

We can also use DataFrame.loc with map.

my_df.loc[list(map(lambda x: x in my_tuples, zip(my_df['A'], my_df['B']))),:]

#my_df.loc[[row in my_tuples for row in zip(my_df['A'], my_df['B'])],:]

Time comparison

%%timeit
my_df.loc[list(map(lambda x: x in my_tuples, zip(my_df['A'], my_df['B']))),:]
394 µs ± 24.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%%timeit
df = my_df.merge(pd.DataFrame(my_tuples, columns=['A','B']))
3.56 ms ± 248 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


%%timeit
df = my_df[my_df.set_index(['A','B']).index.isin(my_tuples)]
3.99 ms ± 139 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

answered Oct 25 '22 20:10

ansev

Related questions
                            
                                How do I rename a key while preserving order in dictionaries (Python 3.7+)?
                            
                                Plotting spatial data on individual map using altair
                            
                                Removing named entities from a document using spacy
                            
                                Is it good to use asyncio.sleep() in long running code to divide async function to multiple smaller parts of code?
                            
                                How to mask image with binary mask
                            
                                Component Gateway with DataprocOperator on Airflow
                            
                                CNN Pytorch Error : Input type (torch.cuda.ByteTensor) and weight type (torch.cuda.FloatTensor) should be the same
                            
                                Python. Selenium. drag_and_drop error 'AttributeError: move_to requires a WebElement'
                            
                                Understanding Pytorch Grid Sample
                            
                                Handling PyLint Warning of Inconsistent Return Statement
                            
                                @tf.function ValueError: Creating variables on a non-first call to a function decorated with tf.function, unable to understand behaviour
                            
                                Decay parameter of Adam optimizer in Keras
                            
                                How do you edit an existing Tensorboard Training Loss summary?
                            
                                Converting python list to pytorch tensor
                            
                                What are these set operations, and why do they give different results?
                            
                                Black (Python) Ignore Rule
                            
                                melt columns and add 20 minutes to each row in date column
                            
                                Gensim LDA Coherence Score Nan
                            
                                cnn IndexError: Target 2 is out of bounds
                            
                                PySide2 Qt3D mesh does not show up

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With