Suppose I have a dataframe as, <pre class="prettyprint lang-py prettyprint-override"><code> a b 0 1 2 1 2 3 2 4 2 3 4 3 </code></pre> I want to filter the dataframe such that I get the result as, <pre class="prettyprint lang-py prettyprint-override"><code> a b 0 1 2 3 4 3 </code></pre> i.e, I want the combination <code>(1,2)</code> and <code>(4,3)</code> by filtering the two columns together. If I try this, <pre class="prettyprint lang-py prettyprint-override"><code>df1 = df[df['a'].isin([1,4]) & df['b'].isin([2,3])] </code></pre> I get the entire dataframe back because combinations of <code>(1,3)</code> and <code>(4,2)</code> also gets included in the above method. But I need only the given combinations. I have a huge list of tuples of two columns based on which I want to filter the dataframe considering the corresponding tuple combination. Also, I dont want to merge the two columns together as a single string and then filter.

Use - <pre class="prettyprint"><code>df[df[['a', 'b']].apply(tuple, axis=1).isin([(1,2), (4,3)])] </code></pre> Output <pre class="prettyprint"><code> a b 0 1 2 3 4 3 </code></pre> Explanation <code>df[['a', 'b']].apply(tuple, axis=1)</code> gives a series of tuples - <pre class="prettyprint"><code>0 (1, 2) 1 (2, 3) 2 (4, 2) 3 (4, 3) </code></pre> <code>.isin([(1,2), (4,3)])</code> searches for the desired tuples and gives a boolean series

The tuple comparison approach as outlined by @Vivek Kalyanarangan is the way to go but the speed can be significantly increased in case of large dataframes by utilizing the MultiIndex instead of using an apply function for tuple creation: For example, in your case: <pre class="prettyprint"><code>keep_tuples = [(1,2), (4,3)] tuples_in_df = pd.MultiIndex.from_frame(df[["a","b"]]) df[tuples_in_df.isin(keep_tuples)] </code></pre> This leads to ~5X speed improvement on a 1,000,000 X 2 sized df when compared to using apply function.

Filter Pandas dataframe based on combination of two columns

Tags:

python

pandas

Suppose I have a dataframe as,

I want to filter the dataframe such that I get the result as,

   a  b
0  1  2
3  4  3

i.e, I want the combination (1,2) and (4,3) by filtering the two columns together.

If I try this,

df1 = df[df['a'].isin([1,4]) & df['b'].isin([2,3])]

I get the entire dataframe back because combinations of (1,3) and (4,2) also gets included in the above method. But I need only the given combinations. I have a huge list of tuples of two columns based on which I want to filter the dataframe considering the corresponding tuple combination.

Also, I dont want to merge the two columns together as a single string and then filter.

765

asked Dec 27 '18 13:12

mayank agrawal

2 Answers

Use -

df[df[['a', 'b']].apply(tuple, axis=1).isin([(1,2), (4,3)])]

Output

    a   b
0   1   2
3   4   3

Explanation

df[['a', 'b']].apply(tuple, axis=1) gives a series of tuples -

0    (1, 2)
1    (2, 3)
2    (4, 2)
3    (4, 3)

.isin([(1,2), (4,3)]) searches for the desired tuples and gives a boolean series

answered Sep 18 '22 07:09

Vivek Kalyanarangan

The tuple comparison approach as outlined by @Vivek Kalyanarangan is the way to go but the speed can be significantly increased in case of large dataframes by utilizing the MultiIndex instead of using an apply function for tuple creation:

For example, in your case:

keep_tuples = [(1,2), (4,3)]
tuples_in_df = pd.MultiIndex.from_frame(df[["a","b"]])
df[tuples_in_df.isin(keep_tuples)]

This leads to ~5X speed improvement on a 1,000,000 X 2 sized df when compared to using apply function.

answered Sep 22 '22 07:09

Md Imbesat Hassan Rizvi

Related questions
                            
                                How to read binary files in Python using NumPy?
                            
                                Keras + tensorflow gives the error "no attribute 'control_flow_ops'"
                            
                                Keras custom decision threshold for precision and recall
                            
                                Pandas mapping to TRUE/FALSE as String, not Boolean
                            
                                Handling errors in psycopg2 - one error seems to create more?
                            
                                Pandas: Count the first consecutive True values
                            
                                How to 'see' / highlight tabs and spaces in PyCharm for checking indentation?
                            
                                How to remove or change the default help command?
                            
                                How to mock os.listdir to pretend files and directories in Python?
                            
                                flask-jwt-extended: Fake Authorization Header during testing (pytest)
                            
                                reading special characters text from .ini file in python
                            
                                Using word2vec to classify words in categories
                            
                                RuntimeError: There is no current event loop in thread 'Thread-1' , multithreading and asyncio error
                            
                                classification metrics can't handle a mix of continuous-multioutput and multi-label-indicator targets
                            
                                How to use inverse_transform in MinMaxScaler for a column in a matrix
                            
                                How to calculate Rolling Correlation with pandas?
                            
                                Force Jupyter Notebook *not* to open a web browser
                            
                                How can I use smoothing techniques to remove jitter in pose estimation? [closed]
                            
                                Two inputs to one model in Keras
                            
                                dataframe to dict such that one column is the key and the other is the value [duplicate]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With