<p>I don't understand <code>pandas</code> <code>DataFrame</code> <code>filter</code>.</p> <h3>Setup</h3> <pre class="prettyprint"><code>import pandas as pd df = pd.DataFrame( [ ['Hello', 'World'], ['Just', 'Wanted'], ['To', 'Say'], ['I\'m', 'Tired'] ] ) </code></pre> <h3>Problem</h3> <pre class="prettyprint"><code>df.filter([0], regex=r'(Hel|Just)', axis=0) </code></pre> <p>I'd expect the <code>[0]</code> to specify the 1st column as the one to look at and <code>axis=0</code> to specify filtering rows. What I get is this:</p> <pre class="prettyprint"><code> 0 1 0 Hello World </code></pre> <p>I was expecting</p> <pre class="prettyprint"><code> 0 1 0 Hello World 1 Just Wanted </code></pre> <h3>Question</h3> <ul> <li>What would have gotten me what I expected?</li> </ul>

<p>This should work:</p> <p><code>df[df[0].str.contains('(Hel|Just)', regex=True)]</code></p>

<p>Per the docs, </p> <blockquote> <p>Arguments are mutually exclusive, but this is not checked for</p> </blockquote> <p>So, it appears, the first optional argument, <code>items=[0]</code> trumps the third optional argument, <code>regex=r'(Hel|Just)'</code>. </p> <pre class="prettyprint"><code>In [194]: df.filter([0], regex=r'(Hel|Just)', axis=0) Out[194]: 0 1 0 Hello World </code></pre> <p>is equivalent to</p> <pre class="prettyprint"><code>In [201]: df.filter([0], axis=0) Out[201]: 0 1 0 Hello World </code></pre> <p>which is merely selecting the row(s) with index values in <code>[0]</code> along the 0-axis.</p> <hr> <p>To get the desired result, you could use <code>str.contains</code> to create a boolean mask, and use <code>df.loc</code> to select rows:</p> <pre class="prettyprint"><code>In [210]: df.loc[df.iloc[:,0].str.contains(r'(Hel|Just)')] Out[210]: 0 1 0 Hello World 1 Just Wanted </code></pre>

<p>Here is a chaining method:</p> <pre class="prettyprint"><code>df.loc[lambda x: x['column_name'].str.contains(regex_patern, regex = True)] </code></pre>

pandas DataFrame filter regex

Setup

import pandas as pd

df = pd.DataFrame(
    [
        ['Hello', 'World'],
        ['Just', 'Wanted'],
        ['To', 'Say'],
        ['I\'m', 'Tired']
    ]
)

Problem

df.filter([0], regex=r'(Hel|Just)', axis=0)

I'd expect the [0] to specify the 1st column as the one to look at and axis=0 to specify filtering rows. What I get is this:

       0      1
0  Hello  World

I was expecting

       0       1
0  Hello   World
1   Just  Wanted

Question

What would have gotten me what I expected?

513

asked May 06 '16 20:05

piRSquared

3 Answers

This should work:

df[df[0].str.contains('(Hel|Just)', regex=True)]

answered Nov 08 '22 00:11

Max

Per the docs,

Arguments are mutually exclusive, but this is not checked for

So, it appears, the first optional argument, items=[0] trumps the third optional argument, regex=r'(Hel|Just)'.

In [194]: df.filter([0], regex=r'(Hel|Just)', axis=0)
Out[194]: 
       0      1
0  Hello  World

is equivalent to

In [201]: df.filter([0], axis=0)
Out[201]: 
       0      1
0  Hello  World

which is merely selecting the row(s) with index values in [0] along the 0-axis.

To get the desired result, you could use str.contains to create a boolean mask, and use df.loc to select rows:

In [210]: df.loc[df.iloc[:,0].str.contains(r'(Hel|Just)')]
Out[210]: 
       0       1
0  Hello   World
1   Just  Wanted

answered Nov 08 '22 01:11

unutbu

Here is a chaining method:

df.loc[lambda x: x['column_name'].str.contains(regex_patern, regex = True)]

answered Nov 07 '22 23:11

Ramin Melikov

Related questions
                            
                                How to draw rounded line ends using matplotlib
                            
                                Graphing a Parabola using Matplotlib in Python
                            
                                Read csv with dd.mm.yyyy in Python and Pandas
                            
                                why do i have error "Address already in use"?
                            
                                Why is using a generator function twice as fast in this case?
                            
                                morse code to english python3
                            
                                asyncio: Wait for event from other thread
                            
                                Python code for Bluetooth throws error after I had to reset the adapter
                            
                                How to feed caffe multi label data in HDF5 format?
                            
                                Add inline model to django admin site
                            
                                Does Python traceback.print_exc() prints to stdout or stderr?
                            
                                Average value in multiple dictionaries based on key in Python?
                            
                                sum over a list of tensors in tensorflow
                            
                                How to Serialize generic foreign key In DRF
                            
                                Calculate angle (degrees) in Python between line (with slope x) and horizontal
                            
                                Progress bar in Sublime Text with Python
                            
                                How to import/open numpy module to IDLE
                            
                                Python: Elementwise join of two lists of same length
                            
                                'utf-8' codec can't decode byte 0x80
                            
                                Copy pandas DataFrame row to multiple other rows

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

pandas DataFrame filter regex

Tags:

python

regex

pandas

filter