I'd appreciate your help. I have a pandas dataframe. I want to search 3 columns of the dataframe using a regular expression, then return all rows that meet the search criteria, sorted by one of my columns. I would like to write this as a function so I can implement this logic with other criteria if possible, but am not quite sure how to do this. For example, I know how pull the results of a search thusly (with col1 being a column name): <pre class="prettyprint"><code>idx1 = df.col1.str.contains(r'vhigh|high', flags=re.IGNORECASE, regex=True, na=False) print df[~idx1] </code></pre> but I can't figure out how to take this type of action, and perform it with multiple columns and then sort. Anyone have any tips?

You can use <code>apply</code> to make the code more concise. For example, given this DataFrame: <pre class="prettyprint"><code>df = pd.DataFrame( { 'col1': ['vhigh', 'low', 'vlow'], 'col2': ['eee', 'low', 'high'], 'val': [100,200,300] } ) print df </code></pre> Input: <pre class="prettyprint"><code> col1 col2 val 0 vhigh eee 100 1 low low 200 2 vlow high 300 </code></pre> You can select all the rows that contain the strings <code>vhigh</code> or <code>high</code> in columns <code>col1</code> or <code>col2</code> as follow: <pre class="prettyprint"><code>mask = df[['col1', 'col2']].apply( lambda x: x.str.contains( 'vhigh|high', regex=True ) ).any(axis=1) print df[mask] </code></pre> The <code>apply</code> function applies the <code>contains</code> function on each column (since by default <code>axis=0</code>). The <code>any</code> function returns a Boolean mask, with element True indicating that at least one of the columns met the search criteria. This can then be used to perform selection on the original DataFrame. Output: <pre class="prettyprint"><code> col1 col2 val 0 vhigh eee 100 2 vlow high 300 </code></pre> Then, to sort the result by a column, e.g. the <code>val</code> column, you could simply do: <pre class="prettyprint"><code>df[mask].sort('val') </code></pre>

Search and filter pandas dataframe with regular expressions

Tags:

python

regex

pandas

I'd appreciate your help. I have a pandas dataframe. I want to search 3 columns of the dataframe using a regular expression, then return all rows that meet the search criteria, sorted by one of my columns. I would like to write this as a function so I can implement this logic with other criteria if possible, but am not quite sure how to do this.

For example, I know how pull the results of a search thusly (with col1 being a column name):

idx1 = df.col1.str.contains(r'vhigh|high', flags=re.IGNORECASE, regex=True, na=False)

print df[~idx1]

but I can't figure out how to take this type of action, and perform it with multiple columns and then sort. Anyone have any tips?

586

asked Sep 16 '15 16:09

Daina

1 Answers

You can use apply to make the code more concise. For example, given this DataFrame:

df = pd.DataFrame(
    {
        'col1': ['vhigh', 'low', 'vlow'],
        'col2': ['eee', 'low', 'high'],
        'val': [100,200,300]
    }
)
print df

Input:

    col1  col2  val
0  vhigh   eee  100
1    low   low  200
2   vlow  high  300

You can select all the rows that contain the strings vhigh or high in columns col1 or col2 as follow:

mask = df[['col1', 'col2']].apply(
    lambda x: x.str.contains(
        'vhigh|high',
        regex=True
    )
).any(axis=1)
print df[mask]

The apply function applies the contains function on each column (since by default axis=0). The any function returns a Boolean mask, with element True indicating that at least one of the columns met the search criteria. This can then be used to perform selection on the original DataFrame.

Output:

    col1  col2  val
0  vhigh   eee  100
2   vlow  high  300

Then, to sort the result by a column, e.g. the val column, you could simply do:

df[mask].sort('val')

167

answered Sep 20 '22 15:09

YS-L

Related questions
                            
                                How to run awk -F\' '{print $2}' inside subprocess.Popen in Python?
                            
                                Сreate a dictionary from a zip of 3 lists
                            
                                Get the min of [0, x] element wise for a column
                            
                                How to get a bigger font size for x-axis tick labels in scipy-generated dedrogram?
                            
                                How to add automatically extension to Jupyter (ipython) notebook?
                            
                                Moving A Rectangle in Pygame
                            
                                How to sort dictionary on first element of the key (tuple)
                            
                                merge few pivot tables in pandas
                            
                                Pandas: index of max value for each group
                            
                                How to match double quote in python regex?
                            
                                how to extract token from string in python?
                            
                                Compile numpy WITHOUT Intel MKL/BLAS/ATLAS/LAPACK
                            
                                Fill missing timeseries data using pandas or numpy
                            
                                Using memmap files for batch processing
                            
                                All addresses to go to a single page (catch-all route to a single view) in Python Pyramid
                            
                                getting a papers references using Elsevier Scopus API
                            
                                How can I find null values with SELECT query in psycopg?
                            
                                Why my bokeh plots doesn't work on github?
                            
                                How to check if a SciPy CSR matrix is empty (i.e. contains only zeroes)?
                            
                                Methods don't chain in Python set

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With