In the following code I want to get just the digits between '-' and 'u'. I thought i could apply regular expression non capturing groups format (?: … ) to ignore everything from '-' to the first digit. But output always include it. How can i use noncapturing groups format to generate correct ouput? <pre class="prettyprint"><code>df = pd.DataFrame( {'a' : [1,2,3,4], 'b' : ['41u -428u', '31u - 68u', '11u - 58u', '21u - 318u'] }) df['b'].str.extract('((?:-[ ]*)[0-9]*)', expand=True) </code></pre> <img src="https://i.stack.imgur.com/LG9IL.png" alt="enter image description here"> <img src="https://i.stack.imgur.com/AYEwm.png" alt="enter image description here">

I think you're trying too complicated a regex. What about: <pre class="prettyprint"><code>df['b'].str.extract(r'-(.*)u', expand=True) 0 0 428 1 68 2 58 3 318 </code></pre>

How to use regex non-capturing groups format in Python

Tags:

In the following code I want to get just the digits between '-' and 'u'. I thought i could apply regular expression non capturing groups format (?: … ) to ignore everything from '-' to the first digit. But output always include it. How can i use noncapturing groups format to generate correct ouput?

df = pd.DataFrame(
    {'a' : [1,2,3,4], 
     'b' : ['41u -428u', '31u - 68u', '11u - 58u', '21u - 318u']
    })

df['b'].str.extract('((?:-[ ]*)[0-9]*)', expand=True)

enter image description here

585

asked May 18 '18 18:05

StackUser

2 Answers

It isn't included in the inner group, but it's still included as part of the outer group. A non-capturing group does't necessarily imply it isn't captured at all... just that that group does not explicitly get saved in the output. It is still captured as part of any enclosing groups.

Just do not put them into the () that define the capturing:

import pandas as pd

df = pd.DataFrame(
    {'a' : [1,2,3,4], 
     'b' : ['41u -428u', '31u - 68u', '11u - 58u', '21u - 318u']
    })

df['b'].str.extract(r'- ?(\d+)u', expand=True)

     0
0  428
1   68
2   58
3  318

That way you match anything that has a '-' in front (mabye followed by a aspace), a 'u' behind and numbers between the both.

Where,

-      # literal hyphen
\s?    # optional space—or you could go with \s* if you expect more than one
(\d+)  # capture one or more digits 
u      # literal "u"

184

answered Sep 25 '22 15:09

Patrick Artner

I think you're trying too complicated a regex. What about:

df['b'].str.extract(r'-(.*)u', expand=True)

      0
0   428
1    68
2    58
3   318

answered Sep 22 '22 15:09

sacuL

Related questions
                            
                                correct way to extend __init__ in python 3 from parent class
                            
                                Updating a variable by its id within python
                            
                                nanfunctions and regular functions behaving the same on Pandas type
                            
                                Using chi2 test for feature selection with continuous features (Scikit Learn)
                            
                                Python Azure Graph: Access Token missing or malformed
                            
                                Using LSTM to predict a simple synthetic time series. Why is it that bad?
                            
                                python: Initial condition in solving differential equation
                            
                                I got an error Attempted relative import beyond top-level package
                            
                                Trying to call method on dict, getting AttributeError: 'dict' object attribute 'update' is read-only
                            
                                Change the input size in Keras
                            
                                QR Code Detection from Pyzbar with Camera Image
                            
                                DataError: No numeric types using mean aggregate function but not sum?
                            
                                How do I configure JsonFormatter in logging dictConfig?
                            
                                RuntimeError: main thread is not in main loop using Matplotlib with Django
                            
                                Sum of specific rows in a dataframe (Pandas)
                            
                                How to sort QTableWidget column values? [duplicate]
                            
                                when plotting with vbar on an xaxis that's a datetime axis, how can I set the width of the bars to be "one day"?
                            
                                Passing options to a function
                            
                                Find the lowercase (un-shifted) form of symbols
                            
                                SQLAlchemy: Get only one column [duplicate]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to use regex non-capturing groups format in Python

Tags:

python

regex

pandas

StackUser

People also ask

2 Answers

Patrick Artner

sacuL

Recent Activity

Donate For Us