Python regex findall

Tags:

regex

I am trying to extract all occurrences of tagged words from a string using regex in Python 2.7.2. Or simply, I want to extract every piece of text inside the [p][/p] tags. Here is my attempt:

regex = ur"[\u005B1P\u005D.+?\u005B\u002FP\u005D]+?" line = "President [P] Barack Obama [/P] met Microsoft founder [P] Bill Gates [/P], yesterday." person = re.findall(pattern, line)

Printing person produces ['President [P]', '[/P]', '[P] Bill Gates [/P]']

What is the correct regex to get: ['[P] Barack Obama [/P]', '[P] Bill Gates [/p]'] or ['Barrack Obama', 'Bill Gates'].

521

asked Oct 13 '11 10:10

1 Answers

import re regex = ur"\[P\] (.+?) \[/P\]+?" line = "President [P] Barack Obama [/P] met Microsoft founder [P] Bill Gates [/P], yesterday." person = re.findall(regex, line) print(person)

yields

['Barack Obama', 'Bill Gates']

The regex ur"[\u005B1P\u005D.+?\u005B\u002FP\u005D]+?" is exactly the same unicode as u'[[1P].+?[/P]]+?' except harder to read.

The first bracketed group [[1P] tells re that any of the characters in the list ['[', '1', 'P'] should match, and similarly with the second bracketed group [/P]].That's not what you want at all. So,

Remove the outer enclosing square brackets. (Also remove the stray 1 in front of P.)
To protect the literal brackets in [P], escape the brackets with a backslash: \[P\].
To return only the words inside the tags, place grouping parentheses around .+?.

114

answered Sep 25 '22 15:09

unutbu

Related questions
                            
                                How can I use a SOCKS 4/5 proxy with urllib2?
                            
                                How to keep all my django applications in specific folder
                            
                                python 'is not' operator
                            
                                How to extract from a list of objects a list of specific attribute?
                            
                                Modifying a symlink in python
                            
                                How can I partially read a huge CSV file?
                            
                                Using explicit (predefined) validation set for grid search with sklearn
                            
                                How do I unit test PySpark programs?
                            
                                Read XLSB File in Pandas Python
                            
                                Selenium waitForElement
                            
                                Python Conditional Variable Setting
                            
                                import matplotlib.pyplot hangs
                            
                                Extract matplotlib colormap in hex-format
                            
                                Can I get the exception from the finally block in python?
                            
                                How to remove repeated elements in a vector, similar to 'set' in Python
                            
                                Selection with .loc in python
                            
                                Using fourier analysis for time series prediction
                            
                                How do you directly overlay a scatter plot on top of a jpg image in matplotlib / Python?
                            
                                How to create/customize your own scorer function in scikit-learn?
                            
                                How do you create a custom activation function with Keras?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Python regex findall

Tags:

python

regex

Ignatius

People also ask

1 Answers

unutbu

Recent Activity

Donate For Us