Python regex to extract a portion of string

Tags:

I want to extract a portion of a large string. There's a target word and an upper bound on the number of words before and after that. The extracted substring must therefore contain the target word along with the upper bound words before and after it. The before and after part can contain lesser words if the target word is closer to the beginning or end of the text.

Eample string

"Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum."

Target word: laboris

words_before: 5

words_after: 2

Should return ['veniam, quis nostrud exercitation ullamco laboris nisi ut']

I thought of a couple of possible patterns but none of them worked. I guess it can also be done by simply traversing the string front and back from the target word. However a regex would definitely make things easier. Any help would be appreciated.

914

asked Oct 04 '15 01:10

user2963623

1 Answers

If you want to split words, you can use slice() and split() function. For example:

>>> text = "Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod
 tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, qu
is nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.
 Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu
 fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in 
culpa qui officia deserunt mollit anim id est laborum.".split()

>>> n = text.index('laboris')
>>> s = slice(n - 5, n + 3)

>>> text[s]
['veniam,', 'quis', 'nostrud', 'exercitation', 'ullamco', 'laboris', 'nisi', 'ut']

105

answered Sep 18 '22 14:09

Remi Crystal

Related questions
                            
                                Python theano.scan taps argument
                            
                                Multiprocessing Pool in Python - Only single CPU is utilized
                            
                                ibpy: extract API responses for multiple contracts
                            
                                Error when indexing with 2 dimensions in NumPy
                            
                                how to give some unique id to each anonymous user in django
                            
                                Union with tuples Python
                            
                                Why does np.percentile return NaN for high percentiles?
                            
                                Python set interpetation of 1 and True
                            
                                Factorial of a matrix elementwise with Numpy
                            
                                How do I run twisted from the console?
                            
                                How to get an attribute of an Element that is namespaced
                            
                                why is gevent-websocket synchronous?
                            
                                Remove Outliers from dataset
                            
                                PEP 3103: Difference between switch case and if statement code blocks
                            
                                Python Telegram Bot - Send Image
                            
                                How to replace all occurences except the first one?
                            
                                Issue with scipy install on windows
                            
                                Python and BeautifulSoup Opening pages
                            
                                List of language codes (ISO639-1) in Python?
                            
                                Parse yaml into a list in python

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Python regex to extract a portion of string

Tags:

python

regex

python-2.7

user2963623

People also ask

1 Answers

Remi Crystal

Recent Activity

Donate For Us