I am trying to remove words that have length below 2 and any word that is numbers. For example <pre class="prettyprint"><code> s = " This is a test 1212 test2" </code></pre> Output desired is <pre class="prettyprint"><code>" This is test test2" </code></pre> I tried <code>\w{2,}</code> this removes all the word whose length is below 2. When I added <code>\D+</code> this removes all numbers when I didn't want to get rid of 2 from <code>test2</code>.

You may use: <pre class="prettyprint"><code>s = re.sub(r'\b(?:\d+|\w)\b\s*', '', s) </code></pre> RegEx Demo Pattern Details: <ul> <li> <code>\b</code>: Match word boundary</li> <li> <code>(?:\d+|\w)</code>: Match a single word character or 1+ digits</li> <li> <code>\b</code>: Match word boundary</li> <li> <code>\s*</code>: Match 0 or more whitespaces</li> </ul>

You can make use of work boundaries <code>'\b'</code> and remove anything that is 1 character long inside boundaries: number or letter, doesn't matter. Also remove anything between boundaries that is just numbers: <pre class="prettyprint"><code>import re s = " This is a test 1212 test2" print( re.sub(r"\b([^ ]|\d+)\b","",s)) </code></pre> Output: <pre class="prettyprint"><code> This is test test2 </code></pre> Explanation: <pre class="prettyprint"><code>\b( word boundary followed by a group [^ ] anything that is not a space (1 character) | or \d+ any amount of numbers ) followed by another boundary </code></pre> is replaced by <code>re.sub(pattern, replaceBy, source)</code> with <code>""</code>.

How can I remove numbers, and words with length below 2, from a sentence?

Tags:

python

regex

I am trying to remove words that have length below 2 and any word that is numbers. For example

 s = " This is a test 1212 test2"

Output desired is

" This is test test2"

I tried \w{2,} this removes all the word whose length is below 2. When I added \D+ this removes all numbers when I didn't want to get rid of 2 from test2.

733

asked Oct 14 '20 18:10

Sam

2 Answers

You may use:

s = re.sub(r'\b(?:\d+|\w)\b\s*', '', s)

RegEx Demo

Pattern Details:

\b: Match word boundary
(?:\d+|\w): Match a single word character or 1+ digits
\b: Match word boundary
\s*: Match 0 or more whitespaces

191

answered Sep 30 '22 07:09

anubhava

You can make use of work boundaries '\b' and remove anything that is 1 character long inside boundaries: number or letter, doesn't matter. Also remove anything between boundaries that is just numbers:

import re

s = " This is a test 1212 test2"

print( re.sub(r"\b([^ ]|\d+)\b","",s))

Output:

 This is  test  test2

Explanation:

\b(           word boundary followed by a group
   [^ ]           anything that is not a space (1 character) 
       |              or
        \d+       any amount of numbers
)             followed by another boundary

is replaced by re.sub(pattern, replaceBy, source) with "".

answered Sep 30 '22 08:09

Patrick Artner

Related questions
                            
                                Django, how to group models by date?
                            
                                how to use scipy.optimize.linear_sum_assignment in tensorflow or keras?
                            
                                Keyboard Interrupt from Python does not abort Rust function (PyO3)
                            
                                Keras callback AttributeError: 'ModelCheckpoint' object has no attribute '_implements_train_batch_hooks'
                            
                                Best way to mark a pybind11-binding as deprecated
                            
                                How to "send keys" to a canvas element for longer duration?
                            
                                Tensorflow DecodeJPEG: Expected image (JPEG, PNG, or GIF), got unknown format starting with '\000\000\000\000\000\000\000\00'
                            
                                Why does mypy reject my "mixed union" type declaration?
                            
                                What exactly is a Sequence?
                            
                                Can I make my custom pytorch modules behave differently when train() or eval() are called?
                            
                                Adding auth decorators to flask restx
                            
                                Why doesn't PyGame draw in the window before the delay or sleep?
                            
                                Can't fetch some numbers from a website using requests
                            
                                Tensorflow 2.3.0 does not detect GPU
                            
                                keras accuracy doesn't improve more than 59 percent
                            
                                Plotly: How to create an odd number of subplots?
                            
                                arrays into pandas dataframe columns
                            
                                What is the proper way to specify a custom template path for jupyter nbconvert V6?
                            
                                Extracting blocks from block diagonal PyTorch tensor
                            
                                How can I prevent or trap StopIteration exception in the yield-calling function?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With