I have an input (including unicode):
s = "Question1: a12 is the number of a, b1 is the number of cầu thủ"
I want to get all words that contain no number and have more than 2 chars, desire output:
['is', 'the', 'number', 'of', 'is', 'the', 'number', 'of', 'cầu', 'thủ']
.
I've tried
re.compile('[\w]{2,}').findall(s)
and got
'Question1', 'a12', 'is', 'the', 'number', 'of', 'b1', 'is', 'the', 'number', 'of', 'cầu', 'thủ'
Is there any way to get only words with no number in it?
You may use
import re
s = "Question1: a12 is the number of a, b1 is the number of cầu thủ"
print(re.compile(r'\b[^\W\d_]{2,}\b').findall(s))
# => ['is', 'the', 'number', 'of', 'is', 'the', 'number', 'of', 'cầu', 'thủ']
Or, if you only want to limit to ASCII only letter words with minimum 2 letters:
print(re.compile(r'\b[a-zA-Z]{2,}\b').findall(s))
See the Python demo
Details
[^\W\d_]
(or r'[a-zA-Z]
ASCII-only variation)\b
r'...'
.So, r'\b[^\W\d_]{2,}\b'
defines a regex that matches a word boundary, two or more letters and then asserts that there is no word char right after these two letters.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With