How to get all words with specific length that doesn't contain number?

Question

I have an input (including unicode):

s = "Question1: a12 is the number of a, b1 is the number of cầu thủ"

I want to get all words that contain no number and have more than 2 chars, desire output:

['is', 'the', 'number', 'of', 'is', 'the', 'number', 'of', 'cầu', 'thủ'].

I've tried

re.compile('[\w]{2,}').findall(s)

and got

'Question1', 'a12', 'is', 'the', 'number', 'of', 'b1', 'is', 'the', 'number', 'of', 'cầu', 'thủ'

Is there any way to get only words with no number in it?

Wiktor Stribiżew · Accepted Answer

You may use

import re
s = "Question1: a12 is the number of a, b1 is the number of cầu thủ"
print(re.compile(r'\b[^\W\d_]{2,}\b').findall(s))
# => ['is', 'the', 'number', 'of', 'is', 'the', 'number', 'of', 'cầu', 'thủ']

Or, if you only want to limit to ASCII only letter words with minimum 2 letters:

print(re.compile(r'\b[a-zA-Z]{2,}\b').findall(s))

See the Python demo

Details

To match only letters, you need to use [^\W\d_] (or r'[a-zA-Z] ASCII-only variation)
To match whole words, you need word boundaries, \b
To make sure you are defining word boundaries and not backspace chars in the regex pattern, use a raw string literal, r'...'.

So, r'\b[^\W\d_]{2,}\b' defines a regex that matches a word boundary, two or more letters and then asserts that there is no word char right after these two letters.

How to get all words with specific length that doesn't contain number?

Tags:

python

regex

Ha Bom

1 Answers

Wiktor Stribiżew

Recent Activity

Donate For Us

How to get all words with specific length that doesn't contain number?

Tags:

python

regex

Ha Bom

1 Answers

Wiktor Stribiżew

Related questions

Recent Activity

Donate For Us