I am learning regex but have not been able to find the right regex in python for selecting characters that start with a particular alphabet.
Example below
text='this is a test'
match=re.findall('(?!t)\w*',text)
# match returns
['his', '', 'is', '', 'a', '', 'est', '']
match=re.findall('[^t]\w+',text)
# match
['his', ' is', ' a', ' test']
Expected : ['is','a']
First, to negate a character class, you put the ^ inside the brackets, not before them. ^[0-9] means "any digit, at the start of the string"; [^0-9] means "anything except a digit". Second, [^0-9] will match anything that isn't a digit, not just letters and underscores.
The caret ^ and dollar $ characters have special meaning in a regexp. They are called “anchors”. The caret ^ matches at the beginning of the text, and the dollar $ – at the end.
To match any character except a list of excluded characters, put the excluded charaters between [^ and ] . The caret ^ must immediately follow the [ or else it stands for just itself. The character '.
How do you match letters in regex? To match a character having special meaning in regex, you need to use a escape sequence prefix with a backslash ( \ ). E.g., \. matches "." ; regex \+ matches "+" ; and regex \( matches "(" .
Use the negative set [^\Wt]
to match any alphanumeric character that is not t. To avoid matching subsets of words, add the word boundary metacharacter, \b
, at the beginning of your pattern.
Also, do not forget that you should use raw strings for regex patterns.
import re
text = 'this is a test'
match = re.findall(r'\b[^\Wt]\w*', text)
print(match) # prints: ['is', 'a']
See the demo here.
Note that this is also achievable without regex.
text = 'this is a test'
match = [word for word in text.split() if not word.startswith('t')]
print(match) # prints: ['is', 'a']
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With