Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Match words that don't start with a certain letter using regex

I am learning regex but have not been able to find the right regex in python for selecting characters that start with a particular alphabet.

Example below

text='this is a test'
match=re.findall('(?!t)\w*',text)

# match returns
['his', '', 'is', '', 'a', '', 'est', '']

match=re.findall('[^t]\w+',text)

# match
['his', ' is', ' a', ' test']

Expected : ['is','a']

like image 329
Priya Avatar asked May 16 '18 15:05

Priya


People also ask

Does not start with number regex?

First, to negate a character class, you put the ^ inside the brackets, not before them. ^[0-9] means "any digit, at the start of the string"; [^0-9] means "anything except a digit". Second, [^0-9] will match anything that isn't a digit, not just letters and underscores.

How do I specify start and end in regex?

The caret ^ and dollar $ characters have special meaning in a regexp. They are called “anchors”. The caret ^ matches at the beginning of the text, and the dollar $ – at the end.

How do you match a character except in regex?

To match any character except a list of excluded characters, put the excluded charaters between [^ and ] . The caret ^ must immediately follow the [ or else it stands for just itself. The character '.

How do you match letters in regex?

How do you match letters in regex? To match a character having special meaning in regex, you need to use a escape sequence prefix with a backslash ( \ ). E.g., \. matches "." ; regex \+ matches "+" ; and regex \( matches "(" .


1 Answers

With regex

Use the negative set [^\Wt] to match any alphanumeric character that is not t. To avoid matching subsets of words, add the word boundary metacharacter, \b, at the beginning of your pattern.

Also, do not forget that you should use raw strings for regex patterns.

import re

text = 'this is a test'
match = re.findall(r'\b[^\Wt]\w*', text)

print(match) # prints: ['is', 'a']

See the demo here.

Without regex

Note that this is also achievable without regex.

text = 'this is a test'
match = [word for word in text.split() if not word.startswith('t')]

print(match) # prints: ['is', 'a']
like image 111
Olivier Melançon Avatar answered Sep 19 '22 16:09

Olivier Melançon