Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

regex line start in character set

Tags:

Let's say I want to search for a 'b' that occurs either at start of line or followed by 'a'.

Why doesn't re.match('[\^a]b','b') match while re.match('^b','b') does?

Update: I realised I should have been using search instead of match. I want it to be a positive for things like 'b', 'cab', 'ab', 'bc', and 'abd'.

like image 612
highBandWidth Avatar asked Nov 02 '12 16:11

highBandWidth


1 Answers

The regex [\^a]b will match either ab or ^b, so it should not match the string 'b'.

Note that re.match() only matches at the beginning of a string, it is as if you have a beginning of string anchor at the beginning of your regex (^ or \A with multiline option enabled).

So "to search for a 'b' that occurs either at start of line or followed by 'a'", you need to use re.search() with the following regex:

(^|a)b 

Note that I am interpreting this slightly differently than the other answers, I think your requirement means that you want to match the 'b' in 'bob' as well as the ab in 'taboo', so the start of line requirement is only for a b that is not preceeded by a.

This method of alternation in the group gives you a more scalable solution than ^b|ab, so to match b at the start of the string or when preceeded by a, x, 2, or 5 you could use the following:

(^|[ax25])b 
like image 163
Andrew Clark Avatar answered Oct 19 '22 17:10

Andrew Clark