How to match the first word after an expression with regex?

2 Answers

This sounds like a job for lookbehinds, though you should be aware that not all regex flavors support them. In your example:

(?<=\bipsum\s)(\w+)

This will match any sequence of letter characters which follows "ipsum" as a whole word followed by a space. It does not match "ipsum" itself, you don't need to worry about reinserting it in the case of, e.g. replacements.

As I said, though, some flavors (JavaScript, for example) don't support lookbehind at all. Many others (most, in fact) only support "fixed width" lookbehinds — so you could use this example but not any of the repetition operators. (In other words, (?<=\b\w+\s+)(\w+) wouldn't work.)

answered Sep 17 '22 03:09

Ben Blank

Some of the other responders have suggested using a regex that doesn't depend on lookbehinds, but I think a complete, working example is needed to get the point across. The idea is that you match the whole sequence ("ipsum" plus the next word) in the normal way, then use a capturing group to isolate the part that interests you. For example:

String s = "Lorem ipsum dolor sit amet, consectetur " +     "adipiscing elit. Nunc eu tellus vel nunc pretium " +     "lacinia. Proin sed lorem. Cras sed ipsum. Nunc " +     "a libero quis risus sollicitudin imperdiet.";  Pattern p = Pattern.compile("ipsum\\W+(\\w+)"); Matcher m = p.matcher(s); while (m.find()) {   System.out.println(m.group(1)); }

Note that this prints both "dolor" and "Nunc". To do that with the lookbehind version, you would have to do something hackish like:

Pattern p = Pattern.compile("(?<=ipsum\\W{1,2})(\\w+)");

That's in Java, which requires the lookbehind to have an obvious maximum length. Some flavors don't have even that much flexibility, and of course, some don't support lookbehinds at all.

However, the biggest problem people seem to be having in their examples is not with lookbehinds, but with word boundaries. Both David Kemp and ck seem to expect \b to match the space character following the 'm', but it doesn't; it matches the position (or boundary) between the 'm' and the space.

It's a common mistake, one I've even seen repeated in a few books and tutorials, but the word-boundary construct, \b, never matches any characters. It's a zero-width assertion, like lookarounds and anchors (^, $, \z, etc.), and what it matches is a position that is either preceded by a word character and not followed by one, or followed by a word character and not preceded by one.

answered Sep 19 '22 03:09

Alan Moore

Related questions
                            
                                Match non printable/non ascii characters and remove from text
                            
                                Does Perl's `(?PARNO)` discard its own named captures when it's done?
                            
                                How is Guava Splitter.onPattern(..).split() different from String.split(..)?
                            
                                How To Negate Regex [duplicate]
                            
                                Regex to find 3 out of 4 conditions
                            
                                How to replace uppercase letters to lowercase letters using regex in Eclipse?
                            
                                Is it possible to replace to uppercase in Visual Studio?
                            
                                How do I regex match with grouping with unknown number of groups
                            
                                Algorithm to find out whether the matches for two Glob patterns (or Regular Expressions) intersect
                            
                                How do I use regular expressions in Jinja2?
                            
                                How do I create a Stream of regex matches?
                            
                                Separate title string with no spaces into words
                            
                                pattern.matcher() vs pattern.matches()
                            
                                Perl Regex 'e' (eval) modifier with s///
                            
                                Mail::RFC822::Address Regex
                            
                                Regex for "AND NOT" operation
                            
                                Regular Expression in Bash Script
                            
                                replace a unknown string between two known strings with sed
                            
                                ignoring folders in mercurial
                            
                                How to match everything up to the second occurrence of a character?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to match the first word after an expression with regex?

Tags:

regex

word-boundary

lookbehind

Matthew Taylor

People also ask

2 Answers

Ben Blank

Alan Moore

Recent Activity

Donate For Us