Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

regex optional word match

Tags:

regex

I'm trying to create a regex for extracting singers, lyricists. I was wondering how to make lyricists search optional.

Sample Multiline String:

Fireworks Singer: Katy Perry
Vogue Singers: Madonna, Karen Lyricist: Madonna

Regex: /Singers?:(.\*)\s?Lyricists?:(.\*)/

This matches the second line correctly and extracts Singers(Madonna, Karen) and Lyricists(Madonna)

But it does not work with the first line, when there are no Lyricists.

How do I make Lyricists search optional?

like image 758
Victor Avatar asked Mar 10 '11 02:03

Victor


People also ask

What is ?! In regex?

The ?! n quantifier matches any string that is not followed by a specific string n.


2 Answers

You can enclose the part you want to match in a non-capturing group: (?:). Then it can be treated as a single unit in the regex, and subsequently you can put a ? after it to make it optional. Example:

/Singers?:(.*)\s?(?:Lyricists?:(.*))?/

Note that here the \s? is useless since .* will greedily eat all characters, and no backtracking will be necessary. This also means that the (?:Lyricists?:(.*)) part will never be matched for the same reason. You can use the non-greedy version of .*, .*? along with the $ to fix this:

/Singers?:(.*?)\s*(?:Lyricists?:(.*))?$/

Some extra whitespace ends up captured; this can be removed also, giving a final regex of:

/Singers?:\s*(.*?)\s*(?:Lyricists?:\s*(.*))?$/
like image 119
Cameron Avatar answered Sep 29 '22 14:09

Cameron


Just to add to Cameron's solution. if the source string has multiple lines each containing both Singers and Lyricists, you'll probably need to add the 'm' multi-line modifier so that the '$' will match ends-of-lines. (You didn't say what language you are using - you may want to add the 'i' modifier as well.)

like image 25
ridgerunner Avatar answered Sep 29 '22 16:09

ridgerunner