Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regular Expression to match unlimited number of options

I want to be able to parse file paths like this one:

 /var/www/index.(htm|html|php|shtml)

into an ordered array:

 array("htm", "html", "php", "shtml")

and then produce a list of alternatives:

/var/www/index.htm
/var/www/index.html
/var/www/index.php
/var/www/index.shtml

Right now, I have a preg_match statement that can split two alternatives:

 preg_match_all ("/\(([^)]*)\|([^)]*)\)/", $path_resource, $matches);

Could somebody give me a pointer how to extend this to accept an unlimited number of alternatives (at least two)? Just regarding the regular expression, the rest I can deal with.

The rule is:

  • The list needs to start with a ( and close with a )

  • There must be one | in the list (i.e. at least two alternatives)

  • Any other occurrence(s) of ( or ) are to remain untouched.

Update: I need to be able to also deal with multiple bracket pairs such as:

 /var/(www|www2)/index.(htm|html|php|shtml)

sorry I didn't say that straight away.

Update 2: If you're looking to do what I'm trying to do in the filesystem, then note that glob() already brings this functionality out of the box. There is no need to implement a custom solutiom. See @Gordon's answer below for details.

like image 371
Pekka Avatar asked Mar 28 '10 20:03

Pekka


People also ask

What does '$' mean in regex?

$ means "Match the end of the string" (the position after the last character in the string). Both are called anchors and ensure that the entire string is matched instead of just a substring.

What is multiline in regex?

Multiline option, or the m inline option, enables the regular expression engine to handle an input string that consists of multiple lines. It changes the interpretation of the ^ and $ language elements so that they match the beginning and end of a line, instead of the beginning and end of the input string.

How do I allow all items in regex?

Throw in an * (asterisk), and it will match everything. Read more. \s (whitespace metacharacter) will match any whitespace character (space; tab; line break; ...), and \S (opposite of \s ) will match anything that is not a whitespace character.


2 Answers

I think you're looking for:

/(([^|]+)(|([^|]+))+)/

Basically, put the splitter '|' into a repeating pattern.

Also, your words should be made up 'not pipes' instead of 'not parens', per your third requirement.

Also, prefer + to * for this problem. + means 'at least one'. * means 'zero or more'.

like image 76
CWF Avatar answered Nov 14 '22 21:11

CWF


Not exactly what you are asking, but what's wrong with just taking what you have to get the list (ignoring the |s), putting it into a variable and then explodeing on the |s? That would give you an array of however many items there were (including 1 if there wasn't a | present).

like image 33
Blair McMillan Avatar answered Nov 14 '22 21:11

Blair McMillan