Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Conditional Regexp: return only one group

Tags:

regex

Two types of URLs I want to match:

(1) www.test.de/type1/12345/this-is-a-title.html
(2) www.test.de/category/another-title-oh-yes.html

In the first type, I want to match "12345". In the second type I want to match "category/another-title-oh-yes".

Here is what I came up with:

(?:(?:\.de\/type1\/([\d]*)\/)|\.de\/([\S]+)\.html)

This returns the following:

For type (1):

Match group 1: 12345
Match group 2: 

For type (2):

Match group: 
Match group 2: category/another-title-oh-yes

As you can see, it is working pretty well already. For various reasons I need the regex to return only one match-group, though. Is there a way to achieve that?

like image 518
Sven S Avatar asked Jul 07 '14 15:07

Sven S


2 Answers

Java/PHP/Python

Get both the matched group at index 1 using both Negative Lookahead and Positive Lookbehind.

((?<=\.de\/type1\/)\d+|(?<=\.de\/)(?!type1)[^\.]+)

There are two regex pattern that are ORed.

First regex pattern looks for 12345

Second regex pattern looks for category/another-title-oh-yes.


Note:

  • Each regex pattern must match exactly one match in each URL
  • Combine whole regex pattern inside the parenthesis (...|...) and remove parenthesis from the [^\.]+ and \d+ where:

    [^\.]+   find anything until dot is found
    \d+      find one or more digits
    

Here is online demo on regex101


Input:

www.test.de/type1/12345/this-is-a-title.html
www.test.de/category/another-title-oh-yes.html

Output:

MATCH 1
1.  [18-23] `12345`
MATCH 2
1.  [57-86] `category/another-title-oh-yes`

JavaScript

try this one and get both the matched group at index 2.

((?:\.de\/type1\/)(\d+)|(?:\.de\/)(?!type1)([^\.]+))

Here is online demo on regex101.

Input:

www.test.de/type1/12345/this-is-a-title.html
www.test.de/category/another-title-oh-yes.html

Output:

MATCH 1
1.  `.de/type1/12345`
2.  `12345`
MATCH 2
1.  `.de/category/another-title-oh-yes`
2.  `category/another-title-oh-yes`
like image 80
Braj Avatar answered Oct 03 '22 15:10

Braj


Maybe this:

^www\.test\.de/(type1/(.*)\.|(.*)\.html)$

Regular expression visualization

Debuggex Demo

Then for example:

var str = "www.test.de/type1/12345/this-is-a-title.html"
var regex = /^www\.test\.de/(type1/(.*)\.|(.*)\.html)$/
console.log(str.match(regex))

This will output an array, the first element is the string, the second one is whatever is after the website address, the third is what matched according to type1 and the fourth element is the rest.

You can do something like var matches = str.match(regex); return matches[2] || matches[3];

like image 29
Mosho Avatar answered Oct 03 '22 13:10

Mosho