Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to avoid capturing groups if the captured match is empty?

I would like to prepend the word "custom" to a list of host-names whose subdomains can be separated by some separator.

Examples:

news.google.com   -> custom.news.google.com
news/google/com   -> custom.news.google.com

dev.maps.yahoo.fr -> custom.dev.maps.yahoo.fr
dev/maps/yahoo/fr -> custom/dev/maps/yahoo/fr

These strings appear inside a document with more content, so I am trying to solve this problem using regular expressions and JavaScript's string replace function.

The list of hostnames and separators is predefined and known in advance. For the sake of this example, I only showed 2 hostnames (news.google.com and dev.maps.yahoo.com) and 2 separators (. and /), but there are more.

The separator within a single string will always be the same, i.e. there won't be cases like dev/maps.yahoo/fr.

I want to be consistent and use the correct separator when prepending "custom".

I built this long regular expression:

const myRegex = /news\.google\.com|news\/google\/com|dev\.maps\.yahoo\.fr|dev\/maps\/yahoo\/fr/

(For readability purposes, this is the expression:

/news\.google\.com/ OR /news\/google\/com/ OR /dev\.maps\.yahoo\.fr/ OR /dev\/maps\/yahoo\/fr/ )

(Note: It is important to emphasize that the list of hostnames is predefined and well known in advance, that's why I am 'hardcoding' the hostnames and not using tokens such as \w+ or \S+. For example, I might want to replace news.google.com, but leave news2.google.com intact).

However, I am not sure how to capture the separator (whether ., /, or any other separator). I tried using capture groups like this:

const myRegex = /news(\.)google\.com|news(\/)google\/com|dev(\.)maps\.yahoo\.fr|dev(\/)maps\/yahoo\/fr/

However, by doing this, I am creating 4 capture groups, and there's only one separator (and this is just a simple example). 3 of the capture groups will be empty, and one of them will contain the separator. How can I know which capture group is it?

Ideally, I would like something like this:

const myString = 'I navigated to news.google.com'; // For example
const myCustomString = myString.replace(
  myRegex,
  (match, <SEPARATOR_WRONG>) => `custom${SEPARATOR_WRONG}${match}`,
);

console.log(myCustomString); 
// will log 'I navigated to custom.news.google.com'

Is there a way to skip captured groups if they are empty?

like image 895
Hector Ricardo Avatar asked Jul 26 '20 22:07

Hector Ricardo


People also ask

What is the point of non-capturing group?

A non-capturing group lets us use the grouping inside a regular expression without changing the numbers assigned to the back references (explained in the next section).

How do you write a non-capturing group in regex?

Sometimes you want to use parentheses to group parts of an expression together, but you don't want the group to capture anything from the substring it matches. To do this use (?: and ) to enclose the group. matches dollar amounts like $10.43 and USD19.

Does the special group and group 0 is included while capturing groups using the groupCount in Java?

There is also a special group, group 0, which always represents the entire expression. This group is not included in the total reported by groupCount .

How do I reference a capture group in regex?

If your regular expression has named capturing groups, then you should use named backreferences to them in the replacement text. The regex (?' name'group) has one group called “name”. You can reference this group with ${name} in the JGsoft applications, Delphi, .


1 Answers

Use \1 to refer to the separator captured in the first (\.|\/) group so we don't have to keep writing it over and over.

const text = `I navigated to news.google.com
I navigated to news/google/com
I navigated to dev.maps.yahoo.fr
I navigated to dev/maps/yahoo/fr`;

const re = /\w+(\.|\/)(\w+\1)?(google|yahoo)\1\w+/g;
console.log(text.replace(re, (url, separator) => `custom${separator}${url}`));

Here's an alternate solution given the new requirement described in the comments:

const text = `I navigated to news.google.com
I navigated to news/google/com
I navigated to dev.maps.yahoo.fr
I navigated to dev/maps/yahoo/fr`;

const re = /(news|dev)(\.|\/)(google|maps)\2(com|yahoo)(\2fr)?/g;

console.log(text.replace(re, (url, prefix, separator) => `custom${separator}${url}`));

Yet another alternate solution:

const text = `I navigated to news.google.com
I navigated to news/google/com
I navigated to dev.maps.yahoo.fr
I navigated to dev/maps/yahoo/fr`;

const re = /news(\.)google\.com|news(\/)google\/com|dev(\.)maps\.yahoo\.fr|dev(\/)maps\/yahoo\/fr/g;

console.log(text.replace(re, url => 'custom' + url.match(/\.|\//)[0] + url));
like image 166
GirkovArpa Avatar answered Oct 05 '22 23:10

GirkovArpa