I have a Regex that is able to detect URLs (Disclosure: I copied this Regex from the internet).
My goal is to split a string, so that I get an array of substrings that either are a full URL or not.
For example.
const detectUrls = // some magical Regex
const input = 'Here is a URL: https://google.com <- That was the URL to Google.';
console.log(input.split(detectUrls)); // This should output ['Here is a URL: ', 'https://google.com', ' <- That was the URL to Google.']
My current Regex solution is as follows: /(([a-z]+:\/\/)?(([a-z0-9\-]+\.)+([a-z]{2}|aero|arpa|biz|com|coop|edu|gov|info|int|jobs|mil|museum|name|nato|net|org|pro|travel|local|internal))(:[0-9]{1,5})?(\/[a-z0-9_\-.~]+)*(\/([a-z0-9_\-.]*)(\?[a-z0-9+_\-.%=&]*)?)?(#[a-zA-Z0-9!$&'()*+.=-_~:@/?]*)?)(\s+|$)/gi;
However, when I run the example code with my regex, I get a useless answer:
[ 'Here is a URL: ',
'https://google.com',
'https://',
'google.com',
'google.',
'com',
undefined,
undefined,
undefined,
undefined,
undefined,
undefined,
' ',
'<- That was the URL to Google.',
]
Would anyone be able to point me in the right direction? Thanks in advance.
The reason why you are getting multiple matches is that the regex will return a match for each of your groups (the things inside parentheses).
For the result you want you should be using non capture groups (?:myRegex)
I modified your regex so that it should work:
/((?:[a-z]+:\/\/)?(?:(?:[a-z0-9\-]+\.)+(?:[a-z]{2}|aero|arpa|biz|com|coop|edu|gov|info|int|jobs|mil|museum|name|nato|net|org|pro|travel|local|internal))(?::[0-9]{1,5})?(?:\/[a-z0-9_\-.~]+)*(?:\/(?:[a-z0-9_\-.]*)(?:\?[a-z0-9+_\-.%=&]*)?)?(?:#[a-zA-Z0-9!$&'(?:)*+.=-_~:@/?]*)?)(?:\s+|$)/
Tip: use an online website like https://regex101.com/ to test your regular expressions.
Also the answer for this question helped a bit:
Use of capture groups in String.split()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With