I have a string which is the result of a join operation on an array of string tokens using | as separator:
['ab', 'abc', 'abcd'].join('|') => 'ab|abc|abcd'
I need a regex that matches an entire token, regardless of it being at the beginning/end of the string, or between separators.
I came up with a simple regex: /(^\|)?<token>(\|$)?/, but the problem is it also matches substrings of the tokens, e.g.
'ab|abc|abcd'.match(/(^\|)?a(\|$)?/g) => [ "a", "a", "a" ]
'ab|abc|abcd'.match(/(^\|)?c(\|$)?/g) => [ "c", "c" ]
while tokens a and c should not match at all.
I can't wrap my head around regexps... thanks in advance for any suggestions!
Whenever you need to restrict the context of your matches, you should never use optional constructs for that purpose. (^\|)? matches an optional | and only if it is at the string start position. (\|$)? matches a | at the end of the string or nothing, an empty string. So, /(^\|)?a(\|$)?/g will match |a| in |a| string, a| in xxxa| string, |a in a |axxx string and a in xxxaxxx string.
What you want to do is to match your "token" (an alternation pattern) when it appears in between pipe symbols or start/end positions.
The best way for this exact scenario is using negative lookarounds with negated character class in them:
/(?<![^|])a(?![^|])/g
It means:
(?<![^|]) - (a negative lookbehind) immediately on the left, there must be no character other than | (so, either start of string or a |.a - matches an a character (note: if you need to match more than one token, you should consider grouping them with a non-capturing group, i.e. (?:abc|xyz|defn|.....))(?![^|]) - (a negative lookbahead) immediately on the right, there must be no character other than | (so, either the end of string, or a | character).Here is a JavaScript demo:
const tokens = ['ab', 'abc', 'abcd']
const reg = new RegExp(`(?<![^|])(?:${tokens.join('|')})(?![^|])`, 'g')
console.log('ab|xyzab|abc|abcdy|abcd'.match(reg))
// => (3) ['ab', 'abc', 'abcd']
Mind that (?<![^|] is basically equivalent to (?<=^|\|) and (?![^|]) to (?=$|\|), but a bit more efficent due to missing alternation that involves backtracking. Just mind that you need to add \n into the negated character classes if you plan to test the regex against individual lines in a multiline text (as is the case at online regex testers.)
Here is a regex demo, by the way.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With