I'm trying to match words that consist only of characters in this character class: [A-z'\\/%]
, excluding cases where:
<
and >
[
and ]
{
and }
So, say I've got this funny string:
[beginning]<start>How's {the} /weather (\\today%?)[end]
I need to match the following strings:
[ "How's", "/weather", "\\today%" ]
I've tried using this pattern:
/[A-z'/\\%]*(?![^{]*})(?![^\[]*\])(?![^<]*>)/gm
But for some reason, it matches:
[ "[beginning]", "", "How's", "", "", "", "/weather", "", "", "\\today%", "", "", "[end]", "" ]
I'm not sure why my pattern allows stuff between [
and ]
, since I used (?![^\[]*\])
, and a similar approach seems to work for not matching {these cases}
and <these cases>
. I'm also not sure why it matches all the empty strings.
Any wisdom? :)
SQL pattern matching enables you to use _ to match any single character and % to match an arbitrary number of characters (including zero characters). In MySQL, SQL patterns are case-insensitive by default.
LIKE operator is used for pattern matching, and it can be used as -. % – It matches zero or more characters.
The expression \w will match any word character. Word characters include alphanumeric characters ( - , - and - ) and underscores (_). \W matches any non-word character.
There are essentially two problems with your pattern:
Never use A-z
in a character class if you intend to match only letters (because it will match more than just letters1). Instead, use a-zA-Z
(or A-Za-z
).
Using the *
quantifier after the character class will allow empty matches. Use the +
quantifier instead.
So, the fixed pattern should be:
[A-Za-z'/\\%]+(?![^{]*})(?![^\[]*\])(?![^<]*>)
Demo.
1The [A-z]
character class means "match any character with an ASCII code between 65 and 122". The problem with that is that codes between 91 and 95 are not letters (and that's why the original pattern matches characters like '[' and ']').
Split it with regular expression:
let data = "[beginning]<start>How's {the} /weather (\\today%?)[end]";
let matches = data.split(/\s*(?:<[^>]+>|\[[^\]]+\]|\{[^\}]+\}|[()])\s*/);
console.log(matches.filter(v => "" !== v));
You can match all the cases that you don't want using an alternation and place the character class in a capturing group to capture what you want to keep.
The [^
is a negated character class that matches any character except what is specified.
(?:\[[^\][]*]|<[^<>]*>|{[^{}]*})|([A-Za-z'/\\%]+)
Explanation
(?:
Non capture group
\[[^\][]*]
Match from opening till closing []
|
Or<[^<>]*>
Match from opening till closing <>
|
Or{[^{}]*}
Match from opening till closing {}
)
Close non capture group|
Or([A-Za-z'/\\%]+)
Repeat the character class 1+ times to prevent empty matches and capture in group 1
Regex demo
const regex = /(?:\[[^\][]*]|<[^<>]*>|{[^{}]*})|([A-Za-z'/\\%]+)/g;
const str = `[beginning]<start>How's {the} /weather (\\\\today%?)[end]`;
let m;
while ((m = regex.exec(str)) !== null) {
if (m[1] !== undefined) console.log(m[1]);
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With