Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Match words that consist of specific characters, excluding between special brackets

I'm trying to match words that consist only of characters in this character class: [A-z'\\/%], excluding cases where:

  • they are between < and >
  • they are between [ and ]
  • they are between { and }

So, say I've got this funny string:

[beginning]<start>How's {the} /weather (\\today%?)[end]

I need to match the following strings:

[ "How's", "/weather", "\\today%" ]

I've tried using this pattern:

/[A-z'/\\%]*(?![^{]*})(?![^\[]*\])(?![^<]*>)/gm

But for some reason, it matches:

[ "[beginning]", "", "How's", "", "", "", "/weather", "", "", "\\today%", "", "", "[end]", "" ]

I'm not sure why my pattern allows stuff between [ and ], since I used (?![^\[]*\]), and a similar approach seems to work for not matching {these cases} and <these cases>. I'm also not sure why it matches all the empty strings.

Any wisdom? :)

like image 444
pitamer Avatar asked Sep 16 '20 04:09

pitamer


People also ask

What are pattern matching characters?

SQL pattern matching enables you to use _ to match any single character and % to match an arbitrary number of characters (including zero characters). In MySQL, SQL patterns are case-insensitive by default.

Which operator is used to match character?

LIKE operator is used for pattern matching, and it can be used as -. % – It matches zero or more characters.

Which pattern is used to match any non What character?

The expression \w will match any word character. Word characters include alphanumeric characters ( - , - and - ) and underscores (_). \W matches any non-word character.


3 Answers

There are essentially two problems with your pattern:

  1. Never use A-z in a character class if you intend to match only letters (because it will match more than just letters1). Instead, use a-zA-Z (or A-Za-z).

  2. Using the * quantifier after the character class will allow empty matches. Use the + quantifier instead.

So, the fixed pattern should be:

[A-Za-z'/\\%]+(?![^{]*})(?![^\[]*\])(?![^<]*>)

Demo.


1The [A-z] character class means "match any character with an ASCII code between 65 and 122". The problem with that is that codes between 91 and 95 are not letters (and that's why the original pattern matches characters like '[' and ']').

like image 68
41686d6564 stands w. Palestine Avatar answered Oct 19 '22 06:10

41686d6564 stands w. Palestine


Split it with regular expression:

let data = "[beginning]<start>How's {the} /weather (\\today%?)[end]";
let matches = data.split(/\s*(?:<[^>]+>|\[[^\]]+\]|\{[^\}]+\}|[()])\s*/);

console.log(matches.filter(v => "" !== v));
like image 1
Taufik Nurrohman Avatar answered Oct 19 '22 07:10

Taufik Nurrohman


You can match all the cases that you don't want using an alternation and place the character class in a capturing group to capture what you want to keep.

The [^ is a negated character class that matches any character except what is specified.

(?:\[[^\][]*]|<[^<>]*>|{[^{}]*})|([A-Za-z'/\\%]+)

Explanation

  • (?: Non capture group
    • \[[^\][]*] Match from opening till closing []
    • | Or
    • <[^<>]*> Match from opening till closing <>
    • | Or
    • {[^{}]*} Match from opening till closing {}
  • ) Close non capture group
  • | Or
  • ([A-Za-z'/\\%]+) Repeat the character class 1+ times to prevent empty matches and capture in group 1

Regex demo

const regex = /(?:\[[^\][]*]|<[^<>]*>|{[^{}]*})|([A-Za-z'/\\%]+)/g;
const str = `[beginning]<start>How's {the} /weather (\\\\today%?)[end]`;
let m;

while ((m = regex.exec(str)) !== null) {
  if (m[1] !== undefined) console.log(m[1]);
}
like image 1
The fourth bird Avatar answered Oct 19 '22 05:10

The fourth bird