I'm trying to match words that consist only of characters in this character class: <code>[A-z'\\/%]</code>, excluding cases where: <ul> <li>they are between <code><</code> and <code>></code> </li> <li>they are between <code>[</code> and <code>]</code> </li> <li>they are between <code>{</code> and <code>}</code> </li> </ul> So, say I've got this funny string: <pre class="prettyprint lang-none prettyprint-override"><code>[beginning]<start>How's {the} /weather (\\today%?)[end] </code></pre> I need to match the following strings: <pre class="prettyprint lang-none prettyprint-override"><code>[ "How's", "/weather", "\\today%" ] </code></pre> I've tried using this pattern: <pre class="prettyprint lang-regex prettyprint-override"><code>/[A-z'/\\%]*(?![^{]*})(?![^\[]*\])(?![^<]*>)/gm </code></pre> But for some reason, it matches: <pre class="prettyprint lang-none prettyprint-override"><code>[ "[beginning]", "", "How's", "", "", "", "/weather", "", "", "\\today%", "", "", "[end]", "" ] </code></pre> I'm not sure why my pattern allows stuff between <code>[</code> and <code>]</code>, since I used <code>(?![^\[]*\])</code>, and a similar approach seems to work for not matching <code>{these cases}</code> and <code><these cases></code>. I'm also not sure why it matches all the empty strings. Any wisdom? :)

There are essentially two problems with your pattern: <ol> <li> Never use <code>A-z</code> in a character class if you intend to match only letters (because it will match more than just letters1). Instead, use <code>a-zA-Z</code> (or <code>A-Za-z</code>). </li> <li> Using the <code>*</code> quantifier after the character class will allow empty matches. Use the <code>+</code> quantifier instead. </li> </ol> So, the fixed pattern should be: <pre class="prettyprint lang-regex prettyprint-override"><code>[A-Za-z'/\\%]+(?![^{]*})(?![^\[]*\])(?![^<]*>) </code></pre> Demo. <hr> 1The <code>[A-z]</code> character class means "match any character with an ASCII code between 65 and 122". The problem with that is that codes between 91 and 95 are not letters (and that's why the original pattern matches characters like '[' and ']').

Split it with regular expression: <div class="snippet" data-lang="js" data-hide="false" data-console="true" data-babel="false"> <div class="snippet-code"> <pre class="prettyprint snippet-code-js lang-js prettyprint-override"><code>let data = "[beginning]<start>How's {the} /weather (\\today%?)[end]"; let matches = data.split(/\s*(?:<[^>]+>|\[[^\]]+\]|\{[^\}]+\}|[()])\s*/); console.log(matches.filter(v => "" !== v));</code></pre> </div> </div>

You can match all the cases that you don't want using an alternation and place the character class in a capturing group to capture what you want to keep. The <code>[^</code> is a negated character class that matches any character except what is specified. <pre class="prettyprint"><code>(?:\[[^\][]*]|<[^<>]*>|{[^{}]*})|([A-Za-z'/\\%]+) </code></pre> Explanation <ul> <li> <code>(?:</code> Non capture group <ul> <li> <code>\[[^\][]*]</code> Match from opening till closing <code>[]</code> </li> <li> <code>|</code> Or</li> <li> <code><[^<>]*></code> Match from opening till closing <code><></code> </li> <li> <code>|</code> Or</li> <li> <code>{[^{}]*}</code> Match from opening till closing <code>{}</code> </li> </ul> </li> <li> <code>)</code> Close non capture group</li> <li> <code>|</code> Or</li> <li> <code>([A-Za-z'/\\%]+)</code> Repeat the character class 1+ times to prevent empty matches and capture in group 1 </li> </ul> Regex demo <div class="snippet" data-lang="js" data-hide="false" data-console="true" data-babel="false"> <div class="snippet-code"> <pre class="prettyprint snippet-code-js lang-js prettyprint-override"><code>const regex = /(?:\[[^\][]*]|<[^<>]*>|{[^{}]*})|([A-Za-z'/\\%]+)/g; const str = `[beginning]<start>How's {the} /weather (\\\\today%?)[end]`; let m; while ((m = regex.exec(str)) !== null) { if (m[1] !== undefined) console.log(m[1]); }</code></pre> </div> </div>

Match words that consist of specific characters, excluding between special brackets

I'm trying to match words that consist only of characters in this character class: [A-z'\\/%], excluding cases where:

they are between < and >
they are between [ and ]
they are between { and }

So, say I've got this funny string:

[beginning]<start>How's {the} /weather (\\today%?)[end]

I need to match the following strings:

[ "How's", "/weather", "\\today%" ]

I've tried using this pattern:

/[A-z'/\\%]*(?![^{]*})(?![^\[]*\])(?![^<]*>)/gm

But for some reason, it matches:

[ "[beginning]", "", "How's", "", "", "", "/weather", "", "", "\\today%", "", "", "[end]", "" ]

I'm not sure why my pattern allows stuff between [ and ], since I used (?![^\[]*\]), and a similar approach seems to work for not matching {these cases} and <these cases>. I'm also not sure why it matches all the empty strings.

Any wisdom? :)

What are pattern matching characters?

SQL pattern matching enables you to use _ to match any single character and % to match an arbitrary number of characters (including zero characters). In MySQL, SQL patterns are case-insensitive by default.

Which operator is used to match character?

LIKE operator is used for pattern matching, and it can be used as -. % – It matches zero or more characters.

Which pattern is used to match any non What character?

The expression \w will match any word character. Word characters include alphanumeric characters ( - , - and - ) and underscores (_). \W matches any non-word character.

There are essentially two problems with your pattern:

Never use A-z in a character class if you intend to match only letters (because it will match more than just letters¹). Instead, use a-zA-Z (or A-Za-z).
Using the * quantifier after the character class will allow empty matches. Use the + quantifier instead.

So, the fixed pattern should be:

[A-Za-z'/\\%]+(?![^{]*})(?![^\[]*\])(?![^<]*>)

Demo.

¹_{The [A-z] character class means "match any character with an ASCII code between 65 and 122". The problem with that is that codes between 91 and 95 are not letters (and that's why the original pattern matches characters like '[' and ']').}

Split it with regular expression:

let data = "[beginning]<start>How's {the} /weather (\\today%?)[end]";
let matches = data.split(/\s*(?:<[^>]+>|\[[^\]]+\]|\{[^\}]+\}|[()])\s*/);

console.log(matches.filter(v => "" !== v));

You can match all the cases that you don't want using an alternation and place the character class in a capturing group to capture what you want to keep.

The [^ is a negated character class that matches any character except what is specified.

(?:\[[^\][]*]|<[^<>]*>|{[^{}]*})|([A-Za-z'/\\%]+)

Explanation

(?: Non capture group
- \[[^\][]*] Match from opening till closing []
- | Or
- <[^<>]*> Match from opening till closing <>
- | Or
- {[^{}]*} Match from opening till closing {}
) Close non capture group
| Or
([A-Za-z'/\\%]+) Repeat the character class 1+ times to prevent empty matches and capture in group 1

Regex demo

const regex = /(?:\[[^\][]*]|<[^<>]*>|{[^{}]*})|([A-Za-z'/\\%]+)/g;
const str = `[beginning]<start>How's {the} /weather (\\\\today%?)[end]`;
let m;

while ((m = regex.exec(str)) !== null) {
  if (m[1] !== undefined) console.log(m[1]);
}

Match words that consist of specific characters, excluding between special brackets

Tags:

javascript

string

regex

match

pitamer

People also ask

3 Answers

41686d6564 stands w. Palestine

Taufik Nurrohman

The fourth bird

Recent Activity

Donate For Us

Match words that consist of specific characters, excluding between special brackets

Tags:

javascript

string

regex

match

pitamer

People also ask

3 Answers

41686d6564 stands w. Palestine

Taufik Nurrohman

The fourth bird

Related questions

Recent Activity

Donate For Us