Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regular expression to remove anything but alphabets and '[single quote]

How can I change this regular expression to remove everything from a string except alphabets and a '(single quote)?

pattern = /\b(ma?c)?([a-z]+)/ig;
  1. this pattern removes unwanted spaces and capitalizes the first letter and turns the rest into lower case
  2. By alphabets I mean English letters a-z.
like image 511
Sushan Ghimire Avatar asked Feb 21 '23 10:02

Sushan Ghimire


1 Answers

To remove characters, you'd need to use something that actually does that, like the string replace function (which can accept a regular expression as the "from" parameter).

Then you're just dealing with a normal application of a character class, which in JavaScript (and most other regular expression variants) is described using [...], where ... is what should be in the class. You'd use the ^ at the beginning to invert the meaning of the class:

In your case, it might be:

str = str.replace(/[^A-Za-z']/g, "");

...which will replace except the English characters A-Z (ABCDEFGHIJKLMNOPQRSTUVWXYZ), a-z (abcdefghijklmnopqrstuvwxyz), and the single quote with nothing (e.g., remove it).

let str = "This is a test with the numbers 123 and a '.";

console.log("before:", str);
str = str.replace(/[^A-Za-z']/g, "");
console.log("after: ", str);

However, note that alphabetic characters not used in English will not be excepted, and there are a lot of those in the various languages used on the web (and even, perversely, in English, in "borrowed" words like "voilà" and "naïve").

You've said you're okay with just English A-Z, but for others coming to this: In environemnts supporting ES2018 and above's Unicode property matching, you could handle anything considered "alphabetic" by Unicode instead of just A-Z by using the \p{Alpha} property. The \p means "matching this Unicode property" (as usual, the lowercase version \p means "matching" and the uppercase version \P means "not matching") and the {Alpha} means "alphabetic":

str = str.replace(/[^\p{Alpha}']/gu, "");

(Note that, again, \p{Alpha} means "alphabetic" but because it's in a negated character class, we're excluding alphabetic characters.)

Note the u flag on that, to enable newer Unicode features. That handles the "voilà" and "naïve" examples too:

let str = "This is a test with the numbers 123 and a ' and voilà and naïve.";

console.log("before:", str);
str = str.replace(/[^\p{Alpha}']/gu, "");
console.log("after: ", str);
like image 161
T.J. Crowder Avatar answered May 01 '23 09:05

T.J. Crowder