Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex pattern for Pilcrow (¶) or Partial Differential (∂) character

I need to find find/replace or convert pilcrow / partial differential characters in a string as they currently show as �.

What I thought would work but doesn't:

const value = 'Javascript Regex pattern for Pilcrow (¶) or Partial Differential (∂) character';
const matches = value.match(/\u2029/gmi);
console.log(matches);

But returns empty.

To be honest, I'm not even sure how to achieve what I need to do.

like image 479
Sean Delaney Avatar asked Aug 19 '20 13:08

Sean Delaney


People also ask

What is the regex for special characters?

Special Regex Characters: These characters have special meaning in regex (to be discussed below): . , + , * , ? , ^ , $ , ( , ) , [ , ] , { , } , | , \ . Escape Sequences (\char): To match a character having special meaning in regex, you need to use a escape sequence prefix with a backslash ( \ ).

How do I match a pattern in regex?

Regular expressions, called regexes for short, are descriptions for a pattern of text. For example, a \d in a regex stands for a digit character — that is, any single numeral 0 to 9. Following regex is used in Python to match a string of three numbers, a hyphen, three more numbers, another hyphen, and four numbers.

What does regex 0 * 1 * 0 * 1 * Mean?

Basically (0+1)* mathes any sequence of ones and zeroes. So, in your example (0+1)*1(0+1)* should match any sequence that has 1. It would not match 000 , but it would match 010 , 1 , 111 etc. (0+1) means 0 OR 1.

What does * do in regex?

The Match-zero-or-more Operator ( * ) This operator repeats the smallest possible preceding regular expression as many times as necessary (including zero) to match the pattern. `*' represents this operator. For example, `o*' matches any string made up of zero or more `o' s.


3 Answers

The correct Unicode code points are U+00B6 and U+2202, not U+2029. You'll also want to use a [] character range in your expression:

const value = 'Javascript Regex pattern for Pilcrow (¶) or Partial Differential (∂) character';
const matches = value.match(/[\u00B6\u2202]/gmi);
console.log(matches);

Of course, you don't really need \u escapes in the first place:

const value = 'Javascript Regex pattern for Pilcrow (¶) or Partial Differential (∂) character';
const matches = value.match(/[¶∂]/gmi);
console.log(matches);

Last but not least, you say:

they currently show as �.

If that's the case, it's very likely that it isn't properly encoded to begin with. In other words, you won't find or because they aren't there. I suggest you address this first.

like image 57
Álvaro González Avatar answered Oct 20 '22 07:10

Álvaro González


Use String.prototype.codePointAt to extract the unicode UTF-16 code point and convert it into hex digits sequence.

const toUnicodeCodePointHex = (str) => {
    const codePoint = str.codePointAt(0).toString(16);
    return '\\u' + '0000'.substring(0, 4 - codePoint.length) + codePoint;
};

const value = 'Javascript Regex pattern for Pilcrow (¶) or Partial Differential (∂) character';

const re = new RegExp(['¶', '∂'].map((item) => toUnicodeCodePointHex(item)).join('|'), 'ig');

const matches = value.match(re);
console.log(matches);

See this very nice article by Mathias Bynens.

like image 34
Kunal Mukherjee Avatar answered Oct 20 '22 06:10

Kunal Mukherjee


You can find them by hex or octal value:

const matches = value.match(/\u00B6|\u2202/g);

Regex for each:

Pilcrow: \u00B6 or \xB6 or \266

Partial Differential: \u2202

like image 1
Jordan Stubblefield Avatar answered Oct 20 '22 05:10

Jordan Stubblefield