I need to find find/replace or convert pilcrow / partial differential characters in a string as they currently show as �.
What I thought would work but doesn't:
const value = 'Javascript Regex pattern for Pilcrow (¶) or Partial Differential (∂) character';
const matches = value.match(/\u2029/gmi);
console.log(matches);
But returns empty.
To be honest, I'm not even sure how to achieve what I need to do.
Special Regex Characters: These characters have special meaning in regex (to be discussed below): . , + , * , ? , ^ , $ , ( , ) , [ , ] , { , } , | , \ . Escape Sequences (\char): To match a character having special meaning in regex, you need to use a escape sequence prefix with a backslash ( \ ).
Regular expressions, called regexes for short, are descriptions for a pattern of text. For example, a \d in a regex stands for a digit character — that is, any single numeral 0 to 9. Following regex is used in Python to match a string of three numbers, a hyphen, three more numbers, another hyphen, and four numbers.
Basically (0+1)* mathes any sequence of ones and zeroes. So, in your example (0+1)*1(0+1)* should match any sequence that has 1. It would not match 000 , but it would match 010 , 1 , 111 etc. (0+1) means 0 OR 1.
The Match-zero-or-more Operator ( * ) This operator repeats the smallest possible preceding regular expression as many times as necessary (including zero) to match the pattern. `*' represents this operator. For example, `o*' matches any string made up of zero or more `o' s.
The correct Unicode code points are U+00B6 and U+2202, not U+2029. You'll also want to use a [] character range in your expression:
const value = 'Javascript Regex pattern for Pilcrow (¶) or Partial Differential (∂) character';
const matches = value.match(/[\u00B6\u2202]/gmi);
console.log(matches);
Of course, you don't really need \u escapes in the first place:
const value = 'Javascript Regex pattern for Pilcrow (¶) or Partial Differential (∂) character';
const matches = value.match(/[¶∂]/gmi);
console.log(matches);
Last but not least, you say:
they currently show as �.
If that's the case, it's very likely that it isn't properly encoded to begin with. In other words, you won't find ¶
or ∂
because they aren't there. I suggest you address this first.
Use String.prototype.codePointAt
to extract the unicode UTF-16 code point and convert it into hex digits sequence.
const toUnicodeCodePointHex = (str) => {
const codePoint = str.codePointAt(0).toString(16);
return '\\u' + '0000'.substring(0, 4 - codePoint.length) + codePoint;
};
const value = 'Javascript Regex pattern for Pilcrow (¶) or Partial Differential (∂) character';
const re = new RegExp(['¶', '∂'].map((item) => toUnicodeCodePointHex(item)).join('|'), 'ig');
const matches = value.match(re);
console.log(matches);
See this very nice article
by Mathias Bynens.
You can find them by hex or octal value:
const matches = value.match(/\u00B6|\u2202/g);
Regex for each:
Pilcrow: \u00B6
or \xB6
or \266
Partial Differential: \u2202
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With