I know that in regex, there is \s
to match all whitepsaces (space, tabs ...)
, \d
for any number, etc.
Is there the same shortcut to match all different quotation marks: ' " “ ” ‘ ’ „ ” « »
?
And more on Wikipedia ...
I can write my own regex, but I will probably miss some quotation marks from other languages, so I like to have a generic way to match all the quotation marks.
But may be they are considered as different characters so that it is impossible?
Try putting a backslash ( \ ) followed by " .
In order to use a literal ^ at the start or a literal $ at the end of a regex, the character must be escaped. Some flavors only use ^ and $ as metacharacters when they are at the start or end of the regex respectively. In those flavors, no additional escaping is necessary. It's usually just best to escape them anyway.
You can put a backslash character followed by a quote ( \" or \' ). This is called an escape sequence and Python will remove the backslash, and put just the quote in the string. Here is an example. The backslashes protect the quotes, but are not printed.
Fortunately, Google Search has a special operator for that: quotation marks. Put quotes around any word or phrase, such as [“wireless phone chargers”], and we'll only show pages that contain those exact words or phrases. Now we're making quoted searches better.
you can use the regex
['"“”‘’„”«»]
see the regex101 demo
Java Unicode support has a very detailed support, and even classifies punctuation. However not for quotes. And there are quotes that are neither types as starting or ending quotes. But you can collect them, and generate code. Advantage: completeness.
for (int cp = 32; cp <= 0xFFFF; ++cp) {
String name = Character.getName(cp);
if(name != null && name.contains("QUOTATION")) {
System.out.printf("\\u%04x = %s (%s %s)%n",
cp, name,
Character.getType(cp) == Character.INITIAL_QUOTE_PUNCTUATION,
Character.getType(cp) == Character.FINAL_QUOTE_PUNCTUATION);
}
}
This exploits code points almost being chars. Hence will not work for Asian scripts (stopping at U+FFFF). This results in:
\u0022 = QUOTATION MARK (false false)
\u00ab = LEFT-POINTING DOUBLE ANGLE QUOTATION MARK (true false)
\u00bb = RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK (false true)
\u2018 = LEFT SINGLE QUOTATION MARK (true false)
\u2019 = RIGHT SINGLE QUOTATION MARK (false true)
\u201a = SINGLE LOW-9 QUOTATION MARK (false false)
\u201b = SINGLE HIGH-REVERSED-9 QUOTATION MARK (true false)
\u201c = LEFT DOUBLE QUOTATION MARK (true false)
\u201d = RIGHT DOUBLE QUOTATION MARK (false true)
\u201e = DOUBLE LOW-9 QUOTATION MARK (false false)
\u201f = DOUBLE HIGH-REVERSED-9 QUOTATION MARK (true false)
\u2039 = SINGLE LEFT-POINTING ANGLE QUOTATION MARK (true false)
\u203a = SINGLE RIGHT-POINTING ANGLE QUOTATION MARK (false true)
\u275b = HEAVY SINGLE TURNED COMMA QUOTATION MARK ORNAMENT (false false)
\u275c = HEAVY SINGLE COMMA QUOTATION MARK ORNAMENT (false false)
\u275d = HEAVY DOUBLE TURNED COMMA QUOTATION MARK ORNAMENT (false false)
\u275e = HEAVY DOUBLE COMMA QUOTATION MARK ORNAMENT (false false)
\u275f = HEAVY LOW SINGLE COMMA QUOTATION MARK ORNAMENT (false false)
\u2760 = HEAVY LOW DOUBLE COMMA QUOTATION MARK ORNAMENT (false false)
\u276e = HEAVY LEFT-POINTING ANGLE QUOTATION MARK ORNAMENT (false false)
\u276f = HEAVY RIGHT-POINTING ANGLE QUOTATION MARK ORNAMENT (false false)
\u301d = REVERSED DOUBLE PRIME QUOTATION MARK (false false)
\u301e = DOUBLE PRIME QUOTATION MARK (false false)
\u301f = LOW DOUBLE PRIME QUOTATION MARK (false false)
\uff02 = FULLWIDTH QUOTATION MARK (false false)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With