Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a regex to grab all quotation marks?

I know that in regex, there is \s to match all whitepsaces (space, tabs ...), \d for any number, etc.

Is there the same shortcut to match all different quotation marks: ' " “ ” ‘ ’ „ ” « »?

And more on Wikipedia ...

I can write my own regex, but I will probably miss some quotation marks from other languages, so I like to have a generic way to match all the quotation marks.

But may be they are considered as different characters so that it is impossible?

like image 558
GaspardP Avatar asked Sep 13 '17 07:09

GaspardP


People also ask

How do you include a quote in regex?

Try putting a backslash ( \ ) followed by " .

Do I need to escape quotes in regex?

In order to use a literal ^ at the start or a literal $ at the end of a regex, the character must be escaped. Some flavors only use ^ and $ as metacharacters when they are at the start or end of the regex respectively. In those flavors, no additional escaping is necessary. It's usually just best to escape them anyway.

How do you escape quotation marks in a string?

You can put a backslash character followed by a quote ( \" or \' ). This is called an escape sequence and Python will remove the backslash, and put just the quote in the string. Here is an example. The backslashes protect the quotes, but are not printed.

How do you search for quotation marks?

Fortunately, Google Search has a special operator for that: quotation marks. Put quotes around any word or phrase, such as [“wireless phone chargers”], and we'll only show pages that contain those exact words or phrases. Now we're making quoted searches better.


2 Answers

you can use the regex

['"“”‘’„”«»]

see the regex101 demo

like image 86
marvel308 Avatar answered Sep 25 '22 19:09

marvel308


Java Unicode support has a very detailed support, and even classifies punctuation. However not for quotes. And there are quotes that are neither types as starting or ending quotes. But you can collect them, and generate code. Advantage: completeness.

    for (int cp = 32; cp <= 0xFFFF; ++cp) {
        String name = Character.getName(cp);
        if(name != null && name.contains("QUOTATION")) {
            System.out.printf("\\u%04x = %s (%s %s)%n",
                    cp, name,
                    Character.getType(cp) == Character.INITIAL_QUOTE_PUNCTUATION,
                    Character.getType(cp) == Character.FINAL_QUOTE_PUNCTUATION);
        }
    }

This exploits code points almost being chars. Hence will not work for Asian scripts (stopping at U+FFFF). This results in:

\u0022 = QUOTATION MARK (false false)
\u00ab = LEFT-POINTING DOUBLE ANGLE QUOTATION MARK (true false)
\u00bb = RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK (false true)
\u2018 = LEFT SINGLE QUOTATION MARK (true false)
\u2019 = RIGHT SINGLE QUOTATION MARK (false true)
\u201a = SINGLE LOW-9 QUOTATION MARK (false false)
\u201b = SINGLE HIGH-REVERSED-9 QUOTATION MARK (true false)
\u201c = LEFT DOUBLE QUOTATION MARK (true false)
\u201d = RIGHT DOUBLE QUOTATION MARK (false true)
\u201e = DOUBLE LOW-9 QUOTATION MARK (false false)
\u201f = DOUBLE HIGH-REVERSED-9 QUOTATION MARK (true false)
\u2039 = SINGLE LEFT-POINTING ANGLE QUOTATION MARK (true false)
\u203a = SINGLE RIGHT-POINTING ANGLE QUOTATION MARK (false true)
\u275b = HEAVY SINGLE TURNED COMMA QUOTATION MARK ORNAMENT (false false)
\u275c = HEAVY SINGLE COMMA QUOTATION MARK ORNAMENT (false false)
\u275d = HEAVY DOUBLE TURNED COMMA QUOTATION MARK ORNAMENT (false false)
\u275e = HEAVY DOUBLE COMMA QUOTATION MARK ORNAMENT (false false)
\u275f = HEAVY LOW SINGLE COMMA QUOTATION MARK ORNAMENT (false false)
\u2760 = HEAVY LOW DOUBLE COMMA QUOTATION MARK ORNAMENT (false false)
\u276e = HEAVY LEFT-POINTING ANGLE QUOTATION MARK ORNAMENT (false false)
\u276f = HEAVY RIGHT-POINTING ANGLE QUOTATION MARK ORNAMENT (false false)
\u301d = REVERSED DOUBLE PRIME QUOTATION MARK (false false)
\u301e = DOUBLE PRIME QUOTATION MARK (false false)
\u301f = LOW DOUBLE PRIME QUOTATION MARK (false false)
\uff02 = FULLWIDTH QUOTATION MARK (false false)
like image 37
Joop Eggen Avatar answered Sep 23 '22 19:09

Joop Eggen