I'm translating code from perl and I've come accross the following line
$text =~ s/([?!\.][\ ]*[\'\"\)\]\p{IsPf}]+) +([\'\"\(\[\¿\¡\p{IsPi}]*[\ ]*[\p{IsUpper}])/$1\n$2/g;
My question is, what does \p{IsPf} and \p{IsPi} match to? I've tried searching online for it but haven't found anything...
\p{..}
matches characters by their unicode character properties: http://perldoc.perl.org/perlunicode.html#Unicode-Character-Properties
In particular, \p{IsPf}
matches characters with the "final punctuation" property, and \p{IsPi}
matches charactes with the "initial punctuation" property. These seem to be mostly closing and opening quotes.
The point of the substitution seems to be breaking sentences into separate lines by matching the end and beginning of a sentence, taking into account that a sentence may start and end with various types of punctuation.
Let's ask RegexBuddy: It's a Unicode character property.
You can find more documentation on Unicode character properties and Unicode scripts here.
As a bit of extra information, unichars
from Unicode::Tussle can be used to list the matching characters.
$ unichars -au '\p{IsPi}' | cat
« U+000AB LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
‘ U+02018 LEFT SINGLE QUOTATION MARK
‛ U+0201B SINGLE HIGH-REVERSED-9 QUOTATION MARK
“ U+0201C LEFT DOUBLE QUOTATION MARK
‟ U+0201F DOUBLE HIGH-REVERSED-9 QUOTATION MARK
‹ U+02039 SINGLE LEFT-POINTING ANGLE QUOTATION MARK
⸂ U+02E02 LEFT SUBSTITUTION BRACKET
⸄ U+02E04 LEFT DOTTED SUBSTITUTION BRACKET
⸉ U+02E09 LEFT TRANSPOSITION BRACKET
⸌ U+02E0C LEFT RAISED OMISSION BRACKET
⸜ U+02E1C LEFT LOW PARAPHRASE BRACKET
⸠ U+02E20 LEFT VERTICAL BAR WITH QUILL
$ unichars -au '\p{IsPf}' | cat
» U+000BB RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
’ U+02019 RIGHT SINGLE QUOTATION MARK
” U+0201D RIGHT DOUBLE QUOTATION MARK
› U+0203A SINGLE RIGHT-POINTING ANGLE QUOTATION MARK
⸃ U+02E03 RIGHT SUBSTITUTION BRACKET
⸅ U+02E05 RIGHT DOTTED SUBSTITUTION BRACKET
⸊ U+02E0A RIGHT TRANSPOSITION BRACKET
⸍ U+02E0D RIGHT RAISED OMISSION BRACKET
⸝ U+02E1D RIGHT LOW PARAPHRASE BRACKET
⸡ U+02E21 RIGHT VERTICAL BAR WITH QUILL
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With