I was trying to find the fancy quotes “ from a string using the following Perl regular expression but it returns false.
$text = "NBN “a joint venture with Telstra”";
if ($text =~ m/“/)
{
print "found";
}
I also tried using "\x93
" ascii code but still does not work. I am stuck here.
Any help is appreciated.
Regards, Allen
Depending on the encoding of the string you are trying to match, you might need to do different things. See The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!).
If the input string is encoded in UTF-8, then you need to specify that encoding in your perl script - one way to do that is with use encoding 'UTF-8'
.
You can also specify use utf8
if you want the encoding of the script itself to be UTF-8. You are probably better off, though, knowing the code point of the character you are checking for, and specifying it directly:
use utf8;
use encoding 'UTF-8';
$text = "NBN “a joint venture with Telstra”"; # Make sure to quote this string properly
if ($text =~ m/\N{U+201C}/) # “ is the same as U+201C LEFT DOUBLE QUOTATION MARK
{
print "found";
}
See the "Demoroniser" and for your specific problem, the discussion of just the "smart" quotes bit of it on Perlmonks Re^3: Reg Ex to strip MS smart quotes.
This advice is assuming - perhaps incorrectly - that your database's "fancy quotes" have come from some piece of Microsoft software producing Windows-1252 encoded text - if you've got UTF-8 instead, Avi's already pointed you in the right direction.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With