Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Perl Regular Expression cannot find fancy quotes “

Tags:

regex

perl

I was trying to find the fancy quotes “ from a string using the following Perl regular expression but it returns false.

$text = "NBN “a joint venture with Telstra”";

if ($text =~ m/“/)
{
  print "found";
}

I also tried using "\x93" ascii code but still does not work. I am stuck here.

Any help is appreciated.

Regards, Allen

like image 832
Allen Avatar asked Apr 04 '11 11:04

Allen


2 Answers

Depending on the encoding of the string you are trying to match, you might need to do different things. See The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!).

If the input string is encoded in UTF-8, then you need to specify that encoding in your perl script - one way to do that is with use encoding 'UTF-8'.

You can also specify use utf8 if you want the encoding of the script itself to be UTF-8. You are probably better off, though, knowing the code point of the character you are checking for, and specifying it directly:

use utf8;
use encoding 'UTF-8';

$text = "NBN “a joint venture with Telstra”"; # Make sure to quote this string properly

if ($text =~ m/\N{U+201C}/) # “ is the same as U+201C LEFT DOUBLE QUOTATION MARK
{
  print "found";
}
like image 187
Avi Avatar answered Oct 03 '22 10:10

Avi


See the "Demoroniser" and for your specific problem, the discussion of just the "smart" quotes bit of it on Perlmonks Re^3: Reg Ex to strip MS smart quotes.

This advice is assuming - perhaps incorrectly - that your database's "fancy quotes" have come from some piece of Microsoft software producing Windows-1252 encoded text - if you've got UTF-8 instead, Avi's already pointed you in the right direction.

like image 34
bigiain Avatar answered Oct 03 '22 11:10

bigiain