Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do you reference unicode characters in ColdFusion regex?

I'm trying to match this character ’ which I can type with alt-0146. Word tells me it's unicode 0x2019 but I can't seem to match it using regular expressions in ColdFusion. Here's a snippet i'm using to match between 2 and 10 letters and apostrophes and this character

[[:alpha:]'\x2019]{2,10}

but it's not working. Any ideas?

like image 422
Trigger Avatar asked Feb 10 '09 08:02

Trigger


Video Answer


2 Answers

It looks like the \x shorthand in CF only supports the first 255 ASCII characters. In order to go above that number, you need to use the chr command inline like this:

<cfscript>
   yourString = "’";
   result = refind("[[:alpha:]'" & chr(8217) & "]{2,10}", yourString);
   writeOutput(result);
</cfscript>

That should give you a match.

like image 120
anopres Avatar answered Nov 22 '22 14:11

anopres


Another thing you could try is directly including the character:

[[:alpha:]'#Chr(8217)#]{2,10}


However I'm not sure if that will work with a CF regex. If not, you still have the option to use Java regex within CF. This is easy to do, and enables you to use a far wider range of regex functionality, almost certainly including unicode support.

If you're doing replacements, you can do a Java Regex directly on a CF string, for example:

<cfset NewString = OrigString.replaceAll( 'ajavaregex' , 'replacement' )/>


For other functionality (e.g. getting an array of matches, callback functions on replace), I have created Java RegEx Utilities - a single component that simplifies these functionality into a single function call.

like image 22
Peter Boughton Avatar answered Nov 22 '22 15:11

Peter Boughton