I'm trying to match this character ’ which I can type with alt-0146. Word tells me it's unicode 0x2019 but I can't seem to match it using regular expressions in ColdFusion. Here's a snippet i'm using to match between 2 and 10 letters and apostrophes and this character
[[:alpha:]'\x2019]{2,10}
but it's not working. Any ideas?
It looks like the \x shorthand in CF only supports the first 255 ASCII characters. In order to go above that number, you need to use the chr command inline like this:
<cfscript>
yourString = "’";
result = refind("[[:alpha:]'" & chr(8217) & "]{2,10}", yourString);
writeOutput(result);
</cfscript>
That should give you a match.
Another thing you could try is directly including the character:
[[:alpha:]'#Chr(8217)#]{2,10}
However I'm not sure if that will work with a CF regex. If not, you still have the option to use Java regex within CF. This is easy to do, and enables you to use a far wider range of regex functionality, almost certainly including unicode support.
If you're doing replacements, you can do a Java Regex directly on a CF string, for example:
<cfset NewString = OrigString.replaceAll( 'ajavaregex' , 'replacement' )/>
For other functionality (e.g. getting an array of matches, callback functions on replace), I have created Java RegEx Utilities - a single component that simplifies these functionality into a single function call.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With