I'm trying to normalize a string in ColdFusion.
I want to use the Java class java.text.Normalizer
for this, as CF doesn't have any similar functions as far as I know.
Here's my current code:
<cfset normalizer = createObject( "java", "java.text.Normalizer" ) />
<cfset string = "äéöè" />
<cfset string = normalizer.normalize(string, createObject( "java", "java.text.Normalizer$Form" ).NFD) />
<cfset string = ReReplace(string, "\\p{InCombiningDiacriticalMarks}+", "") />
<cfoutput>#string#</cfoutput>
Any ideas why it always outputs äéöè
and not a normalized string?
In ColdFusion, unlike in Java, you don't need to escape backslashes in string literals. Your current regex will not match anything that does not start with a backslash, so no replacement happens.
Other than that, your code is perfectly correct and you can see that the length of the string is 8, not 4, at the time of the output. This is an effect of the normalize
call.
However, remember that it is still an equivalent representation of the original string, and so it is not surprising that you cannot tell the difference visually. This is correct Unicode rendering in action.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With