Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a way to use NSString stringByFoldingWithOptions to unfold the single French 'œ' character into 'oe'?

For a diacritics-agnostic full text search feature, I use the following code to convert accented characters like é or Ö into their lowercase non-accented form e and o

[[inputString stringByFoldingWithOptions: 
    NSCaseInsensitiveSearch
    + NSDiacriticInsensitiveSearch
    + NSWidthInsensitiveSearch
locale: [NSLocale currentLocale]] lowercaseString];

This works. However, I found no way to convert special characters whose base form consists of multiple characters like the French œ (as in "sœur") or the German ß (as in 'Fluß'). I would like to convert them into oe and ss respectively. I found no flag for stringByFoldingWithOptions and did not find anything on the web.

EDIT

ß is actually handled correctly by the above code. It converts to ss.

like image 716
regular Avatar asked Nov 30 '25 04:11

regular


1 Answers

From worst to best solution.

Solution 1 will work only for æ and ß and fails for everything else (œ, ij, , , , , , , , ...):

NSString *result = [[[NSString alloc] initWithData:[inputString dataUsingEncoding:NSASCIIStringEncoding allowLossyConversion:YES] encoding:NSASCIIStringEncoding] autorelease];

Solution 2 will work for most ligatures and only fails for æ, œ and ij. I've tried all possible NSLocale, so it's not the issue here:

NSString *result = [inputString stringByFoldingWithOptions:NSCaseInsensitiveSearch | NSDiacriticInsensitiveSearch | NSWidthInsensitiveSearch locale:[NSLocale currentLocale]];

Solution 3 will work for most ligatures and only fails for œ:

NSString *result = [[[NSString alloc] initWithData:[[inputString precomposedStringWithCompatibilityMapping] dataUsingEncoding:NSASCIIStringEncoding allowLossyConversion:YES] encoding:NSASCIIStringEncoding] autorelease];

Which means œ will always need to be manually handled. And best solution is to combine either solution 2 or 3 with a manual string replacement.

Solution 2bis:

inputString = [inputString stringByReplacingOccurrencesOfString:@"æ" withString:@"ae" options:NSCaseInsensitiveSearch range:NSMakeRange(0, [inputString length])];
inputString = [inputString stringByReplacingOccurrencesOfString:@"œ" withString:@"oe" options:NSCaseInsensitiveSearch range:NSMakeRange(0, [inputString length])];
inputString = [inputString stringByReplacingOccurrencesOfString:@"ij" withString:@"ij" options:NSCaseInsensitiveSearch range:NSMakeRange(0, [inputString length])];
NSString *result = [inputString stringByFoldingWithOptions:NSCaseInsensitiveSearch | NSDiacriticInsensitiveSearch | NSWidthInsensitiveSearch locale:[NSLocale currentLocale]];

Solution 3bis:

inputString = [inputString stringByReplacingOccurrencesOfString:@"Œ" withString:@"OE"];
inputString = [inputString stringByReplacingOccurrencesOfString:@"œ" withString:@"oe"];
NSString *result = [[[NSString alloc] initWithData:[[inputString precomposedStringWithCompatibilityMapping] dataUsingEncoding:NSASCIIStringEncoding allowLossyConversion:YES] encoding:NSASCIIStringEncoding] autorelease];

Knowing I might be missing some replacements with solution 2bis and NSLocale is unpredictable, best solution is 3bis. And also this last solution allows you to keep case sensitivity if you need.

like image 76
Cœur Avatar answered Dec 01 '25 20:12

Cœur