Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to handle non standard unicode characters in CoreData?

One stuff I read on the web is this

Rameeee! 👯👯

So it uses non standard character.

I tried to save that to coredata

   NSManagedObjectContext * parentMoc = [self managedObjectContextMainContext]; //Main parent is not nsmainqueueconcurency type. Hence, this is save
    [parentMoc performBlockAndWait:^{
        if (![parentMoc save:&error])
        {
            CLog(@"Error in Saving %@", error);// handle error
        }
    }];
    NSAssert(error==nil, @"Error must be nill");

I got this error:

(lldb) po error
domain: @"NSCocoaErrorDomain" - code: 1671

Hmm... what should I do?

like image 868
user4951 Avatar asked Jun 17 '14 10:06

user4951


1 Answers

Error code 1671 is not documented. However error codes 1660, 1670, and 1680 deal with string validation errors. So let's see what we can find...

Valid strings work the same regardless of whether they have Emoji or whatever. As long as the string contains only valid characters, no special treatment is needed. The string that prompted this question-- as posted-- fits this description. This code works, and changes save without errors:

NSString *testNSString = @"Rameeee! 👯👯";
[newManagedObject setValue:testNSString forKey:@"name"];

A full round trip works exactly as expected, even displaying correctly in a UILabel in a text view cell.

enter image description here

As a result it's clear that the original question is leaving out crucial details somewhere, because the correct answer is that you don't do anything special to handle those characters, they just work.

The sample string from @DevFly provides a clue:

"\U05d4\U05d4\U05d9\U05ea\U05e8\U05d2\U05e9\U05d5\U05ea \U05db\U05dc \U05db\U05da \U05d2\U05d3\U05d5\U05dc\U05d4 \Ud83d"

You actually can't construct a string literal with these contents without some significant difficulty. The compiler complains that the last character, \Ud83d is an "invalid universal character", and compilation fails. Taking a look at the relevant code chart from unicode.org confirms this: \Ud83d is in the "high surrogate area" and the chart notes that

Isolated surrogate code points have no interpretation; consequently, no character code charts or names lists are provided for this range.

What all this means is that \Ud83d is not a valid Unicode character. It does not represent any character and cannot be converted to encodings like UTF-8.

If you drop the invalid character from the end, then just like above it works normally with no special handling:

char *testString = "\u05d4\u05d4\u05d9\u05ea\u05e8\u05d2\u05e9\u05d5\u05ea \u05db\u05dc \u05db\u05da \u05d2\u05d3\u05d5\u05dc\u05d4";
NSString *testNSString = [NSString stringWithUTF8String:testString];
[newManagedObject setValue:testNSString forKey:@"name"];

That saves without errors, and again makes a complete round trip and displays correctly in a UILabel:

enter image description here

What all this means is:

  • This error means that you're somehow constructing a string that contains invalid bytes that don't represent any character.
  • This is not because the characters are Unicode, because valid Unicode is fine. But not every numeric hex value represents a Unicode character, so it's possible to have a corrupt value that can't be used in a string.
  • Since neither @JimThio nor @DevFly nor @SharenEayrs seems to want to explain how they created their problematic byte vectors (I can't really call them "strings") it's impossible to say what originally caused the problem. But that data is corrupt, period, and it only looks like a Core Data problem because that's where you're using the data.
  • A likely cause is that at some point these strings were altered in code without considering that not every character uses the same number of bytes. Doing things like changing strings based on character indexes is likely to cause problems. It might be helpful to review Apple's "Characters and Grapheme Clusters" guide and maybe the NSHipster article on type encodings.
  • @mmarkov's suggestion of using NSData might work but probably not, unless you resort to bizarre code where you avoid using those bytes in a string at all (e.g. you don't use dataUsingEncoding: to convert to NSData). Even if it does, you would still have corrupt data, and it would bite you sooner or later.

Update related to a string given in a comment:

NSString *testNSString = @"👦🏻 👧🏻 👨🏻 👩🏻 👮🏻 👰🏻 👱🏻 👲🏻 👳🏻 👴🏻 👵🏻 👶🏻 👷🏻 👸🏻 💂🏻 👼🏻 🎅🏻 🙇🏻 💁🏻 🙅🏻 🙆🏻 🙋🏻 🙎🏻 🙍🏻 💆🏻 💇🏻 💅🏻 👂🏻 👃🏻 👋🏻 👍🏻 👎🏻 ☝🏻 👆🏻 👇🏻 👈🏻 👉🏻 👌🏻 ✌🏻 👊🏻 ✊🏻 ✋🏻 💪🏻 👐🏻 🙌🏻 👏🏻 🙏🏻";
[newManagedObject setValue:testNSString forKey:@"name"];

Again this saves without errors, and comes back to the UI later exactly as shown above, including after killing the app and re-launching. If this is somehow breaking, it's not Core Data that's corrupting it.

like image 177
Tom Harrington Avatar answered Oct 21 '22 05:10

Tom Harrington