Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Copyright/Registered symbol encoding not working

Tags:

ios

unicode

I’ve developed an iOS app in which we can send emojis from iOS to web portal and vice versa. All emojis sent from iOS to web portal are displaying perfect except “© and ®”.

Here is the emoji encoding piece of code.

NSData *data = [messageBody dataUsingEncoding:NSNonLossyASCIIStringEncoding]; 
NSString *encodedString = [[NSString alloc] initWithData:data encoding:NSUTF8StringEncoding];

// This piece of code returns \251\256 as Unicodes of copyright and registered emojis, as these two Unicodes are not according to standard code so it doesn't display on web portal.

So what should I do to convert them standard Unicodes?

Test Run :

messageBody = @"Copy right symbol : © AND Registered Mark symbol : ®";

// Encoded string i get from the above encoding is

Copy right symbol : \\251 AND Registered Mark symbol : \\256

Where as it should like this (On standard unicodes )

Copy right symbol : \\u00A9 AND Registered Mark symbol : \\u00AE
like image 210
aqsa arshad Avatar asked Mar 16 '17 05:03

aqsa arshad


People also ask

How do I encode  copyright symbol?

Put another way, the Alt code keyboard shortcut for the copyright symbol is ALT+0169.

Is the copyright symbol UTF-8?

Now, the copyright symbol has bytes 0xC2 0xA9 or 11000010 10101001 in UTF-8 encoding, and byte 0xA9 in ANSI encoding.

How do I create  copyright symbol in XML?

Use "\u00a9" it is working exactly as required.

What is this character Â?

Â, â (a-circumflex) is a letter of the Inari Sami, Skolt Sami, Romanian, and Vietnamese alphabets. This letter also appears in French, Friulian, Frisian, Portuguese, Turkish, Walloon, and Welsh languages as a variant of the letter "a". It is included in some romanization systems for Persian, Russian, and Ukrainian.


2 Answers

First, I will try to provide the solution. Then I will try to explain why.

Escaping non-ASCII chars

To escape unicode chars in a string, you shouldn't rely on NSNonLossyASCIIStringEncoding. Below is the code that I use to escape unicode&non-ASCII chars in a string:

// NSMutableString category
- (void)appendChar:(unichar)charToAppend {
    [self appendFormat:@"%C", charToAppend];
}

// NSString category
- (NSString *)UEscapedString {
    char const hexChar[] = "0123456789ABCDEF";
    NSMutableString *outputString = [NSMutableString string];
    for (NSInteger i = 0; i < self.length; i++) {
        unichar character = [self characterAtIndex:i];
        if ((character >> 7) > 0) {
            [outputString appendString:@"\\u"];
            [outputString appendChar:(hexChar[(character >> 12) & 0xF])]; // append the hex character for the left-most 4-bits
            [outputString appendChar:(hexChar[(character >> 8) & 0xF])];  // hex for the second group of 4-bits from the left
            [outputString appendChar:(hexChar[(character >> 4) & 0xF])];  // hex for the third group
            [outputString appendChar:(hexChar[character & 0xF])];         // hex for the last group, e.g., the right most 4-bits
        } else {
            [outputString appendChar:character];
        }
    }
    return [outputString copy];
}

(NOTE: I guess Jon Rose's method does the same but I didn't wanna share a method that I didn't test)

Now you have the following string: Copy right symbol : \u00A9 AND Registered Mark symbol : \u00AE

Escaping unicode is done. Now let's convert it back to display the emojis.

Converting back

This is gonna be confusing at first but this is what it is:

NSData *data = [escapedString dataUsingEncoding:NSUTF8StringEncoding];
NSString *converted = [[NSString alloc] data encoding:NSNonLossyASCIIStringEncoding];

Now you have your emojis (and other non-ASCIIs) back.

What is happening?

The problem

In your case, you are trying to create a common language between your server side and your app. However, NSNonLossyASCIIStringEncoding is pretty bad choice for the purpose. Because this is a black-box that is created by Apple and we don't really know what it is exactly doing inside. As we can see, it converts unicode into \uXXXX while converting non-ASCII chars into \XXX. That is why you shouldn't rely on it to build a multi-platform system. There is no equivalent of it in backend platforms and Android.

Yet it is pretty mysterious, NSNonLossyASCIIStringEncoding can still convert back ® from \u00AE while it is converting it into \256 in the first place. I'm sure there are tools on other platforms to convert \uXXXX into unicode chars, that shouldn't be a problem for you.

like image 127
Mert Buran Avatar answered Nov 14 '22 19:11

Mert Buran


messageBody is a string there is no reason to convert it to data only to convert it back to a string. Replace your code with

NSString *encodedString = messageBody;

If the messageBody object is incorrect then the way to fix it is to change the way it was created. The server sends data, not strings. The data that the server sends is encoding in some agreed upon way. Generally this encoding is UTF-8. If you know the encoding you can convert the data to a string; if you don't, then the data is gibberish that cannot be read. If the messageBody is incorrect, the problem occurred when it was converted from the data that the server sent. It seems likely that you are parsing it with the incorrect encoding.

The code you posted is just plain wrong. It converts a string to data using one encoding (ASCII) and the reads that data with a different encoding (UTF8). That is like translating a book to Spanish and then having a Portuguese speaker translate it back - it might work for some words, but it is still wrong.

If you are still having trouble then you should share the code of where messageBody is created.

If you server expects a ASCII string with all unicode characters changed to \u00xx then you should first yell at your server guy because he is an idiot. But if that doesn't work you can do the following code

NSString* messageBody = @"Copy right symbol : © AND Registered Mark symbol : ®";
NSData* utf32Data = [messageBody dataUsingEncoding:NSUTF32StringEncoding];
uint32_t *bytes = (uint32_t *) [utf32Data bytes];
NSMutableString* escapedString = [[NSMutableString alloc] init];
//Start a 1 because first bytes are for endianness
for(NSUInteger index = 1; index < escapedString.length / 4 ;index++ ){
   uint32_t charValue =  bytes[index];
    if (charValue <= 127) {
        [escapedString appendFormat:@"%C", (unichar)charValue];
    }else{
        [escapedString appendFormat:@"\\\\u%04X", charValue];
    }
}
like image 34
Jon Rose Avatar answered Nov 14 '22 18:11

Jon Rose