I'm stuck on stoopid today as I can't convert a simple piece of ObjC code to its Cpp equivalent. I have this: <pre class="prettyprint"><code> const UInt8 *myBuffer = [(NSString*)aRequest UTF8String]; </code></pre> And I'm trying to replace it with this: <pre class="prettyprint"><code> const UInt8 *myBuffer = (const UInt8 *)CFStringGetCStringPtr(aRequest, kCFStringEncodingUTF8); </code></pre> This is all in a tight unit test that writes an example HTTP request over a socket with CFNetwork APIs. I have working ObjC code that I'm trying to port to C++. I'm gradually replacing NS API calls with their toll free bridged equivalents. Everything has been one for one so far until this last line. This is like the last piece that needs completed.

This is one of those things where Cocoa does all the messy stuff behind the scenes, and you never really appreciate just how complicated things can be until you have to roll up your sleeves and do it yourself. The simple answer for why it's not 'simple' is because <code>NSString</code> (and <code>CFString</code>) deal with all the complicated details of dealing with multiple character sets, Unicode, etc, etc, while presenting a simple, uniform API for manipulating strings. It's object oriented at its best- the details of 'how' <code>(NS|CF)String</code> deals with strings that have different string encodings (UTF8, MacRoman, UTF16, ISO 2022 Japanese, etc) is a private implementation detail. It all 'just works'. It helps to understand how <code>[@"..." UTF8String]</code> works. This is a private implementation detail, so this isn't gospel, but based on observed behavior. When you send a string a <code>UTF8String</code> message, the string does something approximating (not actually tested, so consider it pseudo-code, and there's actually simpler ways to do the exact same thing, so this is overly verbose): <pre class="prettyprint"><code>- (const char *)UTF8String { NSUInteger utf8Length = [self lengthOfBytesUsingEncoding:NSUTF8StringEncoding]; NSMutableData *utf8Data = [NSMutableData dataWithLength:utf8Length + 1UL]; char *utf8Bytes = [utf8Data mutableBytes]; [self getBytes:utf8Bytes maxLength:utf8Length usedLength:NULL encoding:NSUTF8StringEncoding options:0UL range:NSMakeRange(0UL, [self length]) remainingRange:NULL]; return(utf8Bytes); } </code></pre> You don't have to worry about the memory management issues of dealing with the buffer that <code>-UTF8String</code> returns because the <code>NSMutableData</code> is autoreleased. A string object is free to keep the contents of the string in whatever form it wants, so there's no guarantee that its internal representation is the one that would be most convenient for your needs (in this case, UTF8). If you're using just plain C, you're going to have to deal with managing some memory to hold any string conversions that might be required. What was once a simple <code>-UTF8String</code> method call is now much, much more complicated. Most of <code>NSString</code> is actually implemented in/with CoreFoundation / <code>CFString</code>, so there's obviously a path from a <code>CFStringRef</code> -> <code>-UTF8String</code>. It's just not as neat and simple as <code>NSString</code>'s <code>-UTF8String</code>. Most of the complication is with memory management. Here's how I've tackled it in the past: <pre class="prettyprint"><code>void someFunction(void) { CFStringRef cfString; // Assumes 'cfString' points to a (NS|CF)String. const char *useUTF8StringPtr = NULL; UInt8 *freeUTF8StringPtr = NULL; CFIndex stringLength = CFStringGetLength(cfString), usedBytes = 0L; if((useUTF8StringPtr = CFStringGetCStringPtr(cfString, kCFStringEncodingUTF8)) == NULL) { if((freeUTF8StringPtr = malloc(stringLength + 1L)) != NULL) { CFStringGetBytes(cfString, CFRangeMake(0L, stringLength), kCFStringEncodingUTF8, '?', false, freeUTF8StringPtr, stringLength, &usedBytes); freeUTF8StringPtr[usedBytes] = 0; useUTF8StringPtr = (const char *)freeUTF8StringPtr; } } long utf8Length = (long)((freeUTF8StringPtr != NULL) ? usedBytes : stringLength); if(useUTF8StringPtr != NULL) { // useUTF8StringPtr points to a NULL terminated UTF8 encoded string. // utf8Length contains the length of the UTF8 string. // ... do something with useUTF8StringPtr ... } if(freeUTF8StringPtr != NULL) { free(freeUTF8StringPtr); freeUTF8StringPtr = NULL; } } </code></pre> NOTE: I haven't tested this code, but it is modified from working code. So, aside from obvious errors, I believe it should work. The above tries to get the pointer to the buffer that <code>CFString</code> uses to store the contents of the string. If <code>CFString</code> happens to have the string contents encoded in UTF8 (or a suitably compatible encoding, such as ASCII), then it's likely <code>CFStringGetCStringPtr()</code> will return non-<code>NULL</code>. This is obviously the best, and fastest, case. If it can't get that pointer for some reason, say if <code>CFString</code> has its contents encoded in UTF16, then it allocates a buffer with <code>malloc()</code> that is large enough to contain the entire string when its is transcoded to UTF8. Then, at the end of the function, it checks to see if memory was allocated and <code>free()</code>'s it if necessary. And now for a few tips and tricks... <code>CFString</code> 'tends to' (and this is a private implementation detail, so it can and does change between releases) keep 'simple' strings encoded as MacRoman, which is an 8-bit wide encoding. MacRoman, like UTF8, is a superset of ASCII, such that all characters < 128 are equivalent to their ASCII counterparts (or, in other words, any character < 128 is ASCII). In MacRoman, characters >= 128 are 'special' characters. They all have Unicode equivalents, and tend to be things like extra currency symbols and 'extended western' characters. See Wikipedia - MacRoman for more info. But just because a <code>CFString</code> says it's MacRoman (<code>CFString</code> encoding value of <code>kCFStringEncodingMacRoman</code>, <code>NSString</code> encoding value of <code>NSMacOSRomanStringEncoding</code>) doesn't mean that it has characters >= 128 in it. If a <code>kCFStringEncodingMacRoman</code> encoded string returned by <code>CFStringGetCStringPtr()</code> is composed entirely of characters < 128, then it is exactly equivalent to its ASCII (<code>kCFStringEncodingASCII</code>) encoded representation, which is also exactly equivalent to the strings UTF8 (<code>kCFStringEncodingUTF8</code>) encoded representation. Depending on your requirements, you may be able to 'get by' using <code>kCFStringEncodingMacRoman</code> instead of <code>kCFStringEncodingUTF8</code> when calling <code>CFStringGetCStringPtr()</code>. Things 'may' (probably) be faster if you require strict UTF8 encoding for your strings but use <code>kCFStringEncodingMacRoman</code>, then check to make sure the string returned by <code>CFStringGetCStringPtr(string, kCFStringEncodingMacRoman)</code> only contains characters that are < 128. If there are characters >= 128 in the string, then go the slow route by <code>malloc()</code>ing a buffer to hold the converted results. Example: <pre class="prettyprint"><code>CFIndex stringLength = CFStringGetLength(cfString), usedBytes = 0L; useUTF8StringPtr = CFStringGetCStringPtr(cfString, kCFStringEncodingUTF8); for(CFIndex idx = 0L; (useUTF8String != NULL) && (useUTF8String[idx] != 0); idx++) { if(useUTF8String[idx] >= 128) { useUTF8String = NULL; } } if((useUTF8String == NULL) && ((freeUTF8StringPtr = malloc(stringLength + 1L)) != NULL)) { CFStringGetBytes(cfString, CFRangeMake(0L, stringLength), kCFStringEncodingUTF8, '?', false, freeUTF8StringPtr, stringLength, &usedBytes); freeUTF8StringPtr[usedBytes] = 0; useUTF8StringPtr = (const char *)freeUTF8StringPtr; } </code></pre> Like I said, you don't really appreciate just how much work Cocoa does for you automatically until you have to do it all yourself. :)

What's the CFString Equiv of NSString's UTF8String?

Tags:

objective-c

iphone

I'm stuck on stoopid today as I can't convert a simple piece of ObjC code to its Cpp equivalent. I have this:

  const UInt8 *myBuffer = [(NSString*)aRequest UTF8String];

And I'm trying to replace it with this:

  const UInt8 *myBuffer = (const UInt8 *)CFStringGetCStringPtr(aRequest, kCFStringEncodingUTF8);

This is all in a tight unit test that writes an example HTTP request over a socket with CFNetwork APIs. I have working ObjC code that I'm trying to port to C++. I'm gradually replacing NS API calls with their toll free bridged equivalents. Everything has been one for one so far until this last line. This is like the last piece that needs completed.

485

asked Oct 22 '09 19:10

Cliff

2 Answers

This is one of those things where Cocoa does all the messy stuff behind the scenes, and you never really appreciate just how complicated things can be until you have to roll up your sleeves and do it yourself.

The simple answer for why it's not 'simple' is because NSString (and CFString) deal with all the complicated details of dealing with multiple character sets, Unicode, etc, etc, while presenting a simple, uniform API for manipulating strings. It's object oriented at its best- the details of 'how' (NS|CF)String deals with strings that have different string encodings (UTF8, MacRoman, UTF16, ISO 2022 Japanese, etc) is a private implementation detail. It all 'just works'.

It helps to understand how [@"..." UTF8String] works. This is a private implementation detail, so this isn't gospel, but based on observed behavior. When you send a string a UTF8String message, the string does something approximating (not actually tested, so consider it pseudo-code, and there's actually simpler ways to do the exact same thing, so this is overly verbose):

- (const char *)UTF8String
{
  NSUInteger utf8Length = [self lengthOfBytesUsingEncoding:NSUTF8StringEncoding];
  NSMutableData *utf8Data = [NSMutableData dataWithLength:utf8Length + 1UL];
  char *utf8Bytes = [utf8Data mutableBytes];
  [self     getBytes:utf8Bytes
           maxLength:utf8Length
          usedLength:NULL
            encoding:NSUTF8StringEncoding
             options:0UL
               range:NSMakeRange(0UL, [self length])
      remainingRange:NULL];
  return(utf8Bytes);
}

You don't have to worry about the memory management issues of dealing with the buffer that -UTF8String returns because the NSMutableData is autoreleased.

A string object is free to keep the contents of the string in whatever form it wants, so there's no guarantee that its internal representation is the one that would be most convenient for your needs (in this case, UTF8). If you're using just plain C, you're going to have to deal with managing some memory to hold any string conversions that might be required. What was once a simple -UTF8String method call is now much, much more complicated.

Most of NSString is actually implemented in/with CoreFoundation / CFString, so there's obviously a path from a CFStringRef -> -UTF8String. It's just not as neat and simple as NSString's -UTF8String. Most of the complication is with memory management. Here's how I've tackled it in the past:

void someFunction(void) {
  CFStringRef cfString; // Assumes 'cfString' points to a (NS|CF)String.

  const char *useUTF8StringPtr = NULL;
  UInt8 *freeUTF8StringPtr = NULL;

  CFIndex stringLength = CFStringGetLength(cfString), usedBytes = 0L;

  if((useUTF8StringPtr = CFStringGetCStringPtr(cfString, kCFStringEncodingUTF8)) == NULL) {
    if((freeUTF8StringPtr = malloc(stringLength + 1L)) != NULL) {
      CFStringGetBytes(cfString, CFRangeMake(0L, stringLength), kCFStringEncodingUTF8, '?', false, freeUTF8StringPtr, stringLength, &usedBytes);
      freeUTF8StringPtr[usedBytes] = 0;
      useUTF8StringPtr = (const char *)freeUTF8StringPtr;
    }
  }

  long utf8Length = (long)((freeUTF8StringPtr != NULL) ? usedBytes : stringLength);

  if(useUTF8StringPtr != NULL) {
    // useUTF8StringPtr points to a NULL terminated UTF8 encoded string.
    // utf8Length contains the length of the UTF8 string.

    // ... do something with useUTF8StringPtr ...
  }

  if(freeUTF8StringPtr != NULL) { free(freeUTF8StringPtr); freeUTF8StringPtr = NULL; }
}

NOTE: I haven't tested this code, but it is modified from working code. So, aside from obvious errors, I believe it should work.

The above tries to get the pointer to the buffer that CFString uses to store the contents of the string. If CFString happens to have the string contents encoded in UTF8 (or a suitably compatible encoding, such as ASCII), then it's likely CFStringGetCStringPtr() will return non-NULL. This is obviously the best, and fastest, case. If it can't get that pointer for some reason, say if CFString has its contents encoded in UTF16, then it allocates a buffer with malloc() that is large enough to contain the entire string when its is transcoded to UTF8. Then, at the end of the function, it checks to see if memory was allocated and free()'s it if necessary.

And now for a few tips and tricks... CFString 'tends to' (and this is a private implementation detail, so it can and does change between releases) keep 'simple' strings encoded as MacRoman, which is an 8-bit wide encoding. MacRoman, like UTF8, is a superset of ASCII, such that all characters < 128 are equivalent to their ASCII counterparts (or, in other words, any character < 128 is ASCII). In MacRoman, characters >= 128 are 'special' characters. They all have Unicode equivalents, and tend to be things like extra currency symbols and 'extended western' characters. See Wikipedia - MacRoman for more info. But just because a CFString says it's MacRoman (CFString encoding value of kCFStringEncodingMacRoman, NSString encoding value of NSMacOSRomanStringEncoding) doesn't mean that it has characters >= 128 in it. If a kCFStringEncodingMacRoman encoded string returned by CFStringGetCStringPtr() is composed entirely of characters < 128, then it is exactly equivalent to its ASCII (kCFStringEncodingASCII) encoded representation, which is also exactly equivalent to the strings UTF8 (kCFStringEncodingUTF8) encoded representation.

Depending on your requirements, you may be able to 'get by' using kCFStringEncodingMacRoman instead of kCFStringEncodingUTF8 when calling CFStringGetCStringPtr(). Things 'may' (probably) be faster if you require strict UTF8 encoding for your strings but use kCFStringEncodingMacRoman, then check to make sure the string returned by CFStringGetCStringPtr(string, kCFStringEncodingMacRoman) only contains characters that are < 128. If there are characters >= 128 in the string, then go the slow route by malloc()ing a buffer to hold the converted results. Example:

CFIndex stringLength = CFStringGetLength(cfString), usedBytes = 0L;

useUTF8StringPtr = CFStringGetCStringPtr(cfString, kCFStringEncodingUTF8);

for(CFIndex idx = 0L; (useUTF8String != NULL) && (useUTF8String[idx] != 0); idx++) {
  if(useUTF8String[idx] >= 128) { useUTF8String = NULL; }
}

if((useUTF8String == NULL) && ((freeUTF8StringPtr = malloc(stringLength + 1L)) != NULL)) {
  CFStringGetBytes(cfString, CFRangeMake(0L, stringLength), kCFStringEncodingUTF8, '?', false, freeUTF8StringPtr, stringLength, &usedBytes);
  freeUTF8StringPtr[usedBytes] = 0;
  useUTF8StringPtr = (const char *)freeUTF8StringPtr;
}

Like I said, you don't really appreciate just how much work Cocoa does for you automatically until you have to do it all yourself. :)

180

answered Nov 11 '22 04:11

johne

In the sample code above, the following appears:

CFIndex stringLength = CFStringGetLength(cfString)

stringLength is then being used to malloc() a temporary buffer of that many bytes, plus 1.

But the header file for CFStringGetLength() expressly says it returns the number of 16-bit Unicode characters, not bytes. So if some of those Unicode characters are outside the ASCII range, the malloc() buffer won't be long enough to hold the UTF-8 conversion of the string.

Perhaps I'm missing something, but to be absolutely safe, the number of bytes needed to hold N arbitrary Unicode characters is at most 4*n, when they're all converted to UTF-8.

answered Nov 11 '22 04:11

Doug

Related questions
                            
                                UISearchBar: changing background color of input field
                            
                                Where we have to store download data in iPhone application?
                            
                                Showing UIMenuController loses keyboard
                            
                                Strange LLVM warning: no previous prototype for function for
                            
                                Stop the touchstart performing too quick when scrolling
                            
                                Why is -animateWithDuration:delay:options:animations:completion: blocking the UI?
                            
                                How To Set UIColor Value To CGContextSetRGBStrokeColor
                            
                                How to autoresize sublayers
                            
                                How can I reset or clear the clipping mask associated with a CGContext?
                            
                                Convert CVImageBufferRef to CVPixelBufferRef
                            
                                iBeacon: get major and minor - only looking for uuid
                            
                                UIScrollView contentInset not working
                            
                                App launches in full screen in simulator, but not on device
                            
                                dashed border UIImageView swift
                            
                                when I built my app in xcode,there is an error:/bin/sh: bad interpreter: Operation not permitted
                            
                                Build on device fails with exit code 1
                            
                                Unlock iPhone X simulator
                            
                                Remove UIVIew from SuperView with Animation
                            
                                UILabel - how to change text position
                            
                                iPhone speech recognition API? [duplicate]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With