Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

NSCharacter Set uses int's but i need unassigned short?

I am using MWFeedParser to add a feed into my app. Now the framework passes date's and I it has a few warnings mainly due to older type of code.

Now there are 4 warnings left which are all the same and technically I can fix them and remove them so that the warnings are gone, but then I get left with the app not working properly.

The code concerning is:

    // Character sets
NSCharacterSet *stopCharacters = [NSCharacterSet characterSetWithCharactersInString:[NSString stringWithFormat:@"< \t\n\r%C%C%C%C", 0x0085, 0x000C, 0x2028, 0x2029]];

Now the bit that is the warning is:

\t\n\r%C%C%C%C", 0x0085, 0x000C, 0x2028, 0x2029]];

The warning is:

Format specifies type 'unsigned short' but the argument has type 'int'

So I changed into:

\t\n\r%i%i%i%i", 0x0085, 0x000C, 0x2028, 0x2029]];

which indeed removed the warnings and gave me perfect code:-) (no warnings or errors)

When I then ran the app it did not parse the date and it was not able to open the link. I am not sure if this a is C thing, but right now it is definitely outside of my knowledge field. Is there anyone who can help me that can fix this problem, and still have it working in the app??

Thank you in advance:-)

EDIT

     - (NSString *)stringByConvertingHTMLToPlainText {

// Pool
NSAutoreleasePool *pool = [[NSAutoreleasePool alloc] init];

// Character sets
NSCharacterSet *stopCharacters = [NSCharacterSet characterSetWithCharactersInString:@"< \t\n\r\x0085\x000C\u2028\u2029"];    
NSCharacterSet *newLineAndWhitespaceCharacters = [NSCharacterSet characterSetWithCharactersInString:@"< \t\n\r\205\014\u2028\u2029"];


NSCharacterSet *tagNameCharacters = [NSCharacterSet characterSetWithCharactersInString:@"abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"];

// Scan and find all tags
NSMutableString *result = [[NSMutableString alloc] initWithCapacity:self.length];
NSScanner *scanner = [[NSScanner alloc] initWithString:self];
[scanner setCharactersToBeSkipped:nil];
[scanner setCaseSensitive:YES];
NSString *str = nil, *tagName = nil;
BOOL dontReplaceTagWithSpace = NO;
do {

    // Scan up to the start of a tag or whitespace
    if ([scanner scanUpToCharactersFromSet:stopCharacters intoString:&str]) {
        [result appendString:str];
        str = nil; // reset
    }

    // Check if we've stopped at a tag/comment or whitespace
    if ([scanner scanString:@"<" intoString:NULL]) {

        // Stopped at a comment or tag
        if ([scanner scanString:@"!--" intoString:NULL]) {

            // Comment
            [scanner scanUpToString:@"-->" intoString:NULL]; 
            [scanner scanString:@"-->" intoString:NULL];

        } else {

            // Tag - remove and replace with space unless it's
            // a closing inline tag then dont replace with a space
            if ([scanner scanString:@"/" intoString:NULL]) {

                // Closing tag - replace with space unless it's inline
                tagName = nil; dontReplaceTagWithSpace = NO;
                if ([scanner scanCharactersFromSet:tagNameCharacters intoString:&tagName]) {
                    tagName = [tagName lowercaseString];
                    dontReplaceTagWithSpace = ([tagName isEqualToString:@"a"] ||
                                               [tagName isEqualToString:@"b"] ||
                                               [tagName isEqualToString:@"i"] ||
                                               [tagName isEqualToString:@"q"] ||
                                               [tagName isEqualToString:@"span"] ||
                                               [tagName isEqualToString:@"em"] ||
                                               [tagName isEqualToString:@"strong"] ||
                                               [tagName isEqualToString:@"cite"] ||
                                               [tagName isEqualToString:@"abbr"] ||
                                               [tagName isEqualToString:@"acronym"] ||
                                               [tagName isEqualToString:@"label"]);
                }

                // Replace tag with string unless it was an inline
                if (!dontReplaceTagWithSpace && result.length > 0 && ![scanner isAtEnd]) [result appendString:@" "];

            }

            // Scan past tag
            [scanner scanUpToString:@">" intoString:NULL];
            [scanner scanString:@">" intoString:NULL];

        }

    } else {

        // Stopped at whitespace - replace all whitespace and newlines with a space
        if ([scanner scanCharactersFromSet:newLineAndWhitespaceCharacters intoString:NULL]) {
            if (result.length > 0 && ![scanner isAtEnd]) [result appendString:@" "]; // Dont append space to beginning or end of result
        }

    }

} while (![scanner isAtEnd]);

// Cleanup
[scanner release];

// Decode HTML entities and return
NSString *retString = [[result stringByDecodingHTMLEntities] retain];
[result release];

// Drain
[pool drain];

// Return
return [retString autorelease];

}

like image 673
jwknz Avatar asked Nov 25 '12 03:11

jwknz


2 Answers

This is a total mess

The reason this is a total mess is because you are running into a compiler bug and an arbitrary limitation in the C spec.

Scroll to the bottom for the fix.

Compiler warning

Format specifies type 'unsigned short' but the argument has type 'int'

My conclusion is that this is a compiler bug in Clang. It is definitely safe to ignore this warning, because (unsigned short) arguments are always promoted to (int) before they are passed to vararg functions anyway. This is all stuff that is in the C standard (and it applies to Objective C, too).

printf("%hd", 1); // Clang generates warning. GCC does not.
                  // Clang is wrong, GCC is right.

printf("%hd", 1 << 16); // Clang generates warning.  GCC does not.
                        // Clang is right, GCC is wrong.

The problem here is that neither compiler looks deep enough.

Remember, it is actually impossible to pass a short to printf(), because it must get promoted to int. GCC never gives a warning for constants, Clang ignores the fact that you are passing a constant and always gives a warning because the type is wrong. Both options are wrong.

I suspect nobody has noticed because -- why would you be passing a constant expression to printf() anyway?

In the short term, you can use the following hack:

#pragma GCC diagnostic ignored "-Wformat"

Universal character names

You can use \uXXXX notation. Except you can't, because the compiler won't let you use U+0085 this way. Why? See § 6.4.3 of C99:

A universal character name shall not specify a character whose short identifier is less than 00A0 other than 0024 ($), 0040 (@), or 0060 (), nor one in the range D800 through DFFF inclusive.

This rules out \u0085.

There is a proposal to fix this part of the spec.

The fix

You really want a constant string, don't you? Use this:

[NSCharacterSet characterSetWithCharactersInString:
  @"\t\n\r\xc2\x85\x0c\u2028\u2029"]

This relies on the fact that the source encoding is UTF-8. Don't worry, that's not going to change any time soon.

The \xc2\x85 in the string is the UTF-8 encoding of U+0085. The appearance of 85 in both is a coincidence.

like image 86
Dietrich Epp Avatar answered Nov 15 '22 05:11

Dietrich Epp


The problem is that 0x0085, etc are literal ints. So they don't match the %C format specifier, which expects a unichar, which is an unsigned short.

There's no direct way to specify a literal short in C and I'm not aware of any Objective-C extension. But you can use a brute-force approach:

NSCharacterSet *stopCharacters =
         [NSCharacterSet characterSetWithCharactersInString:
                  [NSString stringWithFormat:@"< \t\n\r%C%C%C%C", 
                               (unichar)0x0085, (unichar)0x000C,
                               (unichar)0x2028, (unichar)0x2029]];
like image 35
Tommy Avatar answered Nov 15 '22 05:11

Tommy