I am using MWFeedParser to add a feed into my app. Now the framework passes date's and I it has a few warnings mainly due to older type of code.
Now there are 4 warnings left which are all the same and technically I can fix them and remove them so that the warnings are gone, but then I get left with the app not working properly.
The code concerning is:
// Character sets
NSCharacterSet *stopCharacters = [NSCharacterSet characterSetWithCharactersInString:[NSString stringWithFormat:@"< \t\n\r%C%C%C%C", 0x0085, 0x000C, 0x2028, 0x2029]];
Now the bit that is the warning is:
\t\n\r%C%C%C%C", 0x0085, 0x000C, 0x2028, 0x2029]];
The warning is:
Format specifies type 'unsigned short' but the argument has type 'int'
So I changed into:
\t\n\r%i%i%i%i", 0x0085, 0x000C, 0x2028, 0x2029]];
which indeed removed the warnings and gave me perfect code:-) (no warnings or errors)
When I then ran the app it did not parse the date and it was not able to open the link. I am not sure if this a is C thing, but right now it is definitely outside of my knowledge field. Is there anyone who can help me that can fix this problem, and still have it working in the app??
Thank you in advance:-)
- (NSString *)stringByConvertingHTMLToPlainText {
// Pool
NSAutoreleasePool *pool = [[NSAutoreleasePool alloc] init];
// Character sets
NSCharacterSet *stopCharacters = [NSCharacterSet characterSetWithCharactersInString:@"< \t\n\r\x0085\x000C\u2028\u2029"];
NSCharacterSet *newLineAndWhitespaceCharacters = [NSCharacterSet characterSetWithCharactersInString:@"< \t\n\r\205\014\u2028\u2029"];
NSCharacterSet *tagNameCharacters = [NSCharacterSet characterSetWithCharactersInString:@"abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"];
// Scan and find all tags
NSMutableString *result = [[NSMutableString alloc] initWithCapacity:self.length];
NSScanner *scanner = [[NSScanner alloc] initWithString:self];
[scanner setCharactersToBeSkipped:nil];
[scanner setCaseSensitive:YES];
NSString *str = nil, *tagName = nil;
BOOL dontReplaceTagWithSpace = NO;
do {
// Scan up to the start of a tag or whitespace
if ([scanner scanUpToCharactersFromSet:stopCharacters intoString:&str]) {
[result appendString:str];
str = nil; // reset
}
// Check if we've stopped at a tag/comment or whitespace
if ([scanner scanString:@"<" intoString:NULL]) {
// Stopped at a comment or tag
if ([scanner scanString:@"!--" intoString:NULL]) {
// Comment
[scanner scanUpToString:@"-->" intoString:NULL];
[scanner scanString:@"-->" intoString:NULL];
} else {
// Tag - remove and replace with space unless it's
// a closing inline tag then dont replace with a space
if ([scanner scanString:@"/" intoString:NULL]) {
// Closing tag - replace with space unless it's inline
tagName = nil; dontReplaceTagWithSpace = NO;
if ([scanner scanCharactersFromSet:tagNameCharacters intoString:&tagName]) {
tagName = [tagName lowercaseString];
dontReplaceTagWithSpace = ([tagName isEqualToString:@"a"] ||
[tagName isEqualToString:@"b"] ||
[tagName isEqualToString:@"i"] ||
[tagName isEqualToString:@"q"] ||
[tagName isEqualToString:@"span"] ||
[tagName isEqualToString:@"em"] ||
[tagName isEqualToString:@"strong"] ||
[tagName isEqualToString:@"cite"] ||
[tagName isEqualToString:@"abbr"] ||
[tagName isEqualToString:@"acronym"] ||
[tagName isEqualToString:@"label"]);
}
// Replace tag with string unless it was an inline
if (!dontReplaceTagWithSpace && result.length > 0 && ![scanner isAtEnd]) [result appendString:@" "];
}
// Scan past tag
[scanner scanUpToString:@">" intoString:NULL];
[scanner scanString:@">" intoString:NULL];
}
} else {
// Stopped at whitespace - replace all whitespace and newlines with a space
if ([scanner scanCharactersFromSet:newLineAndWhitespaceCharacters intoString:NULL]) {
if (result.length > 0 && ![scanner isAtEnd]) [result appendString:@" "]; // Dont append space to beginning or end of result
}
}
} while (![scanner isAtEnd]);
// Cleanup
[scanner release];
// Decode HTML entities and return
NSString *retString = [[result stringByDecodingHTMLEntities] retain];
[result release];
// Drain
[pool drain];
// Return
return [retString autorelease];
}
The reason this is a total mess is because you are running into a compiler bug and an arbitrary limitation in the C spec.
Scroll to the bottom for the fix.
Format specifies type 'unsigned short' but the argument has type 'int'
My conclusion is that this is a compiler bug in Clang. It is definitely safe to ignore this warning, because (unsigned short)
arguments are always promoted to (int)
before they are passed to vararg functions anyway. This is all stuff that is in the C standard (and it applies to Objective C, too).
printf("%hd", 1); // Clang generates warning. GCC does not.
// Clang is wrong, GCC is right.
printf("%hd", 1 << 16); // Clang generates warning. GCC does not.
// Clang is right, GCC is wrong.
The problem here is that neither compiler looks deep enough.
Remember, it is actually impossible to pass a short
to printf()
, because it must get promoted to int
. GCC never gives a warning for constants, Clang ignores the fact that you are passing a constant and always gives a warning because the type is wrong. Both options are wrong.
I suspect nobody has noticed because -- why would you be passing a constant expression to printf()
anyway?
In the short term, you can use the following hack:
#pragma GCC diagnostic ignored "-Wformat"
You can use \uXXXX
notation. Except you can't, because the compiler won't let you use U+0085
this way. Why? See § 6.4.3 of C99:
A universal character name shall not specify a character whose short identifier is less than
00A0
other than0024
($
),0040
(@
), or0060
(‘
), nor one in the rangeD800
throughDFFF
inclusive.
This rules out \u0085
.
There is a proposal to fix this part of the spec.
You really want a constant string, don't you? Use this:
[NSCharacterSet characterSetWithCharactersInString:
@"\t\n\r\xc2\x85\x0c\u2028\u2029"]
This relies on the fact that the source encoding is UTF-8. Don't worry, that's not going to change any time soon.
The \xc2\x85
in the string is the UTF-8 encoding of U+0085
. The appearance of 85
in both is a coincidence.
The problem is that 0x0085
, etc are literal ints. So they don't match the %C
format specifier, which expects a unichar
, which is an unsigned short.
There's no direct way to specify a literal short in C and I'm not aware of any Objective-C extension. But you can use a brute-force approach:
NSCharacterSet *stopCharacters =
[NSCharacterSet characterSetWithCharactersInString:
[NSString stringWithFormat:@"< \t\n\r%C%C%C%C",
(unichar)0x0085, (unichar)0x000C,
(unichar)0x2028, (unichar)0x2029]];
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With