Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

GCC compiler -- bug or unspecified behavior?

When I have conflicting definitions of the ivars of a class in objective-c (not redeclaring the class in the same file, but rather naming the same class with diff ivars, no warnings or better yet errors are issued by the compiler. However, both sets of ivars are useable by the appropriate methods in the respective files. For instance

Foo.m:

@interface foo {
int a;
}
- (int)method;
@end

@implementation foo

- (int)method {
    return a;
}

@end

Bar.m:

@interface foo {
float baz;
}

@end

@implementation foo (category)
- (float)blah {
    return baz;
}
@end

compiles without warnings or errors. Is this intentional? Is this an unchecked error? (for the record, a and baz are actually the same memory location.)

Edit: for the record I'm talking about iPhone OS, which I believe uses the same runtime as 64 bit MacOS

like image 542
Jared Pochtar Avatar asked May 12 '10 16:05

Jared Pochtar


1 Answers

Though obviously broken, that code should compile without warning in all cases simply because the compiler doesn't have enough information to know how to warn. When compiled correctly, it generates a completely different linker error only in 64-bit (which is fallout from the new Objective-C ABI, not directly from non-fragile ivars).

If you add int main() {} to foo.m and then compile it with the command line gcc -arch x86_64 foo.m -lobjc, the link errors go away because the objc runtime library provides the empty vtable symbols required to complete the link.

During compilation, think of each .m file as an isolated compilation unit. When the compiler compiles a .m file, it only has knowledge of what is in that .m file, what is provided by anything imported in that .m file, and -- if a project is configured for it -- what is defined in the project's precompiled header.

Thus, when you say in bar.m:

@interface foo {
float baz;
}

@end

@implementation foo (category)
- (float)blah {
    return baz;
}
@end
int main() {}

The compiler has no notion of the declaration in foo.m. The generated code describes a category on class foo that accesses the ivar baz. If that class doesn't exist at link time, an error will be tossed in Now, given your foo.m and bar.m with my addition of a main function as above, let's try some different compilations:

gcc -arch i386 foo.m -lobjc
Undefined symbols:
  "_main", referenced from:
      start in crt1.10.6.o
ld: symbol(s) not found
collect2: ld returned 1 exit status

Makes sense because we didn't define a main() function in foo.m. 64 bit compilation does the same.

gcc -arch i386 bar.m -lobjc

Compiles and links without warning. To understand why, look at the generated symbols (deleted about a dozen irrelevant ones):

nm -a a.out
00001f52 t -[foo(category) blah]
00000000 A .objc_category_name_foo_category

So, the binary contains a category named category on class foo. No link error because the linker doesn't actually try to resolve categories. It assumes that the class foo will magically appear before the category is resolved at runtime.

You can follow along with the runtime's class/category resolution with an ivar:

env OBJC_PRINT_CLASS_SETUP=YES ./a.out 
objc[498]: CONNECT: pending category 'foo (category)'
objc[498]: CONNECT: class 'Object' now connected (root class)
objc[498]: CONNECT: class 'Protocol' now connected
objc[498]: CONNECT: class 'List' now connected

So, the category was marked as pending. The runtime will hook it up as soon as foo comes into existence!

Now, 64 bit...

gcc -arch x86_64 bar.m -lobjc
Undefined symbols:
  "_OBJC_IVAR_$_foo.baz", referenced from:
      -[foo(category) blah] in ccvX4uIk.o
  "_OBJC_CLASS_$_foo", referenced from:
      l_OBJC_$_CATEGORY_foo_$_category in ccvX4uIk.o
      objc-class-ref-to-foo in ccvX4uIk.o
ld: symbol(s) not found

The link errors are because the modern Objective-C ABI actually causes proper symbols to be emitted for instance variables and categories for a variety of reasons, including adding metadata that can help validate programs (as it did in this case).

No compilation errors (which is correct behavior) and the link errors make sense. Now, how about linking the two together?

In the 32 bit case, everything compiles and links without error. Thus, we'll need to look at the symbols and at the ObjC debugging spew to see what is going on:

gcc -arch i386 bar.m foo.m -lobjc
nm -a a.out
00001e0f t -[foo method]
00001dea t -[foo(category) blah]
00000000 A .objc_category_name_foo_category
00003070 S .objc_class_name_foo
env OBJC_PRINT_CLASS_SETUP=YES ./a.out 
objc[530]: CONNECT: attaching category 'foo (category)'
objc[530]: CONNECT: class 'Object' now connected (root class)
objc[530]: CONNECT: class 'Protocol' now connected
objc[530]: CONNECT: class 'List' now connected
objc[530]: CONNECT: class 'foo' now connected (root class)

Aha! Now there is a class foo and the runtime connects the category to the class upon startup. Obviously, the method returning the baz ivar is going to fail spectacularly.

The 64 bit linker fails, though:

gcc -arch x86_64 bar.m foo.m -lobjc
Undefined symbols:
  "_OBJC_IVAR_$_foo.baz", referenced from:
      -[foo(category) blah] in ccBHNqzm.o
ld: symbol(s) not found
collect2: ld returned 1 exit status

With the addition of the symbols for instance variables, the linker can now catch situations where a class has been redeclared incorrectly (as was done in the @interface of bar.m).

like image 97
bbum Avatar answered Sep 30 '22 23:09

bbum