Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Explain "C fundamentally has a corrupt type system"

In the book Coders at Work (p355), Guy Steele says of C++:

I think the decision to be backwards-compatible with C is a fatal flaw. It’s just a set of difficulties that can’t be overcome. C fundamentally has a corrupt type system. It’s good enough to help you avoid some difficulties but it’s not airtight and you can’t count on it

What does he mean by describing the type system as "corrupt"?

Can you demonstrate with a simple example in C?

Edit:

  1. The quote sounds polemic, but I'm not trying to be. I simply want to understand what he means.

  2. Please give examples in C not C++. I'm interested in the "fundamentally" part too :)

like image 925
Ian Mackinnon Avatar asked Nov 08 '10 14:11

Ian Mackinnon


6 Answers

The obvious examples in C of non-type-safety simply come from the fact you can cast from void * to any type without having to explicitly cast so.

struct X
{
  int x;
};

struct Y
{
  double y;
};

struct X xx;
xx.x = 1;
void * vv = &xx;
struct Y * yy = vv; /* no need to cast explicitly */
printf( "%f", yy->y );

Of course printf itself is not exactly typesafe.

C++ is not totally typesafe.

struct Base
{
   int b;
};

struct Derived : Base
{
  int d;

  Derived() 
  {
     b = 1;
     d = 3;
  }
};

Derived derivs[50];
Base * bb = &derivs[0];
std::cout << bb[3].b << std::endl;

It has no problem converting the Derived* to a Base* but you run into problems when you try using the Base* as an array as it will get the pointer arithmetic all wrong and whilst all the b values are 1 you may well get a 3 (As the ints will go 1-3-1-3 etc)

like image 76
CashCow Avatar answered Nov 17 '22 11:11

CashCow


Basically you can cast any data type to any data type

struct SomeStruct {
    void* data;
};

struct SomeStruct object;
*( (int*) &object ) = 10;

and noone catches you.

like image 38
sharptooth Avatar answered Nov 17 '22 11:11

sharptooth


char buffer[42];
FunctionThatDestroysTheStack(buffer);  // By writing 43 chars or more
like image 4
Hans Passant Avatar answered Nov 17 '22 11:11

Hans Passant


The C type system does have some problems. Things like implicit function declaration and implicit conversion from void* can SILENTLY break type safety.

C++ fixes pretty much all of these holes. The C++ type system is NOT backwards compatible with C, it's only compatible with well-written typesafe C code.

Furthermore, the people arguing against C++ typically point you to Java or C# as the "solution". Yet Java and C# do have holes in their type system (array covariance). C++ doesn't have this problem.

EDIT: Examples, in C++, attempting to use array covariance that would (improperly) be allowed by the Java and C# type systems.

#include <stdlib.h>

struct Base {};
struct Derived : Base {};

template<size_t N>
void func1( Base (&array)[N] );

void func2( Base** pArray );

void func3( Base*& refArray );

void test1( void )
{
  Base b[40];
  Derived d[40];

  func1(b); // ok
  func1(d); // error caught by C++ type system
}

void test2( void )
{
  Base* b[40] = {};
  Derived* d[40] = {};

  func2(b); // ok
  func2(d); // error caught by C++ type system

  func3(b[0]); // ok
  func3(d[0]); // error caught by C++ type system
}

Results:

Comeau C/C++ 4.3.10.1 (Oct  6 2008 11:28:09) for ONLINE_EVALUATION_BETA2
Copyright 1988-2008 Comeau Computing.  All rights reserved.
MODE:strict errors C++ C++0x_extensions

"ComeauTest.c", line 19: error: no instance of function template "func1" matches
          the argument list
            The argument types that you used are: (Derived [40])
        func1(d); // error caught by C++ type system
        ^

"ComeauTest.c", line 28: error: argument of type "Derived **" is incompatible with
          parameter of type "Base **"
        func2(d); // error caught by C++ type system
              ^

"ComeauTest.c", line 31: error: a reference of type "Base *&" (not const-qualified)
          cannot be initialized with a value of type "Derived *"
        func3(d[0]); // error caught by C++ type system
              ^

3 errors detected in the compilation of "ComeauTest.c".

This doesn't mean that there are no holes at all in the C++ type system, but it does show that you can't silently overwrite a pointer-to-Derived with a pointer-to-Base like Java and C# allow.

like image 3
Ben Voigt Avatar answered Nov 17 '22 12:11

Ben Voigt


IMHO the "most broken" part of the C type system is that the concepts of

  • values/parameters that are optional
  • mutable values/pass-by-reference
  • arrays
  • non-POD function parameters

are all mapped to the single language concept "pointer". That means, if you get a function parameter of type X*, it might be an optional parameter, it might be expected that the function changes the value pointed to by X*, it might be that there are multiple instances of X after the one pointed to (it's open how many - the number could be passed as a separate parameter, or some kind special "terminator" value might mark the end of the array, as in nul-terminated strings). Or, the parameter might simply by a single structure, that you're not expected to change, but it's cheaper to pass it by reference.

If you get something of type X**, it might be an array of optional values, or it might be an array of simple values and you're expected to change it. Or it might be a 2d jagged array. Or an optional value passed by reference.

In contrast, take the ML family of languages (F#, OCaML, SML). Here these concepts map to separate language constructs:

  • values that are optional have the type X option
  • values that are mutable/pass by reference have the type X ref
  • arrays have the type X array
  • and non-POD types can be passed like PODs. Because they aren't mutable, the compiler can pass them by reference internally, but you don't need to know about that implementation detail

And you can of course combine those, i.e. int optional ref is a mutable value, that can be set to nothing or some integer value. int ref optional on the other hand is an optional mutable value; it can be nothing (and noone can change it) or it can be some mutable int (and you can change it to any other mutable it, but not to nothing).

These distinctions are very sublte, but you have to make them whether you program in ML or not. In C you have to make the same distinctions, but they're not explicitly stated in the type system. You have to read the documentation very carefully, or you might introduce sublte (read: hard to find) bugs if you misunderstand which kind of pointer usage is meant when.

like image 2
Niki Avatar answered Nov 17 '22 10:11

Niki


You'd have to ask him what he meant to get a definitive answer, or perhaps provide more context for that quote.

However, it is pretty clear that if this is a fatal flaw for C++, the disease is chronic, not acute - C++ is thriving, and continually evolving as evidenced by ongoing Boost and C++0x efforts.

I don't even think about C and C++ as coupled any more - a few weeks on the respective fora here quickly cures one of any confusion over the fact that they are two different languages, each with its own strengths and foibles.

like image 2
Steve Townsend Avatar answered Nov 17 '22 11:11

Steve Townsend