Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I safely ensure a char* type will be correctly implemented (on any platform) according to OpenGL spec?

In trying to get my head around graphics programming using c++ and OpenGL3+ I have come across a slightly specialized understanding problem with the char type, the pointers to it and potential implicit or explicit conversion to other char pointer types. I think I have been able to find a solution, but I would like to doublecheck by asking for your take on this.

The current (October 2014) OpenGL4.5 core profile specification (Table 2.2 in chapter 2.2 Command Syntax) lists the OpenGL data types and explicitly states

GL types are not C types. Thus, for example, GL type int is referred to as GLint outside this document, and is not necessarily equivalent to the C type int. An implementation must use exactly the number of bits indicated in the table to represent a GL type.

The GLchar type in this table is specified as a type of bit width 8 that is used to represent characters which make up a string.
To further narrow down what GLchar has to provide, we can have a look at the GLSL Specification (OpenGL Shading Language 4.50, July 2014, Chapter 3.1 Character Set and Phases of Compilation):

The source character set used for the OpenGL shading languages is Unicode in the UTF-8 encoding scheme.

Now the way this is implemented in any OpenGL library header I cared to look for is a simple

typedef char GLchar;

which of course flies in the face of the statement "GL types are not C types" I just quoted.

Normally, this wouldn't be a problem, seeing as typedefs are meant for just such a situation where the underlying type might change in the future.

The problem starts in the user implementation.

Going through a few tutorials on OpenGL, I came across various ways to assign the GLSL source code to a GLchar array needed for processing it. (Please forgive me for not providing all the links. Currently, I do not have the reputation needed to do so.)

The site open.gl likes to do this:

const GLchar* vertexSource =
"#version 150 core\n"
"in vec2 position;"
"void main() {"
"   gl_Position = vec4(position, 0.0, 1.0);"
"}";

or this:

// Shader macro
#define GLSL(src) "#version 150 core\n" #src

// Vertex shader
const GLchar* vertexShaderSrc = GLSL(
  in vec2 pos;

  void main() {
      gl_Position = vec4(pos, 0.0, 1.0);
  }
);

On lazyfoo.net (Chapter 30 Loading Text File Shaders), the source code is read from a file (my preferred method) into a std::string shaderString variable which is then used to initialize the GL string:

const GLchar* shaderSource = shaderString.c_str();

The most adventurous approach I have seen yet is the first one I get when I google loading shader file - the ClockworkCoders tutorial on loading hosted at the OpenGL SDK that uses an explicit cast - not to GLchar* but to GLubyte* - like this:

GLchar** ShaderSource;
unsigned long len;
ifstream file;
// . . .
len = getFileLength(file);
// . . .
*ShaderSource = (GLubyte*) new char[len+1];

Any decent c++ compiler will give an invalid conversion error here. The g++ compiler will let it go with a warning only if the -fpermissive flag is set. Compiling it that way, the code will work because GLubyte is in the end just a typedef alias of the fundamental type unsigned char which is the same length as char. In this case an implicit pointer conversion may generate a warning but should still do the right thing. This goes against C++ standard, where char* is not compatible with signed or unsigned char*, so doing it this way is bad practice. Which brings me to the problem I had:

My point is, all these tutorials rely on the basic fact that the implementation of the OpenGL specification is currently just window dressing in the form of typedefs for fundamental types. This assumption is in no way covered by the specification. Worse, it is explicitly discouraged to think of GL types as C types.

If at any point in the future the OpenGL implementation should change - for whatever reason - so that GLchar is no longer a simple typedef alias of char, code like this will no longer compile as there are no implicit conversions between pointers to incompatible types. While it is certainly possible in some cases to tell the compiler to just ignore the invalid pointer conversion, opening the gates to bad programming like that may and will lead to all kinds of other problems in your code.

I have seen exactly one place that does it right to my understanding: the official opengl.org wiki example on Shader Compilation, i.e.:

std::string vertexSource = //Get source code for vertex shader.
// . . .
const GLchar *source = (const GLchar *)vertexSource.c_str();

The sole difference to other tutorials is an explicit cast to const GLchar* before the assignment. Ugly, I know, yet, as far as I can see, it makes the code secure against any valid future implementation of the OpenGL specification (summed up): a type of bit size 8 representing characters in the UTF-8 encoding scheme.

To illustrate my reasoning, I have written a simple class GLchar2 that fulfils this specification but no longer allows implicit pointer conversion to or from any fundamental type:

// GLchar2.h - a char type of 1 byte length

#include <iostream>
#include <locale> // handle whitespaces

class GLchar2 {
  char element; // value of the GLchar2 variable
public:
  // default constructor
  GLchar2 () {}
  // user defined conversion from char to GLchar2
  GLchar2 (char element) : element(element) {}
  // copy constructor
  GLchar2 (const GLchar2& c) : element(c.element) {}
  // destructor
  ~GLchar2 () {}
  // assignment operator
  GLchar2& operator= (const GLchar2& c) {element = c; return *this;}
  // user defined conversion to integral c++ type char
  operator char () const {return element;}
};

// overloading the output operator to correctly handle GLchar2
// due to implicit conversion of GLchar2 to char, implementation is unnecessary
//std::ostream& operator<< (std::ostream& o, const GLchar2 character) {
//  char out = character;
//  return o << out;
//}

// overloading the output operator to correctly handle GLchar2*
std::ostream& operator<< (std::ostream& o, const GLchar2* output_string) {
  for (const GLchar2* string_it = output_string; *string_it != '\0'; ++string_it) {
    o << *string_it;
  }
  return o;
}

// overloading the input operator to correctly handle GLchar2
std::istream& operator>> (std::istream& i, GLchar2& input_char) {
  char in;
  if (i >> in) input_char = in; // this is where the magic happens
  return i;
}

// overloading the input operator to correctly handle GLchar2*
std::istream& operator>> (std::istream& i, GLchar2* input_string) {
  GLchar2* string_it;
  int width = i.width();
  std::locale loc;
  while (std::isspace((char)i.peek(),loc)) i.ignore(); // ignore leading whitespaces
  for (string_it = input_string; (((i.width() == 0 || --width > 0) && !std::isspace((char)i.peek(),loc)) && i >> *string_it); ++string_it);
  *string_it = '\0'; // terminate with null character
  i.width(0); // reset width of i
  return i;
}

Note that in addition to writing the class, I have implemented overloads of the input and output stream operators to correctly handle reading and writing from the class as well as c-string style null-terminated GLchar2 arrays. This is possible without knowing the internal structure of the class, as long as it provides implicit conversions between the types char and GLchar2 (but not their pointers). No explicit conversions between char and GLchar2 or their pointer types are necessary.

I don't claim that this implementation of GLchar is worthwhile or complete, but it should do for the purpose of demonstration. Comparing it to a typedef char GLchar1; I find what I can and cannot do with this type:

// program: test_GLchar.cpp - testing implementation of GLchar

#include <iostream>
#include <fstream>
#include <locale> // handle whitespaces
#include "GLchar2.h"

typedef char GLchar1;

int main () {
  // byte size comparison
  std::cout << "GLchar1 has a size of " << sizeof(GLchar1) << " byte.\n"; // 1
  std::cout << "GLchar2 has a size of " << sizeof(GLchar2) << " byte.\n"; // 1
  // char constructor
  const GLchar1 test_char1 = 'o';
  const GLchar2 test_char2 = 't';
  // default constructor
  GLchar2 test_char3;
  // char conversion
  test_char3 = '3';
  // assignment operator
  GLchar2 test_char4;
  GLchar2 test_char5;
  test_char5 = test_char4 = 65; // ASCII value 'A'
  // copy constructor
  GLchar2 test_char6 = test_char5;
  // pointer conversion
  const GLchar1* test_string1 = "test string one"; // compiles
  //const GLchar1* test_string1 = (const GLchar1*)"test string one"; // compiles
  //const GLchar2* test_string2 = "test string two"; // does *not* compile!
  const GLchar2* test_string2 = (const GLchar2*)"test string two"; // compiles

  std::cout << "A test character of type GLchar1: " << test_char1 << ".\n"; // o
  std::cout << "A test character of type GLchar2: " << test_char2 << ".\n"; // t
  std::cout << "A test character of type GLchar2: " << test_char3 << ".\n"; // 3
  std::cout << "A test character of type GLchar2: " << test_char4 << ".\n"; // A
  std::cout << "A test character of type GLchar2: " << test_char5 << ".\n"; // A
  std::cout << "A test character of type GLchar2: " << test_char6 << ".\n"; // A

  std::cout << "A test string of type GLchar1: " << test_string1 << ".\n";
  // OUT: A test string of type GLchar1: test string one.\n
  std::cout << "A test string of type GLchar2: " << test_string2 << ".\n";
  // OUT: A test string of type GLchar2: test string two.\n

  // input operator comparison
  // test_input_file.vert has the content
  //  If you can read this,
  //  you can read this.
  // (one whitespace before each line to test implementation)
  GLchar1* test_string3;
  GLchar2* test_string4;
  GLchar1* test_string5;
  GLchar2* test_string6;
  // read character by character
  std::ifstream test_file("test_input_file.vert");
  if (test_file) {
    test_file.seekg(0, test_file.end);
    int length = test_file.tellg();
    test_file.seekg(0, test_file.beg);

    test_string3 = new GLchar1[length+1];
    GLchar1* test_it = test_string3;
    std::locale loc;
    while (test_file >> *test_it) {
      ++test_it;
      while (std::isspace((char)test_file.peek(),loc)) {
        *test_it = test_file.peek(); // add whitespaces
        test_file.ignore();
        ++test_it;
      }
    }
    *test_it = '\0';
    std::cout << test_string3 << "\n";
    // OUT: If you can read this,\n you can read this.\n
    std::cout << length << " " <<test_it - test_string3 << "\n";
    // OUT: 42 41\n
    delete[] test_string3;
    test_file.close();
  }
  std::ifstream test_file2("test_input_file.vert");
  if (test_file2) {
    test_file2.seekg(0, test_file2.end);
    int length = test_file2.tellg();
    test_file2.seekg(0, test_file2.beg);

    test_string4 = new GLchar2[length+1];
    GLchar2* test_it = test_string4;
    std::locale loc;
    while (test_file2 >> *test_it) {
      ++test_it;
      while (std::isspace((char)test_file2.peek(),loc)) {
        *test_it = test_file2.peek(); // add whitespaces
        test_file2.ignore();
        ++test_it;
      }
    }
    *test_it = '\0';
    std::cout << test_string4 << "\n";
    // OUT: If you can read this,\n you can read this.\n
    std::cout << length << " " << test_it - test_string4 << "\n";
    // OUT: 42 41\n
    delete[] test_string4;
    test_file2.close();
  }
  // read a word (until delimiter whitespace)
  test_file.open("test_input_file.vert");
  if (test_file) {
    test_file.seekg(0, test_file.end);
    int length = test_file.tellg();
    test_file.seekg(0, test_file.beg);

    test_string5 = new GLchar1[length+1];
    //test_file.width(2);
    test_file >> test_string5;
    std::cout << test_string5 << "\n";
    // OUT: If\n
    delete[] test_string5;
    test_file.close();
  }
  test_file2.open("test_input_file.vert");
  if (test_file2) {
    test_file2.seekg(0, test_file2.end);
    int length = test_file2.tellg();
    test_file2.seekg(0, test_file2.beg);

    test_string6 = new GLchar2[length+1];
    //test_file2.width(2);
    test_file2 >> test_string6;
    std::cout << test_string6 << "\n";
    // OUT: If\n
    delete[] test_string6;
    test_file2.close();
  }
  // read word by word
  test_file.open("test_input_file.vert");
  if (test_file) {
    test_file.seekg(0, test_file.end);
    int length = test_file.tellg();
    test_file.seekg(0, test_file.beg);

    test_string5 = new GLchar1[length+1];
    GLchar1* test_it = test_string5;
    std::locale loc;
    while (test_file >> test_it) {
      while (*test_it != '\0') ++test_it; // test_it points to null character
      while (std::isspace((char)test_file.peek(),loc)) {
        *test_it = test_file.peek(); // add whitespaces
        test_file.ignore();
        ++test_it;
      }
    }
    std::cout << test_string5 << "\n";
    // OUT: If you can read this,\n you can read this.\n
    delete[] test_string5;
    test_file.close();
  }
  test_file2.open("test_input_file.vert");
  if (test_file2) {
    test_file2.seekg(0, test_file2.end);
    int length = test_file2.tellg();
    test_file2.seekg(0, test_file2.beg);

    test_string6 = new GLchar2[length+1];
    GLchar2* test_it = test_string6;
    std::locale loc;
    while (test_file2 >> test_it) {
      while (*test_it != '\0') ++test_it; // test_it points to null character
      while (std::isspace((char)test_file2.peek(), loc)) {
        *test_it = test_file2.peek(); // add whitespaces
        test_file2.ignore();
        ++test_it;
      }
    }
    std::cout << test_string6 << "\n";
    // OUT: If you can read this,\n you can read this.\n
    delete[] test_string6;
    test_file2.close();
  }
  // read whole file with std::istream::getline
  test_file.open("test_input_file.vert");
  if (test_file) {
    test_file.seekg(0, test_file.end);
    int length = test_file.tellg();
    test_file.seekg(0, test_file.beg);

    test_string5 = new GLchar1[length+1];
    std::locale loc;
    while (std::isspace((char)test_file.peek(),loc)) test_file.ignore(); // ignore leading whitespaces
    test_file.getline(test_string5, length, '\0');
    std::cout << test_string5  << "\n";
    // OUT: If you can read this,\n you can read this.\n
    delete[] test_string5;
    test_file.close();
  }
  // no way to do this for a string of GLchar2 as far as I can see
  // the getline function that returns c-strings rather than std::string is
  // a member of istream and expects to return *this, so overloading is a no go
  // however, this works as above:

  // read whole file with std::getline
  test_file.open("test_input_file.vert");
  if (test_file) {
    std::locale loc;
    while (std::isspace((char)test_file.peek(),loc)) test_file.ignore(); // ignore leading whitespaces
    std::string test_stdstring1;
    std::getline(test_file, test_stdstring1, '\0');
    test_string5 = (GLchar1*) test_stdstring1.c_str();
    std::cout << test_string5 << "\n";
    // OUT: If you can read this,\n you can read this.\n
    test_file.close();
  }

  test_file2.open("test_input_file.vert");
  if (test_file2) {
    std::locale loc;
    while (std::isspace((char)test_file2.peek(),loc)) test_file2.ignore(); // ignore leading whitespaces
    std::string test_stdstring2;
    std::getline(test_file2, test_stdstring2, '\0');
    test_string6 = (GLchar2*) test_stdstring2.c_str();
    std::cout << test_string6 << "\n";
    // OUT: If you can read this,\n you can read this.\n
    test_file.close();
  }

  return 0;
}

I conclude that there are at least two viable ways to write code that will always handle GLchar strings correctly without violating C++ standards:

  1. Use an explicit conversion from a char array to a GLchar array (untidy, but doable).

    const GLchar* sourceCode = (const GLchar*)"some code";

    std::string sourceString = std::string("some code"); // can be from a file GLchar* sourceCode = (GLchar*) sourceString.c_str();

  2. Use the input stream operator to read the string from a file directly into a GLchar array.

The second method has the advantage that no explicit conversion is necessary, but to implement it, space for the string must be allocated dynamically. Another potential downside is that OpenGL won't necessarily provide overloads for the input and output stream operators to handle their type or their pointer type. However, as I have shown, writing these overloads yourself is no matter of witchcraft as long as at least the type conversion to and from char has been implemented.

So far, I have not found any other viable overload for input from files that provides exactly the same syntax as for c-strings.

Now my question is this: Have I thought this through correctly so that my code will remain safe against possible changes made by OpenGL and - no matter whether the answer is yes or no - is there a better (i.e. safer) way to ensure upward compatibility of my code?

Also, I have read this stackoverflow question and answer, but as far as I am aware, it does not cover strings, since they are not fundamental types.

I am also not asking how to write a class that does provide implicit pointer conversions (though that would be an interesting exercise). The point of this example class is to prohibit implicit pointer assignment, since there is no guarantee that OpenGL would provide such if they decided to change their implementation.

like image 807
eSemmel Avatar asked Jan 03 '15 20:01

eSemmel


2 Answers

What the OpenGL spec means with the statement

"GL types are not C types"

is, that a OpenGL implementation may use any type it sees fit for the purpose. It does not mean that the implementation is forbidden to use C types. It means that when programming against the OpenGL API no assumptions regarding the nature of the OpenGL types must be made.

OpenGL specifies that GLchar is 8 bits (with the signedness explicitly not specified). Period, no further discussion. So as long as you code your program in a way, that GLchar is treated as a 8 bit datatype, everything is fine. If you're worried about the validity you can add a static assert CHAR_BIT == 8 in the code to throw an error if the platform doesn't follow this.

The typedefs in the OpenGL headers (the headers are not normative BTW) are choosen so that the resulting types match the requirements of the underlying platform ABI. A slightly more portable gl.h may do a

#include <stdint.h>
typedef int8_t GLchar;

but this just boils down to the type definition of int8_t which will likely just be

typedef signed char int8_t;

for the usual compiler.

If at any point in the future the OpenGL implementation should change - for whatever reason - so that GLchar is no longer a simple typedef alias of char, code like this will no longer compile as there are no implicit conversions between pointers to incompatible types

OpenGL is not defined in terms of a C API or ABI. GLchar is 8 bits and as long as the API bindings adhere to that, everything is fine. It will never happen that the OpenGL spec changes to a different size for GLchar because that would wreak havoc not only for existing code, but also OpenGL-over-network protocols like GLX.

Update

Note that if you care about the signedness. The most important effect of signedness in C is regarding integer promotion rules and that in C many character operations in fact operate on ints rather than chars (using negative values as a side channel) and to be unsurprising regarding integer promotion rules the char type in C is signed. That's it.

Update 2

Note that you'll be hard pressed to find any C implementation for which the platform ABI has CHAR_BIT != 8 and OpenGL implementations exist for it – heck, I'm not even sure, that there is or was any C implementation with CHAR_BIT != 8 at all. Unusual sizes for int and short? Sure! But char? I don't know.

Update 3

Regarding getting this whole thing into the C++ static type system, I'd suggest deriving a custom glstring class from std::basic_string with the type, traits and allocator being instanced for GLchar. When it comes to pointer type compatibility in most ABIs GLchar aliases to signed char and thus behaves like standard C strings.

like image 111
datenwolf Avatar answered Sep 22 '22 16:09

datenwolf


Extending @datenwolf answer:

Regarding CHAR_BIT: C requires CHAR_BIT >= 8, char is the smallest addressable unit in C, and OpenGL has an 8-bit type. This implies that you cannot implement a conforming OpenGL on a system with CHAR_BIT != 8... which is consistent with the statement

... it is not possible to implement the GL API on an architecture which cannot satisfy the exact bit width requirements in table 2.2.

from the OpenGL 4.5 spec.

As per converting GLubyte* to char*, AFAIK it is actually completely valid C and C++. char* is explicitly permitted to alias all other types, which is why code like

int x;
istream &is = ...;
is.read((char*)&x, sizeof(x));

is valid. Since sizeof(char) == sizeof(GLchar) == 1 by the OpenGL and C bit-width requirements combined, you can freely access arrays of GLchar as arrays of char.

The paragraph you quote with "GL types are not C types" is referring to the fact that the OpenGL spec uses types like "float" and "int" without the "GL" prefix, thus it says that despite it using these unprefixed names, they do not (necessarily) refer to the corresponding C types. Rather an OpenGL type named "int" may be an alias to the C type "long" in a concrete C language binding. On the contrary, any sane binding will use C types so that you could write arithmetic expressions using OpenGL types (in C you can do that only with built-in types).

Have I thought this through correctly so that my code will remain safe against possible changes made by OpenGL and - no matter whether the answer is yes or no - is there a better (i.e. safer) way to ensure upward compatibility of my code?

I think that you are thinking too much about code portability from language-lawyer point of view, rather than focusing on learning OpenGL and writing code portable in practice. OpenGL spec does not define the language bindings, but no C binding will ever break what everybody expects to be working, like assigning a const GLchar *str = "hello world". Remember also that these are C bindings that you typically use from C++, so no crazy classes and operator overloading will be there in the headers, which practically restricts the implementation to use fundamental types for Table 2.2.

Edit:

There are platforms with CHAR_BIT > 8. See Exotic architectures the standards committees care about. Though today it's mostly limited to DSPs. POSIX requires CHAR_BIT == 8.

Never bother instantiating basic_strings and iostreams with types other than those required by the standard. If your type is an alias to one of those, you are fine, but you could use the the former directly. If your type is different, you will enter a never ending nightmare of traits, locales, codecvt states, etc. which cannot be portably resolved. In fact never use anything other than a char.

like image 33
Yakov Galka Avatar answered Sep 20 '22 16:09

Yakov Galka