In C, deferencing the null pointer is Undefined Behavior, however the null pointer value has a bit representation that in some architectures make it points to a valid address (e.g the address 0). Let's call this address the null pointer address, for the sake of clarity. Suppose I want to write a piece of software in C, in an environment with unrestrained access to memory. Suppose further I want to write some data at the null pointer address: how would I achieve that in a standard compliant way? Example case (IA32e): <pre class="prettyprint"><code>#include <stdint.h> int main() { uintptr_t zero = 0; char* p = (char*)zero; return *p; } </code></pre> This code when compiled with gcc with -O3 for IA32e gets transformed into <pre class="prettyprint"><code>movzx eax, BYTE PTR [0] ud2 </code></pre> due to UB (0 is the bit representation of the null pointer). Since C is close to low level programming, I believe there must be a way to access the null pointer address and avoid UB. <hr> Just to be clear I'm asking about what the standard has to say about this, NOT how to achieve this in a implementation defined way. I know the answer for the latter.

I read (part of) the C99 standard to clear my mind. I found the sections that are of interest for my own question and I'm writing this as a reference. DISCLAIMER I'm an absolute beginner, 90% or more of what I have written is wrong, makes no sense, or may break you toaster. I also try to make a rationale out of the standard, often with disastrous and naive results (as stated in the comment). Don't read. Consult @Olaf, for a formal and professional answer. For the following, the term architectural address designed a memory address as seen by the processor (logical, virtual, linear, physical or bus address). In other word the addresses that you would use in assembly. <hr> In section 6.3.2.3. it reads <blockquote> An integer constant expression with the value 0, or such an expression cast to type <code>void *</code>, is called a null pointer constant. If a null pointer constant is converted to a pointer type, the resulting pointer, called a null pointer, is guaranteed to compare unequal to a pointer to any object or function. </blockquote> and regarding integer to pointer conversion <blockquote> An integer may be converted to any pointer type. Except as previously specified [i.e. for the case of null pointer constant], the result is implementation-defined, might not be correctly aligned, might not point to an entity of the referenced type, and might be a trap representation&dagger;. </blockquote> These imply that the compiler, to be compliant, need only to implement a function int2ptr from integer to pointers that <ol> <li> int2ptr(0) is, by definition, the null pointer. Note that int2ptr(0) is not mandated to be 0. It can be any bit representation.</li> <li>*int2ptr(n != 0) has no constraints. Note that this means that int2ptr needs not to be the identity function, nor a function that return valid pointers!</li> </ol> Given the code below <pre class="prettyprint"><code>char* p = (char*)241; </code></pre> The standard makes absolute no guarantee that the expression <code>*p = 56;</code> will write to the architectural address 241. And so it gives no direct way to access any other architectural address (including int2ptr(0), the address designed by a null pointer, if valid). Simply put the standard does not deal with architectural addresses, but with pointers, their comparison, conversions and their operations&Dagger;. When we write code like <code>char* p = (char*)K</code> we are not telling the compiler to make <code>p</code> point to the architectural address K, we are telling it to make a pointer out of the integer K, or in other term to make <code>p</code> point to the (C abstract) address K. Null pointer and the (architectural) address 0x0 are not the same (cit.) and so is true for any other pointer made from the integer K and the (architectural) address K. For some reasons, childhood heritages, I thought that integer literals in C could be used to express architectural addresses, instead I was wrong and that only happen to be (sort of) correct in the compilers I was using. The answer to my own question is simply: There is no standard way because there are no (architectural) address in the C standard document. This is true for every (architectural) address, not only the int2ptr(0) one1. <hr> Note about <code>return *(volatile char*)0;</code> The standard says that <blockquote> If an invalid value [a null pointer value is an invalid value] has been assigned to the pointer, the behavior of the unary * operator is undefined. </blockquote> and that <blockquote> Therefore any expression referring to such an [volatile] object shall be evaluated strictly according to the rules of the abstract machine. </blockquote> The abstract machine says that <code>*</code> is undefined for null pointer values, so that code shouldn't differ from this one <code>return *(char*)0;</code> which is also undefined. Indeed they don't differ, at least with GCC 4.9, both compile to the instructions stated in my question. The implementation defined way to access the 0 architectural address is, for GCC, the use of the -fno-isolate-erroneous-paths-dereference flag which produces the "expected" assembly code. <hr> &dagger;The mapping functions for converting a pointer to an integer or an integer to a pointer are intended to be consistent with the addressing structure of the execution environment. &Dagger;Unfortunately it says that the <code>&</code> yields the address of its operand, I believe this is a bit improper, I would say that it yields a pointer to its operand. Consider a variable <code>a</code> that is known to resides at address 0xf1 in a 16 bit address space and consider a compiler that implements int2ptr(n) = 0x8000 | n. <code>&a</code> would yield a pointer whose bit representation is 0x80f1 which is not the address of <code>a</code>. 1Which was special to me because it was the only one, in my implementations, that couldn't be accessed.

As OP has correctly concluded in her answer to her own question: <blockquote> There is no standard way because there are no (architectural) address in the C standard document. This is true for every (architectural) address, not only the int2ptr(0) one. </blockquote> However, a situation where one would want to access memory directly is likely one where a custom linker script is employed. (I.e. some kind of embedded systems stuff.) So I would say, the standard compliant way of doing what OP asks would be to export a symbol for the (architectural) address in the linker script, and not bother with the exact address in the C code itself. A variation of that scheme would be to define a symbol at address zero and simply use that to derive any other required address. To do that add something like the following to the <code>SECTIONS</code> portion of the linker script (assuming GNU ld syntax): <pre class="prettyprint"><code>_memory = 0; </code></pre> And then in your C code: <pre class="prettyprint"><code>extern char _memory[]; </code></pre> Now it is possible to e.g. create a pointer to the zero address using for example <code>char *p = &_memory[0];</code> (or simply <code>char *p = _memory;</code>), without ever converting an int to a pointer. Similarly, <code>int addr = ...; char *p_addr = &_memory[addr];</code> will create a pointer to the address <code>addr</code> without technically casting an int to a pointer. (This of course avoids the original question, because the linker is independent from the C standard and C compiler, and every linker might have a different syntax for their linker script. Also, the generated code might be less efficient, because the compiler is not aware of the address being accessed. But I think this still adds an interesting perspective to the question, so please forgive the slightly off-topic answer..)

C standard compliant way to access null pointer address?

Tags:

c

language-lawyer

undefined-behavior

null-pointer

In C, deferencing the null pointer is Undefined Behavior, however the null pointer value has a bit representation that in some architectures make it points to a valid address (e.g the address 0).
Let's call this address the null pointer address, for the sake of clarity.

Suppose I want to write a piece of software in C, in an environment with unrestrained access to memory. Suppose further I want to write some data at the null pointer address: how would I achieve that in a standard compliant way?

Example case (IA32e):

#include <stdint.h>  int main() {    uintptr_t zero = 0;     char* p = (char*)zero;     return *p; }

This code when compiled with gcc with -O3 for IA32e gets transformed into

movzx eax, BYTE PTR [0] ud2

due to UB (0 is the bit representation of the null pointer).

Since C is close to low level programming, I believe there must be a way to access the null pointer address and avoid UB.

Just to be clear
I'm asking about what the standard has to say about this, NOT how to achieve this in a implementation defined way.
I know the answer for the latter.

661

asked Feb 21 '16 14:02

Margaret Bloom

2 Answers

I read (part of) the C99 standard to clear my mind. I found the sections that are of interest for my own question and I'm writing this as a reference.

DISCLAIMER
I'm an absolute beginner, 90% or more of what I have written is wrong, makes no sense, or may break you toaster. I also try to make a rationale out of the standard, often with disastrous and naive results (as stated in the comment).
Don't read.
Consult @Olaf, for a formal and professional answer.

For the following, the term architectural address designed a memory address as seen by the processor (logical, virtual, linear, physical or bus address). In other word the addresses that you would use in assembly.

In section 6.3.2.3. it reads

An integer constant expression with the value 0, or such an expression cast to type void *, is called a null pointer constant. If a null pointer constant is converted to a pointer type, the resulting pointer, called a null pointer, is guaranteed to compare unequal to a pointer to any object or function.

and regarding integer to pointer conversion

An integer may be converted to any pointer type. Except as previously specified [i.e. for the case of null pointer constant], the result is implementation-defined, might not be correctly aligned, might not point to an entity of the referenced type, and might be a trap representation^†.

These imply that the compiler, to be compliant, need only to implement a function int2ptr from integer to pointers that

int2ptr(0) is, by definition, the null pointer.
Note that int2ptr(0) is not mandated to be 0. It can be any bit representation.
*int2ptr(n != 0) has no constraints.
Note that this means that int2ptr needs not to be the identity function, nor a function that return valid pointers!

Given the code below

char* p = (char*)241;

The standard makes absolute no guarantee that the expression *p = 56; will write to the architectural address 241.
And so it gives no direct way to access any other architectural address (including int2ptr(0), the address designed by a null pointer, if valid).

Simply put the standard does not deal with architectural addresses, but with pointers, their comparison, conversions and their operations^‡.

When we write code like char* p = (char*)K we are not telling the compiler to make p point to the architectural address K, we are telling it to make a pointer out of the integer K, or in other term to make p point to the (C abstract) address K.

Null pointer and the (architectural) address 0x0 are not the same (cit.) and so is true for any other pointer made from the integer K and the (architectural) address K.

For some reasons, childhood heritages, I thought that integer literals in C could be used to express architectural addresses, instead I was wrong and that only happen to be (sort of) correct in the compilers I was using.

The answer to my own question is simply: There is no standard way because there are no (architectural) address in the C standard document. This is true for every (architectural) address, not only the int2ptr(0) one¹.

Note about return *(volatile char*)0;

The standard says that

If an invalid value [a null pointer value is an invalid value] has been assigned to the pointer, the behavior of the unary * operator is undefined.

and that

Therefore any expression referring to such an [volatile] object shall be evaluated strictly according to the rules of the abstract machine.

The abstract machine says that * is undefined for null pointer values, so that code shouldn't differ from this one

return *(char*)0;

which is also undefined.
Indeed they don't differ, at least with GCC 4.9, both compile to the instructions stated in my question.

The implementation defined way to access the 0 architectural address is, for GCC, the use of the -fno-isolate-erroneous-paths-dereference flag which produces the "expected" assembly code.

_{^†The mapping functions for converting a pointer to an integer or an integer to a pointer are intended to be consistent with the addressing structure of the execution environment.}

_{^‡Unfortunately it says that the & yields the address of its operand, I believe this is a bit improper, I would say that it yields a pointer to its operand. Consider a variable a that is known to resides at address 0xf1 in a 16 bit address space and consider a compiler that implements int2ptr(n) = 0x8000 | n. &a would yield a pointer whose bit representation is 0x80f1 which is not the address of a.}

_{¹Which was special to me because it was the only one, in my implementations, that couldn't be accessed.}

164

answered Oct 08 '22 17:10

Margaret Bloom

As OP has correctly concluded in her answer to her own question:

There is no standard way because there are no (architectural) address in the C standard document. This is true for every (architectural) address, not only the int2ptr(0) one.

However, a situation where one would want to access memory directly is likely one where a custom linker script is employed. (I.e. some kind of embedded systems stuff.) So I would say, the standard compliant way of doing what OP asks would be to export a symbol for the (architectural) address in the linker script, and not bother with the exact address in the C code itself.

A variation of that scheme would be to define a symbol at address zero and simply use that to derive any other required address. To do that add something like the following to the SECTIONS portion of the linker script (assuming GNU ld syntax):

_memory = 0;

And then in your C code:

extern char _memory[];

Now it is possible to e.g. create a pointer to the zero address using for example char *p = &_memory[0]; (or simply char *p = _memory;), without ever converting an int to a pointer. Similarly, int addr = ...; char *p_addr = &_memory[addr]; will create a pointer to the address addr without technically casting an int to a pointer.

(This of course avoids the original question, because the linker is independent from the C standard and C compiler, and every linker might have a different syntax for their linker script. Also, the generated code might be less efficient, because the compiler is not aware of the address being accessed. But I think this still adds an interesting perspective to the question, so please forgive the slightly off-topic answer..)

answered Oct 08 '22 17:10

CliffordVienna

Related questions
                            
                                What are those strange array sizes [*] and [static] in C99?
                            
                                Saving to disk an in-memory database
                            
                                How do 32-bit applications make system calls on 64-bit Linux?
                            
                                Union initialization in C++ and C
                            
                                How to interpret strace output?
                            
                                How to compile and run a C/C++ program on the Android system
                            
                                What is _In_ in C++?
                            
                                What does SEGV_ACCERR mean?
                            
                                How do I get GDB to break out of a loop?
                            
                                What is the __STDC_VERSION__ value for C11?
                            
                                Windowless OpenGL
                            
                                Where to begin with programming for robotics? [closed]
                            
                                How can I find the full file path given a library name like libfoo.so.1?
                            
                                Convert UTF-16 to UTF-8 under Windows and Linux, in C
                            
                                Writing Structs to a file in c [closed]
                            
                                Why do the older C language specs require function-local variables to be declared up-front?
                            
                                C strcmp implementation using subtraction of characters
                            
                                What is the difference between wmain and main?
                            
                                there is no heap in c?
                            
                                (How) Can I inline a particular function call?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

C standard compliant way to access null pointer address?

Tags:

c

language-lawyer

undefined-behavior

null-pointer

Margaret Bloom

People also ask

2 Answers

Margaret Bloom

CliffordVienna

Recent Activity

Donate For Us