Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can a pointer be implemented except storing an address?

Recently I answered another question asking for questions every decent C++ programmer should be able to answer. My suggestion was

Q: How does a pointer point to an object?
A: The pointer stores the address of that object.

but user R.. disagrees with the A I propose to the Q - he says that The correct answer would be "it's implementation-specific". While present-day implementations store numeric addresses as pointers, there's no reason it couldn't be something much more elaborate.

Definitely I can't disagree that there could be other implementations except storing an address just for the sake of disagreeing. I'm really interested what other really used implementations are there.

What are other actually used implementations of pointers in C++ except storing an address in an integer type variable? How is casting (especially dynamic_cast) implemented?

like image 240
sharptooth Avatar asked Oct 15 '10 06:10

sharptooth


3 Answers

On a conceptual level, I agree with you -- I define the address of an object as "the information needed to locate the object in memory". What the address looks like, though, can vary quite a bit.

A pointer value these days is usually represented as a simple, linear address... but there have been architectures where the address format isn't so simple, or varies depending on type. For example, programming in real mode on an x86 (e.g. under DOS), you sometimes have to store the address as a segment:offset pair.

See http://c-faq.com/null/machexamp.html for some more examples. I found the reference to the Symbolics Lisp machine intriguing.

like image 156
Jander Avatar answered Oct 20 '22 01:10

Jander


I would call Boost.Interprocess as a witness.

In Boost.Interprocess the interprocess pointers are offsets from the beginning of the mapped memory area. This allows to get the pointer from another process, map the memory area (which pointer address might be different from the one in the process which passed the pointer) and still get to the same object.

Therefore, interprocess pointers are not represented as addresses, but they can be resolved as one.

Thanks for watching :-)

like image 39
Matthieu M. Avatar answered Oct 20 '22 00:10

Matthieu M.


If we are familiar with accessing array elements using pointer arithmetic it is easy to understand how objects are layed out in memory and how dynamic_cast works. Consider the following simple class:

struct point
{
    point (int x, int y) : x_ (x), y_ (y) { }
    int x_;
    int y_;
};

point* p = new point(10, 20); 

Assume that p is assigned to the memory location 0x01. Its member variables are stored in their own disparate locations, say x_ is stored at 0x04 and y_ at 0x07. It is easier to visualize the object p as an array of pointers. p (in our case (0x1) points to the beginning of the array:

0x01
+-------+-------+
|       |       |
+---+---+----+--+
    |        |
    |        |
   0x04     0x07
 +-----+   +-----+
 |  10 |   | 20  |
 +-----+   +-----+

So code to access the fields will essentially become accessing array elements using pointer arithmetic:

p->x_; // => **p
p->y_; // => *(*(p + 1))

If the language support some kind of automatic memory management, like GC, additional fields may be added to the object array behind the scene. Imagine a C++ implementation that collects garbage with the help of reference counting. Then the compiler might add an additional field (rc) to keep track of that count. The above array representation then becomes:

0x01
+-------+-------+-------+
|       |       |       |
+--+----+---+---+----+--+
   |        |        |
   |        |        |
  0x02     0x04     0x07
+--+---+  +-----+   +-----+
|  rc  |  |  10 |   | 20  |
+------+  +-----+   +-----+

The first cell points to the address of the reference count. The compiler will emit appropriate code to access the portions of p that should be visible to the outside world:

p->x_; // => *(*(p + 1))
p->y_; // => *(*(p + 2))

Now it is easy to understand how dynamic_cast works. Compiler deals with polymorphic classes by adding an extra hidden pointer to the underlying representation. This pointer contains the address of the beginning of another 'array' called the vtable, which in turn contain the addresses of the implementations of virtual functions in this class. But the first entry of the vtable is special. It does not point to a function address but to an object of a class called type_info. This object contains the run-time type information of the object and pointers to type_infos of its base classes. Consider the following example:

class Frame
{
public:
    virtual void render (Screen* s) = 0;
    // ....
};

class Window : public Frame
{ 
public:
    virtual void render (Screen* s)
    {
        // ...
    }
    // ....
private:
   int x_;
   int y_;
   int w_;
   int h_;
};

An object of Window will have the following memory layout:

window object (w)
+---------+
| &vtable +------------------+
|         |                  |
+----+----+                  |
+---------+     vtable       |            Window type_info    Frame type_info
|  &x_    |     +------------+-----+      +--------------+    +----------------+
+---------+     | &type_info       +------+              +----+                |
+---------+     |                  |      |              |    |                |
|  &y_    |     +------------------+      +--------------+    +----------------+
+---------+     +------------------+
+---------+     | &Window::render()|
+---------+     +------------------+    
+---------+                     
|  &h_    |
+---------+

Now consider what will happen when we try to cast a Window* a Frame*:

Frame* f = dynamic_cast<Frame*> (w);

dynamic_cast will follow the type_info links from the vtable of w, confirms that Frame is in its list of base classes and assign w to f. If it cannot find Frame in the list, f is set to 0 indicating that the casting failed. The vtable provides an economic way to represent the type_info of a class. This is one reason why dynamic_cast works only for classes with virtual functions. Restricting dynamic_cast to polymorphic types also makes sense from a logical point of view. This is, if an object has no virtual functions, it cannot safely be manipulated without knowledge of its exact type.

The target type of dynamic_cast need not be polymorphic. This allows us to wrap a concrete type in a polymorphic type:

// no virtual functions
class A 
{
};

class B
{
public:
    virtual void f() = 0;
};

class C : public A, public B
{
    virtual void f() { }
};


C* c = new C;
A* a = dynamic_cast<A*>(c); // OK
like image 21
Vijay Mathew Avatar answered Oct 20 '22 00:10

Vijay Mathew