Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

C++ precise garbage collector using clang/llvm?

Ok so I'm wanting to write a precise 'mark and sweep' garbage collector in C++. I have hopefully made some decisions that can help me as in all my pointers will be wrapped in a 'RelocObject' and I'll have a single block of memory for the heap. This looks something like this:

// This class acts as an indirection to the actual object in memory so that it can be      
// relocated in the sweep phase of garbage collector
class MemBlock
{
public:
    void* Get( void ) { return m_ptr; }

private:
    MemBlock( void ) : m_ptr( NULL ){}

    void* m_ptr;
};

// This is of the same size as the above class and is directly cast to it, but is     
// typed so that we can easily debug the underlying object
template<typename _Type_>
class TypedBlock
{
public:
    _Type_* Get( void ) { return m_pObject; }

private:
    TypedBlock( void ) : m_pObject( NULL ){}

    // Pointer to actual object in memory
    _Type_* m_pObject;
};

// This is our wrapper class that every pointer is wrapped in 
template< typename _Type_ >
class RelocObject
{
public:

    RelocObject( void ) : m_pRef( NULL ) {}

    static RelocObject New( void )
    {
        RelocObject ref( (TypedBlock<_Type_>*)Allocator()->Alloc( this, sizeof(_Type_), __alignof(_Type_) ) );
        new ( ref.m_pRef->Get() ) _Type_();
        return ref;
    }

    ~RelocObject(){}

    _Type_*     operator->  ( void ) const 
    { 
        assert( m_pRef && "ERROR! Object is null\n" ); 
        return (_Type_*)m_pRef->Get(); 
    }

    // Equality
    bool operator ==(const RelocObject& rhs) const { return m_pRef->Get() == rhs.m_pRef->Get(); }
    bool operator !=(const RelocObject& rhs) const { return m_pRef->Get() != rhs.m_pRef->Get(); }

    RelocObject&    operator=   ( const RelocObject& rhs ) 
    {
        if(this == &rhs) return *this;
        m_pRef = rhs.m_pRef;
        return *this; 
    }

private:

    RelocObject( TypedBlock<_Type_>* pRef ) : m_pRef( pRef ) 
    {
        assert( m_pRef && "ERROR! Can't construct a null object\n");
    }

    RelocObject*    operator&   ( void ) { return this; }
    _Type_&     operator*   ( void ) const { return *(_Type_*)m_pRef->Get(); }

    // SS: 
    TypedBlock<_Type_>* m_pRef;
};

// We would use it like so...
typedef RelocObject<Impl::Foo> Foo;

void main( void )
{
    Foo foo = Foo::New();
}

So in order to find the 'root' RelocObjects when I allocate in 'RelocObject::New' I pass in the 'this' pointer of the RelocObject into the allocator(garbage collector). The allocator then checks to see if the 'this' pointer is in the range of the memory block for the heap and if it is then I can assume its not a root.

So the issue comes when I want to trace from the roots through the child objects using the zero or more RelocObjects located inside each child object.

I want to find the RelocObjects in a class (ie a child object) using a 'precise' method. I could use a reflection approach and make the user Register where in each class his or her RelocObjects are. However this would be very error prone and so I'd like to do this automatically.

So instead I'm looking to use Clang to find the offsets of the RelocObjects within the classes at compile time and then load this information at program start and use this in the mark phase of the garbage collector to trace through and mark the child objects.

So my question is can Clang help? I've heard you can gather all kinds of type information during compilation using its compile time hooks. If so what should I look for in Clang ie are there any examples of doing this kind of thing?

Just to be explicit: I want to use Clang to automatically find the offset of 'Foo' (which is a typedef of RelocObject) in FooB without the user providing any 'hints' ie they just write:

class FooB
{
public:
    int m_a;
    Foo m_ptr;
};

Thanks in advance for any help.

like image 951
user176168 Avatar asked Nov 14 '22 07:11

user176168


1 Answers

Whenever a RelocObject is instantiated, it's address can be recorded in a RelocObject ownership database along with sizeof(*derivedRelocObject) which will immediately identify which Foo belongs to which FooB. You don't need Clang for that. Also since Foo will be created shortly after FooB, your ownership database system can be very simple as the order of "I've been created, here's my address and size" calls will show the owning RelocObject record directly before the RelocObject instance's that it owns.

Each RelocObject has a ownership_been_declared flag initialized as false, upon first use (which would be after the constructors have completed, since no real work should be done in the constructor), so when any of those newly created objects is first used it requests that the database update it's ownership, the database goes through it's queue of recorded addresses and can identify which objects belong to which, clear some from it's list, setting their ownership_been_declared flag to true and you will have the offsets too (if you still need them).


p.s. if you like I can share my code for an Incremental Garbage Collector I wrote many years ago, which you might find helpful.

like image 164
Keldon Alleyne Avatar answered Dec 07 '22 23:12

Keldon Alleyne