Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Caching a const char * as a return type

Was reading up a bit on my C++, and found this article about RTTI (Runtime Type Identification): http://msdn.microsoft.com/en-us/library/70ky2y6k(VS.80).aspx . Well, that's another subject :) - However, I stumbled upon a weird saying in the type_info-class, namely about the ::name-method. It says: "The type_info::name member function returns a const char* to a null-terminated string representing the human-readable name of the type. The memory pointed to is cached and should never be directly deallocated."

How can you implement something like this yourself!? I've been struggling quite a bit with this exact problem often before, as I don't want to make a new char-array for the caller to delete, so I've stuck to std::string thus far.

So, for the sake of simplicity, let's say I want to make a method that returns "Hello World!", let's call it

const char *getHelloString() const;

Personally, I would make it somehow like this (Pseudo):

const char *getHelloString() const
{
  char *returnVal = new char[13];
  strcpy("HelloWorld!", returnVal);

  return returnVal
}

.. But this would mean that the caller should do a delete[] on my return pointer :(

Thx in advance

like image 334
Meeh Avatar asked Oct 14 '08 11:10

Meeh


5 Answers

How about this:

const char *getHelloString() const
{
    return "HelloWorld!";
}

Returning a literal directly means the space for the string is allocated in static storage by the compiler and will be available throughout the duration of the program.

like image 194
Greg Rogers Avatar answered Nov 04 '22 04:11

Greg Rogers


I like all the answers about how the string could be statically allocated, but that's not necessarily true for all implementations, particularly the one whose documentation the original poster linked to. In this case, it appears that the decorated type name is stored statically in order to save space, and the undecorated type name is computed on demand and cached in a linked list.

If you're curious about how the Visual C++ type_info::name() implementation allocates and caches its memory, it's not hard to find out. First, create a tiny test program:

#include <cstdio>
#include <typeinfo>
#include <vector>    
int main(int argc, char* argv[]) {
    std::vector<int> v;
    const type_info& ti = typeid(v);
    const char* n = ti.name();
    printf("%s\n", n);
    return 0;
}

Build it and run it under a debugger (I used WinDbg) and look at the pointer returned by type_info::name(). Does it point to a global structure? If so, WinDbg's ln command will tell the name of the closest symbol:

0:000> ?? n
char * 0x00000000`00857290
 "class std::vector<int,class std::allocator<int> >"
0:000> ln 0x00000000`00857290
0:000>

ln didn't print anything, which indicates that the string wasn't in the range of addresses owned by any specific module. It would be in that range if it was in the data or read-only data segment. Let's see if it was allocated on the heap, by searching all heaps for the address returned by type_info::name():

0:000> !heap -x 0x00000000`00857290
Entry             User              Heap              Segment               Size  PrevSize  Unused    Flags
-------------------------------------------------------------------------------------------------------------
0000000000857280  0000000000857290  0000000000850000  0000000000850000        70        40        3e  busy extra fill 

Yes, it was allocated on the heap. Putting a breakpoint at the start of malloc() and restarting the program confirms it.

Looking at the declaration in <typeinfo> gives a clue about where the heap pointers are getting cached:

struct __type_info_node {
    void *memPtr;
    __type_info_node* next;
};

extern __type_info_node __type_info_root_node;
...
_CRTIMP_PURE const char* __CLR_OR_THIS_CALL name(__type_info_node* __ptype_info_node = &__type_info_root_node) const;

If you find the address of __type_info_root_node and walk down the list in the debugger, you quickly find a node containing the same address that was returned by type_info::name(). The list seems to be related to the caching scheme.

The MSDN page linked in the original question seems to fill in the blanks: the name is stored in its decorated form to save space, and this form is accessible via type_info::raw_name(). When you call type_info::name() for the first time on a given type, it undecorates the name, stores it in a heap-allocated buffer, caches the buffer pointer, and returns it.

The linked list may also be used to deallocate the cached strings during program exit (however, I didn't verify whether that is the case). This would ensure that they don't show up as memory leaks when you run a memory debugging tool.

like image 35
bk1e Avatar answered Nov 04 '22 05:11

bk1e


Well gee, if we are talking about just a function, that you always want to return the same value. it's quite simple.

const char * foo() 
{
   static char[] return_val= "HelloWorld!";
   return return_val;
}

The tricky bit is when you start doing things where you are caching the result, and then you have to consider Threading,or when your cache gets invalidated, and trying to store thing in thread local storage. But if it's just a one off output that is immediate copied, this should do the trick.
Alternately if you don't have a fixed size you have to do something where you have to either use a static buffer of arbitrary size.. in which you might eventually have something too large, or turn to a managed class say std::string.

const char * foo() 
{
   static std::string output;
   DoCalculation(output);
   return output.c_str();
}

also the function signature

const char *getHelloString() const;

is only applicable for member functions. At which point you don't need to deal with static function local variables and could just use a member variable.

like image 2
Dan Avatar answered Nov 04 '22 04:11

Dan


I think that since they know that there are a finite number of these, they just keep them around forever. It might be appropriate for you to do that in some instances, but as a general rule, std::string is going to be better.

They can also look up new calls to see if they made that string already and return the same pointer. Again, depending on what you are doing, this may be useful for you too.

like image 1
Lou Franco Avatar answered Nov 04 '22 03:11

Lou Franco


Be careful when implementing a function that allocates a chunk of memory and then expects the caller to deallocate it, as you do in the OP:

const char *getHelloString() const
{
  char *returnVal = new char[13];
  strcpy("HelloWorld!", returnVal);

  return returnVal
}

By doing this you are transferring ownership of the memory to the caller. If you call this code from some other function:

int main()
{
  char * str = getHelloString();
  delete str;
  return 0;
}

...the semantics of transferring ownership of the memory is not clear, creating a situation where bugs and memory leaks are more likely.

Also, at least under Windows, if the two functions are in 2 different modules you could potentially corrupt the heap. In particular, if main() is in hello.exe, compiled in VC9, and getHelloString() is in utility.dll, compiled in VC6, you'll corrupt the heap when you delete the memory. This is because VC6 and VC9 both use their own heap, and they aren't the same heap, so you are allocating from one heap and deallocating from another.

like image 1
John Dibling Avatar answered Nov 04 '22 03:11

John Dibling