Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to distinguish between strings in heap or literals?

Tags:

c

People also ask

How do you know if a string is literal?

A "string literal" is a sequence of characters from the source character set enclosed in double quotation marks (" "). String literals are used to represent a sequence of characters which, taken together, form a null-terminated string. You must always prefix wide-string literals with the letter L.

What is the difference between a string and a string literal?

The main difference between String Literal and String Object is that String Literal is a String created using double quotes while String Object is a String created using the new() operator. String is a set of characters. Generally, it is necessary to perform String operations in most applications.

Is string literal in heap?

The value of any object will be stored in the heap, and all the String literals go in the pool inside the heap: The variables created on the stack are deallocated as soon as the thread completes execution.

Are strings allocated on heap?

If the length is long then the string is allocated in heap area, but if is short, it is stored in a preallocated area of the class, i.e. in a data member declared as "char s[...]".


I've had a similar case recently. Here's what I did:

If you're making an API that accepts a string pointer and then uses it to create an object (mylib_create_element), a good idea would be to copy the string to a separate heap buffer and then free it at your discretion. This way, the user is responsible for freeing the string he used in the call to your API, which makes sense. It's his string, after all.

Note that this won't work if your API depends on the user changing the string after creating the object!


On most Unixes, there are values 'etext' and 'edata'. If your pointer is between 'etext' and 'edata', then it shall be statically initialized. Those values are not mentioned in any standard, so the usage is non portable and at your own risk.

Example:

#include<stdio.h>
#include<stdlib.h>

extern char edata;
extern char etext;

#define IS_LITERAL(b) ((b) >= &etext && (b) < &edata)

int main() {
    char *p1 = "static";
    char *p2 = malloc(10);
    printf("%d, %d\n", IS_LITERAL(p1), IS_LITERAL(p2));
}

You can only ask user to explicitly mark their input as literal or allocated string.

However, as @Mints97 mentions in his answer, basically this approach is architecturally incorrect: you force user of your library for some explicit actions, and if he forgets to, it leads most likely to a memory leak (or even to an application crash). So use it only if:

  1. You want to drastically reduce amount of allocations. In my case it was JSON node names, that never change during lifetime of a program.
  2. You have good control of code of consumers of your library. In my case, libraries are shipped with binaries and tightly bound to them.

Implementation example

#define AAS_DYNAMIC             'D'
#define AAS_STATIC              'S'

#define AAS_STATIC_PREFIX       "S"
#define AAS_CONST_STR(str)      ((AAS_STATIC_PREFIX str) + 1)

char* aas_allocate(size_t count) {
    char* buffer = malloc(count + 2);
    if(buffer == NULL)
        return NULL;

    *buffer = AAS_DYNAMIC;

    return buffer + 1;
}

void aas_free(char* aas) {
    if(aas != NULL) {
        if(*(aas - 1) == AAS_DYNAMIC) {
            free(aas - 1);
        }
    }
}

...

char* s1 = AAS_CONST_STR("test1");
char* s2 = aas_allocate(10);

strcpy(s2, "test2");

aas_free(s1);
aas_free(s2);

Testing performance (note #1)

I benchmarked my libtsjson library with following code (800k iterations):

    node = json_new_node(NULL);
    json_add_integer(node, NODE_NAME("i"), 10);
    json_add_string(node, NODE_NAME("s1"), json_str_create("test1"));
    json_add_string(node, NODE_NAME("s2"), json_str_create("test2"));
    json_node_destroy(node);

My CPU is Intel Core i7 860. If NODE_NAME is just a macro, time per iteration was 479ns If NODE_NAME is a memory allocation, time per iteration was 609ns

Hinting user or compiler (note #2)

  • Add a hint to all such pointers, i.e. Linux static source analyser Sparse may catch such issues

    char __autostring* s1 = aas_copy("test"); /* OK */
    char __autostring* s2 = strdup("test");   /* Should be fail? */
    char* s3 = s1;                            /* Confuses sparse */
    char* s4 = (char*) s1;                    /* Explicit conversion - OK */
    

(not completely sure about outputs of Sparse)

  • Use simple typedef to make compiler raise a warning when you do something wrong:

    #ifdef AAS_STRICT
    typedef struct { char a; } *aas_t;
    #else
    typedef char *aas_t;
    #endif
    

This approach is one more step to a world of a dirty C hacks, i.e. sizeof(*aas_t) is now > 1.

Full source with changes may be found here. If compiled with -DAAS_STRICT it will raise tons of errors: https://ideone.com/xxmSat Even for correct code it can complain about strcpy() (not reproduced on ideone).


The simple answer is you cannot do this since C language does not demarcate stack, heap and data section.

If you wanted to have a guess - you could collect address of the first variable on the stack, address of the calling function and address of a byte of memory allocated to heap; and then compare it with your pointer - a very bad practice with no guarantees.

It's best for you to revamp your code such a way that you don't come across this issue.