Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Differentiate String Literal from Char Array

I want to write some function that takes a string literal - and only a string literal:

template <size_t N>
void foo(const char (&str)[N]);

Unfortunately, that is too expansive and will match any array of char - whether or not it's a true string literal. While it's impossible to tell the difference between these at compile-time - without having to resort to requiring the caller to wrap the literal/array - at run-time, the two arrays will be in entirely different places in memory:

foo("Hello"); // at 0x400f81

const char msg[] = {'1', '2', '3'};
foo(msg); // at 0x7fff3552767f

Is there a way to know where in memory the string data could live so that I could at least assert that the function takes a string literal only? (Using gcc 4.7.3, but really a solution for any compiler would be great).

like image 858
Barry Avatar asked Feb 10 '15 17:02

Barry


People also ask

What is the difference between character array and string literal?

String refers to a sequence of characters represented as a single data type. Character Array is a sequential collection of data type char. Strings are immutable.

What is the difference between string and char?

char is a primitive data type whereas String is a class in java. char represents a single character whereas String can have zero or more characters. So String is an array of chars. We define char in java program using single quote (') whereas we can define String in Java using double quotes (").

What is the difference between string and array?

The key difference between Array and String is that an Array is a data structure that holds a collection of elements having the same data types, while a String is a collection of characters.

Are string literals arrays?

String literals are stored in C as an array of chars, terminted by a null byte.


3 Answers

You seem to assume that a necessary trait of a "true string literal" is that the compiler bakes it into the static storage of the executable.

This is not actually true. The C and C++ standards guarantee us that a string literal shall have static storage duration, so it must exist for the life of the program, but if a compiler can arrange this without placing the literal in static storage, it is free to do so, and some compilers sometimes do.

However, it's clear that the property you want to test, for a given string literal, is whether it is in fact in static storage. And since it need not be in static storage, as far as the language standards guarantee, there can't be any solution of your problem founded solely on portable C/C++.

Whether a given string literal is in fact in static storage is the question of whether the address of the string literal lies within one of the address ranges that get assigned to linkage sections that qualify as static storage, in the nomenclature of your particular toolchain, when your program is built by that toolchain.

So the solution I suggest is that you enable your program to know the address ranges of those of its own linkage sections that qualify as static storage, and then it can test whether a given string literal is in static storage by obvious code.

Here is an illustration of this solution for a toy C++ project, prog built with the GNU/Linux x86_64 toolchain (C++98 or better will do, and the approach is only slightly more fiddly for C). In this setting, we link in ELF format, and the linkage sections we will deem static storage are .bss (0-initialized static data), .rodata (read-only static static) and .data (read/write static data).

Here are our source files:

section_bounds.h

#ifndef SECTION_BOUNDS_H
#define SECTION_BOUNDS_H
// Export delimiting values for our `.bss`, `.rodata` and `.data` sections
extern unsigned long const section_bss_start;
extern unsigned long const section_bss_size;
extern unsigned long const section_bss_end;
extern unsigned long const section_rodata_start;
extern unsigned long const section_rodata_size;
extern unsigned long const section_rodata_end;
extern unsigned long const section_data_start;
extern unsigned long const section_data_size;
extern unsigned long const section_data_end;
#endif

section_bounds.cpp

// Assign either placeholder or pre-defined values to 
// the section delimiting globals.
#ifndef BSS_START
#define BSS_START 0x0
#endif
#ifndef BSS_SIZE
#define BSS_SIZE 0xffff
#endif
#ifndef RODATA_START
#define RODATA_START 0x0
#endif
#ifndef RODATA_SIZE
#define RODATA_SIZE 0xffff
#endif
#ifndef DATA_START
#define DATA_START 0x0
#endif
#ifndef DATA_SIZE
#define DATA_SIZE 0xffff
#endif
extern unsigned long const 
    section_bss_start = BSS_START;
extern unsigned long const section_bss_size = BSS_SIZE;
extern unsigned long const 
    section_bss_end = section_bss_start + section_bss_size;
extern unsigned long const 
    section_rodata_start = RODATA_START;
extern unsigned long const 
    section_rodata_size = RODATA_SIZE;
extern unsigned long const 
    section_rodata_end = section_rodata_start + section_rodata_size;
extern unsigned long const 
    section_data_start = DATA_START;
extern unsigned long const 
    section_data_size = DATA_SIZE;
extern unsigned long const 
    section_data_end = section_data_start + section_data_size;

cstr_storage_triage.h

#ifndef CSTR_STORAGE_TRIAGE_H
#define CSTR_STORAGE_TRIAGE_H

// Classify the storage type addressed by `s` and print it on `cout`
extern void cstr_storage_triage(const char *s);

#endif

cstr_storage_triage.cpp

#include "cstr_storage_triage.h"
#include "section_bounds.h"
#include <iostream>

using namespace std;

void cstr_storage_triage(const char *s)
{
    unsigned long addr = (unsigned long)s;
    cout << "When s = " << (void*)s << " -> \"" << s << '\"' << endl;
    if (addr >= section_bss_start && addr < section_bss_end) {
        cout << "then s is in static 0-initialized data\n";
    } else if (addr >= section_rodata_start && addr < section_rodata_end) {
        cout << "then s is in static read-only data\n";     
    } else if (addr >= section_data_start && addr < section_data_end){
        cout << "then s is in static read/write data\n";
    } else {
        cout << "then s is on the stack/heap\n";
    }       
}

main.cpp

// Demonstrate storage classification of various arrays of char 

#include "cstr_storage_triage.h"

static char in_bss[1];
static char const * in_rodata = "In static read-only data";
static char in_rwdata[] = "In static read/write data";  

int main()
{
    char on_stack[] = "On stack";
    cstr_storage_triage(in_bss);
    cstr_storage_triage(in_rodata);
    cstr_storage_triage(in_rwdata);
    cstr_storage_triage(on_stack);
    cstr_storage_triage("Where am I?");
    return 0;
}

Here is our makefile:

.PHONY: all clean

SRCS = main.cpp cstr_storage_triage.cpp section_bounds.cpp 
OBJS = $(SRCS:.cpp=.o)
TARG = prog
MAP_FILE = $(TARG).map

ifdef AGAIN
BSS_BOUNDS := $(shell grep -m 1 '^\.bss ' $(MAP_FILE))
BSS_START := $(word 2,$(BSS_BOUNDS))
BSS_SIZE := $(word 3,$(BSS_BOUNDS))
RODATA_BOUNDS := $(shell grep -m 1 '^\.rodata ' $(MAP_FILE))
RODATA_START := $(word 2,$(RODATA_BOUNDS))
RODATA_SIZE := $(word 3,$(RODATA_BOUNDS))
DATA_BOUNDS := $(shell grep -m 1 '^\.data ' $(MAP_FILE))
DATA_START := $(word 2,$(DATA_BOUNDS))
DATA_SIZE := $(word 3,$(DATA_BOUNDS))
CPPFLAGS += \
    -DBSS_START=$(BSS_START) \
    -DBSS_SIZE=$(BSS_SIZE) \
    -DRODATA_START=$(RODATA_START) \
    -DRODATA_SIZE=$(RODATA_SIZE) \
    -DDATA_START=$(DATA_START) \
    -DDATA_SIZE=$(DATA_SIZE)
endif

all: $(TARG)

clean:
    rm -f $(OBJS) $(MAP_FILE) $(TARG)

ifndef AGAIN
$(MAP_FILE): $(OBJS)
    g++ -o $(TARG) $(CXXFLAGS) -Wl,-Map=$@ $(OBJS) $(LDLIBS)
    touch section_bounds.cpp

$(TARG): $(MAP_FILE)
    $(MAKE) AGAIN=1
else
$(TARG): $(OBJS)
    g++ -o $@ $(CXXFLAGS) $(OBJS) $(LDLIBS)
endif

Here is what make looks like:

$ make
g++    -c -o main.o main.cpp
g++    -c -o cstr_storage_triage.o cstr_storage_triage.cpp
g++    -c -o section_bounds.o section_bounds.cpp
g++ -o prog  -Wl,-Map=prog.map main.o cstr_storage_triage.o section_bounds.o 
touch section_bounds.cpp
make AGAIN=1
make[1]: Entering directory `/home/imk/develop/SO/string_lit_only'
g++  -DBSS_START=0x00000000006020c0 -DBSS_SIZE=0x118 -DRODATA_START=0x0000000000400bf0
 -DRODATA_SIZE=0x120 -DDATA_START=0x0000000000602070 -DDATA_SIZE=0x3a
  -c -o section_bounds.o section_bounds.cpp
g++ -o prog  main.o cstr_storage_triage.o section_bounds.o

And lastly, what prog does:

$ ./prog
When s = 0x6021d1 -> ""
then s is in static 0-initialized data
When s = 0x400bf4 -> "In static read-only data"
then s is in static read-only data
When s = 0x602090 -> "In static read/write data"
then s is in static read/write data
When s = 0x7fffa1b053a0 -> "On stack"
then s is on the stack/heap
When s = 0x400c0d -> "Where am I?"
then s is in static read-only data

If it's obvious how this works, you need read no further.

The program will compile and link even before we know the addresses and sizes of its static storage sections. It would need too, wouldn't it!? In that case, the global section_* variables that ought to hold these values all get built with place-holder values.

When make is run, the recipes:

$(TARG): $(MAP_FILE)
    $(MAKE) AGAIN=1

and

$(MAP_FILE): $(OBJS)
    g++ -o $(TARG) $(CXXFLAGS) -Wl,-Map=$@ $(OBJS) $(LDLIBS)
    touch section_bounds.cpp

are operative, because AGAIN is undefined. They tell make that in order to build prog it must first build the linker map file of prog, as per the second recipe, and then re-timestamp section_bounds.cpp. After that, make is to call itself again, with AGAIN defined = 1.

Excecuting the makefile again, with AGAIN defined, make now finds that it must compute all the variables:

BSS_BOUNDS
BSS_START
BSS_SIZE
RODATA_BOUNDS
RODATA_START
RODATA_SIZE
DATA_BOUNDS
DATA_START
DATA_SIZE

For each static storage section S, it computes S_BOUNDS by grepping the linker map file for the line that reports the address and size of S. From that line, it assigns the 2nd word ( = the section address) to S_START, and the 3rd word ( = the size of the section) to S_SIZE. All the section delimiting values are then appended, via -D options to the CPPFLAGS that will automatically be passed to compilations.

Because AGAIN is defined, the operative recipe for $(TARG) is now the customary:

$(TARG): $(OBJS)
    g++ -o $@ $(CXXFLAGS) $(OBJS) $(LDLIBS)

But we touched section_bounds.cpp in the parent make; so it has to be recompiled, and therefore prog has to be relinked. This time, when section_bounds.cpp is compiled, all the section-delimiting macros:

BSS_START
BSS_SIZE
RODATA_START
RODATA_SIZE
DATA_START
DATA_SIZE

will have pre-defined values and will not assume their place-holder values.

And those predefined values will be correct because the second linkage adds no symbols to the linkage and removes none, and does not alter the size or storage class of any symbol. It just assigns different values to symbols that were present in the first linkage. Consequently, the addresses and sizes of the static storage sections will be unaltered and are now known to your program.

like image 92
Mike Kinghan Avatar answered Nov 06 '22 14:11

Mike Kinghan


Depending on what exactly you want, this may or may not work for you:

#include <cstdlib>

template <size_t N>
void foo(const char (&str)[N]) {}

template <char> struct check_literal {};

#define foo(arg) foo((check_literal<arg[0]>(),arg))    

int main()
{

    // This compiles
    foo("abc");

    // This does not
    static const char abc[] = "abc";
    foo(abc);
}

This works with g++ and clang++ in -std=c++11 mode only.

like image 22
n. 1.8e9-where's-my-share m. Avatar answered Nov 06 '22 16:11

n. 1.8e9-where's-my-share m.


You can use user-defined literals, that by definitions can only be applied to literals:

#include <iostream>

struct literal_wrapper
{
    const char* const ptr;
private:
    constexpr literal_wrapper(const char* p) : ptr(p) {}
    friend constexpr literal_wrapper operator "" _lw(const char* p, std::size_t);
};
constexpr literal_wrapper operator "" _lw(const char* p, std::size_t){ return literal_wrapper(p); }

literal_wrapper f()
{
    std::cout << "f()" << std::endl;
    return "test"_lw;
}

void foo(const literal_wrapper& lw)
{
    std::cout << "foo:" << lw.ptr << " " << static_cast<const void*>(lw.ptr) << std::endl;
}

int main()
{
    auto x1 = f(), x2 = f(), x3 = f();
    const void* p1 = x1.ptr;
    const void* p2 = x2.ptr;
    const void* p3 = x3.ptr;
    std::cout << x1.ptr << " " << p1 << " " << p2 << " " << p3 << std::endl;

    foo(x1);
    foo(x2);
    foo("test"_lw);
    foo("test2"_lw);
}
like image 38
Loghorn Avatar answered Nov 06 '22 16:11

Loghorn