Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the right way to handle char* strings?

Tags:

c++

I have a third party library that is using char* (non-const) as placeholder for string values. What is the right and safe way to assign values to those datatypes? I have the following test benchmark that uses my own timer class to measure execution times:

#include "string.h"
#include <iostream>
#include <sj/timer_chrono.hpp>

using namespace std;

int main()
{
    sj::timer_chrono sw;

    int iterations = 1e7;

    // first method gives compiler warning:
    // conversion from string literal to 'char *' is deprecated [-Wdeprecated-writable-strings]
    cout << "creating c-strings unsafe(?) way..." << endl;
    sw.start();
    for (int i = 0; i < iterations; ++i)
    {
        char* str = "teststring";
    }   
    sw.stop();
    cout << sw.elapsed_ns() / (double)iterations << " ns" << endl;

    cout << "creating c-strings safe(?) way..." << endl;
    sw.start();
    for (int i = 0; i < iterations; ++i)
    {
        char* str = new char[strlen("teststr")];
        strcpy(str, "teststring");
    }   
    sw.stop();
    cout << sw.elapsed_ns() / (double)iterations << " ns" << endl;


    return 0;

}

Output:

creating c-strings unsafe(?) way...
1.9164 ns
creating c-strings safe(?) way...
31.7406 ns

While the "safe" way get's rid of the compiler warning it makes the code about 15-20 times slower according to this benchmark (1.9 nanoseconds per iteration vs 31.7 nanoseconds per iteration). What is the correct way and what are is so dangerous about that "deprecated" way?

like image 802
user4157482 Avatar asked May 02 '13 13:05

user4157482


People also ask

Is char * a string?

char is a primitive data type whereas String is a class in java. char represents a single character whereas String can have zero or more characters. So String is an array of chars.

What does char * mean in C?

In C, char* means a pointer to a character. Strings are an array of characters eliminated by the null character in C.

Is char * A string in C?

This last part of the definition is important: all C-strings are char arrays, but not all char arrays are c-strings. C-strings of this form are called “string literals“: const char * str = "This is a string literal.

How does a char * work?

The char data type is an integral type, meaning the underlying value is stored as an integer. Similar to how a Boolean value 0 is interpreted as false and non-zero is interpreted as true , the integer stored by a char variable are intepreted as an ASCII character .


2 Answers

The C++ standard is clear:

An ordinary string literal has type “array of n const char” (section 2.14.5.8 in C++11).

and

The effect of attempting to modify a string literal is undefined (section 2.14.5.12 in C++11).

For a string known at compile time, the safe way of obtaining a non-const char* is this

char literal[] = "teststring";

you can then safely

char* ptr = literal;

If at compile time you don't know the string but know its length you can use an array:

char str[STR_LENGTH + 1];

If you don't know the length then you will need to use dynamic allocation. Make sure you deallocate the memory when the strings are no longer needed.

This will work only if the API doesn't take ownership of the char* you pass.

If it tries to deallocate the strings internally then it should say so in the documentation and inform you on the proper way to allocate the strings. You will need to match your allocation method with the one used internally by the API.

The

char literal[] = "test";

will create a local, 5 character array with automatinc storage (meaning the variable will be destroyed when the execution leaves the scope in which the variable is declared) and initialize each character in the array with the characters 't', 'e', 's', 't' and '\0'.

You can later edit these characters: literal[2] = 'x';

If you write this:

char* str1 = "test";
char* str2 = "test";

then, depending on the compiler, str1 and str2 may be the same value (i.e., point to the same string).

("Whether all string literals are distinct (that is, are stored in nonoverlapping objects) is implementation- defined." in Section 2.14.5.12 of the C++ standard)

It may also be true that they are stored in a read-only section of memory and therefore any attempt to modify the string will result in an exception/crash.

They are also, in reality of the type const char* so this line:

char* str = "test";

actually casts away the const-ness on the string, which is why the compiler will issue the warning.

like image 141
Andrei Avatar answered Oct 07 '22 18:10

Andrei


The unsafe way is the way to go for all strings that are known at compile-time.

Your "safe" way leaks memory and is rather horrific.

Normally you'd have a sane C API which accepts const char *, so you could use a proper safe way in C++, i.e. std::string and its c_str() method.

If your C API assumes ownership of the string, your "safe way" has another flaw: you can't mix new[] and free(), passing memory allocated using the C++ new[] operator to a C API which expects to call free() on it is not allowed. If the C API doesn't want to call free() later on the string, it should be fine to use new[] on the C++ side.

Also, this is a strange mixture of C++ and C.

like image 27
unwind Avatar answered Oct 07 '22 19:10

unwind