Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Returning a C string from a function

Tags:

c

Your function signature needs to be:

const char * myFunction()
{
    return "my String";
}

Background:

It's so fundamental to C & C++, but little more discussion should be in order.

In C (& C++ for that matter), a string is just an array of bytes terminated with a zero byte - hence the term "string-zero" is used to represent this particular flavour of string. There are other kinds of strings, but in C (& C++), this flavour is inherently understood by the language itself. Other languages (Java, Pascal, etc.) use different methodologies to understand "my string".

If you ever use the Windows API (which is in C++), you'll see quite regularly function parameters like: "LPCSTR lpszName". The 'sz' part represents this notion of 'string-zero': an array of bytes with a null (/zero) terminator.

Clarification:

For the sake of this 'intro', I use the word 'bytes' and 'characters' interchangeably, because it's easier to learn this way. Be aware that there are other methods (wide-characters, and multi-byte character systems (mbcs)) that are used to cope with international characters. UTF-8 is an example of an mbcs. For the sake of intro, I quietly 'skip over' all of this.

Memory:

This means that a string like "my string" actually uses 9+1 (=10!) bytes. This is important to know when you finally get around to allocating strings dynamically.

So, without this 'terminating zero', you don't have a string. You have an array of characters (also called a buffer) hanging around in memory.

Longevity of data:

The use of the function this way:

const char * myFunction()
{
    return "my String";
}

int main()
{
    const char* szSomeString = myFunction(); // Fraught with problems
    printf("%s", szSomeString);
}

... will generally land you with random unhandled-exceptions/segment faults and the like, especially 'down the road'.

In short, although my answer is correct - 9 times out of 10 you'll end up with a program that crashes if you use it that way, especially if you think it's 'good practice' to do it that way. In short: It's generally not.

For example, imagine some time in the future, the string now needs to be manipulated in some way. Generally, a coder will 'take the easy path' and (try to) write code like this:

const char * myFunction(const char* name)
{
    char szBuffer[255];
    snprintf(szBuffer, sizeof(szBuffer), "Hi %s", name);
    return szBuffer;
}

That is, your program will crash because the compiler (may/may not) have released the memory used by szBuffer by the time the printf() in main() is called. (Your compiler should also warn you of such problems beforehand.)

There are two ways to return strings that won't barf so readily.

  1. returning buffers (static or dynamically allocated) that live for a while. In C++ use 'helper classes' (for example, std::string) to handle the longevity of data (which requires changing the function's return value), or
  2. pass a buffer to the function that gets filled in with information.

Note that it is impossible to use strings without using pointers in C. As I have shown, they are synonymous. Even in C++ with template classes, there are always buffers (that is, pointers) being used in the background.

So, to better answer the (now modified question). (There are sure to be a variety of 'other answers' that can be provided.)

Safer Answers:

Example 1, using statically allocated strings:

const char* calculateMonth(int month)
{
    static char* months[] = {"Jan", "Feb", "Mar" .... };
    static char badFood[] = "Unknown";
    if (month < 1 || month > 12)
        return badFood; // Choose whatever is appropriate for bad input. Crashing is never appropriate however.
    else
        return months[month-1];
}

int main()
{
    printf("%s", calculateMonth(2)); // Prints "Feb"
}

What the static does here (many programmers do not like this type of 'allocation') is that the strings get put into the data segment of the program. That is, it's permanently allocated.

If you move over to C++ you'll use similar strategies:

class Foo
{
    char _someData[12];
public:
    const char* someFunction() const
    { // The final 'const' is to let the compiler know that nothing is changed in the class when this function is called.
        return _someData;
    }
}

... but it's probably easier to use helper classes, such as std::string, if you're writing the code for your own use (and not part of a library to be shared with others).

Example 2, using caller-defined buffers:

This is the more 'foolproof' way of passing strings around. The data returned isn't subject to manipulation by the calling party. That is, example 1 can easily be abused by a calling party and expose you to application faults. This way, it's much safer (albeit uses more lines of code):

void calculateMonth(int month, char* pszMonth, int buffersize)
{
    const char* months[] = {"Jan", "Feb", "Mar" .... }; // Allocated dynamically during the function call. (Can be inefficient with a bad compiler)
    if (!pszMonth || buffersize<1)
        return; // Bad input. Let junk deal with junk data.
    if (month<1 || month>12)
    {
        *pszMonth = '\0'; // Return an 'empty' string
        // OR: strncpy(pszMonth, "Bad Month", buffersize-1);
    }
    else
    {
        strncpy(pszMonth, months[month-1], buffersize-1);
    }
    pszMonth[buffersize-1] = '\0'; // Ensure a valid terminating zero! Many people forget this!
}

int main()
{
    char month[16]; // 16 bytes allocated here on the stack.
    calculateMonth(3, month, sizeof(month));
    printf("%s", month); // Prints "Mar"
}

There are lots of reasons why the second method is better, particularly if you're writing a library to be used by others (you don't need to lock into a particular allocation/deallocation scheme, third parties can't break your code, and you don't need to link to a specific memory management library), but like all code, it's up to you on what you like best. For that reason, most people opt for example 1 until they've been burnt so many times that they refuse to write it that way anymore ;)

Disclaimer:

I retired several years back and my C is a bit rusty now. This demo code should all compile properly with C (it is OK for any C++ compiler though).


A C string is defined as a pointer to an array of characters.

If you cannot have pointers, by definition you cannot have strings.


Note this new function:

const char* myFunction()
{
    static char array[] = "my string";
    return array;
}

I defined "array" as static. Otherwise when the function ends, the variable (and the pointer you are returning) gets out of scope. Since that memory is allocated on the stack, and it will get corrupted. The downside of this implementation is that the code is not reentrant and not threadsafe.

Another alternative would be to use malloc to allocate the string in the heap, and then free on the correct locations of your code. This code will be reentrant and threadsafe.

As noted in the comment, this is a very bad practice, since an attacker can then inject code to your application (he/she needs to open the code using GDB, then make a breakpoint and modify the value of a returned variable to overflow and fun just gets started).

It is much more recommended to let the caller handle about memory allocations. See this new example:

char* myFunction(char* output_str, size_t max_len)
{
   const char *str = "my string";
   size_t l = strlen(str);
   if (l+1 > max_len) {
      return NULL;
   }
   strcpy(str, str, l);
   return input;
}

Note that the only content which can be modified is the one that the user. Another side effect - this code is now threadsafe, at least from the library point of view. The programmer calling this method should verify that the memory section used is threadsafe.


Your problem is with the return type of the function - it must be:

char *myFunction()

...and then your original formulation will work.

Note that you cannot have C strings without pointers being involved, somewhere along the line.

Also: Turn up your compiler warnings. It should have warned you about that return line converting a char * to char without an explicit cast.


Based on your newly-added backstory with the question, why not just return an integer from 1 to 12 for the month, and let the main() function use a switch statement or if-else ladder to decide what to print? It's certainly not the best way to go - char* would be - but in the context of a class like this I imagine it's probably the most elegant.


Your function return type is a single character (char). You should return a pointer to the first element of the character array. If you can't use pointers, then you are screwed. :(


You can create the array in the caller, which is the main function, and pass the array to the callee which is your myFunction(). Thus myFunction can fill the string into the array. However, you need to declare myFunction() as

char* myFunction(char * buf, int buf_len){
  strncpy(buf, "my string", buf_len);
  return buf;
}

And in main, myFunction should be called in this way:

char array[51];
memset(array, 0, 51); /* All bytes are set to '\0' */
printf("%s", myFunction(array, 50)); /* The buf_len argument  is 50, not 51. This is to make sure the string in buf is always null-terminated (array[50] is always '\0') */

However, a pointer is still used.