I tried to write my own implementation of the strchr() method. It now looks like this: <pre class="prettyprint"><code>char *mystrchr(const char *s, int c) { while (*s != (char) c) { if (!*s++) { return NULL; } } return (char *)s; } </code></pre> The last line originally was <pre class="prettyprint"><code>return s; </code></pre> But this didn't work because s is const. I found out that there needs to be this cast (char *), but I honestly don't know what I am doing there :( Can someone explain?

The practice of returning non-const pointers to const data from non-modifying functions is actually an idiom rather widely used in C language. It is not always pretty, but it is rather well established. The reationale here is simple: <code>strchr</code> by itself is a non-modifying operation. Yet we need <code>strchr</code> functionality for both constant strings and non-constant strings, which would also propagate the constness of the input to the constness of the output. Neither C not C++ provide any elegant support for this concept, meaning that in both languages you will have to write two virtually identical functions in order to avoid taking any risks with const-correctness. In C++ you wild be able to use function overloading by declaring two functions with the same name <pre class="prettyprint"><code>const char *strchr(const char *s, int c); char *strchr(char *s, int c); </code></pre> In C you have no function overloading, so in order to fully enforce const-correctness in this case you would have to provide two functions with different names, something like <pre class="prettyprint"><code>const char *strchr_c(const char *s, int c); char *strchr(char *s, int c); </code></pre> Although in some cases this might be the right thing to do, it is typically (and rightfully) considered too cumbersome and involving by C standards. You can resolve this situation in a more compact (albeit more risky) way by implementing only one function <pre class="prettyprint"><code>char *strchr(const char *s, int c); </code></pre> which returns non-const pointer into the input string (by using a cast at the exit, exactly as you did it). Note, that this approach does not violate any rules of the language, although it provides the caller with the means to violate them. By casting away the constness of the data this approach simply delegates the responsibility to observe const-correctness from the function itself to the caller. As long as the caller is aware of what's going on and remembers to "play nice", i.e. uses a const-qualified pointer to point to const data, any temporary breaches in the wall of const-correctness created by such function are repaired instantly. I see this trick as a perfectly acceptable approach to reducing unnecessary code duplication (especially in absence of function overloading). The standard library uses it. You have no reason to avoid it either, assuming you understand what you are doing. Now, as for your implementation of <code>strchr</code>, it looks weird to me from the stylistic point of view. I would use the cycle header to iterate over the full range we are operating on (the full string), and use the inner <code>if</code> to catch the early termination condition <pre class="prettyprint"><code>for (; *s != '\0'; ++s) if (*s == c) return (char *) s; return NULL; </code></pre> But things like that are always a matter of personal preference. Someone might prefer to just <pre class="prettyprint"><code>for (; *s != '\0' && *s != c; ++s) ; return *s == c ? (char *) s : NULL; </code></pre> Some might say that modifying function parameter (<code>s</code>) inside the function is a bad practice.

How does strchr implementation work

Tags:

c

pointers

constants

strchr

I tried to write my own implementation of the strchr() method.

It now looks like this:

char *mystrchr(const char *s, int c) {
    while (*s != (char) c) {
        if (!*s++) {
            return NULL;
        }
    }
    return (char *)s;
}

The last line originally was

return s;

But this didn't work because s is const. I found out that there needs to be this cast (char *), but I honestly don't know what I am doing there :( Can someone explain?

435

asked Jan 16 '13 20:01

Marc

2 Answers

I believe this is actually a flaw in the C Standard's definition of the strchr() function. (I'll be happy to be proven wrong.) (Replying to the comments, it's arguable whether it's really a flaw; IMHO it's still poor design. It can be used safely, but it's too easy to use it unsafely.)

Here's what the C standard says:

char *strchr(const char *s, int c);

The strchr function locates the first occurrence of c (converted to a char) in the string pointed to by s. The terminating null character is considered to be part of the string.

Which means that this program:

#include <stdio.h>
#include <string.h>

int main(void) {
    const char *s = "hello";
    char *p = strchr(s, 'l');
    *p = 'L';
    return 0;
}

even though it carefully defines the pointer to the string literal as a pointer to const char, has undefined behavior, since it modifies the string literal. gcc, at least, doesn't warn about this, and the program dies with a segmentation fault.

The problem is that strchr() takes a const char* argument, which means it promises not to modify the data that s points to -- but it returns a plain char*, which permits the caller to modify the same data.

Here's another example; ~~it doesn't have undefined behavior, but~~ it quietly modifies a const qualified object without any casts (which, on further thought, I believe has undefined behavior):

#include <stdio.h>
#include <string.h>

int main(void) {
    const char s[] = "hello";
    char *p = strchr(s, 'l');
    *p = 'L';
    printf("s = \"%s\"\n", s);
    return 0;
}

Which means, I think, (to answer your question) that a C implementation of strchr() has to cast its result to convert it from const char* to char*, or do something equivalent.

This is why C++, in one of the few changes it makes to the C standard library, replaces strchr() with two overloaded functions of the same name:

const char * strchr ( const char * str, int character );
      char * strchr (       char * str, int character );

Of course C can't do this.

An alternative would have been to replace strchr by two functions, one taking a const char* and returning a const char*, and another taking a char* and returning a char*. Unlike in C++, the two functions would have to have different names, perhaps strchr and strcchr.

(Historically, const was added to C after strchr() had already been defined. This was probably the only way to keep strchr() without breaking existing code.)

strchr() is not the only C standard library function that has this problem. The list of affected function (I think this list is complete but I don't guarantee it) is:

void *memchr(const void *s, int c, size_t n);
char *strchr(const char *s, int c);
char *strpbrk(const char *s1, const char *s2);
char *strrchr(const char *s, int c);
char *strstr(const char *s1, const char *s2);

(all declared in <string.h>) and:

void *bsearch(const void *key, const void *base,
    size_t nmemb, size_t size,
    int (*compar)(const void *, const void *));

(declared in <stdlib.h>). All these functions take a pointer to const data that points to the initial element of an array, and return a non-const pointer to an element of that array.

101

answered Sep 21 '22 09:09

Keith Thompson

The practice of returning non-const pointers to const data from non-modifying functions is actually an idiom rather widely used in C language. It is not always pretty, but it is rather well established.

The reationale here is simple: strchr by itself is a non-modifying operation. Yet we need strchr functionality for both constant strings and non-constant strings, which would also propagate the constness of the input to the constness of the output. Neither C not C++ provide any elegant support for this concept, meaning that in both languages you will have to write two virtually identical functions in order to avoid taking any risks with const-correctness.

In C++ you wild be able to use function overloading by declaring two functions with the same name

const char *strchr(const char *s, int c);
char *strchr(char *s, int c);

In C you have no function overloading, so in order to fully enforce const-correctness in this case you would have to provide two functions with different names, something like

const char *strchr_c(const char *s, int c);
char *strchr(char *s, int c);

Although in some cases this might be the right thing to do, it is typically (and rightfully) considered too cumbersome and involving by C standards. You can resolve this situation in a more compact (albeit more risky) way by implementing only one function

char *strchr(const char *s, int c);

which returns non-const pointer into the input string (by using a cast at the exit, exactly as you did it). Note, that this approach does not violate any rules of the language, although it provides the caller with the means to violate them. By casting away the constness of the data this approach simply delegates the responsibility to observe const-correctness from the function itself to the caller. As long as the caller is aware of what's going on and remembers to "play nice", i.e. uses a const-qualified pointer to point to const data, any temporary breaches in the wall of const-correctness created by such function are repaired instantly.

I see this trick as a perfectly acceptable approach to reducing unnecessary code duplication (especially in absence of function overloading). The standard library uses it. You have no reason to avoid it either, assuming you understand what you are doing.

Now, as for your implementation of strchr, it looks weird to me from the stylistic point of view. I would use the cycle header to iterate over the full range we are operating on (the full string), and use the inner if to catch the early termination condition

for (; *s != '\0'; ++s)
  if (*s == c)
    return (char *) s;

return NULL;

But things like that are always a matter of personal preference. Someone might prefer to just

for (; *s != '\0' && *s != c; ++s)
  ;

return *s == c ? (char *) s : NULL;

Some might say that modifying function parameter (s) inside the function is a bad practice.

answered Sep 19 '22 09:09

AnT

Related questions
                            
                                Run GCC preprocessor non-C files
                            
                                possible to do if (!boolvar) { ... in 1 asm instruction?
                            
                                Where do I find the assembly that creates a static variable in the .data section of my C program?
                            
                                How to profile in the Linux kernel or use the perf_event*.[hc] framework?
                            
                                C - Problems stepping through a struct pointer
                            
                                How to detect button press on stm32f4discover?
                            
                                OpenMP, writing to distinct array elements in parallel
                            
                                What is the need of separate address structure in sockaddr_in?
                            
                                Reversing Array in C?
                            
                                How do I compile C code to a raw os-less binary?
                            
                                Why errno can set to zero by scanf?(when enter "ctrl+D")
                            
                                Questions on libc's pointer encryption
                            
                                python '__file__' is not defined [duplicate]
                            
                                Edit variable values in ELF file?
                            
                                Typedef function pointer, for a function that returns a pointer to a function of its own type? [duplicate]
                            
                                Can't ctrl-c my SDL apps anymore
                            
                                How to catch X errors?
                            
                                Why does the named reference to an anonymous struct , idiom described below need -fms-extensions to be compiled by clang/gcc
                            
                                Command line application: How to attach a child process to xcode debugger?
                            
                                Make vim indent C preprocessor directives the same as other statements

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With