Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

C strange array behaviour

After learning that both strncmp is not what it seems to be and strlcpy not being available on my operating system (Linux), I figured I could try and write it myself.

I found a quote from Ulrich Drepper, the libc maintainer, who posted an alternative to strlcpy using mempcpy. I don't have mempcpy either, but it's behaviour was easy to replicate. First of, this is the testcase I have

#include <stdio.h>
#include <string.h>

#define BSIZE 10

void insp(const char* s, int n)
{
   int i;

   for (i = 0; i < n; i++)
      printf("%c  ", s[i]);

   printf("\n");

   for (i = 0; i < n; i++)
      printf("%02X ", s[i]);

   printf("\n");

   return;
}

int copy_string(char *dest, const char *src, int n)
{
   int r = strlen(memcpy(dest, src, n-1));
   dest[r] = 0;

   return r;
}

int main()
{
   char b[BSIZE];
   memset(b, 0, BSIZE);

   printf("Buffer size is %d", BSIZE);

   insp(b, BSIZE);

   printf("\nFirst copy:\n");
   copy_string(b, "First", BSIZE);
   insp(b, BSIZE);
   printf("b = '%s'\n", b);

   printf("\nSecond copy:\n");
   copy_string(b, "Second", BSIZE);
   insp(b, BSIZE);

   printf("b = '%s'\n", b);

   return 0;
}

And this is its result:

Buffer size is 10                    
00 00 00 00 00 00 00 00 00 00 

First copy:
F  i  r  s  t     b     =    
46 69 72 73 74 00 62 20 3D 00 
b = 'First'

Second copy:
S  e  c  o  n  d          
53 65 63 6F 6E 64 00 00 01 00 
b = 'Second'

You can see in the internal representation (the lines insp() created) that there's some noise mixed in, like the printf() format string in the inspection after the first copy, and a foreign 0x01 in the second copy.

The strings are copied intact and it correctly handles too long source strings (let's ignore the possible issue with passing 0 as length to copy_string for now, I'll fix that later).

But why are there foreign array contents (from the format string) inside my destination? It's as if the destination was actually RESIZED to match the new length.

like image 426
LukeN Avatar asked May 14 '10 18:05

LukeN


3 Answers

The end of the string is marked by a \0 the memory after that can be anything, unless your OS deliberately blanks it then it's just whatever random junk was left there.

Note in this case the 'problem' isn't in the copy_string , you are exactly copying 10chars - but the memory after 'first' in your main code is just random.

like image 120
Martin Beckett Avatar answered Sep 28 '22 00:09

Martin Beckett


Because you are not stopping at the source size, you are stopping at the destiny size, which happens to be bigger than source, so you are copying the source string plus a bit of garbage past it.

You can easily see that you are copying your source string, with its null terminator. But since you are memcopying 10 bytes and both strings "First" and "Second" are shorter than 10 bytes, you are also copying the extra bytes past them.

like image 34
Francisco Soto Avatar answered Sep 27 '22 22:09

Francisco Soto


The use of memcpy(dest, src, n-1) invokes undefined behavior if dest and src are not both at least n-1 in length.

For example, First\0 is six characters in length, but you read n-1 (9) characters from it; the contents of the memory past the end of the string literal are undefined, as is the behavior of your program when you read that memory.

like image 30
James McNellis Avatar answered Sep 27 '22 22:09

James McNellis