Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Manipulating C-strings with multiple null characters in memory

Tags:

c

I need to search through a chunk of memory for a string of characters, but several of these strings have every character null separated, like this:
"I. .a.m. .a. .s.t.r.i.n.g"
with all of the '.'s being null characters. My problem comes from actually getting this into memory. I've tried several ways, for instance:

 char* str2; 
 str2 = (char*)malloc(sizeof(char)*40);   
 memcpy((void*)str2, "123\0567\09abc", 12);    

Will put the following into the memory that str2 points to: 123.7.9abc..
Something like
str2 = "123456789\0abcde\054321";
Will have str2 pointing to a block of memory that looks like 123456789.abcde,321 , wherein the '.' is a null character, and the ',' is an actual comma.

So clearly inserting null characters into cstrings doesn't work as easily as I thought it did, like inserting a newline character. I encountered similar difficulties trying this with the string library as well. I could do separate assignments, something like:

 char* str;    
 str = (char*)malloc(sizeof(char)*40);  
 strcpy(str, "123");  
 strcpy(str+4, "abc");  
 strcpy(str+8, "ABC");  

But that is certainly not preferable, and I believe the problem lies in my understanding of how c-style strings are stored in memory. Clearly "abc\0123" doesn't actually go into memory as 61 62 63 00 31 32 33 (in hex). How is it stored, and how can I store what I need to?

(I also apologize for not having set the code in blocks, this is my first time posting a question, and somehow "four spaced" is more difficult than I can handle apparently. Thank you, Luchian. I see more newlines were needed.)

like image 865
Fulluphigh Avatar asked Jun 13 '12 20:06

Fulluphigh


People also ask

What is the use of null character in string manipulation?

The Null character is used to represent the end of the string or end of an array or other concepts in C. The end of the character string or the NULL byte is represented by '0' or '\0' or simply NULL.

Does C automatically null terminate strings?

C strings are null-terminated. That is, they are terminated by the null character, NUL . They are not terminated by the null pointer NULL , which is a completely different kind of value with a completely different purpose. NUL is guaranteed to have the integer value zero.

Does null character occupy a byte in string?

In all modern character sets, the null character has a code point value of zero. In most encodings, this is translated to a single code unit with a zero value. For instance, in UTF-8 it is a single zero byte. However, in Modified UTF-8 the null character is encoded as two bytes: 0xC0, 0x80.

What is the null character in C strings?

'\0' is defined to be a null character. It is a character with all bits set to zero. This has nothing to do with pointers. '\0' is (like all character literals) an integer constant with the value zero.


1 Answers

If every other char contains a null, then almost certainly you actually have UTF-16 encoded strings. Process them accordingly and your problems will disappear.

Assuming you are on Windows, where UTF-16 is common, you would use wchar_t* rather than char* to hold such strings. And you would use wide char string processing functions to operate on such data. For example, use wcscpy rather than strcpy and so on.

like image 60
David Heffernan Avatar answered Nov 23 '22 13:11

David Heffernan