I need to search through a chunk of memory for a string of characters, but several of these strings have every character null separated, like this:
"I. .a.m. .a. .s.t.r.i.n.g"
with all of the '.'s being null characters. My problem comes from actually getting this into memory. I've tried several ways, for instance:
char* str2;
str2 = (char*)malloc(sizeof(char)*40);
memcpy((void*)str2, "123\0567\09abc", 12);
Will put the following into the memory that str2 points to: 123.7.9abc..
Something likestr2 = "123456789\0abcde\054321";
Will have str2 pointing to a block of memory that looks like 123456789.abcde,321
, wherein the '.' is a null character, and the ',' is an actual comma.
So clearly inserting null characters into cstrings doesn't work as easily as I thought it did, like inserting a newline character. I encountered similar difficulties trying this with the string library as well. I could do separate assignments, something like:
char* str;
str = (char*)malloc(sizeof(char)*40);
strcpy(str, "123");
strcpy(str+4, "abc");
strcpy(str+8, "ABC");
But that is certainly not preferable, and I believe the problem lies in my understanding of how c-style strings are stored in memory. Clearly "abc\0123" doesn't actually go into memory as 61 62 63 00 31 32 33
(in hex). How is it stored, and how can I store what I need to?
(I also apologize for not having set the code in blocks, this is my first time posting a question, and somehow "four spaced" is more difficult than I can handle apparently. Thank you, Luchian. I see more newlines were needed.)
The Null character is used to represent the end of the string or end of an array or other concepts in C. The end of the character string or the NULL byte is represented by '0' or '\0' or simply NULL.
C strings are null-terminated. That is, they are terminated by the null character, NUL . They are not terminated by the null pointer NULL , which is a completely different kind of value with a completely different purpose. NUL is guaranteed to have the integer value zero.
In all modern character sets, the null character has a code point value of zero. In most encodings, this is translated to a single code unit with a zero value. For instance, in UTF-8 it is a single zero byte. However, in Modified UTF-8 the null character is encoded as two bytes: 0xC0, 0x80.
'\0' is defined to be a null character. It is a character with all bits set to zero. This has nothing to do with pointers. '\0' is (like all character literals) an integer constant with the value zero.
If every other char
contains a null, then almost certainly you actually have UTF-16 encoded strings. Process them accordingly and your problems will disappear.
Assuming you are on Windows, where UTF-16 is common, you would use wchar_t*
rather than char*
to hold such strings. And you would use wide char string processing functions to operate on such data. For example, use wcscpy
rather than strcpy
and so on.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With