Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to properly add hex escapes into a string-literal?

Tags:

c

c99

When you have string in C, you can add direct hex code inside.

char str[] = "abcde"; // 'a', 'b', 'c', 'd', 'e', 0x00
char str2[] = "abc\x12\x34"; // 'a', 'b', 'c', 0x12, 0x34, 0x00

Both examples have 6 bytes in memory. Now the problem exists if you want to add value [a-fA-F0-9] after hex entry.

//I want: 'a', 'b', 'c', 0x12, 'e', 0x00
//Error, hex is too big because last e is treated as part of hex thus becoming 0x12e
char problem[] = "abc\x12e";

Possible solution is to replace after definition.

//This will work, bad idea
char solution[6] = "abcde";
solution[3] = 0x12;

This can work, but it will fail, if you put it as const.

//This will not work
const char solution[6] = "abcde";
solution[3] = 0x12; //Compilation error!

How to properly insert e after \x12 without triggering error?


Why I'm asking? When you want to build UTF-8 string as constant, you have to use hex values of character if it is larger than ASCII table can hold.

like image 277
tilz0R Avatar asked Aug 10 '17 11:08

tilz0R


People also ask

How do you escape a string literal?

String literal syntaxUse the escape sequence \n to represent a new-line character as part of the string. Use the escape sequence \\ to represent a backslash character as part of the string. You can represent a single quotation mark symbol either by itself or with the escape sequence \' .

How do you escape hex?

A hexadecimal escape sequence is a backslash followed by the letter 'x' followed by two hexadecimal digits (0-9a-fA-F). It matches a character in the target sequence with the value specified by the two digits. For example, “\x41” matches the target sequence “A” when the ASCII character encoding is used.


3 Answers

Use 3 octal digits:

char problem[] = "abc\022e";

or split your string:

char problem[] = "abc\x12" "e";

Why these work:

  • Unlike hex escapes, standard defines 3 digits as maximum amount for octal escape.

    6.4.4.4 Character constants

    ...

    octal-escape-sequence:
        \ octal-digit
        \ octal-digit octal-digit
        \ octal-digit octal-digit octal-digit
    

    ...

    hexadecimal-escape-sequence:
        \x hexadecimal-digit
        hexadecimal-escape-sequence hexadecimal-digit
    
  • String literal concatenation is defined as a later translation phase than literal escape character conversion.

    5.1.1.2 Translation phases

    ...

    1. Each source character set member and escape sequence in character constants and string literals is converted to the corresponding member of the execution character set; if there is no corresponding member, it is converted to an implementation- defined member other than the null (wide) character. 8)

    2. Adjacent string literal tokens are concatenated.

like image 126
user694733 Avatar answered Oct 16 '22 12:10

user694733


Since string literals are concateneated early on in the compilation process, but after the escaped-character conversion, you can just use:

char problem[] = "abc\x12" "e";

though you may prefer full separation for readability:

char problem[] = "abc" "\x12" "e";

For the language lawyers amongst us, this is covered in C11 5.1.1.2 Translation phases (my emphasis):

  1. Each source character set member and escape sequence in character constants and string literals is converted to the corresponding member of the execution character set; if there is no corresponding member, it is converted to an implementation-defined member other than the null (wide) character.

  2. Adjacent string literal tokens are concatenated.

like image 27
paxdiablo Avatar answered Oct 16 '22 13:10

paxdiablo


Why I'm asking? When you want to build UTF-8 string as constant, you have to use hex values of character is larger than ASCII table can hold.

Well, no. You don't have to. As of C11, you can prefix your string constant with u8, which tells the compiler that the character literal is in UTF-8.

char solution[] = u8"no need to use hex-codes á駵";

(Same thing is supported by C++11 as well, by the way)

like image 5
Damon Avatar answered Oct 16 '22 13:10

Damon