Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What do single quotes do in C++ when used on multiple characters?

Tags:

c++

quotes

People also ask

What do single quotes do in C?

In C and C++ the single quote is used to identify the single character, and double quotes are used for string literals. A string literal “x” is a string, it is containing character 'x' and a null terminator '\0'. So “x” is two-character array in this case.

What is the purpose of a single quotation mark?

Single quotation marks are used to indicate quotations inside of other quotations. “Jessie said, 'Goodbye,'” Ben said. This is Ben talking, so his words go in quotation marks. But because we're quoting Ben quoting someone else, Jessie, we use single quotation marks to indicate the quote within the quote.

What is the difference between single quotes and double quotes in shell script?

Single quotes won't interpolate anything, but double quotes will. For example: variables, backticks, certain \ escapes, etc. Enclosing characters in single quotes ( ' ) preserves the literal value of each character within the quotes. A single quote may not occur between single quotes, even when preceded by a backslash.


It's a multi-character literal. 1952805748 is 0x74657374, which decomposes as

0x74 -> 't'
0x65 -> 'e'
0x73 -> 's'
0x74 -> 't'

Edit:

C++ standard, §2.14.3/1 - Character literals

(...) An ordinary character literal that contains more than one c-char is a multicharacter literal . A multicharacter literal has type int and implementation-defined value.


No, it's not an address. It's the so-called multibyte character.

Typically, it's the ASCII values of the four characters combined.

't' == 0x74; 'e' == 0x65; 's' == 0x73; 't' == 0x74; 

So 0x74657374 is 1952805748.

But it can also be 0x74736574 on some other compiler. The C and C++ standards both say the value of multibyte characters is implementation defined. So generally its use is strongly discouraged.


An ordinary character literal that contains more than one c-char is a multicharacter literal. A multicharacter literal has type int and implementation-defined value.

Implementation defined behavior is required to be documented by the implementation. for example in gcc you can find it here

The compiler values a multi-character character constant a character at a time, shifting the previous value left by the number of bits per target character, and then or-ing in the bit-pattern of the new character truncated to the width of a target character. The final bit-pattern is given type int, and is therefore signed, regardless of whether single characters are signed or not.

Check the explanation in this page for more details


They're really just ints. They're used extensively in the Core Audio API enum's for example, in the CoreAudioTypes.h header file,

enum
{
    kAudioFormatLinearPCM               = 'lpcm',
    kAudioFormatAC3                     = 'ac-3',
    kAudioFormat60958AC3                = 'cac3',
    kAudioFormatAppleIMA4               = 'ima4',
    kAudioFormatMPEG4AAC                = 'aac ',
    kAudioFormatMPEG4CELP               = 'celp',
} ;

There's a lot of chatter about this not being "platform independent", but when you're using an api that's made for a specific platform, who cares about portability. Checking for equality on the same platform will never fail. These enum'd values are easier to read and they actually contain their identity in their value, which is pretty nice.

What I've tried to do below is wrap a multibyte character literal up so it can be printed (on Mac this works). The strange thing is, if you don't use up all 4 characters, the result becomes wrong below..

#include <stdio.h>

#define MASK(x,BYTEX) ((x&(0xff<<8*BYTEX))>>(8*BYTEX))

struct Multibyte
{
  union{
    int val ;
    char vals[4];
  };

  Multibyte() : val(0) { }
  Multibyte( int in )
  {
    vals[0] = MASK(in,3);
    vals[1] = MASK(in,2);
    vals[2] = MASK(in,1);
    vals[3] = MASK(in,0);
  }
  char operator[]( int i ) {
    return val >> (3-i)*8 ; // works on mac
    //return val>>i*8 ; // might work on other systems
  }

  void println()
  {
    for( int i = 0 ; i < 4 ; i++ )
      putc( vals[i], stdout ) ;
    puts( "" ) ;
  }
} ;

int main(int argc, const char * argv[])
{
  Multibyte( 'abcd' ).println() ;  
  Multibyte( 'x097' ).println() ;
  Multibyte( '\"\\\'\'' ).println() ;
  Multibyte( '/*|' ).println() ;
  Multibyte( 'd' ).println() ;

  return 0;
}

This kind of feature is really good when you are building parsers. Consider this:

byte* buffer = ...;
if(*(int*)buffer == 'GET ')
  invoke_get_method(buffer+4);

This code will likely only work on specific endianess and might break across different compilers