Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ASCII strings and endianness

An intern who works with me showed me an exam he had taken in computer science about endianness issues. There was a question that showed an ASCII string "My-Pizza", and the student had to show how that string would be represented in memory on a little endian computer. Of course, this sounds like a trick question because ASCII strings are not affected by endian issues.

But shockingly, the intern claims his professor insists that the string would be represented as:

P-yM azzi 

I know this can't be right. There is no way an ASCII string would be represented like that on any machine. But apparently, the professor is insisting on this. So, I wrote up a small C program and told the intern to give it to his professor.

#include <string.h> #include <stdio.h>  int main() {     const char* s = "My-Pizza";     size_t length = strlen(s);     for (const char* it = s; it < s + length; ++it) {         printf("%p : %c\n", it, *it);     } } 

This clearly demonstrates that the string is stored as "My-Pizza" in memory. A day later, the intern gets back to me and tells me the professor is now claiming that C is automagically converting the addresses to display the string in proper order.

I told him his professor is insane, and this is clearly wrong. But just to check my own sanity here, I decided to post this on stackoverflow so I could get others to confirm what I'm saying.

So, I ask : who is right here?

like image 730
Charles Salvia Avatar asked Oct 14 '09 18:10

Charles Salvia


People also ask

Are strings affected by endianness?

There was a question that showed an ASCII string "My-Pizza", and the student had to show how that string would be represented in memory on a little endian computer. Of course, this sounds like a trick question because ASCII strings are not affected by endian issues.

What determines the endianness?

Broadly speaking, the endianness in use is determined by the CPU. Because there are a number of options, it is unsurprising that different semiconductor vendors have chosen different endianness for their CPUs.

What is meant by endianness?

Endianness is a term that describes the order in which a sequence of bytes is stored in computer memory. Endianness can be either big or small, with the adjectives referring to which value is stored first.

What is endianness of the CPU?

Endianness is the term used to describe the byte order of data. Big-endian means data is being sent or stored with the Most Significant Byte (MSB) first. Little-endian means data is sent with the Least Significant Byte (LSB) first. A CPU always has an endianness, even for 8 bit CPUs.

Why are character strings represented as endian?

This, of course, has nothing to do with the string itself. And to say that the string itself is represented that way on a little-endian machine is utter nonsense. Show activity on this post. Endianness defines the order of bytes within multi-byte values. Character strings are arrays of single-byte values.

What is ASCII?

It's a 7-bit character code where every single bit represents a unique character. On this webpage you will find 8 bits, 256 characters, ASCII table according to Windows-1252 (code page 1252) which is a superset of ISO 8859-1 in terms of printable characters.

What is the extended ASCII table?

ASCII Code- The extended ASCII table ASCII, stands for American Standard Code for Information Interchange. It's a 7-bit character code where every single bit represents a unique character.

What is endianness in C++?

Endianness defines the order of bytes within multi-byte values. Character strings are arrays of single-byte values. So each value (character in the string) is the same on both little-endian and big-endian architectures, and endianness does not affect the order of values in a structure.


2 Answers

Without a doubt, you are correct.

ANSI C standard 6.1.4 specifies that string literals are stored in memory by "concatenating" the characters in the literal.

ANSI standard 6.3.6 also specifies the effect of addition on a pointer value:

When an expression that has integral type is added to or subtracted from a pointer, the result has the type of the pointer operand. If the pointer operand points to an element of an array object, and the array is large enough, the result points to an element offset from the original element such that the difference of the subscripts of the resulting and original array elements equals the integral expression.

If the idea attributed to this person were correct, then the compiler would also have to monkey around with integer math when the integers are used as array indices. Many other fallacies would also result which are left to the imagination.

The person may be confused, because (unlike a string initializer), multi-byte chacter constants such as 'ABCD' are stored in endian order.

There are many reasons a person might be confused about this. As others have suggested here, he may be misreading what he sees in a debugger window, where the contents have been byte-swapped for readability of int values.

like image 125
Heath Hunnicutt Avatar answered Oct 08 '22 22:10

Heath Hunnicutt


The professor is confused. In order to see something like 'P-yM azzi' you need to take some memory inspection tool that displays memory in '4-byte integer' mode and at the same time gives you a "character interpretation" of each integer in higher-order byte to lower-order byte mode.

This, of course, has nothing to do with the string itself. And to say that the string itself is represented that way on a little-endian machine is utter nonsense.

like image 42
AnT Avatar answered Oct 08 '22 20:10

AnT