Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Data transfer across big and small Endian machines

Tags:

c

char

endianness

Assuming there are 3 strings like "cat", "bat, "rat". I need to combine them into one string and send them to another computer of different Endianess through socket programming.

So, if the other machine is big endian, I'll pack the strings as:
memcpy (base, "cat", 3)
memcpy (base+3, "bat", 3)
memcpy (base+6, "rat", 3)

If the other machine is little endian, I'll pack the strings as:
memcpy (base, "rat", 3)
memcpy (base+3, "bat", 3)
memcpy (base+6, "cat", 3)

Is my method correct?

Secondly, will they be actually received on the other machine in a reverse order? I mean when the other machine's software will start extracting strings from the "base" string, will it actually be in the reverse order - like rat, bat, cat?

like image 868
Aquarius_Girl Avatar asked May 24 '13 07:05

Aquarius_Girl


2 Answers

When you dump a memory buffer with two bytes 0x02 0x00 in it into a socket, the 0x02 is sent first, then 0x00 is sent. When a receiver reads from a socket, 0x02 will arrive first, and will be stored at the start of the buffer. 0x00 arrives second, and is stored right after 0x02. So, after you did send(sock, &buffer, 2), and receiver did recv(sock, &buffer, 2), the contents of your and the receiver's buffers are the same — on byte level.

But now a problem of interpretation comes in. Yeah, you have two bytes, 0x02 0x00, in memory, but so what? What do they mean? Oh, they mean an integer number from 0..65535 range, you say? But there are two ways to store such a number. The first is to store more significant bits in the first byte, so 512 = 10'00000000 is stored as 0x02 0x00. The second is to store less significant bits in the first byte, so 512 is stored as 0x00 0x02, and 0x02 0x00 is a way to store 2, not 512.

So, the imprtant lesson is: when you send some data, you have to be sure that the receiver will interpret them just as you do. Integers that span multiple bytes can be interpreted differently, so you have to somehow make out an agreement on exactly one way to send them.

Now, back to character strings. A string in C is a sequence of bytes both on conceptual and representational level—that's not so with integers! When you work with integers, I bet you don't much care that they are represented as a bunch of bytes, and the actual representation is not specified by C. Your compiler may store integers in whatever fashion it desires. A string, however, is a sequence of bytes in a certain order, and that's fixed in C. You have the first character, the second, and so on. So there is just one way to interpret 02 00 as a string: it's a 2-character string with the first character 0x02, and the second character 0x00. No confusion is possible.

P.S. Of course, when you start thinking of strings not as a sequence of bytes, but a sequence of characters, the problem with intepretation arises again: which byte means what character? But that's another story.

EDIT: In your comment to another answer, you said that you "have to make a provision for the other machine to know that what I have sent is actually an integer not a string". Yes. That's the main problem with exchanging data with other machines: what you send and what they see is just a sequence of bytes. Now all participants of this exchange must interpret this sequence of bytes in the same way, or they get confused. If you mean to send number 512 and do this by sending bytes 0x02 0x00, the other side better understand that by 0x02 0x00 you mean 512 and not 2, or, say START OF TEXT. Or that when you send 0x31 0x32 0x33 0x00 you mean "123", and not 825373440, or 31323300.

Still, the answer to the original question: "if I send "catbatrat", what will sender see?" is: "the sender will see "catbatrat", independent on endianness".

like image 68
Joker_vD Avatar answered Nov 19 '22 20:11

Joker_vD


Endianness makes no difference at the byte level, so you don't need to worry about it for strings of 8 bit characters or anything else where the data is just a stream of bytes.

For anything where the elements are larger than a byte, e.g. 2 or more byte integers, floating point, etc, then you do need to worry about endianness, or alternatively use a text-based format for data interchange.

like image 3
Paul R Avatar answered Nov 19 '22 19:11

Paul R