I am doing some socket programming in C, and trying to wrestle with byte order problems. My request (send) is fine but when I receive data my bytes are all out of order. I start with something like this:
char * aResponse= (char *)malloc(512);
int total = recv(sock, aResponse, 511, 0);
When dealing with this response, each 16bit word seems to have it's bytes reversed (I'm using UDP). I tried to fix that by doing something like this:
unsigned short * _netOrder= (unsigned short *)aResponse;
unsigned short * newhostOrder= (unsigned short *)malloc(total);
for (i = 0; i < total; ++i)
{
newhostOrder[i] = ntohs(_netOrder[i]);
}
This works ok when I am treating the data as a short, however if I cast the pointer to a char again the bytes are reversed. What am I doing wrong?
The TCP/IP standard network byte order is big-endian. In order to participate in a TCP/IP network, little-endian systems usually bear the burden of conversion to network byte order.
Given this explanation, it's clear that endianness doesn't matter with C-style strings. Endianness does matter when you use a type cast that depends on a certain endian being in use.
Byte ordering doesn't matter. Byte ordering for Unicode strings depends on the type of encoding used. If encoding is UTF-8, ordering doesn't matter since encoding is a sequence of single bytes. If encoding is UTF-16, then byte ordering matters.
There are two common host byte order methods: Little-endian byte ordering places the least significant byte first. This method is used in Intel microprocessors, for example. Big-endian byte ordering places the most significant byte first.
Ok, there seems to be problems with what you are doing on two different levels. Part of the confusion here seems to stem for your use of pointers, what type of objects they point to, and then the interpretation of the encoding of the values in the memory pointed to by the pointer(s).
The encoding of multi-byte entities in memory is what is referred to as endianess. The two common encodings are referred to as Little Endian (LE) and Big Endian (BE). With LE, a 16-bit quantity like a short is encoded least significant byte (LSB) first. Under BE, the most significant byte (MSB) is encoded first.
By convention, network protocols normally encode things into what we call "network byte order" (NBO) which also happens to be the same as BE. If you are sending and receiving memory buffers on big endian platforms, then you will not run into conversion problems. However, your code would then be platform dependent on the BE convention. If you want to write portable code that works correctly on both LE and BE platforms, you should not assume the platform's endianess.
Achieving endian portability is the purpose of routines like ntohs(), ntohl(), htons(), and htonl(). These functions/macros are defined on a given platform to do the necessary conversions at the sending and receiving ends:
Understand that your comment about accessing the memory when cast back to characters has no affect on the actual order of entities in memory. That is, if you access the buffer as a series of bytes, you will see the bytes in whatever order they were actually encoded into memory as, whether you have a BE or LE machine. So if you are looking at a NBO encoded buffer after receive, the MSB is going to be first - always. If you look at the output buffer after your have converted back to host order, if you have BE machine, the byte order will be unchanged. Conversely, on a LE machine, the bytes will all now be reversed in the converted buffer.
Finally, in your conversion loop, the variable total
refers to bytes. However, you are accessing the buffer as shorts
. Your loop guard should not be total
, but should be:
total / sizeof( unsigned short )
to account for the double byte nature of each short
.
This works ok when I'm treating the data as a short, however if I cast the pointer to a char again the bytes are reversed.
That's what I'd expect.
What am I doing wrong?
You have to know what the sender sent: know whether the data is bytes (which don't need reversing), or shorts or longs (which do).
Google for tutorials associated with the ntohs
, htons
, and htons
APIs.
It's not clear what aResponse
represents (string of characters? struct?). Endianness is relevant only for numerical values, not char
s. You also need to make sure that at the sender's side, all numerical values are converted from host to network byte-order (hton*
).
Apart from your original question (which I think was already answered), you should have a look at your malloc statement. malloc allocates bytes and an unsigned short is most likely to be two bytes.
Your statement should look like:
unsigned short *ptr = (unsigned short*) malloc(total * sizeof(unsigned short));
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With