Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Byte order with a large array of characters in C

I am doing some socket programming in C, and trying to wrestle with byte order problems. My request (send) is fine but when I receive data my bytes are all out of order. I start with something like this:

char * aResponse= (char *)malloc(512);
int total = recv(sock, aResponse, 511, 0);

When dealing with this response, each 16bit word seems to have it's bytes reversed (I'm using UDP). I tried to fix that by doing something like this:

    unsigned short * _netOrder= (unsigned short *)aResponse;
    unsigned short * newhostOrder= (unsigned short *)malloc(total);
    for (i = 0; i < total; ++i)
    {
         newhostOrder[i] = ntohs(_netOrder[i]);
    }

This works ok when I am treating the data as a short, however if I cast the pointer to a char again the bytes are reversed. What am I doing wrong?

like image 636
ChrisDiRulli Avatar asked Feb 08 '09 17:02

ChrisDiRulli


People also ask

Is network byte order big or little?

The TCP/IP standard network byte order is big-endian. In order to participate in a TCP/IP network, little-endian systems usually bear the burden of conversion to network byte order.

Does Endianness matter in C?

Given this explanation, it's clear that endianness doesn't matter with C-style strings. Endianness does matter when you use a type cast that depends on a certain endian being in use.

Does byte order matter for strings?

Byte ordering doesn't matter. Byte ordering for Unicode strings depends on the type of encoding used. If encoding is UTF-8, ordering doesn't matter since encoding is a sequence of single bytes. If encoding is UTF-16, then byte ordering matters.

Is host byte order Little-Endian?

There are two common host byte order methods: Little-endian byte ordering places the least significant byte first. This method is used in Intel microprocessors, for example. Big-endian byte ordering places the most significant byte first.


4 Answers

Ok, there seems to be problems with what you are doing on two different levels. Part of the confusion here seems to stem for your use of pointers, what type of objects they point to, and then the interpretation of the encoding of the values in the memory pointed to by the pointer(s).

The encoding of multi-byte entities in memory is what is referred to as endianess. The two common encodings are referred to as Little Endian (LE) and Big Endian (BE). With LE, a 16-bit quantity like a short is encoded least significant byte (LSB) first. Under BE, the most significant byte (MSB) is encoded first.

By convention, network protocols normally encode things into what we call "network byte order" (NBO) which also happens to be the same as BE. If you are sending and receiving memory buffers on big endian platforms, then you will not run into conversion problems. However, your code would then be platform dependent on the BE convention. If you want to write portable code that works correctly on both LE and BE platforms, you should not assume the platform's endianess.

Achieving endian portability is the purpose of routines like ntohs(), ntohl(), htons(), and htonl(). These functions/macros are defined on a given platform to do the necessary conversions at the sending and receiving ends:

  • htons() - Convert short value from host order to network order (for sending)
  • htonl() - Convert long value from host order to network order (for sending)
  • ntohs() - Convert short value from network order to host order (after receive)
  • ntohl() - Convert long value from network order to host order (after receive)

Understand that your comment about accessing the memory when cast back to characters has no affect on the actual order of entities in memory. That is, if you access the buffer as a series of bytes, you will see the bytes in whatever order they were actually encoded into memory as, whether you have a BE or LE machine. So if you are looking at a NBO encoded buffer after receive, the MSB is going to be first - always. If you look at the output buffer after your have converted back to host order, if you have BE machine, the byte order will be unchanged. Conversely, on a LE machine, the bytes will all now be reversed in the converted buffer.

Finally, in your conversion loop, the variable total refers to bytes. However, you are accessing the buffer as shorts. Your loop guard should not be total, but should be:

total / sizeof( unsigned short )

to account for the double byte nature of each short.

like image 60
Tall Jeff Avatar answered Sep 19 '22 00:09

Tall Jeff


This works ok when I'm treating the data as a short, however if I cast the pointer to a char again the bytes are reversed.

That's what I'd expect.

What am I doing wrong?

You have to know what the sender sent: know whether the data is bytes (which don't need reversing), or shorts or longs (which do).

Google for tutorials associated with the ntohs, htons, and htons APIs.

like image 42
ChrisW Avatar answered Sep 22 '22 00:09

ChrisW


It's not clear what aResponse represents (string of characters? struct?). Endianness is relevant only for numerical values, not chars. You also need to make sure that at the sender's side, all numerical values are converted from host to network byte-order (hton*).

like image 30
Zach Scrivena Avatar answered Sep 20 '22 00:09

Zach Scrivena


Apart from your original question (which I think was already answered), you should have a look at your malloc statement. malloc allocates bytes and an unsigned short is most likely to be two bytes.

Your statement should look like:

unsigned short *ptr = (unsigned short*) malloc(total * sizeof(unsigned short));
like image 43
olli-MSFT Avatar answered Sep 21 '22 00:09

olli-MSFT