Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why do we need endianness here?

I am reading a source-code which downloads the zip-file and reads the data into numpy array. The code suppose to work on macos and linux and here is the snippet that I see:

def _read32(bytestream):
    dt = numpy.dtype(numpy.uint32).newbyteorder('>')
    return numpy.frombuffer(bytestream.read(4), dtype=dt)

This function is used in the following context:

with gzip.open(filename) as bytestream:
    magic = _read32(bytestream)

It is not hard to see what happens here, but I am puzzled with the purpose of newbyteorder('>'). I read the documentation, and know what endianness mean, but can not understand why exactly developer added newbyteorder (in my opinion it is not really needed).

like image 570
Salvador Dali Avatar asked Nov 13 '15 12:11

Salvador Dali


People also ask

Why do we need little endian?

The benefit of little endianness is that a variable can be read as any length using the same address. One benefit of big-endian is that you can read 16-bit and 32-bit values as most humans do; from left to right.

Why does endianness exist?

Another reason it exists is because it seems that it wasn't standardized back in the 1960s and 1970s; some companies (such as Intel with their x86 architecture) decided to go with little-endian (possibly due to the optimization reasoning above), whereas other companies selected big-endian.

What is the need of endianness byte swapping?

So Endianness comes into picture when you are sending and receiving data across the network from one host to another host. If the sender and receiver computer have different Endianness, then there is a need to swap the Endianness so that it is compatible.

Why do we need little endian and big-endian?

Little and big endian are two ways of storing multibyte data-types ( int, float, etc). In little endian machines, last byte of binary representation of the multibyte data-type is stored first. On the other hand, in big endian machines, first byte of binary representation of the multibyte data-type is stored first.


2 Answers

That's because data downloaded is in big endian format as described in source page: http://yann.lecun.com/exdb/mnist/

All the integers in the files are stored in the MSB first (high endian) format used by most non-Intel processors. Users of Intel processors and other low-endian machines must flip the bytes of the header.

like image 79
HeyYO Avatar answered Nov 12 '22 04:11

HeyYO


It is just a way of ensuring that the bytes are interpreted from the resulting array in the correct order, regardless of a system's native byteorder.

By default, the built in NumPy integer dtypes will use the byteorder that is native to your system. For example, my system is little-endian, so simply using the dtype numpy.dtype(numpy.uint32) will mean that values read into an array from a buffer with the bytes in big-endian order will not be interpreted correctly.

If np.frombuffer is to meant to recieve bytes that are known to be in a particular byteorder, the best practice is to modify the dtype using newbyteorder. This is mentioned in the documents for np.frombuffer:

Notes

If the buffer has data that is not in machine byte-order, this should be specified as part of the data-type, e.g.:

>>> dt = np.dtype(int)
>>> dt = dt.newbyteorder('>')
>>> np.frombuffer(buf, dtype=dt)

The data of the resulting array will not be byteswapped, but will be interpreted correctly.

like image 26
Alex Riley Avatar answered Nov 12 '22 04:11

Alex Riley