Why do we need endianness here?

Tags:

I am reading a source-code which downloads the zip-file and reads the data into numpy array. The code suppose to work on macos and linux and here is the snippet that I see:

def _read32(bytestream):
    dt = numpy.dtype(numpy.uint32).newbyteorder('>')
    return numpy.frombuffer(bytestream.read(4), dtype=dt)

This function is used in the following context:

with gzip.open(filename) as bytestream:
    magic = _read32(bytestream)

It is not hard to see what happens here, but I am puzzled with the purpose of newbyteorder('>'). I read the documentation, and know what endianness mean, but can not understand why exactly developer added newbyteorder (in my opinion it is not really needed).

570

asked Nov 13 '15 12:11

Salvador Dali

2 Answers

That's because data downloaded is in big endian format as described in source page: http://yann.lecun.com/exdb/mnist/

All the integers in the files are stored in the MSB first (high endian) format used by most non-Intel processors. Users of Intel processors and other low-endian machines must flip the bytes of the header.

answered Nov 12 '22 04:11

HeyYO

It is just a way of ensuring that the bytes are interpreted from the resulting array in the correct order, regardless of a system's native byteorder.

By default, the built in NumPy integer dtypes will use the byteorder that is native to your system. For example, my system is little-endian, so simply using the dtype numpy.dtype(numpy.uint32) will mean that values read into an array from a buffer with the bytes in big-endian order will not be interpreted correctly.

If np.frombuffer is to meant to recieve bytes that are known to be in a particular byteorder, the best practice is to modify the dtype using newbyteorder. This is mentioned in the documents for np.frombuffer:

Notes

If the buffer has data that is not in machine byte-order, this should be specified as part of the data-type, e.g.:
>>> dt = np.dtype(int)
>>> dt = dt.newbyteorder('>')
>>> np.frombuffer(buf, dtype=dt)
The data of the resulting array will not be byteswapped, but will be interpreted correctly.

answered Nov 12 '22 04:11

Alex Riley

Related questions
                            
                                Extracting multiple submatrices in Python
                            
                                Python - Theano scan() function
                            
                                Delete first child node using BeautifulSoup
                            
                                TypeError: '_io.TextIOWrapper' object is not subscriptable
                            
                                Assigning values to Pandas Multiindex DataFrame by index level
                            
                                Python modules not found over terminal but on python shell, Linux
                            
                                In Python, using jsonpath-rw to get values for specific attribute (json/dict)
                            
                                Slow bitwise operations
                            
                                How to set Memcached retrieval timeout in Django
                            
                                Python pandas: replace values multiple columns matching multiple columns from another dataframe
                            
                                Run function after a certain type of model is committed
                            
                                A Test to Support Both JSON and File Multipart Uploads in DRF
                            
                                Why is B = numpy.dot(A,x) so much slower looping through doing B[i,:,:] = numpy.dot(A[i,:,:],x) )?
                            
                                Django translation escape % sign
                            
                                How to convert list into set in pandas?
                            
                                Changing the size of the heatmap specifically in a seaborn clustermap?
                            
                                dimshuffle equivalent function in Numpy
                            
                                Why does search in gmail API return different result than search in gmail website?
                            
                                timedelta error with numpy.longdouble dtype
                            
                                Why are "Models aren't loaded yet"?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why do we need endianness here?

Tags:

python

numpy

endianness

Salvador Dali

People also ask

2 Answers

HeyYO

Alex Riley

Recent Activity

Donate For Us