This could be a very silly question but I tried to google keywords like less and greater signs in data type of numpy
and found no reference.
In the doc of numpy
,
x = np.array([(1.0, 2), (3.0, 4)], dtype=[('x', float), ('y', int)])
outputs
array([(1.0, 2), (3.0, 4)],
dtype=[('x', '<f8'), ('y', '<i4')])
But on my PC, the output is
array([(1.0, 2), (3.0, 4)],
dtype=[('x', '>f8'), ('y', '>i4')])
What do <
and >
in the dtype
mean and why there is the difference?
It means: 'O' (Python) objects. Source. The first character specifies the kind of data and the remaining characters specify the number of bytes per item, except for Unicode, where it is interpreted as the number of characters. The item size must correspond to an existing type, or an error will be raised.
64 refers to the memory allocated to hold this character. Numeric characters with decimals. If a column contains numbers and NaNs(see below), pandas will default to float64, in case your missing value has a decimal. Values meant to hold time data.
The keywords <
and >
stand for byte ordering, aka endianness. It is the order in which bytes from numbers are stored (when numbers are compossed of more than 1 byte, e.g. int16, int32, float32...). This page from the reference gives you all the information you need about it in numpy, but as a summary:
|
: it doesn't have a byte order because is redundant (on single byte numbers or strings)
<
: little-endian
>
: big-endian
As @tobias_k and @RobertKern pointed out, the default endianess, if not specified, is system dependant.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With