I'm trying to read in a complicated data file that has floating point values. Some C code has been supplied that handles this format (Met Office PP file) and it does a lot of bit twiddling and swapping. And it doesn't work. It gets a lot right, like the size of the data, but the numerical values in the returned matrix are nonsensical, have NaNs and values like 1e38 and -1e38 liberally sprinkled.
However, I have a binary exe ("convsh") that can convert these to netCDF, and the netCDFs look fine - nice swirly maps of wind speed.
What I'm thinking is that the bytes of the PP file are being read in in the wrong order. If I could compare the bytes of the floats returned correctly in the netCDF data with the bytes in the floats returned wrongly from the C code, then I might figure out the correct swappage.
So is there a plain R function to dump the four (or eight?) bytes of a floating point number? Something like:
> as.bytes(pi)
[1] 23 54 163 73 99 00 12 45 # made up values
searches for "bytes" and "float" and "binary" haven't helped.
Its trivial in C, I could probably have written it in the time it took me to write this...
Float to Byte Array Conversion As we know, the size of a float in Java is 32 bit which is similar to an int. So we can use floatToIntBits or floatToRawIntBits functions available in the Float class of Java. And then shift the bits to return a byte array.
Single-precision values with float type have 4 bytes, consisting of a sign bit, an 8-bit excess-127 binary exponent, and a 23-bit mantissa. The mantissa represents a number between 1.0 and 2.0.
The size of a float or other data types for that matter is dependent upon the system. It has to do with the hardware architecture and the compiler. This float, 10498.429 , would also be 4 bytes in memory. If a given computer system had a float size of 4 bytes then all floats are 4 bytes.
rdyncall might give you what you're looking for:
library(rdyncall)
as.floatraw(pi)
# [1] db 0f 49 40
# attr(,"class")
# [1] "floatraw"
Or maybe writeBin(pi, raw(8))
?
Yes, that must exist in the serialization code because R merrily sends stuff across the wire, taking care of endianness too. Did you look at eg Rserve using it, or how digest passes the char representation to chosen hash functions?
After a quick glance at digest.R
:
R> serialize(pi, connection=NULL, ascii=TRUE)
[1] 41 0a 32 0a 31 33 34 39 31 34 0a 31 33 31 38 34 30 0a
[19] 31 34 0a 31 0a 33 2e 31 34 31 35 39 32 36 35 33 35 38
[37] 39 37 39 33 0a
and
R> serialize(pi, connection=NULL, ascii=FALSE)
[1] 58 0a 00 00 00 02 00 02 0f 02 00 02 03 00 00 00 00 0e
[19] 00 00 00 01 40 09 21 fb 54 44 2d 18
R>
That might get you going.
Come to think about it, this includes header meta-data.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With