Firstly, before this question gets marked as duplicate, I'm aware others have asked similar questions but there doesn't seem to be a clear explanation. I'm trying to read in a binary file into an 2D array (documented well here http://nsidc.org/data/docs/daac/nsidc0051_gsfc_seaice.gd.html).
The header is a 300 byte array.
So far, I have;
import struct
with open("nt_197912_n07_v1.1_n.bin",mode='rb') as file:
filecontent = file.read()
x = struct.unpack("iiii",filecontent[:300])
Throws up an error of string argument length.
The best way to read a binary number is to start with the right-most digit and work your way left. The power of that first location is zero, meaning the value for that digit, if it's not a zero, is two to the power of zero, or one. In this case, since the digit is a zero, the value for this place would be zero.
rb : Opens the file as read-only in binary format and starts reading from the beginning of the file. While binary format can be used for different purposes, it is usually used when dealing with things like images, videos, etc. r+ : Opens a file for reading and writing, placing the pointer at the beginning of the file.
Per default, Python's built-in open() function opens a text file. If you want to open a binary file, you need to add the 'b' character to the optional mode string argument. To open a file for reading in binary format, use mode='rb' . To open a file for writing in binary format, use mode='rb' .
After you have determined the size of the grid (n_rows
xn_cols
= 448x304) from your header (see below), you can simply read the data using numpy.frombuffer
.
import numpy as np
#...
#Get data from Numpy buffer
dt = np.dtype(('>u1', (n_rows, n_cols)))
x = np.frombuffer(filecontent[300:], dt) #we know the data starts from idx 300 onwards
#Remove unnecessary dimension that numpy gave us
x = x[0,:,:]
The '>u1'
specifies the format of the data, in this case unsigned integers of size 1-byte, that are big-endian format.
Plotting this with matplotlib.pyplot
import matplotlib.pyplot as plt
#...
plt.imshow(x, extent=[0,3,-3,3], aspect="auto")
plt.show()
The extent=
option simply specifies the axis values, you can change these to lat/lon for example (parsed from your header)
From the docs for struct.unpack(fmt, string)
:
The string must contain exactly the amount of data required by the format (
len(string)
must equalcalcsize(fmt)
)
You can determine the size specified in the format string (fmt
) by looking at the Format Characters section.
Your fmt
in struct.unpack("iiii",filecontent[:300])
, specifies 4 int types (you can also use 4i
= iiii
for simplicity), each of which have size 4, requiring a string of length 16.
Your string (filecontent[:300]
) is of length 300, whilst your fmt
is asking for a string of length 16, hence the error.
As an example, reading your supplied document I extracted the first 21*6 bytes, which has format:
a 21-element array of 6-byte character strings that contain information such as polar stereographic grid characteristics
With:
x = struct.unpack("6s"*21, filecontent[:126])
This returns a tuple of 21 elements. Note the whitespace padding in some elements to meet the 6-byte requirement.
>> print x
# ('00255\x00', ' 304\x00', ' 448\x00', '1.799\x00', '39.43\x00', '45.00\x00', '558.4\x00', '154.0\x00', '234.0\x00', '
# SMMR\x00', '07 cn\x00', ' 336\x00', ' 0000\x00', ' 0034\x00', ' 364\x00', ' 0000\x00', ' 0046\x00', ' 1979\x00', ' 33
# 6\x00', ' 000\x00', '00250\x00')
Notes:
fmt
, "6s"*21
is a string with 6s
repeated 21
times. Each format-character 6s
represents one string of 6-bytes
(see below), this will match the required format specified in your
document.126
in filecontent[:126]
is calculated as 6*21 = 126
. s
(string) specifier, the preceding number does
not mean to repeat the format character 6 times (as it would
normally for other format characters). Instead, it specifies the size
of the string. s
represents a 1-byte string, whilst 6s
represents
a 6-byte string.Because the binary data must be manually specified, this may be tedious to do in source code. You can consider using some configuration file (like .ini
file)
This function will read the header and store it in a dictionary, where the structure is given from a .ini
file
# user configparser for Python 3x
import ConfigParser
def read_header(data, config_file):
"""
Read binary data specified by a INI file which specifies the structure
"""
with open(config_file) as fd:
#Init the config class
conf = ConfigParser.ConfigParser()
conf.readfp(fd)
#preallocate dictionary to store data
header = {}
#Iterate over the key-value pairs under the
#'Structure' section
for key in conf.options('structure'):
#determine the string properties
start_idx, end_idx = [int(x) for x in conf.get('structure', key).split(',')]
start_idx -= 1 #remember python is zero indexed!
strLength = end_idx - start_idx
#Get the data
header[key] = struct.unpack("%is" % strLength, data[start_idx:end_idx])
#Format the data
header[key] = [x.strip() for x in header[key]]
header[key] = [x.replace('\x00', '') for x in header[key]]
#Unmap from list-type
#use .items() for Python 3x
header = {k:v[0] for k, v in header.iteritems()}
return header
An example .ini
file below. The key is the name to use when storing the data, and the values is a comma-separated pair of values, the first being the starting index and the second being the ending index. These values were taken from Table 1 in your document.
[structure]
missing_data: 1, 6
n_cols: 7, 12
n_rows: 13, 18
latitude_enclosed: 25, 30
This function can be used as follows:
header = read_header(filecontent, 'headerStructure.ini')
n_cols = int(header['n_cols'])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With