Firstly, before this question gets marked as duplicate, I'm aware others have asked similar questions but there doesn't seem to be a clear explanation. I'm trying to read in a binary file into an 2D array (documented well here http://nsidc.org/data/docs/daac/nsidc0051_gsfc_seaice.gd.html). The header is a 300 byte array. So far, I have; <pre class="prettyprint"><code>import struct with open("nt_197912_n07_v1.1_n.bin",mode='rb') as file: filecontent = file.read() x = struct.unpack("iiii",filecontent[:300]) </code></pre> Throws up an error of string argument length.

<h3>Reading the Data (Short Answer)</h3> After you have determined the size of the grid (<code>n_rows</code>x<code>n_cols</code> = 448x304) from your header (see below), you can simply read the data using <code>numpy.frombuffer</code>. <pre class="prettyprint"><code>import numpy as np #... #Get data from Numpy buffer dt = np.dtype(('>u1', (n_rows, n_cols))) x = np.frombuffer(filecontent[300:], dt) #we know the data starts from idx 300 onwards #Remove unnecessary dimension that numpy gave us x = x[0,:,:] </code></pre> The <code>'>u1'</code> specifies the format of the data, in this case unsigned integers of size 1-byte, that are big-endian format. Plotting this with <code>matplotlib.pyplot</code> <pre class="prettyprint"><code>import matplotlib.pyplot as plt #... plt.imshow(x, extent=[0,3,-3,3], aspect="auto") plt.show() </code></pre> The <code>extent=</code> option simply specifies the axis values, you can change these to lat/lon for example (parsed from your header) <img src="https://i.stack.imgur.com/TtoaP.png" alt="Output"> <h3>Explanation of Error from .unpack()</h3> From the docs for <code>struct.unpack(fmt, string)</code>: <blockquote> The string must contain exactly the amount of data required by the format (<code>len(string)</code> must equal <code>calcsize(fmt)</code>) </blockquote> You can determine the size specified in the format string (<code>fmt</code>) by looking at the Format Characters section. Your <code>fmt</code> in <code>struct.unpack("iiii",filecontent[:300])</code>, specifies 4 int types (you can also use <code>4i</code> = <code>iiii</code> for simplicity), each of which have size 4, requiring a string of length 16. Your string (<code>filecontent[:300]</code>) is of length 300, whilst your <code>fmt</code> is asking for a string of length 16, hence the error. <h3>Example Usage of .unpack()</h3> As an example, reading your supplied document I extracted the first 21*6 bytes, which has format: <blockquote> a 21-element array of 6-byte character strings that contain information such as polar stereographic grid characteristics </blockquote> With: <pre class="prettyprint"><code>x = struct.unpack("6s"*21, filecontent[:126]) </code></pre> This returns a tuple of 21 elements. Note the whitespace padding in some elements to meet the 6-byte requirement. <pre class="prettyprint"><code>>> print x # ('00255\x00', ' 304\x00', ' 448\x00', '1.799\x00', '39.43\x00', '45.00\x00', '558.4\x00', '154.0\x00', '234.0\x00', ' # SMMR\x00', '07 cn\x00', ' 336\x00', ' 0000\x00', ' 0034\x00', ' 364\x00', ' 0000\x00', ' 0046\x00', ' 1979\x00', ' 33 # 6\x00', ' 000\x00', '00250\x00') </code></pre> Notes: <ul> <li>The first argument <code>fmt</code>, <code>"6s"*21</code> is a string with <code>6s</code> repeated 21 times. Each format-character <code>6s</code> represents one string of 6-bytes (see below), this will match the required format specified in your document.</li> <li>The number <code>126</code> in <code>filecontent[:126]</code> is calculated as <code>6*21 = 126</code>. </li> <li>Note that for the <code>s</code> (string) specifier, the preceding number does not mean to repeat the format character 6 times (as it would normally for other format characters). Instead, it specifies the size of the string. <code>s</code> represents a 1-byte string, whilst <code>6s</code> represents a 6-byte string.</li> </ul> <h3>More Extensive Solution for Header Reading (Long)</h3> Because the binary data must be manually specified, this may be tedious to do in source code. You can consider using some configuration file (like <code>.ini</code> file) This function will read the header and store it in a dictionary, where the structure is given from a <code>.ini</code> file <pre class="prettyprint"><code># user configparser for Python 3x import ConfigParser def read_header(data, config_file): """ Read binary data specified by a INI file which specifies the structure """ with open(config_file) as fd: #Init the config class conf = ConfigParser.ConfigParser() conf.readfp(fd) #preallocate dictionary to store data header = {} #Iterate over the key-value pairs under the #'Structure' section for key in conf.options('structure'): #determine the string properties start_idx, end_idx = [int(x) for x in conf.get('structure', key).split(',')] start_idx -= 1 #remember python is zero indexed! strLength = end_idx - start_idx #Get the data header[key] = struct.unpack("%is" % strLength, data[start_idx:end_idx]) #Format the data header[key] = [x.strip() for x in header[key]] header[key] = [x.replace('\x00', '') for x in header[key]] #Unmap from list-type #use .items() for Python 3x header = {k:v[0] for k, v in header.iteritems()} return header </code></pre> An example <code>.ini</code> file below. The key is the name to use when storing the data, and the values is a comma-separated pair of values, the first being the starting index and the second being the ending index. These values were taken from Table 1 in your document. <pre class="prettyprint"><code>[structure] missing_data: 1, 6 n_cols: 7, 12 n_rows: 13, 18 latitude_enclosed: 25, 30 </code></pre> This function can be used as follows: <pre class="prettyprint"><code>header = read_header(filecontent, 'headerStructure.ini') n_cols = int(header['n_cols']) </code></pre>

Reading binary data in python

Firstly, before this question gets marked as duplicate, I'm aware others have asked similar questions but there doesn't seem to be a clear explanation. I'm trying to read in a binary file into an 2D array (documented well here http://nsidc.org/data/docs/daac/nsidc0051_gsfc_seaice.gd.html).

The header is a 300 byte array.

So far, I have;

import struct

with open("nt_197912_n07_v1.1_n.bin",mode='rb') as file:
    filecontent = file.read()

x = struct.unpack("iiii",filecontent[:300])

Throws up an error of string argument length.

How do you read binary data?

The best way to read a binary number is to start with the right-most digit and work your way left. The power of that first location is zero, meaning the value for that digit, if it's not a zero, is two to the power of zero, or one. In this case, since the digit is a zero, the value for this place would be zero.

Which function is used to read records from a binary file in Python?

rb : Opens the file as read-only in binary format and starts reading from the beginning of the file. While binary format can be used for different purposes, it is usually used when dealing with things like images, videos, etc. r+ : Opens a file for reading and writing, placing the pointer at the beginning of the file.

How do I read a binary csv file in Python?

Per default, Python's built-in open() function opens a text file. If you want to open a binary file, you need to add the 'b' character to the optional mode string argument. To open a file for reading in binary format, use mode='rb' . To open a file for writing in binary format, use mode='rb' .

Reading the Data (Short Answer)

After you have determined the size of the grid (n_rowsxn_cols = 448x304) from your header (see below), you can simply read the data using numpy.frombuffer.

import numpy as np

#...

#Get data from Numpy buffer
dt = np.dtype(('>u1', (n_rows, n_cols)))
x = np.frombuffer(filecontent[300:], dt) #we know the data starts from idx 300 onwards

#Remove unnecessary dimension that numpy gave us
x = x[0,:,:]

The '>u1' specifies the format of the data, in this case unsigned integers of size 1-byte, that are big-endian format.

Plotting this with matplotlib.pyplot

import matplotlib.pyplot as plt

#...

plt.imshow(x, extent=[0,3,-3,3], aspect="auto")
plt.show()

The extent= option simply specifies the axis values, you can change these to lat/lon for example (parsed from your header)

Output

Explanation of Error from .unpack()

From the docs for struct.unpack(fmt, string):

The string must contain exactly the amount of data required by the format (len(string) must equal calcsize(fmt))

You can determine the size specified in the format string (fmt) by looking at the Format Characters section.

Your fmt in struct.unpack("iiii",filecontent[:300]), specifies 4 int types (you can also use 4i = iiii for simplicity), each of which have size 4, requiring a string of length 16.

Your string (filecontent[:300]) is of length 300, whilst your fmt is asking for a string of length 16, hence the error.

Example Usage of .unpack()

As an example, reading your supplied document I extracted the first 21*6 bytes, which has format:

a 21-element array of 6-byte character strings that contain information such as polar stereographic grid characteristics

With:

x = struct.unpack("6s"*21, filecontent[:126])

This returns a tuple of 21 elements. Note the whitespace padding in some elements to meet the 6-byte requirement.

>> print x
    # ('00255\x00', '  304\x00', '  448\x00', '1.799\x00', '39.43\x00', '45.00\x00', '558.4\x00', '154.0\x00', '234.0\x00', '
    # SMMR\x00', '07 cn\x00', '  336\x00', ' 0000\x00', ' 0034\x00', '  364\x00', ' 0000\x00', ' 0046\x00', ' 1979\x00', '  33
    # 6\x00', '  000\x00', '00250\x00')

Notes:

The first argument fmt, "6s"*21 is a string with 6s repeated 21 times. Each format-character 6s represents one string of 6-bytes (see below), this will match the required format specified in your document.
The number 126 in filecontent[:126] is calculated as 6*21 = 126.
Note that for the s (string) specifier, the preceding number does not mean to repeat the format character 6 times (as it would normally for other format characters). Instead, it specifies the size of the string. s represents a 1-byte string, whilst 6s represents a 6-byte string.

Reading binary data in python

Tags:

python

file

binary

J W

People also ask

1 Answers

Reading the Data (Short Answer)

Explanation of Error from .unpack()

Example Usage of .unpack()

More Extensive Solution for Header Reading (Long)

Jamie Phan

Recent Activity

Donate For Us

Reading binary data in python

Tags:

python

file

binary

J W

People also ask

1 Answers

Reading the Data (Short Answer)

Explanation of Error from .unpack()

Example Usage of .unpack()

More Extensive Solution for Header Reading (Long)

Jamie Phan

Related questions

Recent Activity

Donate For Us