Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to read this ASCII data into Python lists or numpy arrays?

Tags:

python

numpy

I have an ASCII data file with a format that's unfamiliar to me in terms of how I could best read the data into a list or array in Python. The ASCII data file is formatted like this:

line 0:          <month> <year>
lines 1 - 217:   12 integer values per line, each value has seven spaces, the first is always a space

For example the first record in the file looks like this:

    1 1900
 -32768 -32768    790  -1457  -1367    -16   -575    116 -32768 -32768   1898 -32768
 -32768  -1289 -32768 -32768 -32768 -32768 -32768 -32768 -32768 -32768 -32768 -32768
 -32768 -32768    -92 -32768 -32768 -32768    125 -32768 -32768 -32768 -32768 -32768
 -32768 -32768 -32768 -32768 -32768  -1656 -32768   -764 -32768 -32768 -32768 -32768
 <212 more lines like the above for this record, same spacing/separators/etc.>

I'll call the above a single record (all data for a single month), and there are about 1200 records in the file. The months increase sequentially from 1 to 12 before starting over with an increment of the year value. I want to read the records one at a time, something like this:

with open(data_file, 'r') as dataFile:
    # while file still has unread records
        # read month and year to use to create a datetime object
        # read the next 216 lines of 12 values into a list (or array) of 2592 values
        # process the record's list (or array) of data

What is an efficient "Pythonic" way of doing the above looping over the records including how to best read the data into a list or array?

like image 671
James Adams Avatar asked Mar 18 '26 13:03

James Adams


1 Answers

itertools.groupby can be used here.

from datetime import date
from itertools import groupby

def keyfunc(line):
    global key
    row = map(int, line.strip().split())
    if len(row) == 2:
        month, year = row
        key = date(year, month, 1)
    return key

def read_file(fname):
    with open(fname, 'r') as f:
        for rec_date, lines in groupby(f, keyfunc):
            data = []
            for line in lines:
                line = map(int, line.strip().split())
                if len(line) == 2:
                    continue
                data.extend(line)
            yield rec_date, data

for rec_date, data in read_file('data.txt'):
    print rec_date, data[:5], '... (', len(data), ")"

The keyfunc is the clever bit. It returns the key for each row of data. groupby will produce an iterator for each set of contiguous records with the same key. keyfunc is implemented using a global to track the latest 2-value record (converted to a date). This global might be avoidable with a bit more thought. When a new 2-value record is found it starts a new group with the date as the key. The data are aggregated into a single array for each key, ignoring the 2-value rows as they are also returned. The final result is an iterator that returns a 2-tuple of date and data array for each date in your data file.

EDIT: Here's a simple option, without using itertools.groupby

from datetime import date

def read_file2(fname):
    data = []
    with open(fname, 'r') as f:
        for line in f:
            row = map(int, line.strip().split())
            if len(row) == 2:
                if data:
                    yield key, data
                month, year = row
                key = date(year, month, 1)                
                data = []
            else:
                data.extend(row)
        if data:
            yield key, data


for rec_date, data in read_file2('data.txt'):
    print rec_date, data[:5], '... (', len(data), ")"
like image 129
Graeme Stuart Avatar answered Mar 20 '26 02:03

Graeme Stuart



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!