I have an ASCII data file with a format that's unfamiliar to me in terms of how I could best read the data into a list or array in Python. The ASCII data file is formatted like this:
line 0: <month> <year>
lines 1 - 217: 12 integer values per line, each value has seven spaces, the first is always a space
For example the first record in the file looks like this:
1 1900
-32768 -32768 790 -1457 -1367 -16 -575 116 -32768 -32768 1898 -32768
-32768 -1289 -32768 -32768 -32768 -32768 -32768 -32768 -32768 -32768 -32768 -32768
-32768 -32768 -92 -32768 -32768 -32768 125 -32768 -32768 -32768 -32768 -32768
-32768 -32768 -32768 -32768 -32768 -1656 -32768 -764 -32768 -32768 -32768 -32768
<212 more lines like the above for this record, same spacing/separators/etc.>
I'll call the above a single record (all data for a single month), and there are about 1200 records in the file. The months increase sequentially from 1 to 12 before starting over with an increment of the year value. I want to read the records one at a time, something like this:
with open(data_file, 'r') as dataFile:
# while file still has unread records
# read month and year to use to create a datetime object
# read the next 216 lines of 12 values into a list (or array) of 2592 values
# process the record's list (or array) of data
What is an efficient "Pythonic" way of doing the above looping over the records including how to best read the data into a list or array?
itertools.groupby can be used here.
from datetime import date
from itertools import groupby
def keyfunc(line):
global key
row = map(int, line.strip().split())
if len(row) == 2:
month, year = row
key = date(year, month, 1)
return key
def read_file(fname):
with open(fname, 'r') as f:
for rec_date, lines in groupby(f, keyfunc):
data = []
for line in lines:
line = map(int, line.strip().split())
if len(line) == 2:
continue
data.extend(line)
yield rec_date, data
for rec_date, data in read_file('data.txt'):
print rec_date, data[:5], '... (', len(data), ")"
The keyfunc is the clever bit. It returns the key for each row of data. groupby will produce an iterator for each set of contiguous records with the same key. keyfunc is implemented using a global to track the latest 2-value record (converted to a date). This global might be avoidable with a bit more thought. When a new 2-value record is found it starts a new group with the date as the key. The data are aggregated into a single array for each key, ignoring the 2-value rows as they are also returned. The final result is an iterator that returns a 2-tuple of date and data array for each date in your data file.
EDIT: Here's a simple option, without using itertools.groupby
from datetime import date
def read_file2(fname):
data = []
with open(fname, 'r') as f:
for line in f:
row = map(int, line.strip().split())
if len(row) == 2:
if data:
yield key, data
month, year = row
key = date(year, month, 1)
data = []
else:
data.extend(row)
if data:
yield key, data
for rec_date, data in read_file2('data.txt'):
print rec_date, data[:5], '... (', len(data), ")"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With