Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python: Reading complicated text files with numpy

Tags:

python

numpy

I am using python with numpy to read in data from a numerical model in a text file with a fairly complicated format.

Numpy's genfromtxt and fromfile functions work well, but only if the data is structured. My data files looks something like this:

------snip

[sitename] [dimemsion 1 size] [dimension 2 size]
[data for dim 1]
[data for dim 2]
[date/time]
[header data]
[data (dim1 * dim2)]
[header]
[data]
...
.  
.   
[data/time]
[header]
[data]
.
.
etc...

---- snip

So, I have a mixture of text and numbers and a complicated (but repeating) layout. How is the best way to read this in using numpy?

Cheers,

Chris

like image 273
ccbunney Avatar asked Apr 12 '12 21:04

ccbunney


2 Answers

Numpy isn't good at generalized parsing, so you'd do well to look beyond it, and what you choose will depend mostly on how consistent the files are.

If they're unusually ultra consistent, so that say, you can just extract numbers from known positions and known rows, than you can just read in the file line by line as a sting and index this to the character that you want. (Step through the file, e.g., using file.readlines to get each line as a string.)

The usual case (at least that I find) is that it's more varied than above, but that simple string operations can be used to parse the line, such as string.split (which is almost always my first step), etc.

Beyond this, there are lots of parsing libraries in Python. I'm partial to pyparsing (but I don't know the others well, so it's not a fair comparison). Here's a summary of the various parsing libraries.

like image 58
tom10 Avatar answered Sep 30 '22 19:09

tom10


I agree with the previous answer. The following chain of steps work best and are a lot easier than pyparsing or numpy.genfromtxt

inp = open(textfilename).readlines()
my_list = []
for line in inp:
    item = str.split(line)
    my_list.append(float(item[0]))

You can then easily convert the list into a numpy array/matrix and proceed from there

like image 41
prrao Avatar answered Sep 30 '22 18:09

prrao