I am using python with numpy to read in data from a numerical model in a text file with a fairly complicated format.
Numpy's genfromtxt and fromfile functions work well, but only if the data is structured. My data files looks something like this:
------snip
[sitename] [dimemsion 1 size] [dimension 2 size]
[data for dim 1]
[data for dim 2]
[date/time]
[header data]
[data (dim1 * dim2)]
[header]
[data]
...
.
.
[data/time]
[header]
[data]
.
.
etc...
---- snip
So, I have a mixture of text and numbers and a complicated (but repeating) layout. How is the best way to read this in using numpy?
Cheers,
Chris
Numpy isn't good at generalized parsing, so you'd do well to look beyond it, and what you choose will depend mostly on how consistent the files are.
If they're unusually ultra consistent, so that say, you can just extract numbers from known positions and known rows, than you can just read in the file line by line as a sting and index this to the character that you want. (Step through the file, e.g., using file.readlines to get each line as a string.)
The usual case (at least that I find) is that it's more varied than above, but that simple string operations can be used to parse the line, such as string.split (which is almost always my first step), etc.
Beyond this, there are lots of parsing libraries in Python. I'm partial to pyparsing (but I don't know the others well, so it's not a fair comparison). Here's a summary of the various parsing libraries.
I agree with the previous answer. The following chain of steps work best and are a lot easier than pyparsing or numpy.genfromtxt
inp = open(textfilename).readlines()
my_list = []
for line in inp:
item = str.split(line)
my_list.append(float(item[0]))
You can then easily convert the list into a numpy array/matrix and proceed from there
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With