I've volunteered to help someone convert a finite element mesh from one format to another (i-deas *.unv to Alberta). I've used NumPy to do some additional shaping of the mesh, but I'm having problems reading the raw text file data into NumPy arrays. I've tried genfromtxt and loadtxt with no success so far.
Some details:
1) All groups are delimited by the header and footer flag " -1" on its own line.
2) The NODE group has a header " 2411" on its own line. I only want to read alternate lines from this group, skipping each line with 4 integers, but reading the line with 3 Fortran double precision numbers.
3) The ELEMENT connectivity group has a header " 2412" on its own line. All data are integers and only the first 4 columns are required to be read. There will be some empty slots in the NumPy array due to missing values for 2 and 3 node elements.
4) The " 2477" node groups I think I can deal with myself using regular expressions that find which lines to read.
5) The real data file will have about 1 million lines of text, so I'm very keen for it to be vectorized if possible (or whatever NumPy does to read stuff quickly).
Sorry if I've given too much information, and thanks.
The lines below are a sample of parts of the *.unv text file format.
-1
2411
146303 1 1 11
6.9849462399269246D-001 8.0008842847097805D-002 6.6360238055630028D-001
146304 1 1 11
4.1854795755893875D-001 9.1256034628308313D-001 3.5725496189239300D-002
146305 1 1 11
7.5541258490349616D-001 3.7870257739063029D-001 2.0504544370783115D-001
146306 1 1 11
2.7637569971086767D-001 9.2829777518336010D-001 1.3757239038663285D-001
-1
-1
2412
9 21 1 0 7 2
0 0 0
1 9
10 21 1 0 7 2
0 0 0
9 10
1550 91 6 0 7 3
761 3685 2027
1551 91 6 0 7 3
761 2380 2067
39720 111 1 0 7 4
71854 59536 40323 73014
39721 111 1 0 7 4
45520 48908 133818 145014
-1
-1
2477
1 0 0 0 0 0 0 3022
PERMANENT GROUP1
7 2 0 0 7 3 0 0
7 8 0 0 7 7 0 0
7 147 0 0 7 148 0 0
2 0 0 0 0 0 0 2915
PERMANENT GROUP2
7 1 0 0 7 5 0 0
7 4 0 0 7 6 0 0
7 9 0 0 7 11 0 0
-1
The numpy methods genfromtxt
and loadtxt
would be rather difficult to apply on the whole file, as your data has a quite special structure (which changes depending in which node you are). Therefore, I'd suggest the following strategy:
Read the file line by lines, try to determine in which node you are by analysing the line.
If you are in a node, which has only a few data (and where, for example, you have to read alternating lines, so you can't read continously), read it line by line and process the lines.
When you reach a section with a lot of data (like the one with the "real data"), use numpys fromfile method to read in the data, like this:
mydata = np.fromfile(fp, sep=" ", dtype=int, count=number_of_elements)
mydata.shape = (100000, 3) # Reshape it to the desired shape as fromfile
# returns a 1D array.
This way, you combine the flexibility of line-by-line processing with the ability to quickly read and convert large chunks of data.
UPDATE: The point is, that you open the file, read it line by line, and when you arrive at a place with a big chunk of data, you pass the file descriptor to fromfile.
Below a simplified example:
import numpy as np
fp = open("test.dat", "r")
line = fp.readline()
ndata = int(line.strip())
data = np.fromfile(fp, count=ndata, sep=" ", dtype=int)
fp.close()
That would read the data from a file test.dat
with a content like:
10
1 2 3 4 5
6 7 8 9 10
The first line is read explicitely with fp.read()
, processed (the number of integers to be read is determined) and then np.fromfile()
reads the appropriate chunk of data and stores it in the 1D-array data
.
UPDATE2: Alternatively, you could read the entire text into a buffer, then determine the starting and end positions for the large chunk of data and convert it via np.fromstring
directly:
fp = open("test.dat", "r")
txt = fp.read()
fp.close()
# Now determine starting and end positions (startpos, endpos)
# ..
# pass text that portion of the text to the fromstring function.
data = np.fromstring(txt[startpos:endpos], dtype=int, sep=" ")
Or, if it is easy to formulate as one regular expression, you could use the fromregex()
directly on the file.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With