Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Importing big tecplot block files in python as fast as possible

Tags:

I want to import in python some ascii file ( from tecplot, software for cfd post processing). Rules for those files are (at least, for those that I need to import):

  • The file is divided in several section

Each section has two lines as header like:

VARIABLES = "x" "y" "z" "ro" "rovx" "rovy" "rovz" "roE" "M" "p" "Pi" "tsta" "tgen" 
ZONE T="Window(s) : E_W_Block0002_ALL",  I=29,  J=17,  K=25, F=BLOCK
  • Each section has a set of variable given by the first line. When a section ends, a new section starts with two similar lines.
  • For each variable there are I*J*K values.
  • Each variable is a continous block of values.
  • There are a fixed number of values per row (6).
  • When a variable ends, the next one starts in a new line.
  • Variables are "IJK ordered data".The I-index varies the fastest; the J-index the next fastest; the K-index the slowest. The I-index should be the inner loop, the K-index shoould be the outer loop, and the J-index the loop in between.

Here is an example of data:

VARIABLES = "x" "y" "z" "ro" "rovx" "rovy" "rovz" "roE" "M" "p" "Pi" "tsta" "tgen" 
ZONE T="Window(s) : E_W_Block0002_ALL",  I=29,  J=17,  K=25, F=BLOCK
-3.9999999E+00 -3.3327306E+00 -2.7760824E+00 -2.3117116E+00 -1.9243209E+00 -1.6011492E+00
[...]
0.0000000E+00 #fin first variable
-4.3532482E-02 -4.3584235E-02 -4.3627592E-02 -4.3663762E-02 -4.3693815E-02 -4.3718831E-02 #second variable, 'y'
[...]
1.0738781E-01 #end of second variable
[...]
[...]
VARIABLES = "x" "y" "z" "ro" "rovx" "rovy" "rovz" "roE" "M" "p" "Pi" "tsta" "tgen" #next zone
ZONE T="Window(s) : E_W_Block0003_ALL",  I=17,  J=17,  K=25, F=BLOCK

I am quite new at python and I have written a code to import the data to a dictionary, writing the variables as 3D numpy.array . Those files could be very big, (up to Gb). How can I make this code faster? (or more generally, how can I import such files as fast as possible)?

import re
from numpy import zeros, array, prod
def vectorr(I,  J,  K):
    """function"""
    vect = []
    for k in range(0,  K):
        for j in range(0, J):
            for i in range(0, I):
                vect.append([i, j, k])
    return vect

a = open('E:\u.dat')

filelist = a.readlines()

NumberCol = 6
count = 0
data = dict()
leng = len(filelist)
countzone = 0
while count < leng:
    strVARIABLES = re.findall('VARIABLES', filelist[count])
    variables = re.findall(r'"(.*?)"',  filelist[count])
    countzone = countzone+1
    data[countzone] = {key:[] for key in variables}
    count = count+1
    strI = re.findall('I=....', filelist[count])
    strI = re.findall('\d+', strI[0]) 
    I = int(strI[0])
    ##
    strJ = re.findall('J=....', filelist[count])
    strJ = re.findall('\d+', strJ[0])
    J = int(strJ[0])
    ##
    strK = re.findall('K=....', filelist[count])
    strK = re.findall('\d+', strK[0])
    K = int(strK[0])
    data[countzone]['indmax'] = array([I, J, K])
    pr = prod(data[countzone]['indmax'])
    lin = pr // NumberCol
    if pr%NumberCol != 0:
        lin = lin+1
    vect = vectorr(I, J, K)
    for key in variables:
        init = zeros((I, J, K))
        for ii in range(0, lin):
            count = count+1
            temp = map(float, filelist[count].split())
            for iii in range(0, len(temp)):
                init.itemset(tuple(vect[ii*6+iii]), temp[iii])
        data[countzone][key] = init
    count = count+1

Ps. In python, no cython or other languages