Read formatted data from part of a file fast (Gmsh mesh format)

Question

I maintain a little Python package that converts between different formats used for mesh representation à la

enter image description here

Those files can grow pretty big, so when reading them with Python it's important to do it efficiently.

One of the most used formats is msh from Gmsh. Unfortunately, its data layout is arguably not the best. An example file:

$MeshFormat
2.2 0 8
$EndMeshFormat
$Nodes
8
1 -0.5 -0.5 -0.5
2  0.5 -0.5 -0.5
3 -0.5  0.5 -0.5
4  0.5  0.5 -0.5
5 -0.5 -0.5  0.5
6  0.5 -0.5  0.5
7 -0.5  0.5  0.5
8  0.5  0.5  0.5
$EndNodes
$Elements
2
1 4 2 1 11 1 2 3 5
2 4 2 1 11 2 5 6 8
$EndElements

For the $Nodes:

The first number (8) is the number of nodes to follow.

In each node line, the first number is the index (not actually needed by still part of the format, ugh), then follow the three spatial coordinates.

So far, I haven't come up with anything better than islices in a for loop, which is pretty slow.

# The first line is the number of nodes
line = next(islice(f, 1))
num_nodes = int(line)
#
points = numpy.empty((num_nodes, 3))
for k, line in enumerate(islice(f, num_nodes)):
    points[k, :] = numpy.array(line.split(), dtype=float)[1:]
    line = next(islice(f, 1))
assert line.strip() == '$EndNodes'

For the $Elements:

The first number (2) is the number of elements to follow.

In each element line, the first number is the index, then follows an enum for the element type (4 is for tetrahedra). Then follows the number of integer tags for this element (2 in each case here, namely 1 and 11). Corresponding to the element type, the last few entries in this row correspond to $Node indices that form the element – in the case of a tetrahedron, the last four entries.

Since the number of tags can vary from element to element (i.e., line to line), just like the element type and the number of node indices, each line may have a different number of integers.

For both $Nodes and $Elements, any help for reading this data quickly is appreciated.

DYZ · Accepted Answer

Here's a somewhat weird implementation based on NumPy:

f = open('foo.msh')
f.readline() # '$MeshFormat
'
f.readline() # '2.2 0 8
'
f.readline() # '$EndMeshFormat
'
f.readline() # '$Nodes
'
n_nodes = int(f.readline()) # '8
'
nodes = numpy.fromfile(f,count=n_nodes*4, sep=" ").reshape((n_nodes,4))
# array([[ 1. , -0.5, -0.5, -0.5],
#   [ 2. ,  0.5, -0.5, -0.5],
#   [ 3. , -0.5,  0.5, -0.5],
#   [ 4. ,  0.5,  0.5, -0.5],
#   [ 5. , -0.5, -0.5,  0.5],
#   [ 6. ,  0.5, -0.5,  0.5],
#   [ 7. , -0.5,  0.5,  0.5],
#   [ 8. ,  0.5,  0.5,  0.5]])
f.readline() # '$EndNodes
'
f.readline() # '$Elements
'
n_elems = int(f.readline()) # '2
'
elems = numpy.fromfile(f,sep=" ")[:-1] # $EndElements read as -1
# This array must be reshaped based on the element type(s)
# array([  1.,   4.,   2.,   1.,  11.,   1.,   2.,   3.,   5.,   2.,   4.,
#    2.,   1.,  11.,   2.,   5.,   6.,   8.])

Read formatted data from part of a file fast (Gmsh mesh format)

Tags:

python

io

numpy

mesh

Nico Schlömer

1 Answers

DYZ

Recent Activity

Donate For Us

Read formatted data from part of a file fast (Gmsh mesh format)

Tags:

python

io

numpy

mesh

Nico Schlömer

1 Answers

DYZ

Related questions

Recent Activity

Donate For Us