I'm trying to read the data which is not structured well. It looks something like this
Generated by trjconv : P/L=1/400 t= 0.00000
11214
1P1 aP1 1 80.48 35.36 4.25
2P1 aP1 2 37.45 3.92 3.96
3P2 aP2 3 18.53 -9.69 4.68
4P2 aP2 4 55.39 74.34 4.60
5P3 aP3 5 22.11 68.71 3.85
6P3 aP3 6 -4.13 24.04 3.73
7P4 aP4 7 40.16 6.39 4.73
8P4 aP4 8 -5.40 35.73 4.85
9P5 aP5 9 36.67 22.45 4.08
10P5 aP5 10 -3.68 -10.66 4.18
Generated by trjconv : P/L=1/400 t= 1000.000
11214
1P1 aP1 1 80.48 35.36 4.25
2P1 aP1 2 37.45 3.92 3.96
3P2 aP2 3 18.53 -9.69 4.68
4P2 aP2 4 55.39 74.34 4.60
5P3 aP3 5 22.11 68.71 3.85
6P3 aP3 6 -4.13 24.04 3.73
7P4 aP4 7 40.16 6.39 4.73
8P4 aP4 8 -5.40 35.73 4.85
9P5 aP5 9 36.67 22.45 4.08
10P5 aP5 10 -3.68 -10.66 4.18
Generated by trjconv : P/L=1/400 t= 2000.000
11214
1P1 aP1 1 80.48 35.36 4.25
2P1 aP1 2 37.45 3.92 3.96
3P2 aP2 3 18.53 -9.69 4.68
4P2 aP2 4 55.39 74.34 4.60
5P3 aP3 5 22.11 68.71 3.85
6P3 aP3 6 -4.13 24.04 3.73
7P4 aP4 7 40.16 6.39 4.73
8P4 aP4 8 -5.40 35.73 4.85
9P5 aP5 9 36.67 22.45 4.08
10P5 aP5 10 -3.68 -10.66 4.18
Generated by trjconv : P/L=1/400 t= 3000.000
11214
1P1 aP1 1 80.48 35.36 4.25
2P1 aP1 2 37.45 3.92 3.96
3P2 aP2 3 18.53 -9.69 4.68
4P2 aP2 4 55.39 74.34 4.60
5P3 aP3 5 22.11 68.71 3.85
6P3 aP3 6 -4.13 24.04 3.73
7P4 aP4 7 40.16 6.39 4.73
8P4 aP4 8 -5.40 35.73 4.85
9P5 aP5 9 36.67 22.45 4.08
10P5 aP5 10 -3.68 -10.66 4.18
It consists of different frames with updated time. What I showed here is just a sample. The whole file is around 50GB. therefore it will be better to read it line by line or in chunks. But I could not figure out how to deal with the headers of each frame. Are there any ways to get rid of these headers? For now I used following method:
import numpy as np
#define a np.dtype for gro array/dataset (hard-coded for now)
gro_dt = np.dtype([('col1', 'S4'), ('col2', 'S4'), ('col3', int),
('col4', float), ('col5', float), ('col6', float)])
file = np.genfromtxt('sample.gro', skip_header = 2, dtype=gro_dt)
But it throws the following error when it comes to next header.
ValueError: Some errors were detected !
Line #13 (got 7 columns instead of 6)
Line #14 (got 1 columns instead of 6)
Line #25 (got 7 columns instead of 6)
Line #26 (got 1 columns instead of 6)
Line #37 (got 7 columns instead of 6)
Line #38 (got 1 columns instead of 6)
An complex number is represented by “ x + yi “. Python converts the real numbers x and y into complex using the function complex(x,y). The real part can be accessed using the function real() and imaginary part can be represented by imag().
Integer and floating points are separated by decimal points. 1 is an integer, 1.0 is a floating-point number. Complex numbers are written in the form, x + yj , where x is the real part and y is the imaginary part.
A complex number has two parts, real part and imaginary part. Complex numbers are represented as A+Bi or A+Bj , where A is real part and B is imaginary part. Python supports complex data type as built-in feature which means we can directly perform different operations on complex number in python.
Python adopted the convention used by electrical engineers. In that field, i is used to represent current and use j as the square root of -1.
Python supports complex data type as built-in feature which means we can directly perform different operations on complex number in python. First thing first, python uses A+Bj notation to represent complex number meaning python will recognize 3+4j as a valid number but 3+4i is not valid.
Not only real numbers, Python can also handle complex numbers and its associated functions using the file “cmath”. Complex numbers have their uses in many applications related to mathematics and python provides useful tools to handle and manipulate them. Converting real numbers to complex number. An complex number is represented by “ x + yi “.
An complex number is represented by “ x + yi “. Python converts the real numbers x and y into complex using the function complex(x,y).
This time, your expression is no longer a literal because Python evaluated it into a complex number comprising only two parts. Remember that the basic rules of algebra carry over to complex numbers, so if you group similar terms and apply component-wise addition, then you’ll end up with 6 + 8j. Notice how Python displays complex numbers by default.
Write an adaptor that strips the periodic headers.
def adapt(f):
for line in f:
if line.startswith("Generated"):
print(line, end='')
# Consume the following line as well.
# If your data is well behaved, you can
# assume the following line exists and should be
# skipped, instead of using the try statement.
try:
print(next(f), end='')
except StopIteration:
pass
continue
yield line
with open('sample.gro') as f:
file = np.genfromtxt(adapt(f), dtype=gro_dt)
Since genfromtxt
accepts a generator function, maybe a converter function like so? (This keeps the t=
value from the headers intact as the first column.)
def converter(inf):
current_t = None
for line in inf:
if "trjconv" in line:
current_t = line.partition("t=")[-1].strip()
elif line.startswith(" "):
yield current_t + line
gro_dt = np.dtype(
[
("t", "float"),
("col1", "S4"),
("col2", "S4"),
("col3", int),
("col4", float),
("col5", float),
("col6", float),
]
)
with open("sample.gro") as fp:
file = np.genfromtxt(converter(fp), dtype=gro_dt)
print(file)
The output begins
[( 0., b'1P1', b'aP1', 1, 80.48, 35.36, 4.25)
( 0., b'2P1', b'aP1', 2, 37.45, 3.92, 3.96)
( 0., b'3P2', b'aP2', 3, 18.53, -9.69, 4.68)
( 0., b'4P2', b'aP2', 4, 55.39, 74.34, 4.6 )
( 0., b'5P3', b'aP3', 5, 22.11, 68.71, 3.85)
( 0., b'6P3', b'aP3', 6, -4.13, 24.04, 3.73)
( 0., b'7P4', b'aP4', 7, 40.16, 6.39, 4.73)
( 0., b'8P4', b'aP4', 8, -5.4 , 35.73, 4.85)
( 0., b'9P5', b'aP5', 9, 36.67, 22.45, 4.08)
( 0., b'10P5', b'aP5', 10, -3.68, -10.66, 4.18)
(1000., b'1P1', b'aP1', 1, 80.48, 35.36, 4.25)
(1000., b'2P1', b'aP1', 2, 37.45, 3.92, 3.96)
(1000., b'3P2', b'aP2', 3, 18.53, -9.69, 4.68)
(1000., b'4P2', b'aP2', 4, 55.39, 74.34, 4.6 )
assuming you want to collect the frame data (not sure you can do that for 50 GB..)
The code below does that.
def _is_interesting_line(line_str: str) -> bool:
return line and line_str[0].isspace()
data = []
with open('data.txt') as f:
while True:
line = f.readline()
if not line:
break
interesting = _is_interesting_line(line)
if not interesting:
print(line.strip())
else:
data.append(line.strip())
print('result:')
print(data)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With