I'm new to python and I'm trying to figure out how to load a data file that contains blocks of data on a per timestep basis, such as like this:
TIME:,0
Q01 : A:,-10.7436,0.000536907,-0.00963283,0.00102934
Q02 : B:,0,0.0168694,-0.000413983,0.00345921
Q03 : C:,0.0566665
Q04 : D:,0.074456
Q05 : E:,0.077456
Q06 : F:,0.0744835
Q07 : G:,0.140448
Q08 : H:,-0.123968
Q09 : I:,0
Q10 : J:,0.00204377,0.0109621,-0.0539183,0.000708574
Q11 : K:,-2.86115e-17,0.00947104,0.0145645,1.05458e-16,-1.90972e-17,-0.00947859
Q12 : L:,-0.0036781,0.00161254
Q13 : M:,-0.00941257,0.000249692,-0.0046302,-0.00162387,0.000981709,-0.0135982,-0.0223496,-0.00872062,0.00548815,0.0114075,.........,-0.00196206
Q14 : N:,3797, 66558
Q15 : O:,0.0579981
Q16 : P:,0
Q17 : Q:,625
TIME:,0.1
Q01 : A:,-10.563,0.000636907,-0.00963283,0.00102934
Q02 : B:,0,0.01665694
Q03 : C:,0.786,-0.000666,0.6555
Q04 : D:,0.87,0.96
Q05 : E:,0.077456
Q06 : F:,0.07447835
Q07 : G:,0.140448
Q08 : H:,-0.123968
Q09 : I:,0
Q10 : J:,0.00204377,0.0109621,-0.0539183,0.000708574
Q11 : K:,-2.86115e-17,0.00947104,0.0145645,1.05458e-16,-1.90972e-17,-0.00947859
Q12 : L:,-0.0036781,0.00161254
Q13 : M:,-0.00941257,0.000249692,-0.0046302,-0.00162387,0.000981709,-0.0135982,-0.0223496,-0.00872062,0.00548815,0.0114075,.........,-0.00196206
Q14 : N:,3797, 66558
Q15 : O:,0.0579981
Q16 : P:,0,2,4
Q17 : Q:,786
Each block contains a number of variables that may have very different numbers of columns of data in it. The number of columns per variable may change in each timestep block, but the number of variables per block is the same in every timestep and it is always known how many variables were exported. There is no information on the number of blocks of data (timesteps) in the data file.
When the data has been read, it should be loaded in a format of variable per timestep:
Time: | A: | B:
0 | -10.7436,0.000536907,-0.00963283,0.00102934 | ........
0.1 | -10.563,0.000636907,-0.00963283,0.00102934 | ........
0.2 | ...... | ........
If the number of columns of data was the same every timestep and the same for every variable , this would be a very simple problem.
I guess I need to read the file line by line, in two loops, one per block and then once inside each block and then store the inputs in an array (append?). The changing number of columns per line has me a little stumped at the minute since I'm not very familiar with python and numpy yet.
If someone could point me in the right direction, such as what functions I should be using to do this relatively efficiently, that would be great.
import pandas as pd
res = {}
TIME = None
# by default lazy line read
for line in open('file.txt'):
parts = line.strip().split(':')
map(str.strip, parts)
if len(parts) and parts[0] == 'TIME':
TIME = parts[1].strip(',')
res[TIME] = {}
print('New time section start {}'.format(TIME))
# here you can stop and work with data from previou period
continue
if len(parts) <= 1:
continue
res[TIME][parts[1].lstrip()] = parts[2].strip(',').split(',')
df = pd.DataFrame.from_dict(res, 'columns')
# for example for TIME 0
dfZero = df['0']
print(dfZero)
df = pd.DataFrame.from_dict(res, 'index')
dfA = df['A']
print(dfA)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With