Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python import text file where each line has different number of columns

I'm new to python and I'm trying to figure out how to load a data file that contains blocks of data on a per timestep basis, such as like this:

TIME:,0
Q01 : A:,-10.7436,0.000536907,-0.00963283,0.00102934
Q02 : B:,0,0.0168694,-0.000413983,0.00345921
Q03 : C:,0.0566665
Q04 : D:,0.074456
Q05 : E:,0.077456
Q06 : F:,0.0744835
Q07 : G:,0.140448
Q08 : H:,-0.123968
Q09 : I:,0
Q10 : J:,0.00204377,0.0109621,-0.0539183,0.000708574
Q11 : K:,-2.86115e-17,0.00947104,0.0145645,1.05458e-16,-1.90972e-17,-0.00947859
Q12 : L:,-0.0036781,0.00161254
Q13 : M:,-0.00941257,0.000249692,-0.0046302,-0.00162387,0.000981709,-0.0135982,-0.0223496,-0.00872062,0.00548815,0.0114075,.........,-0.00196206
Q14 : N:,3797, 66558
Q15 : O:,0.0579981
Q16 : P:,0
Q17 : Q:,625

TIME:,0.1
Q01 : A:,-10.563,0.000636907,-0.00963283,0.00102934
Q02 : B:,0,0.01665694
Q03 : C:,0.786,-0.000666,0.6555
Q04 : D:,0.87,0.96
Q05 : E:,0.077456
Q06 : F:,0.07447835
Q07 : G:,0.140448
Q08 : H:,-0.123968
Q09 : I:,0
Q10 : J:,0.00204377,0.0109621,-0.0539183,0.000708574
Q11 : K:,-2.86115e-17,0.00947104,0.0145645,1.05458e-16,-1.90972e-17,-0.00947859
Q12 : L:,-0.0036781,0.00161254
Q13 : M:,-0.00941257,0.000249692,-0.0046302,-0.00162387,0.000981709,-0.0135982,-0.0223496,-0.00872062,0.00548815,0.0114075,.........,-0.00196206
Q14 : N:,3797, 66558
Q15 : O:,0.0579981
Q16 : P:,0,2,4
Q17 : Q:,786

Each block contains a number of variables that may have very different numbers of columns of data in it. The number of columns per variable may change in each timestep block, but the number of variables per block is the same in every timestep and it is always known how many variables were exported. There is no information on the number of blocks of data (timesteps) in the data file.

When the data has been read, it should be loaded in a format of variable per timestep:

Time:  |  A:                                           |  B:
0      |  -10.7436,0.000536907,-0.00963283,0.00102934  |  ........
0.1    |  -10.563,0.000636907,-0.00963283,0.00102934   |  ........
0.2    |  ......                                       |  ........

If the number of columns of data was the same every timestep and the same for every variable , this would be a very simple problem.

I guess I need to read the file line by line, in two loops, one per block and then once inside each block and then store the inputs in an array (append?). The changing number of columns per line has me a little stumped at the minute since I'm not very familiar with python and numpy yet.

If someone could point me in the right direction, such as what functions I should be using to do this relatively efficiently, that would be great.

like image 496
jpmorr Avatar asked Mar 12 '23 06:03

jpmorr


1 Answers

import pandas as pd
res = {}
TIME = None

# by default lazy line read
for line in open('file.txt'):
    parts = line.strip().split(':')
    map(str.strip, parts)
    if len(parts) and parts[0] == 'TIME':
        TIME = parts[1].strip(',')
        res[TIME] = {}
        print('New time section start {}'.format(TIME))
        # here you can stop and work with data from previou period
        continue

    if len(parts) <= 1:
        continue
    res[TIME][parts[1].lstrip()] = parts[2].strip(',').split(',')

df = pd.DataFrame.from_dict(res, 'columns')
# for example for TIME 0
dfZero = df['0']
print(dfZero)


df = pd.DataFrame.from_dict(res, 'index')

dfA = df['A']
print(dfA)

enter image description here

like image 133
VelikiiNehochuha Avatar answered Apr 27 '23 04:04

VelikiiNehochuha