Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Splitting strings based on multiple delimiters does not yield consistent result

I have a file type with many rows containing information as follows:

  P087 = ( 4.000000000000000E+001,-6.250000000000000E-001 )
  P088 = ( 4.000000000000000E+001, 0.000000000000000E+000 )

I'm reading this file line by line using

fo = open(FileName, 'r')
for line in fo:
    #do stuff to line

I'd like to see how to split each line to give lists as follows:

[87, 40.0,-0.625]
[88, 40.0, 0.0]

I tried splitting using python's regular .split() method but it doesn't split the lines consistently, yielding varying list lengths for each line.

I also investigated re.split() using stuff like re.split([ = ( ]|,) but that didn't work either. I'm also not a big regular expression user (though I know they are very powerful) which explains why I'm having a hard time finding the right one.

I guess I need to delimit the lines by ' = ( ' and ',' though I'm really not sure how to do it such that the resulting lists are consistent. Any help would be much appreciated.

Thanks

like image 624
user32882 Avatar asked Mar 11 '26 09:03

user32882


2 Answers

Using ast.literal_eval() for parsing the tuple string:

import ast
import re

with open(FileName, 'r') as f:
    out = [
        [int(re.findall('(?<=P)\d+', k)[0]), *ast.literal_eval(v.strip())]
        for k, v in [line.split('=') for line in f]
    ]
like image 112
eugenhu Avatar answered Mar 13 '26 22:03

eugenhu


This should do it:

for line in fo:
    parts = re.match(r'\s*P(\d+)\s*=\s*[(]\s*([^ ,]*)[ ,]+([^ ,]*)[ )]*',line).groups()
    print([int(parts[0]), float(parts[1]), float(parts[2])])

The re.match is used to extract the important parts, then each is parsed to the appropriate type to be printed.

like image 31
Scott Hunter Avatar answered Mar 13 '26 22:03

Scott Hunter