I have the following string which I am parsing from another file : "CHEM1(5GL) CH3M2(55LB) CHEM3954114(50KG)" What I want to do is split them up into individual values, which I achieve using the .split() function. So I get them as an array:
x = ['CHEM1(5GL)', 'CH3M2(55LB)','CHEM3954114(50KG)']
Now I want to further split them into 3 segments, and store them in 3 other variables so I can write them to excel as such :
a = CHEM1
b = 5
c = GL
for the first array, then I will loop back for the second array:
a = CH3M2
b = 55
c = LB
and finally :
a = CHEM3954114
b = 50
c = KG
I am unsure how to go about that as I am still new in python. To the best of my acknowledge I iterate multiple times with the split function, but I believe there has to be a better way to do it than that.
Thank you.
You should use the re package:
import re
x = ['CHEM1(5GL)', 'CH3M2(55LB)','CHEM3954114(50KG)']
pattern = re.compile("([^\(]+)\((\d+)(.+)\)")
for x1 in x:
m = pattern.search(x1)
if m:
a, b, c = m.group(1), int(m.group(2)), m.group(3)
FOLLOW UP:
The regex topic is enormous and extremely well covered on this site - as Tim has highlighted above. I can share my thinking for this specific case. Essentially, there are 3 groups of characters you want to extract:
( - not included() - not included.A group is anything included between brackets (): in this specific case, it may become confusing because, as stressed above, you have brackets as part of sentence - which will need to be escaped with a \ to be distinguished from the ones used in the regular expression.
([^\(]+), which essentially means: match one or more characters which are not ( (the ^ is the negation, and the bracket ( needs to be escaped here, for the reasons described above). Note that characters may include not only letters and numbers but also special characters like $, £, - and so forth. I wanted to keep my options open here, but you can be more laser guided if you need (including, for example, only numbers and letters using [\w]+)(\d+), which is essentially matching 1 or more (expressed with +) digits (expressed with \d).(.+) - match any remaining characters, with the final \) making sure that you match any remaining characters up to the closing bracket.If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With