My text file (unfortunately) looks like this...
<amar>[amar-1000#Fem$$$_Y](1){india|1000#Fem$$$,mumbai|1000#Mas$$$}
<akbar>[akbar-1000#Fem$$$_Y](1){}
<john>[-0000#$$$_N](0){USA|0100#$avi$$,NJ|0100#$avi$$}
It contain the customer name followed by some information. The sequence is...
text string followed by list, set and then dictionary
<> [] () {}
This is not python compatible file so the data is not as expected. I want to process the file and extract some information.
amar 1000 | 1000 | 1000
akbar 1000
john 0000 | 0100 | 0100
1) name between <>
2) The number between - and # in the list
3 & 4) split dictionary on comma and the numbers between | and # (there can be more than 2 entries here)
I am open to using any tool best suited for this task.
The following Python script will read your text file and give you the desired results:
import re, itertools
with open("input.txt", "r") as f_input:
for line in f_input:
reLine = re.match(r"<(\w+)>\[(.*?)\].*?{(.*?)\}", line)
lNumbers = [re.findall(".*?(\d+).*?", entry) for entry in reLine.groups()[1:]]
lNumbers = list(itertools.chain.from_iterable(lNumbers))
print reLine.group(1), " | ".join(lNumbers)
This prints the following output:
amar 1000 | 1000 | 1000
akbar 1000
john 0000 | 0100 | 0100
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With