Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

process a text file using various delimiters

Tags:

python

grep

sed

awk

My text file (unfortunately) looks like this...

<amar>[amar-1000#Fem$$$_Y](1){india|1000#Fem$$$,mumbai|1000#Mas$$$}
<akbar>[akbar-1000#Fem$$$_Y](1){}
<john>[-0000#$$$_N](0){USA|0100#$avi$$,NJ|0100#$avi$$}

It contain the customer name followed by some information. The sequence is...

text string followed by list, set and then dictionary

<> [] () {}

This is not python compatible file so the data is not as expected. I want to process the file and extract some information.

amar 1000 | 1000  | 1000
akbar 1000  
john 0000  | 0100 | 0100

1) name between <>

2) The number between - and # in the list

3 & 4) split dictionary on comma and the numbers between | and # (there can be more than 2 entries here)

I am open to using any tool best suited for this task.

like image 991
shantanuo Avatar asked Dec 15 '22 12:12

shantanuo


1 Answers

The following Python script will read your text file and give you the desired results:

import re, itertools

with open("input.txt", "r") as f_input:
    for line in f_input:
        reLine = re.match(r"<(\w+)>\[(.*?)\].*?{(.*?)\}", line) 
        lNumbers = [re.findall(".*?(\d+).*?", entry) for entry in  reLine.groups()[1:]]
        lNumbers = list(itertools.chain.from_iterable(lNumbers))
        print reLine.group(1), " | ".join(lNumbers)

This prints the following output:

amar 1000 | 1000 | 1000
akbar 1000
john 0000 | 0100 | 0100
like image 117
Martin Evans Avatar answered Jan 02 '23 19:01

Martin Evans