Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to extract lines after specific words?

Tags:

I want to get date and specific item in a text using regular expression in python 3. Below is an example:

text = '''
190219 7:05:30 line1 fail
               line1 this is the 1st fail
               line2 fail
               line2 this is the 2nd fail
               line3 success 
               line3 this is the 1st success process
               line3 this process need 3sec
200219 9:10:10 line1 fail
               line1 this is the 1st fail
               line2 success 
               line2 this is the 1st success process
               line2 this process need 4sec
               line3 success 
               line3 this is the 2st success process
               line3 this process need 2sec

'''

In the example above, I would like to get all line after 'success line'. Here desired output:

[('190219','7:05:30','line3 this is the 1st success process', 'line3 this process need 3sec'),
('200219', '9:10:10', 'line2 this is the 1st success process', 'line2 this process need 4sec', 'line3 this is the 2st success process','line3 this process need 2sec')]

This is want I've tried:

>>> newLine = re.sub(r'\t|\n|\r|\s{2,}',' ', text)
>>> newLine
>>> Out[3]: ' 190219 7:05:30 line1 fail  line1 this is the 1st fail  line2 fail  line2 this is the 2nd fail  line3 success line3 this is the 1st success process  line3 this process need 3sec 200219 9:10:10 line1 fail  line1 this is the 1st fail  line2 success line2 this is the 1st success process  line2 this process need 4sec  line3 success line3 this is the 2st success process  line3 this process need 2sec  '

I don't know what the proper way to get result. I've tried this to get the line :

(\b\d{6}\b \d{1,}:\d{2}:\d{2})...

How do I solve this problem?

like image 225
elisa Avatar asked May 24 '19 03:05

elisa


People also ask

How do I extract text from a specific word in Python?

Using regular expressions to extract any specific word We can use search() method from re module to find the first occurrence of the word and then we can obtain the word using slicing. re.search() method will take the word to be extracted in regular expression form and the string as input and and returns a re.

How do I extract a specific line from a file in Python?

Use readlines() to Read the range of line from the File The readlines() method reads all lines from a file and stores it in a list. You can use an index number as a line number to extract a set of lines from it. This is the most straightforward way to read a specific line from a file in Python.


1 Answers

This is my solution using regex:

text = '''
190219 7:05:30 line1 fail
               line1 this is the 1st fail
               line2 fail
               line2 this is the 2nd fail
               line3 success 
               line3 this is the 1st success process
               line3 this process need 3sec
200219 9:10:10 line1 fail
               line1 this is the 1st fail
               line2 success 
               line2 this is the 1st success process
               line2 this process need 4sec
               line3 success 
               line3 this is the 2st success process
               line3 this process need 2sec
'''

# find desired lines
count = 0
data = []
for item in text.splitlines():
    # find date
    match_date = re.search('\d+\s\d+:\d\d:\d\d', item)
    # get date
    if match_date != None:
        count = 1
        date_time = match_date.group().split(' ')
        for item in date_time:
            data.append(item)
    # find line with success
    match = re.search('\w+\d\ssuccess',item)
    # handle collecting next lines
    if match != None:
        count = 2

    if count > 2:
        data.append(item.strip())

    if count == 2:
        count += 1

# split list data
# find integers i list
numbers = []
for item in data:
     numbers.append(item.isdigit())

# get positions of integers
indexes = [i for i,x in enumerate(numbers) if x == True]
number_of_elements = len(data)
indexes = indexes + [number_of_elements]

# create list of list
result = []
for i in range(0, len(indexes)-1):
    result.append(data[indexes[i]:indexes[i+1]])

Result:

[['190219', '7:05:30', 'line3 this is the 1st success process', 'line3 this process need 3sec'], ['200219', '9:10:10', 'line2 this is the 1st success process', 'line2 this process need 4sec', 'line3 this is the 2st success process', 'line3 this process need 2sec']]
like image 53
Zaraki Kenpachi Avatar answered Sep 21 '22 17:09

Zaraki Kenpachi