How to extract lines after specific words?

Tags:

I want to get date and specific item in a text using regular expression in python 3. Below is an example:

text = '''
190219 7:05:30 line1 fail
               line1 this is the 1st fail
               line2 fail
               line2 this is the 2nd fail
               line3 success 
               line3 this is the 1st success process
               line3 this process need 3sec
200219 9:10:10 line1 fail
               line1 this is the 1st fail
               line2 success 
               line2 this is the 1st success process
               line2 this process need 4sec
               line3 success 
               line3 this is the 2st success process
               line3 this process need 2sec

'''

In the example above, I would like to get all line after 'success line'. Here desired output:

[('190219','7:05:30','line3 this is the 1st success process', 'line3 this process need 3sec'),
('200219', '9:10:10', 'line2 this is the 1st success process', 'line2 this process need 4sec', 'line3 this is the 2st success process','line3 this process need 2sec')]

This is want I've tried:

>>> newLine = re.sub(r'\t|\n|\r|\s{2,}',' ', text)
>>> newLine
>>> Out[3]: ' 190219 7:05:30 line1 fail  line1 this is the 1st fail  line2 fail  line2 this is the 2nd fail  line3 success line3 this is the 1st success process  line3 this process need 3sec 200219 9:10:10 line1 fail  line1 this is the 1st fail  line2 success line2 this is the 1st success process  line2 this process need 4sec  line3 success line3 this is the 2st success process  line3 this process need 2sec  '

I don't know what the proper way to get result. I've tried this to get the line :

(\b\d{6}\b \d{1,}:\d{2}:\d{2})...

How do I solve this problem?

225

asked May 24 '19 03:05

elisa

1 Answers

This is my solution using regex:

text = '''
190219 7:05:30 line1 fail
               line1 this is the 1st fail
               line2 fail
               line2 this is the 2nd fail
               line3 success 
               line3 this is the 1st success process
               line3 this process need 3sec
200219 9:10:10 line1 fail
               line1 this is the 1st fail
               line2 success 
               line2 this is the 1st success process
               line2 this process need 4sec
               line3 success 
               line3 this is the 2st success process
               line3 this process need 2sec
'''

# find desired lines
count = 0
data = []
for item in text.splitlines():
    # find date
    match_date = re.search('\d+\s\d+:\d\d:\d\d', item)
    # get date
    if match_date != None:
        count = 1
        date_time = match_date.group().split(' ')
        for item in date_time:
            data.append(item)
    # find line with success
    match = re.search('\w+\d\ssuccess',item)
    # handle collecting next lines
    if match != None:
        count = 2

    if count > 2:
        data.append(item.strip())

    if count == 2:
        count += 1

# split list data
# find integers i list
numbers = []
for item in data:
     numbers.append(item.isdigit())

# get positions of integers
indexes = [i for i,x in enumerate(numbers) if x == True]
number_of_elements = len(data)
indexes = indexes + [number_of_elements]

# create list of list
result = []
for i in range(0, len(indexes)-1):
    result.append(data[indexes[i]:indexes[i+1]])

Result:

[['190219', '7:05:30', 'line3 this is the 1st success process', 'line3 this process need 3sec'], ['200219', '9:10:10', 'line2 this is the 1st success process', 'line2 this process need 4sec', 'line3 this is the 2st success process', 'line3 this process need 2sec']]

answered Sep 21 '22 17:09

Zaraki Kenpachi

Related questions
                            
                                Liquibase multiple changelog execution
                            
                                How does IDisposable work with use and return?
                            
                                Why use autocomplete with radio buttons?
                            
                                VBA Code Scraper not placing data in right columns
                            
                                Will the dart codes for iOS be removed when compiling for Android?
                            
                                Failure to find x11 on nana cmake
                            
                                Why does this.getClass give it's own class name rather than Anonymous class name?
                            
                                Is it possible to directly assign the return value of a method to a variable?
                            
                                How do I use if-else statement with python pandas dataframe isna() function?
                            
                                Multiprocessing.Queue with hugh data causes _wait_for_tstate_lock
                            
                                Service not found: even though it exists in the app's container
                            
                                How to create different dashboards for different users of a Shiny app? (on the same app code)

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With