I want to get date and specific item in a text using regular expression in python 3. Below is an example:
text = '''
190219 7:05:30 line1 fail
line1 this is the 1st fail
line2 fail
line2 this is the 2nd fail
line3 success
line3 this is the 1st success process
line3 this process need 3sec
200219 9:10:10 line1 fail
line1 this is the 1st fail
line2 success
line2 this is the 1st success process
line2 this process need 4sec
line3 success
line3 this is the 2st success process
line3 this process need 2sec
'''
In the example above, I would like to get all line after 'success line'. Here desired output:
[('190219','7:05:30','line3 this is the 1st success process', 'line3 this process need 3sec'),
('200219', '9:10:10', 'line2 this is the 1st success process', 'line2 this process need 4sec', 'line3 this is the 2st success process','line3 this process need 2sec')]
This is want I've tried:
>>> newLine = re.sub(r'\t|\n|\r|\s{2,}',' ', text)
>>> newLine
>>> Out[3]: ' 190219 7:05:30 line1 fail line1 this is the 1st fail line2 fail line2 this is the 2nd fail line3 success line3 this is the 1st success process line3 this process need 3sec 200219 9:10:10 line1 fail line1 this is the 1st fail line2 success line2 this is the 1st success process line2 this process need 4sec line3 success line3 this is the 2st success process line3 this process need 2sec '
I don't know what the proper way to get result. I've tried this to get the line :
(\b\d{6}\b \d{1,}:\d{2}:\d{2})...
How do I solve this problem?
Using regular expressions to extract any specific word We can use search() method from re module to find the first occurrence of the word and then we can obtain the word using slicing. re.search() method will take the word to be extracted in regular expression form and the string as input and and returns a re.
Use readlines() to Read the range of line from the File The readlines() method reads all lines from a file and stores it in a list. You can use an index number as a line number to extract a set of lines from it. This is the most straightforward way to read a specific line from a file in Python.
This is my solution using regex:
text = '''
190219 7:05:30 line1 fail
line1 this is the 1st fail
line2 fail
line2 this is the 2nd fail
line3 success
line3 this is the 1st success process
line3 this process need 3sec
200219 9:10:10 line1 fail
line1 this is the 1st fail
line2 success
line2 this is the 1st success process
line2 this process need 4sec
line3 success
line3 this is the 2st success process
line3 this process need 2sec
'''
# find desired lines
count = 0
data = []
for item in text.splitlines():
# find date
match_date = re.search('\d+\s\d+:\d\d:\d\d', item)
# get date
if match_date != None:
count = 1
date_time = match_date.group().split(' ')
for item in date_time:
data.append(item)
# find line with success
match = re.search('\w+\d\ssuccess',item)
# handle collecting next lines
if match != None:
count = 2
if count > 2:
data.append(item.strip())
if count == 2:
count += 1
# split list data
# find integers i list
numbers = []
for item in data:
numbers.append(item.isdigit())
# get positions of integers
indexes = [i for i,x in enumerate(numbers) if x == True]
number_of_elements = len(data)
indexes = indexes + [number_of_elements]
# create list of list
result = []
for i in range(0, len(indexes)-1):
result.append(data[indexes[i]:indexes[i+1]])
Result:
[['190219', '7:05:30', 'line3 this is the 1st success process', 'line3 this process need 3sec'], ['200219', '9:10:10', 'line2 this is the 1st success process', 'line2 this process need 4sec', 'line3 this is the 2st success process', 'line3 this process need 2sec']]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With