I am reading data from a txt file which has time stamps. I need to read data from a txt file and write the result in a different txt file. Therefore, I need to sort the data.
For example, I need to calculate time difference for XXXXXX between 2020-08-28T11:46:24.8419656Z and 2020-08-28T11:48:11.8418281Z, which is total time diff. To calculate "Execution" time, I need to subtract between 2020-08-28T11:48:11.8418281Z and 2020-08-28T11:46:39.9417366Z. These are just example to calculate time diff. If there is a error, I need to print in "Test Status" as 1. There is an error in YYYYYY so I just need to assign time status if they are not exist as 0. In output, I gave the values to show them as an example.
How can I calculate time diff because there is T in middle of time stamp? Also another challenge is that I need to calculate between two rows regarding their label in column. To find out the name of time stamps(e.g XXXXXXX), I need to check "#########" and then I can sort it otherwise I dont know which name is coming in txt file.
from datetime import datetime
def time_diff(start, end):
start_dt = datetime.strptime(start, '%H:%M:%S')
end_dt = datetime.strptime(end, '%H:%M:%S')
diff = (end_dt - start_dt)
return diff.seconds
scores = {}
with open('input.txt') as fin:
for line in fin.readlines():
values = line.split(',')
scores[values[0]] = time_diff(values[0],values[0])
with open('result.txt', 'w') as fout:
for key, value in sorted(scores.iteritems(), key=lambda (k,v): (v,k)):
fout.write('%s,%s\n' % (key, value))
INPUT:
2020-08-28T11:46:24.8419656Z ################################################################################
2020-08-28T11:46:24.8419656Z XXXXXX
2020-08-28T11:46:39.9397372Z Execution 0
2020-08-28T11:46:39.9417366Z Creation 0
2020-08-28T11:46:41.4877509Z Build 0
2020-08-28T11:48:02.6957708Z Level 0
2020-08-28T11:48:02.7227683Z Converting file start
2020-08-28T11:48:11.7408315Z Converting done 0
2020-08-28T11:48:11.8148285Z Checking results
2020-08-28T11:48:11.8418281Z Test Status XXXXXX: Success
2020-08-28T11:48:11.8498273Z ################################################################################
2020-08-28T11:48:11.8498273Z YYYYYY
2020-08-28T11:48:27.1533026Z Execution 0
2020-08-28T11:48:27.1583035Z Creation 0
2020-08-28T11:48:28.6763028Z Build 0
2020-08-28T11:49:31.9180832Z Level 0
2020-08-28T11:49:31.9440848Z ##[error]
2020-08-28T11:49:31.9530839Z ################################################################################
2020-08-28T11:50:24.8419656Z ZZZZZZ
2020-08-28T11:50:39.9397372Z Execution 0
2020-08-28T11:50:39.9417366Z Creation 0
2020-08-28T11:50:41.4877509Z Build 0
2020-08-28T11:51:02.6957708Z Level 0
2020-08-28T11:51:02.7227683Z Converting file start
2020-08-28T11:51:11.7408315Z Converting done 0
2020-08-28T11:51:11.8148285Z Checking results
2020-08-28T11:51:11.8418281Z Test Status ZZZZZZ: Success
2020-08-28T11:51:31.9530839Z ################################################################################
OUTPUT:
Name Total Execution Creation Build Level Converting Checking results Test Status
XXXXXX 10 2 2 2 2 2 2 2 0
YYYYYY 10 2 2 2 2 0 0 0 1
ZZZZZZ 10 2 2 2 2 2 2 2 0
import re
from dateutil import parser
import pandas as pd
with open('input.txt') as file:
data = file.read()
timestamps = re.findall(r'(\d{4}-\d{2}-\d{2}T\d{2}:\d{2}.+Z)\s#{3,}', data)
text = []
dict_list = []
for i in range(len(timestamps)-1):
text.append(data[data.index(timestamps[i]):data.index(timestamps[i+1])])
time_diff = parser.isoparse(timestamps[i+1]) - parser.isoparse(timestamps[i])
# print(text[-1])
lines = text[-1].split('\n')
dict = {}
dict['name'] = lines[1].split(' ')[1]
dict['execution'] = (parser.isoparse(lines[3].split(' ')[0]) - parser.isoparse(lines[2].split(' ')[0])).seconds
dict['creation'] = (parser.isoparse(lines[4].split(' ')[0]) - parser.isoparse(lines[3].split(' ')[0])).seconds
dict['build'] = (parser.isoparse(lines[5].split(' ')[0]) - parser.isoparse(lines[4].split(' ')[0])).seconds
dict['level'] = (parser.isoparse(lines[6].split(' ')[0]) - parser.isoparse(lines[5].split(' ')[0])).seconds
if "error" in lines[-2]:
dict['test_status'] = 1
dict_list.append(dict)
continue
elif "Success" in lines[-2]:
dict['test_status'] = 0
dict['converting'] = (parser.isoparse(lines[7].split(' ')[0]) - parser.isoparse(lines[6].split(' ')[0])).seconds
dict['checking'] = (parser.isoparse(lines[8].split(' ')[0]) - parser.isoparse(lines[7].split(' ')[0])).seconds
dict_list.append(dict)
df = pd.DataFrame(dict_list)
df.to_csv('output.csv')
You can get all timestamps in this way and then you can get data between two timestamps by slicing data. Let me know if there's any issue.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With