Consider the standard web log file in assets/logdata.txt. This file records the access a user makes when visiting a web page (like this one!). Each line of the log has the following items:
'146.204.224.152'
)'feest6811'
note: sometimes the user name is missing! In this case, use '-' as the value for the username.)'21/Jun/2019:15:45:24 -0700'
)'POST /incentivize HTTP/1.1'
note: not everything is a POST!)Your task is to convert this into a list of dictionaries, where each dictionary looks like the following:
example_dict = {"host":"146.204.224.152",
"user_name":"feest6811",
"time":"21/Jun/2019:15:45:24 -0700",
"request":"POST /incentivize HTTP/1.1"}
This is sample of the txt data file.
I wrote these lines of codes:
import re
def logs():
with open("assets/logdata.txt", "r") as file:
logdata = file.read()
#print(logdata)
pattern="""
(?P<host>.*)
(-\s)
(?P<user_name>\w*)
(\s)
([POST]*)
(?P<time>\w*)
"""
for item in re.finditer(pattern,logdata,re.VERBOSE):
print(item.groupdict())
return(item)
logs()
It helped my in making "host"
and "user_name"
however I can't continue and making the rest of the requirements. can anyone help please?
You can easily convert python string to the dictionary by using the inbuilt function of loads of json library of python. Before using this method, you have to import the json library in python using the “import” keyword.
How to create a Python dictionary from text file? Assuming a following text file (dict.txt) is present Following Python code reads the file using open () function. Each line as string is split at space character. First component is used as key and second as value
Using the json.loads () method : Converts the string of valid dictionary into json form. Using the ast.literal_eval () method : Function safer than the eval function and can be used for interconversion of all data types other than dictionary as well.
A Dictionary in Python is collection of key-value pairs, where key is always unique and oftenly we need to store a dictionary and read it back again. We can read a dictionary from a flie in 3 ways: Using the json.loads () method : Converts the string of valid dictionary into json form.
try this my friend
import re
def logs():
logs = []
w = '(?P<host>(?:\d+\.){3}\d+)\s+(?:\S+)\s+(?P<user_name>\S+)\s+\[(?P<time>[-+\w\s:/]+)\]\s+"(?P<request>.+?.+?)"'
with open("assets/logdata.txt", "r") as f:
logdata = f.read()
for m in re.finditer(w, logdata):
logs.append(m.groupdict())
return logs
You're using \w
to get user_names
, however \w
doesn't include -
that can be in the log (Common Log Format (CLF)), so as an alternative you could use \S+
(one or more of anything except a whitespace). For the time
you can create a capturing group allowing only the expected characters (class) for that field (e.g. \w\s
, -+
for the timezone, /
for the date and :
for the time) surrounded by squared brackets (literals), a similar capturing can be made for the request
using "
.
import re
regex = re.compile(
r'(?P<host>(?:\d+\.){3}\d+)\s+'
r'(?:\S+)\s+'
r'(?P<user_name>\S+)\s+'
r'\[(?P<time>[-+\w\s:/]+)\]\s+'
r'"(?P<request>POST.+?)"'
)
def logs():
data = []
with open("sample.txt", "r") as f:
logdata = f.read()
for m in regex.finditer(logdata):
data.append(m.groupdict())
return data
print(logs())
(Replaced user_name from first line with "-" for testing on the second line)
[
{
"host":"146.204.224.152",
"user_name":"feest6811",
"time":"21/Jun/2019:15:45:24 -0700",
"request":"POST /incentivize HTTP/l.l"
},
{
"host":"146.204.224.152",
"user_name":"-",
"time":"21/Jun/2019:15:45:24 -0700",
"request":"POST /incentivize HTTP/l.l"
},
{
"host":"144.23.247.108",
"user_name":"auer7552",
"time":"21/Jun/2019:15:45:35 -0700",
"request":"POST /extensible/infrastructures/one-to-one/enterprise HTTP/l.l"
},
...
Please see the code below:
import re
regex = re.compile(
r'(?P<host>(?:\d+\.){1,3}\d+)\s+-\s+'
r'(?P<user_name>[\w+\-]+)?\s+'
r'\[(?P<time>[-\w\s:/]+)\]\s+'
r'"(?P<request>\w+.+?)"'
)
def logs():
data = []
with open("assets/logdata.txt", "r") as f:
logdata = f.read()
for item in regex.finditer(logdata):
x = item.groupdict()
if x["user_name"] is None:
x["user_name"] = "-"
data.append(x)
return data
logs()
Please find below the part of output as well:
[{'host': '146.204.224.152', 'user_name': 'feest6811', 'time': '21/Jun/2019:15:45:24 -0700', 'request': 'POST /incentivize HTTP/1.1'}, {'host': '197.109.77.178', 'user_name': 'kertzmann3129', 'time': '21/Jun/2019:15:45:25 -0700', 'request': 'DELETE /virtual/solutions/target/web+services HTTP/2.0'}, {'host': '156.127.178.177', 'user_name': 'okuneva5222', 'time': '21/Jun/2019:15:45:27 -0700', 'request': 'DELETE /interactive/transparent/niches/revolutionize HTTP/1.1'}, {'host': '100.32.205.59', 'user_name': 'ortiz8891', 'time': '21/Jun/2019:15:45:28 -0700', 'request': 'PATCH /architectures HTTP/1.0'}, {'host': '168.95.156.240', 'user_name': 'stark2413', 'time': '21/Jun/2019:15:45:31 -0700', 'request': 'GET /engage HTTP/2.0'}, .....] with 979 dictionaries for each line of the text file.
Thank you
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With