I'm really new to Python and I'm stuck with the below problem that I need to solve. I've a log file from Apache Log as below:
[01/Aug/1995:00:54:59 -0400] "GET /images/opf-logo.gif HTTP/1.0" 200 32511
[01/Aug/1995:00:55:04 -0400] "GET /images/ksclogosmall.gif HTTP/1.0" 200 3635
[01/Aug/1995:00:55:06 -0400] "GET /images/ksclogosmall.gif HTTP/1.0" 403 298
[01/Aug/1995:00:55:09 -0400] "GET /images/ksclogosmall.gif HTTP/1.0" 200 3635
[01/Aug/1995:00:55:18 -0400] "GET /images/opf-logo.gif HTTP/1.0" 200 32511
[01/Aug/1995:00:56:52 -0400] "GET /images/ksclogosmall.gif HTTP/1.0" 200 3635
I've to return the 10 most requested objects and their cumulative bytes transferred. I need to include only GET requests with Successful (HTTP 2xx) responses.
So the above log would result into:
/images/ksclogosmall.gif 10905
/images/opf-logo.gif 65022
So far I've the following code:
import re
from collections import Counter, defaultdict
from operator import itemgetter
import itertools
import sys
log_file = "web.log"
pattern = re.compile(
r'\[(?P<date>[^\[\]:]+):(?P<time>\d+:\d+:\d+) (?P<timezone>[\-+]?\d\d\d\d)\] '
+ r'"(?P<method>\w+) (?P<path>[\S]+) (?P<protocol>[^"]+)" (?P<status>\d+) (?P<bytes_xfd>-|\d+)')
dict_list = []
with open(log_file, "r") as f:
for line in f.readlines():
if re.search("GET", line) and re.search(r'HTTP/[\d.]+"\s[2]\d{2}', line):
try:
log_line_data = pattern.match(line)
path = log_line_data["path"]
bytes_transferred = int(log_line_data["bytes_xfd"])
dict_list.append({path: bytes_transferred})
except:
print("Unexpected Error: ", sys.exc_info()[0])
raise
f.close()
print(dict_list)
This code prints the following list of dictionary.
[{'/images/opf-logo.gif': 32511},
{'/images/ksclogosmall.gif': 3635},
{'/images/ksclogosmall.gif': 3635},
{'/images/opf-logo.gif': 32511},
{'/images/ksclogosmall.gif': 3635}]
I don't know how to go about from here to get the result as:
/images/ksclogosmall.gif 10905
/images/opf-logo.gif 65022
This result is basically addition of values corresponding to similar keys sorted by number of times particular key occurred in a desc order.
Note: I tried using colllections.Counter with no avail, here I'd like to sort by the num of times the key occurred.
Any help would be appreciated.
To sort a list of dictionaries according to the value of the specific key, specify the key parameter of the sort() method or the sorted() function. By specifying a function to be applied to each element of the list, it is sorted according to the result of that function.
In Python sorted() is the built-in function that can be helpful to sort all the iterables in Python dictionary. To sort the values and keys we can use the sorted() function. This sorted function will return a new list.
Sort Dictionary Using the operator Module and itemgetter() This function returns the key-value pairs of a dictionary as a list of tuples. We can sort the list of tuples by using the itemgetter() function to pull the second value of the tuple i.e. the value of the keys in the dictionary.
You can use a collections.Counter and update
it to add up the bytes transferred for each object:
from collections import Counter
c = Counter()
for d in dict_list:
c.update(d)
occurrences=Counter([list(x.keys())[0] for x in dict_list])
sorted(c.items(), key=lambda x: occurrences[x[0]], reverse=True)
Output:
[('/images/ksclogosmall.gif', 10905), ('/images/opf-logo.gif', 65022)]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With