Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Add values of keys and sort it by occurrence of the keys in a list of dictionaries in Python

I'm really new to Python and I'm stuck with the below problem that I need to solve. I've a log file from Apache Log as below:

[01/Aug/1995:00:54:59 -0400] "GET /images/opf-logo.gif HTTP/1.0" 200 32511
[01/Aug/1995:00:55:04 -0400] "GET /images/ksclogosmall.gif HTTP/1.0" 200 3635
[01/Aug/1995:00:55:06 -0400] "GET /images/ksclogosmall.gif HTTP/1.0" 403 298
[01/Aug/1995:00:55:09 -0400] "GET /images/ksclogosmall.gif HTTP/1.0" 200 3635
[01/Aug/1995:00:55:18 -0400] "GET /images/opf-logo.gif HTTP/1.0" 200 32511
[01/Aug/1995:00:56:52 -0400] "GET /images/ksclogosmall.gif HTTP/1.0" 200 3635

I've to return the 10 most requested objects and their cumulative bytes transferred. I need to include only GET requests with Successful (HTTP 2xx) responses.

So the above log would result into:

/images/ksclogosmall.gif 10905
/images/opf-logo.gif 65022

So far I've the following code:

import re
from collections import Counter, defaultdict
from operator import itemgetter
import itertools
import sys

log_file = "web.log"
pattern = re.compile(
      r'\[(?P<date>[^\[\]:]+):(?P<time>\d+:\d+:\d+) (?P<timezone>[\-+]?\d\d\d\d)\] '
      + r'"(?P<method>\w+) (?P<path>[\S]+) (?P<protocol>[^"]+)" (?P<status>\d+) (?P<bytes_xfd>-|\d+)')

dict_list = []

with open(log_file, "r") as f:
    for line in f.readlines():
        if re.search("GET", line) and re.search(r'HTTP/[\d.]+"\s[2]\d{2}', line):
            try:
                log_line_data = pattern.match(line)
                path = log_line_data["path"]
                bytes_transferred = int(log_line_data["bytes_xfd"])
                dict_list.append({path: bytes_transferred})
            except:
                print("Unexpected Error: ", sys.exc_info()[0])
                raise
    f.close()

print(dict_list)

This code prints the following list of dictionary.

[{'/images/opf-logo.gif': 32511}, 
{'/images/ksclogosmall.gif': 3635}, 
{'/images/ksclogosmall.gif': 3635}, 
{'/images/opf-logo.gif': 32511}, 
{'/images/ksclogosmall.gif': 3635}]

I don't know how to go about from here to get the result as:

/images/ksclogosmall.gif 10905
/images/opf-logo.gif 65022

This result is basically addition of values corresponding to similar keys sorted by number of times particular key occurred in a desc order.

Note: I tried using colllections.Counter with no avail, here I'd like to sort by the num of times the key occurred.

Any help would be appreciated.

like image 256
leo_21 Avatar asked Jul 19 '17 17:07

leo_21


People also ask

How do I sort a list of dictionaries by key?

To sort a list of dictionaries according to the value of the specific key, specify the key parameter of the sort() method or the sorted() function. By specifying a function to be applied to each element of the list, it is sorted according to the result of that function.

Can you sort the keys in a dictionary Python?

In Python sorted() is the built-in function that can be helpful to sort all the iterables in Python dictionary. To sort the values and keys we can use the sorted() function. This sorted function will return a new list.

How do you sort multiple values in a dictionary Python?

Sort Dictionary Using the operator Module and itemgetter() This function returns the key-value pairs of a dictionary as a list of tuples. We can sort the list of tuples by using the itemgetter() function to pull the second value of the tuple i.e. the value of the keys in the dictionary.


1 Answers

You can use a collections.Counter and update it to add up the bytes transferred for each object:

from collections import Counter
c = Counter()
for d in dict_list:
    c.update(d)
occurrences=Counter([list(x.keys())[0] for x in dict_list])
sorted(c.items(), key=lambda x: occurrences[x[0]], reverse=True)

Output:

[('/images/ksclogosmall.gif', 10905), ('/images/opf-logo.gif', 65022)]
like image 86
Imran Avatar answered Sep 20 '22 17:09

Imran