Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Coursera Course - Introduction of Data Science in Python Assignment 1

Tags:

python

regex

I'm taking this course on Coursera, and I'm running some issues while doing the first assignment. The task is to basically use regular expression to get certain values from the given file. Then, the function should output a dictionary containing these values:

example_dict = {"host":"146.204.224.152", 

                "user_name":"feest6811", 

                "time":"21/Jun/2019:15:45:24 -0700",

                "request":"POST /incentivize HTTP/1.1"} 

This is just a screenshot of the file. Due to some reasons, the link doesn't work if it's not open directly from Coursera. I apologize in advance for the bad formatting. One thing I must point out is that for some cases, as you can see in the first example, there's no username. Instead '-' is used.

159.253.153.40 - - [21/Jun/2019:15:46:10 -0700] "POST /e-business HTTP/1.0" 504 19845
136.195.158.6 - feeney9464 [21/Jun/2019:15:46:11 -0700] "HEAD /open-source/markets HTTP/2.0" 204 21149 

This is what I currently have right now. However, the output is None. I guess there's something wrong in my pattern.

import re
def logs():
    
    with open("assets/logdata.txt", "r") as file:
        logdata = file.read()
    # YOUR CODE HERE
        
        pattern = """ 
        (?P<host>\w*)
        (\d+\.\d+.\d+.\d+\ )
        (?P<user_name>\w*)
        (\ -\ [a-z]+[0-9]+\ )
        (?P<time>\w*)
        (\[(.*?)\])
        (?P<request>\w*)
        (".*")
        """
        for item in re.finditer(pattern,logdata,re.VERBOSE):
       
            print(item.groupdict())
like image 385
BryantHsiung Avatar asked Oct 28 '25 16:10

BryantHsiung


1 Answers

You can use the following expression:

(?P<host>\d+(?:\.\d+){3}) # 1+ digits and 3 occurrenses of . and 3 digits
\s+\S+\s+                 # 1+ whitespaces, 1+ non-whitespaces, 1+ whitespaces
(?P<user_name>\S+)\s+\[   # 1+ non-whitespaces (Group "user_name"), 1+ whitespaces and [
(?P<time>[^\]\[]*)\]\s+   # Group "time": 0+ chars other than [ and ], ], 1+ whitespaces
"(?P<request>[^"]*)"      # ", Group "request": 0+ non-" chars, "

See the regex demo. See the Python demo:

import re
logdata = r"""159.253.153.40 - - [21/Jun/2019:15:46:10 -0700] "POST /e-business HTTP/1.0" 504 19845
136.195.158.6 - feeney9464 [21/Jun/2019:15:46:11 -0700] "HEAD /open-source/markets HTTP/2.0" 204 21149"""
pattern = r'''
(?P<host>\d+(?:\.\d+){3}) # 1+ digits and 3 occurrenses of . and 3 digits
\s+\S+\s+                 # 1+ whitespaces, 1+ non-whitespaces, 1+ whitespaces
(?P<user_name>\S+)\s+\[   # 1+ non-whitespaces (Group "user_name"), 1+ whitespaces and [
(?P<time>[^\]\[]*)\]\s+   # Group "time": 0+ chars other than [ and ], ], 1+ whitespaces
"(?P<request>[^"]*)"      # ", Group "request": 0+ non-" chars, "
'''
for item in re.finditer(pattern,logdata,re.VERBOSE):
    print(item.groupdict())

Output:

{'host': '159.253.153.40', 'user_name': '-', 'time': '21/Jun/2019:15:46:10 -0700', 'request': 'POST /e-business HTTP/1.0'}
{'host': '136.195.158.6', 'user_name': 'feeney9464', 'time': '21/Jun/2019:15:46:11 -0700', 'request': 'HEAD /open-source/markets HTTP/2.0'}
like image 185
Wiktor Stribiżew Avatar answered Oct 31 '25 06:10

Wiktor Stribiżew