Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Make dictionary from txt file using re

Tags:

python

regex

Consider the standard web log file in assets/logdata.txt. This file records the access a user makes when visiting a web page (like this one!). Each line of the log has the following items:

  • a host (e.g., '146.204.224.152')
  • a user_name (e.g., 'feest6811' note: sometimes the user name is missing! In this case, use '-' as the value for the username.)
  • the time a request was made (e.g., '21/Jun/2019:15:45:24 -0700')
  • the post request type (e.g., 'POST /incentivize HTTP/1.1' note: not everything is a POST!)

Your task is to convert this into a list of dictionaries, where each dictionary looks like the following:

example_dict = {"host":"146.204.224.152", 
                "user_name":"feest6811", 
                "time":"21/Jun/2019:15:45:24 -0700",
                "request":"POST /incentivize HTTP/1.1"}

This is sample of the txt data file.

sample of the text file

I wrote these lines of codes:

import re
def logs():
    with open("assets/logdata.txt", "r") as file:
        logdata = file.read()
        #print(logdata)
        pattern="""
        (?P<host>.*)        
        (-\s)   
        (?P<user_name>\w*)  
        (\s) 
        ([POST]*)
        (?P<time>\w*)               
                 """
        for item in re.finditer(pattern,logdata,re.VERBOSE):
            print(item.groupdict())
        return(item)
logs()

It helped my in making "host" and "user_name" however I can't continue and making the rest of the requirements. can anyone help please? this is what i have done till now

like image 684
Ahmed Sharshar Avatar asked Sep 25 '20 22:09

Ahmed Sharshar


People also ask

Can we convert string to dictionary in Python?

You can easily convert python string to the dictionary by using the inbuilt function of loads of json library of python. Before using this method, you have to import the json library in python using the “import” keyword.

How to create a dictionary from text file in Python?

How to create a Python dictionary from text file? Assuming a following text file (dict.txt) is present Following Python code reads the file using open () function. Each line as string is split at space character. First component is used as key and second as value

How do I convert a dictionary to a JSON file?

Using the json.loads () method : Converts the string of valid dictionary into json form. Using the ast.literal_eval () method : Function safer than the eval function and can be used for interconversion of all data types other than dictionary as well.

How to read a dictionary from a Flie in Python?

A Dictionary in Python is collection of key-value pairs, where key is always unique and oftenly we need to store a dictionary and read it back again. We can read a dictionary from a flie in 3 ways: Using the json.loads () method : Converts the string of valid dictionary into json form.


Video Answer


3 Answers

try this my friend

import re


def logs():
    logs = []
    w = '(?P<host>(?:\d+\.){3}\d+)\s+(?:\S+)\s+(?P<user_name>\S+)\s+\[(?P<time>[-+\w\s:/]+)\]\s+"(?P<request>.+?.+?)"'
    with open("assets/logdata.txt", "r") as f:
        logdata = f.read()
    for m in re.finditer(w, logdata):
        logs.append(m.groupdict())
    return logs
like image 128
Abd-elrhman Mohey Avatar answered Oct 12 '22 03:10

Abd-elrhman Mohey


You're using \w to get user_names, however \w doesn't include - that can be in the log (Common Log Format (CLF)), so as an alternative you could use \S+ (one or more of anything except a whitespace). For the time you can create a capturing group allowing only the expected characters (class) for that field (e.g. \w\s, -+ for the timezone, / for the date and : for the time) surrounded by squared brackets (literals), a similar capturing can be made for the request using ".

import re

regex = re.compile(
    r'(?P<host>(?:\d+\.){3}\d+)\s+'
    r'(?:\S+)\s+'
    r'(?P<user_name>\S+)\s+'
    r'\[(?P<time>[-+\w\s:/]+)\]\s+'
    r'"(?P<request>POST.+?)"'
)

def logs():
    data = []
    with open("sample.txt", "r") as f:
        logdata = f.read()
    for m in regex.finditer(logdata):
        data.append(m.groupdict())
    return data

print(logs())

(Replaced user_name from first line with "-" for testing on the second line)

[
   {
      "host":"146.204.224.152",
      "user_name":"feest6811",
      "time":"21/Jun/2019:15:45:24 -0700",
      "request":"POST /incentivize HTTP/l.l"
   },
   {
      "host":"146.204.224.152",
      "user_name":"-",
      "time":"21/Jun/2019:15:45:24 -0700",
      "request":"POST /incentivize HTTP/l.l"
   },
   {
      "host":"144.23.247.108",
      "user_name":"auer7552",
      "time":"21/Jun/2019:15:45:35 -0700",
      "request":"POST /extensible/infrastructures/one-to-one/enterprise HTTP/l.l"
   },
    ...
like image 36
n1colas.m Avatar answered Oct 12 '22 04:10

n1colas.m


Please see the code below:

import re

regex = re.compile(
    r'(?P<host>(?:\d+\.){1,3}\d+)\s+-\s+'
    r'(?P<user_name>[\w+\-]+)?\s+'
    r'\[(?P<time>[-\w\s:/]+)\]\s+'
    r'"(?P<request>\w+.+?)"'
)

def logs():
    data = []
    with open("assets/logdata.txt", "r") as f:
        logdata = f.read()
        for item in regex.finditer(logdata):
            x = item.groupdict()
            if x["user_name"] is None:
                x["user_name"] = "-"
            data.append(x)
    return data

logs()

Please find below the part of output as well:

[{'host': '146.204.224.152', 'user_name': 'feest6811', 'time': '21/Jun/2019:15:45:24 -0700', 'request': 'POST /incentivize HTTP/1.1'}, {'host': '197.109.77.178', 'user_name': 'kertzmann3129', 'time': '21/Jun/2019:15:45:25 -0700', 'request': 'DELETE /virtual/solutions/target/web+services HTTP/2.0'}, {'host': '156.127.178.177', 'user_name': 'okuneva5222', 'time': '21/Jun/2019:15:45:27 -0700', 'request': 'DELETE /interactive/transparent/niches/revolutionize HTTP/1.1'}, {'host': '100.32.205.59', 'user_name': 'ortiz8891', 'time': '21/Jun/2019:15:45:28 -0700', 'request': 'PATCH /architectures HTTP/1.0'}, {'host': '168.95.156.240', 'user_name': 'stark2413', 'time': '21/Jun/2019:15:45:31 -0700', 'request': 'GET /engage HTTP/2.0'}, .....] with 979 dictionaries for each line of the text file.

Thank you

like image 1
Shikha Khandelwal Avatar answered Oct 12 '22 05:10

Shikha Khandelwal