Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Open a file, read content, make content into a list using regex, then print list in python

Tags:

python

regex

I am using "import re and sys"

On the terminal, when I type "1.py a.txt" I want it to read "a.txt", which has these content:

17:18:42.525964 IP 66.185.85.146.80 > 192.168.0.15.34436: Flags [.], seq 1:1449, ack 2555, win 1320, options [nop,nop,TS val 3551057710 ecr 43002332], length 1448
17:18:42.526623 IP 66.185.85.146.80 > 192.168.0.15.34436: Flags [.], seq 1449:2897, ack 2555, win 1320, options [nop,nop,TS val 3551057710 ecr 43002332], length 1448
17:18:42.526900 IP 192.168.0.15.34436 > 66.185.85.146.80: Flags [.], ack 2897, win 1444, options [nop,nop,TS val 43002448 ecr 3551057710], length 0
17:18:42.527694 IP 66.185.85.146.80 > 192.168.0.15.34436: Flags [.], seq 2897:14481, ack 2555, win 1320, options [nop,nop,TS val 3551057710 ecr 43002332], length 11584
17:18:42.527716 IP 192.168.0.15.34436 > 66.185.85.146.80: Flags [.], ack 14481, win 1444, options [nop,nop,TS val 43002448 ecr 3551057710], length 0
17:18:42.528794 IP 66.185.85.146.80 > 192.168.0.15.34436: Flags [.], seq 14481:23169, ack 2555, win 1320, options [nop,nop,TS val 3551057710 ecr 43002332], length 8688
17:18:42.528813 IP 192.168.0.15.34436 > 66.185.85.146.80: Flags [.], ack 23169, win 1444, options [nop,nop,TS val 43002448 ecr 3551057710], length 0
17:18:42.545191 IP 192.168.0.15.60030 > 52.2.63.29.80: Flags [.], seq 4113773418:4113774866, ack 850072640, win 270, options [nop,nop,TS val 43002452 ecr 9849626], length 1448

then use regex, to remove everything but the ip addresses and the length(total), and print it out as:

source: 66.185.85.146 dest: 192.168.0.15 total:1448
source: 66.185.85.146 dest: 192.168.0.15 total:1448
source: 192.168.0.15 dest: 66.185.85.146 total:0

but if there are duplicates, then it will read as follows, where it will add the total amounts of the duplicates:

source: 66.185.85.146 dest: 192.168.0.15 total:2896
source: 192.168.0.15 dest: 66.185.85.146 total:0

Furthermore, if i type "-s" in the terminal like so:

"1.py -s a.txt"

or

"1.py a.txt -s 192.168.0.15"

it should sort, for the first -s, it will sort and print the content, and if -s ip, then sort the ips.

currently this is what I have for each item, I want to know how to use them all together.

#!/usr/bin/python3
import re
import sys

file = sys.argv[1]
a = open(file, "r")

for line in a:
   line = line.rstrip()
   c = re.findall(r'^(?:[0-9]{1,3}\.){3}[0-9]{1,3}$',line) #Yes I know its not the best regex for this, but I am testing it out for now
   d = re.findall(r'\b(\d+)$\b',line)

   if len(c) > 0 and len(d) > 0:
      print("source:", c[0],"\t","dest:",c[1],"\t", "total:",d[0])

That is what I have so far, I do not know how to use the "-s" or how to sort, as well as how to remove the duplicates, and add the totals when duplicates are removed.

like image 848
Lightning Avatar asked Oct 29 '15 19:10

Lightning


People also ask

What is a regex in Python?

A RegEx, or Regular Expression, is a sequence of characters that forms a search pattern. RegEx can be used to check if a string contains the specified search pattern.


2 Answers

what you need is ArgumentParser for your -s parameter, so something like:

import argparse
...
def main():
    parser = argparse.ArgumentParser()
    parser.add_argument('-s', '--sort', action='append',
                    help='sort specific IP')
    parser.add_argument('-s2', '--sortall', action='store_true',
                    help='sort all the IPs')

    args = parser.parse_args()
    if args.sortall:
        # store all Ips

    for ip in args.sort:
        # store by ip
if __name__ == '__main__':
    main()

now you can use the script like:

1.py a.txt -s 192.168.0.15

or

1.py a.txt -s2

apart from that, on how to put all together, looks like a homework, so you should read more about python to figure it out.

like image 112
eLRuLL Avatar answered Oct 27 '22 02:10

eLRuLL


To read the -s you probably want a library to parse the arguments, like the standard argparse. It allows you to specify which arguments your script requires, and their descriptions, and it parses them and ensure their format.

To sort a list there's the sorted(my_list) function.

Finally, to ensure there are no duplicates you can use a set. This loses the list ordering, but since you are sorting it later it shouldn't be a problem.

Alternatively, there's the Counter collection made specifically to add grouped values and sort them.

from collections import Counter

results = Counter()

for line in a:
    line = line.rstrip()
    c = re.findall(r'^(?:[0-9]{1,3}\.){3}[0-9]{1,3}$',line) #Yes I know its not the best regex for this, but I am testing it out for now
    d = re.findall(r'\b(\d+)$\b',line)

    if len(c) > 0 and len(d) > 0:
        source, destination, length = c[0], c[1], d[0]
        results[(source, destination)] += int(length)

# Print the sorted items.
for (source, destination), length in results.most_common():
    print("source:", source, "\t", "dest:", destination, "\t", "total:", length)
like image 39
BoppreH Avatar answered Oct 27 '22 02:10

BoppreH