Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python: Checking Header Format

I'm new to python and need help with a problem. Basically I need to open a file and read it which I can do no problem. The problem arises at line 0, where I need to check the header format.

The header needs to be in the format: p wncf nvar nclauses hard where 'nvar' 'nclauses' and 'hard' are all positive integers.

For example:

p wncf 1563 817439 186191

would be a valid header line.

Here is coding i have already thanks to a question people answered earlier:

import re 
filename = raw_input('Please enter the name of the WNCF file: ') 
f = open(filename, 'r') 

for line in f: 
    p = re.compile('p wncf \d+ \d+ \d+$') 
    if p.match(line[0]) == None: 
        print "incorrect format"

I still get an incorrect format even when the file is of a correct format. Also, would it be possible to assign the integers to an object?

Thanks in advance.

like image 474
harpalss Avatar asked Nov 18 '25 20:11

harpalss


2 Answers

Alright, a few things.

  1. You only need to compile your regular expression once. In the example you gave above, you're recompiling it for every line in the file.

  2. line[0] is just the first character in each line. Replace line[0] with line and your code should work.

To assign the integers to an object, you have to surround the groups you want in parentheses. In your case, let

p = re.compile(r"p wncf (\d+) (\d+) (\d+)")

And instead of p.match(line), which returns a match object or None, you could use findall. Check out the following as a replacement for what you have.

p = re.compile(r"p wncf (\d+) (\d+) (\d+)") 
for line in f: 
    matches = p.findall(line)
    if len(matches) != 0:
        print matches[0][0], matches[0][1], matches[0][2]
    else:
        print "No matches."

Edit: If your header values can contain negative numbers as well, you should replace r"p wncf (\d+) (\d+) (\d+)" with r"p wncf (-?\d+) (-?\d+) (-?\d+)".

like image 200
dwlz Avatar answered Nov 20 '25 09:11

dwlz


something like that (lines is a list of all the lines in order):

import re
if re.match(r'p wncf \d+ \d+ \d+', lines[0]) == None:
    print "Bad format"

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!