Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python: How to ignore #comment lines when reading in a file

In Python, I have just read a line form a text file and I'd like to know how to code to ignore comments with a hash # at the beginning of the line.

I think it should be something like this:

for     if line !contain #       then ...process line    else end for loop  

But I'm new to Python and I don't know the syntax

like image 892
John Avatar asked Nov 10 '09 07:11

John


People also ask

Can you try without except in Python?

We cannot have the try block without except so, the only thing we can do is try to ignore the raised exception so that the code does not go the except block and specify the pass statement in the except block as shown earlier. The pass statement is equivalent to an empty line of code. We can also use the finally block.

What does except pass do Python?

The except:pass construct essentially silences any and all exceptional conditions that come up while the code covered in the try: block is being run.

How do you delete an exception in Python?

Usage. Use remove() method to remove a single item from a list. The method searches for the first instance of the given item and removes it. If specified item is not found, it raises 'ValueError' exception.

How do you handle errors in Python?

The try block lets you test a block of code for errors. The except block lets you handle the error. The else block lets you execute code when there is no error. The finally block lets you execute code, regardless of the result of the try- and except blocks.


2 Answers

you can use startswith()

eg

for line in open("file"):     li=line.strip()     if not li.startswith("#"):         print line.rstrip() 
like image 71
ghostdog74 Avatar answered Sep 22 '22 21:09

ghostdog74


I recommend you don't ignore the whole line when you see a # character; just ignore the rest of the line. You can do that easily with a string method function called partition:

with open("filename") as f:     for line in f:         line = line.partition('#')[0]         line = line.rstrip()         # ... do something with line ... 

partition returns a tuple: everything before the partition string, the partition string, and everything after the partition string. So, by indexing with [0] we take just the part before the partition string.

EDIT: If you are using a version of Python that doesn't have partition(), here is code you could use:

with open("filename") as f:     for line in f:         line = line.split('#', 1)[0]         line = line.rstrip()         # ... do something with line ... 

This splits the string on a '#' character, then keeps everything before the split. The 1 argument makes the .split() method stop after a one split; since we are just grabbing the 0th substring (by indexing with [0]) you would get the same answer without the 1 argument, but this might be a little bit faster. (Simplified from my original code thanks to a comment from @gnr. My original code was messier for no good reason; thanks, @gnr.)

You could also just write your own version of partition(). Here is one called part():

def part(s, s_part):     i0 = s.find(s_part)     i1 = i0 + len(s_part)     return (s[:i0], s[i0:i1], s[i1:]) 

@dalle noted that '#' can appear inside a string. It's not that easy to handle this case correctly, so I just ignored it, but I should have said something.

If your input file has simple enough rules for quoted strings, this isn't hard. It would be hard if you accepted any legal Python quoted string, because there are single-quoted, double-quoted, multiline quotes with a backslash escaping the end-of-line, triple quoted strings (using either single or double quotes), and even raw strings! The only possible way to correctly handle all that would be a complicated state machine.

But if we limit ourselves to just a simple quoted string, we can handle it with a simple state machine. We can even allow a backslash-quoted double quote inside the string.

c_backslash = '\\' c_dquote = '"' c_comment = '#'   def chop_comment(line):     # a little state machine with two state varaibles:     in_quote = False  # whether we are in a quoted string right now     backslash_escape = False  # true if we just saw a backslash      for i, ch in enumerate(line):         if not in_quote and ch == c_comment:             # not in a quote, saw a '#', it's a comment.  Chop it and return!             return line[:i]         elif backslash_escape:             # we must have just seen a backslash; reset that flag and continue             backslash_escape = False         elif in_quote and ch == c_backslash:             # we are in a quote and we see a backslash; escape next char             backslash_escape = True         elif ch == c_dquote:             in_quote = not in_quote      return line 

I didn't really want to get this complicated in a question tagged "beginner" but this state machine is reasonably simple, and I hope it will be interesting.

like image 26
steveha Avatar answered Sep 22 '22 21:09

steveha