I have a file that contains something like
# comment
# comment
not a comment# comment
# comment
not a comment
I'm trying to read the file line by line and only capture lines that does not start with #. What is wrong with my code/regex?
import re
def read_file():
pattern = re.compile("^(?<!# ).*")
with open('list') as f:
for line in f:
print pattern.findall(line)
Original code captures everything instead of expected.
Use match
function in this case- since it will check in the beginning.
So expression will be \s*[^#]
- for sanity i use \s
to pass whitespaces.
OP's code will be-
def read_file():
pattern = re.compile("\s*[^#]")
with open(r"C:\test.txt") as f:
for line in f:
if pattern.match(line):
print line
read_file()
EDIT-
A bit explanation why OP's pattern is not working-
When you use .
it means all except line break character. So when you write ^(?<!# ).*
it means any
character (except line break- it includes #
damn it!) that has not #
before- ultimately it becomes any string (except line break variant) starts with any
character.
See LIVE DEMO
Solution:
Try negation
like ^(?<!# )[^#]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With