How do I search for a pattern within a text file using Python combining regex & string/file operations and store instances of the pattern?

Tags:

So essentially I'm looking for specifically a 4 digit code within two angle brackets within a text file. I know that I need to open the text file and then parse line by line, but I am not sure the best way to go about structuring my code after checking "for line in file".

I think I can either somehow split it, strip it, or partition, but I also wrote a regex which I used compile on and so if that returns a match object I don't think I can use that with those string based operations. Also I'm not sure whether my regex is greedy enough or not...

I'd like to store all instances of those found hits as strings within either a tuple or a list.

Here is my regex:

regex = re.compile("(<(\d{4,5})>)?")

I don't think I need to include all that much code considering its fairly basic so far.

962

asked May 07 '12 05:05

Carl Carlson

2 Answers

import re
pattern = re.compile("<(\d{4,5})>")

for i, line in enumerate(open('test.txt')):
    for match in re.finditer(pattern, line):
        print 'Found on line %s: %s' % (i+1, match.group())

A couple of notes about the regex:

You don't need the ? at the end and the outer (...) if you don't want to match the number with the angle brackets, but only want the number itself
It matches either 4 or 5 digits between the angle brackets

Update: It's important to understand that the match and capture in a regex can be quite different. The regex in my snippet above matches the pattern with angle brackets, but I ask to capture only the internal number, without the angle brackets.

More about regex in python can be found here : Regular Expression HOWTO

166

answered Oct 06 '22 01:10

Eli Bendersky

Doing it in one bulk read:

import re

textfile = open(filename, 'r')
filetext = textfile.read()
textfile.close()
matches = re.findall("(<(\d{4,5})>)?", filetext)

Line by line:

import re

textfile = open(filename, 'r')
matches = []
reg = re.compile("(<(\d{4,5})>)?")
for line in textfile:
    matches += reg.findall(line)
textfile.close()

But again, the matches that returns will not be useful for anything except counting unless you added an offset counter:

import re

textfile = open(filename, 'r')
matches = []
offset = 0
reg = re.compile("(<(\d{4,5})>)?")
for line in textfile:
    matches += [(reg.findall(line),offset)]
    offset += len(line)
textfile.close()

But it still just makes more sense to read the whole file in at once.

answered Oct 06 '22 01:10

Josiah

Related questions
                            
                                How to use 'yield' inside async function?
                            
                                How does Python sort a list of tuples?
                            
                                What is generator.throw() good for?
                            
                                PyTorch: How to get the shape of a Tensor as a list of int
                            
                                jupyterlab interactive plot
                            
                                What errors/exceptions do I need to handle with urllib2.Request / urlopen?
                            
                                Is there a possibility to execute a Python script while being in interactive mode
                            
                                Test case execution order in pytest
                            
                                How to use valgrind with python?
                            
                                TypeError: only length-1 arrays can be converted to Python scalars while plot showing
                            
                                @staticmethod with @property
                            
                                Create stacked histogram from unequal length arrays
                            
                                Why is Flask application not creating any logs when hosted by Gunicorn?
                            
                                Unbalanced data and weighted cross entropy
                            
                                How to use __setattr__ correctly, avoiding infinite recursion
                            
                                Why can a dictionary be unpacked as a tuple?
                            
                                How to truncate a string using str.format in Python?
                            
                                Function overloading in Python: Missing [closed]
                            
                                Ignore part of a python tuple
                            
                                JavaScript timestamp to Python datetime conversion

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How do I search for a pattern within a text file using Python combining regex & string/file operations and store instances of the pattern?

Tags:

python

regex

string-parsing

file-io

text-mining

Carl Carlson

People also ask

2 Answers

Eli Bendersky

Josiah

Recent Activity

Donate For Us