I'm trying to match strings in the lines of a file and write the matches minus the first one and the last one
import os, re
infile=open("~/infile", "r")
out=open("~/out", "w")
pattern=re.compile("=[A-Z0-9]*>")
for line in infile:
out.write( pattern.search(line)[1:-1] + '\n' )
Problem is that it says that Match
is not subscriptable, when I try to add .group()
it says that Nonegroup has no attritube group
, groups()
returns that .write
needs a tuple etc
Any idea how to get .search
to return a string ?
While re. findall() returns matches of a substring found in a text, re. match() searches only from the beginning of a string and returns match object if found. However, if a match is found somewhere in the middle of the string, it returns none.
Above we used re.search() to find the first match for a pattern. findall() finds *all* the matches and returns them as a list of strings, with each string representing one match.
The re.search() function takes two parameters and returns a match object if there is a match. If there is more than one match, only the first occurrence of the match will be returned. If no matches are found, the value None is returned.
However, re.search() only returns the first match. The lower case letter pattern matches: The sequence of letters at the beginning of the string. The zero-width spot between the 1 and 2.
The re.search
function returns a Match
object.
If the match fails, the re.search
function will return None. To extract the matching text, use the Match.group
method.
>>> match = re.search("a.", "abc")
>>> if match is not None:
... print(match.group(0))
'ab'
>>> print(re.search("a.", "a"))
None
That said, it's probably a better idea to use groups to find the required section of the match:
>>> match = re.search("=([A-Z0-9]*)>", "=abc>") # Notice brackets
>>> match.group(0)
'=abc>'
>>> match.group(1)
'abc'
This regex can then be used with findall as @WiktorStribiżew suggests.
You seem to need only the part of strings between =
and >
. In this case, it is much easier to use a capturing group around the alphanumeric pattern and use it with re.findall
that will never return None
, but just an empty list upon no match, or a list of captured texts if found. Also, I doubt you need empty matches, so use +
instead of *
:
pattern=re.compile(r"=([A-Z0-9]+)>")
^ ^
and then
"\n".join(pattern.findall(line))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With