Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python Regex to match a string as a pattern and return number

I have some lines that represent some data in a text file. They are all of the following format:

s = 'TheBears      SUCCESS Number of wins : 14'

They all begin with the name then whitespace and the text 'SUCCESS Number of wins : ' and finally the number of wins, n1. There are multiple strings each with a different name and value. I am trying to write a program that can parse any of these strings and return the name of the dataset and the numerical value at the end of the string. I am trying to use regular expressions to do this and I have come up with the following:

import re
def winnumbers(s):
    pattern = re.compile(r"""(?P<name>.*?)     #starting name
                             \s*SUCCESS        #whitespace and success
                             \s*Number\s*of\s*wins  #whitespace and strings
                             \s*\:\s*(?P<n1>.*?)""",re.VERBOSE)
    match = pattern.match(s)

    name = match.group("name")
    n1 = match.group("n1")

    return (name, n1)

So far, my program can return the name, but the trouble comes after that. They all have the text "SUCCESS Number of wins : " so my thinking was to find a way to match this text. But I realize that my method of matching an exact substring isn't correct right now. Is there any way to match a whole substring as part of the pattern? I have been reading quite a bit on regular expressions lately but haven't found anything like this. I'm still really new to programming and I appreciate any assistance.

Eventually, I will use float() to return n1 as a number, but I left that out because it doesn't properly find the number in the first place right now and would only return an error.

like image 638
Simos Anderson Avatar asked Jun 16 '11 19:06

Simos Anderson


1 Answers

Try this one out:

((\S+)\s+SUCCESS Number of wins : (\d+))

These are the results:

>>> regex = re.compile("((\S+)\s+SUCCESS Number of wins : (\d+))")
>>> r = regex.search(string)
>>> r
<_sre.SRE_Match object at 0xc827cf478a56b350>
>>> regex.match(string)
<_sre.SRE_Match object at 0xc827cf478a56b228>

# List the groups found
>>> r.groups()
(u'TheBears SUCCESS Number of wins : 14', u'TheBears', u'14')

# List the named dictionary objects found
>>> r.groupdict()
{}

# Run findall
>>> regex.findall(string)
[(u'TheBears SUCCESS Number of wins : 14', u'TheBears', u'14')]
# So you can do this for the name and number:
>>> fullstring, name, number = r.groups()

If you don't need the full string just remove the surround parenthesis.

like image 93
fijter Avatar answered Sep 25 '22 03:09

fijter