Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python: regex findall

Tags:

python

regex

Iam using python regex to extract certain values from a given string. This is my string:

mystring.txt

sometext
somemore    text here

some  other text

              course: course1
Id              Name                marks
____________________________________________________
1               student1            65
2               student2            75
3               MyName              69
4               student4            43

              course: course2
Id              Name                marks
____________________________________________________
1               student1            84
2               student2            73
8               student7            99
4               student4            32

              course: course4
Id              Name                marks
____________________________________________________
1               student1            97
3               MyName              60
8               student6            82

and I need to extract the course name and corresponding marks for a particular student. For example, I need the course and marks for MyName from the above string.

I tried:

re.findall(".*?course: (\w+).*?MyName\s+(\d+).*?",buff,re.DOTALL)

But this works only if MyName is present under each course, but not if MyName is missing in some of the course, like in my example string.

Here I get output as: [('course1', '69'), ('course2', '60')]

but what actually what I want to achive is: [('course1', '69'), ('course4', '60')]

what would be the correct regex for this?

#!/usr/bin/python    
import re

buffer_fp = open("mystring.txt","r+")
buff = buffer_fp.read()
buffer_fp.close()
print re.findall(".*?course: (\w+).*?MyName\s+(\d+).*?",buff,re.DOTALL)
like image 413
Deepa Avatar asked Jun 03 '15 06:06

Deepa


2 Answers

.*?course: (\w+)(?:(?!\bcourse\b).)*MyName\s+(\d+).*?

                    ^^^^^^^^^^^^

You can try this.See demo.Just use a lookahead based quantifier which will search for MyName before a course just before it.

https://regex101.com/r/pG1kU1/26

like image 164
vks Avatar answered Oct 04 '22 04:10

vks


I suspect this is impossible to do in a single regular expression. They are not all-powerful.

Even if you find a way, don't do this. Your non-working regex is already close to unreadable; a working solution is likely to be even more so. You can most likely do this in just a few lines of meaningful code. Pseudocode solution:

for line in buff:
    if it is a course line:
        set the course variable
    if it is a MyName line:
        add (course, marks) to the list of matches

Note that this could (and probably should) involve regexes in each of those if blocks. It's not a case of choosing between the hammer and the screwdriver to the exclusion of the other, but rather using them both for what they do best.

like image 40
Béla Avatar answered Oct 04 '22 02:10

Béla