Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python - Using regex to find multiple matches and print them out [duplicate]

Tags:

python

regex

I need to find content of forms from HTML source file, I did some searching and found very good method to do that, but the problem is that it prints out only first found, how can I loop through it and output all form contents, not just first one?

line = 'bla bla bla<form>Form 1</form> some text...<form>Form 2</form> more text?' matchObj = re.search('<form>(.*?)</form>', line, re.S) print matchObj.group(1) # Output: Form 1 # I need it to output every form content he found, not just first one... 
like image 300
Stan Avatar asked Oct 11 '11 11:10

Stan


People also ask

Does re search return multiple matches?

The re.search() returns only the first match to the pattern from the target string.

How do you do multiple regex in Python?

made this to find all with multiple #regular #expressions. regex1 = r"your regex here" regex2 = r"your regex here" regex3 = r"your regex here" regexList = [regex1, regex1, regex3] for x in regexList: if re. findall(x, your string): some_list = re. findall(x, your string) for y in some_list: found_regex_list.

How do I use Finditer in Python?

Finditer method finditer() works exactly the same as the re. findall() method except it returns an iterator yielding match objects matching the regex pattern in a string instead of a list. It scans the string from left to right, and matches are returned in the iterator form.

What is Backreference in regular expression Python?

Introduction to the Python regex backreferences The backreferences allow you to reference capturing groups within a regular expression. In this syntax, N can be 1, 2, 3, etc. that represents the corresponding capturing group. Note that the \g<0> refer to the entire match, which has the same value as the match.


1 Answers

Do not use regular expressions to parse HTML.

But if you ever need to find all regexp matches in a string, use the findall function.

import re line = 'bla bla bla<form>Form 1</form> some text...<form>Form 2</form> more text?' matches = re.findall('<form>(.*?)</form>', line, re.DOTALL) print(matches)  # Output: ['Form 1', 'Form 2'] 
like image 142
Petr Viktorin Avatar answered Sep 29 '22 09:09

Petr Viktorin