I need to find content of forms from HTML source file, I did some searching and found very good method to do that, but the problem is that it prints out only first found, how can I loop through it and output all form contents, not just first one?
line = 'bla bla bla<form>Form 1</form> some text...<form>Form 2</form> more text?' matchObj = re.search('<form>(.*?)</form>', line, re.S) print matchObj.group(1) # Output: Form 1 # I need it to output every form content he found, not just first one...
The re.search() returns only the first match to the pattern from the target string.
made this to find all with multiple #regular #expressions. regex1 = r"your regex here" regex2 = r"your regex here" regex3 = r"your regex here" regexList = [regex1, regex1, regex3] for x in regexList: if re. findall(x, your string): some_list = re. findall(x, your string) for y in some_list: found_regex_list.
Finditer method finditer() works exactly the same as the re. findall() method except it returns an iterator yielding match objects matching the regex pattern in a string instead of a list. It scans the string from left to right, and matches are returned in the iterator form.
Introduction to the Python regex backreferences The backreferences allow you to reference capturing groups within a regular expression. In this syntax, N can be 1, 2, 3, etc. that represents the corresponding capturing group. Note that the \g<0> refer to the entire match, which has the same value as the match.
Do not use regular expressions to parse HTML.
But if you ever need to find all regexp matches in a string, use the findall
function.
import re line = 'bla bla bla<form>Form 1</form> some text...<form>Form 2</form> more text?' matches = re.findall('<form>(.*?)</form>', line, re.DOTALL) print(matches) # Output: ['Form 1', 'Form 2']
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With