I want to extract the data between <tr>
tags from an html page. I used the following code.But i didn't get any result. The html between the <tr>
tags is in multiple lines
category =re.findall('<tr>(.*?)</tr>',data);
Please suggest a fix for this problem.
MULTILINE search modifier forces the ^ symbol to match at the beginning of each line of text (and not just the first), and the $ symbol to match at the end of each line of text (and not just the last one).
Multiline option, or the m inline option, enables the regular expression engine to handle an input string that consists of multiple lines. It changes the interpretation of the ^ and $ language elements so that they match the beginning and end of a line, instead of the beginning and end of the input string.
Practical Data Science using PythonDOTALL flag tells python to make the '. ' special character match all characters, including newline characters. This is a paragraph. It has multiple lines.
just to clear up the issue. Despite all those links to re.M
it wouldn't work here as simple skimming of the its explanation would reveal. You'd need re.S
, if you wouldn't try to parse html, of course:
>>> doc = """<table border="1">
<tr>
<td>row 1, cell 1</td>
<td>row 1, cell 2</td>
</tr>
<tr>
<td>row 2, cell 1</td>
<td>row 2, cell 2</td>
</tr>
</table>"""
>>> re.findall('<tr>(.*?)</tr>', doc, re.S)
['\n <td>row 1, cell 1</td>\n <td>row 1, cell 2</td>\n ',
'\n <td>row 2, cell 1</td>\n <td>row 2, cell 2</td>\n ']
>>> re.findall('<tr>(.*?)</tr>', doc, re.M)
[]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With