Is there any lib out there that can take a text (like a html document) and a list of strings (like the name of some products) and then find a pattern in the list of strings and generate a regular expression that would extract all the strings in the text (html document) that match the pattern it found?
For example, given the following html:
<table>
<tr>
<td>Product 1</td>
<td>Product 2</td>
<td>Product 3</td>
<td>Product 4</td>
<td>Product 5</td>
<td>Product 6</td>
<td>Product 7</td>
<td>Product 8</td>
</tr>
</table>
and the following list of strings:
['Product 1', 'Product 2', 'Product 3']
I'd like a function that would build a regex like the following:
'<td>(.*?)</td>'
and then extract all the information from the html that match the regex. In this case, the output would be:
['Product 1', 'Product 2', 'Product 3', 'Product 4', 'Product 5', 'Product 6', 'Product 7', 'Product 8']
CLARIFICATION:
I'd like the function to look at the surrounding of the samples, not at the samples themselves. So, for example, if the html was:
<tr>
<td>Word</td>
<td>More words</td>
<td>101</td>
<td>-1-0-1-</td>
</tr>
and the samples ['Word', 'More words']
I'd like it to extract:
['Word', 'More words', '101', '-1-0-1-']
Python has a built-in package called re , which can be used to work with Regular Expressions.
The Python "re" module provides regular expression support.
Regex is provided by many programming languages, such as python, java, javascript, etc.
Your requirement is at the same time very specific and very general.
I don't think you would ever find any library for your purpose unless you write your own.
On the other hand, if you spend too much time writing regex, you could use some GUI tools to help you build them, like: http://www.regular-expressions.info/regexmagic.html
However, if you need to extract data from html documents only, you should consider using an html parser, it should make things a lot easier.
I recommend beautifulsoup
for parsing html document in python:
https://pypi.python.org/pypi/beautifulsoup4/4.2.1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With