I am using Beautiful Soup in Python.
Here is an example URL:
http://www.locationary.com/place/en/US/Ohio/Middletown/McDonald%27s-p1013254580.jsp
In the HTML, there are a bunch of tags and the only way I can specify which ones to find is with their id. The only thing I want to find is the telephone number. The tag looks like this:
<td class="dispTxt" id="value_xxx_c_1_f_8_a_134242498">5134231582</td>
I have gone to other URLs on the same website and found almost the same id for the telephone number tag every time. The part that always stays the same is:
'value_xxx_c_1_f_8_a_'
However, the numbers that come after that always change. Is there a way that I can tell Beautiful Soup to look for part of the id and match it and let the other part be numbers like a regular expression could?
Also, once I get the tag, I was wondering...how can I extract the phone number without using regular expressions? I don't know if Beautiful Soup can do that but it would probably be simpler than regex.
You can use regular expressions (this example matches on the tag names, you need to adjust it so it matches on an element's id):
import re
for tag in soup.find_all(re.compile("^value_xxx_c_1_f_8_a_")):
print(tag.name)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With