Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use Beautiful Soup to find a tag with changing id?

I am using Beautiful Soup in Python.

Here is an example URL:

http://www.locationary.com/place/en/US/Ohio/Middletown/McDonald%27s-p1013254580.jsp

In the HTML, there are a bunch of tags and the only way I can specify which ones to find is with their id. The only thing I want to find is the telephone number. The tag looks like this:

<td class="dispTxt" id="value_xxx_c_1_f_8_a_134242498">5134231582</td> 

I have gone to other URLs on the same website and found almost the same id for the telephone number tag every time. The part that always stays the same is:

'value_xxx_c_1_f_8_a_'

However, the numbers that come after that always change. Is there a way that I can tell Beautiful Soup to look for part of the id and match it and let the other part be numbers like a regular expression could?

Also, once I get the tag, I was wondering...how can I extract the phone number without using regular expressions? I don't know if Beautiful Soup can do that but it would probably be simpler than regex.

like image 376
Marcus Johnson Avatar asked Aug 12 '12 17:08

Marcus Johnson


1 Answers

You can use regular expressions (this example matches on the tag names, you need to adjust it so it matches on an element's id):

import re
for tag in soup.find_all(re.compile("^value_xxx_c_1_f_8_a_")):
    print(tag.name)
like image 195
Simeon Visser Avatar answered Sep 28 '22 04:09

Simeon Visser