So I have an html page. It's full of various tags, most of them have sessionid GET parameter in their href attribute. Example:
...
<a href="struct_view_distrib.asp?sessionid=11692390">
...
<a href="SHOW_PARENT.asp?sessionid=11692390">
...
<a href="nakl_view.asp?sessionid=11692390">
...
<a href="move_sum_to_7300001.asp?sessionid=11692390&mode_id=0">
...
So, as you see, sessionid is the same, i just need to get it's value into variable, no matter from which one: x=11692390 I'm newbie in regex, but google wasn't helpful. Thx a lot!
This does not use regexes, but anyway, this is what you would do in Python 2.6:
from BeautifulSoup import BeautifulSoup
import urlparse
soup = BeautifulSoup(html)
links = soup.findAll('a', href=True)
for link in links:
href = link['href']
url = urlparse.urlparse(href)
params = urlparse.parse_qs(url.query)
if 'sessionid' in params:
print params['sessionid'][0]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With