I'm writing a code in Python to get all the 'a' tags in a URL using Beautiful soup, then I use the link at position 3, then I should follow that link, I will repeat this process about 18 times. I included the code below, which has the process repeated twice. I can't come about a way to repeat the same process 18 times in a loop.Any help would be appreciated.
import re
import urllib
from BeautifulSoup import *
htm1= urllib.urlopen('https://pr4e.dr-chuck.com/tsugi/mod/python-data/data/known_by_Fikret.html ').read()
soup =BeautifulSoup(htm1)
tags = soup('a')
list1=list()
for tag in tags:
x = tag.get('href', None)
list1.append(x)
M= list1[2]
htm2= urllib.urlopen(M).read()
soup =BeautifulSoup(htm2)
tags1 = soup('a')
list2=list()
for tag1 in tags1:
x2 = tag1.get('href', None)
list2.append(x2)
y= list2[2]
print y
OK, I just wrote this code, it's working but I get the same 4 links in the results. It looks like there is something wrong in the loop (please note: I'm trying the loop 4 times)
import re
import urllib
from BeautifulSoup import *
list1=list()
url = 'https://pr4e.dr-chuck.com/tsugi/mod/python-data/data/known_by_Fikret.html'
for i in range (4): # repeat 4 times
htm2= urllib.urlopen(url).read()
soup1=BeautifulSoup(htm2)
tags1= soup1('a')
for tag1 in tags1:
x2 = tag1.get('href', None)
list1.append(x2)
y= list1[2]
if len(x2) < 3: # no 3rd link
break # exit the loop
else:
url=y
print y
I can't come about a way to repeat the same process 18 times in a loop.
To repeat something 18 times in Python, you could use for _ in range(18)
loop:
#!/usr/bin/env python2
from urllib2 import urlopen
from urlparse import urljoin
from bs4 import BeautifulSoup # $ pip install beautifulsoup4
url = 'http://example.com'
for _ in range(18): # repeat 18 times
soup = BeautifulSoup(urlopen(url))
a = soup.find_all('a', href=True) # all <a href> links
if len(a) < 3: # no 3rd link
break # exit the loop
url = urljoin(url, a[2]['href']) # 3rd link, note: ignore <base href>
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With