I am trying to get random sample of internet pages, I don't want to scrap google search results for various reasons. Here is how I have tried it to do;
import socket
from random import randint
def doesitserveawebpage(ip):
ip=str(ip)
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
try:
s.connect((ip, 80))
s.shutdown(2)
return True
except:
return False
def givemerandomwebsite():
adrformat = "%d.%d.%d.%d"
while True:
adr = adrformat % tuple(randint(0,255) for _ in range(4))
try:
print "Tring %s" % adr
name = socket.gethostbyaddr(adr)
if (doesitserveawebpage(adr)):
return name
else:
continue
except socket.herror:
continue
Well, it doesn't work. First, it works too slow. Second, it gives me addreses that don't serve web pages. Is there anyway I can make this code better, or would you suggest another way to solve this problem?
Making the assumption that most HTTP servers runs on a host with domain name (e.g. not just an IP address), you can further verify your random IP addresses by doing a DNS lookup, e.g. dig.
Also, you should not allow your algorithm to create a random IP that is part of the private IP ranges.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With