Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to make a random sample from the internet?

Tags:

python

random

I am trying to get random sample of internet pages, I don't want to scrap google search results for various reasons. Here is how I have tried it to do;

import socket
from random import randint

def doesitserveawebpage(ip):
    ip=str(ip)
    s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    try:
        s.connect((ip, 80))
        s.shutdown(2)
        return True
    except:
        return False

def givemerandomwebsite():
    adrformat = "%d.%d.%d.%d"
    while True:
        adr = adrformat % tuple(randint(0,255) for _ in range(4))
        try:
            print "Tring %s" % adr
            name = socket.gethostbyaddr(adr)
            if (doesitserveawebpage(adr)):
                return name
            else:
                continue
        except socket.herror:
            continue

Well, it doesn't work. First, it works too slow. Second, it gives me addreses that don't serve web pages. Is there anyway I can make this code better, or would you suggest another way to solve this problem?

like image 327
yasar Avatar asked Nov 13 '22 12:11

yasar


1 Answers

Making the assumption that most HTTP servers runs on a host with domain name (e.g. not just an IP address), you can further verify your random IP addresses by doing a DNS lookup, e.g. dig.

Also, you should not allow your algorithm to create a random IP that is part of the private IP ranges.

like image 147
RipperDoc Avatar answered Nov 15 '22 06:11

RipperDoc