Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How many network ports does Linux allow python to use?

So I have been trying to multi-thread some internet connections in python. I have been using the multiprocessing module so I can get around the "Global Interpreter Lock". But it seems that the system only gives one open connection port to python, Or at least it only allows one connection to happen at once. Here is an example of what I am saying.

*Note that this is running on a linux server

from multiprocessing import Process, Queue
import urllib
import random

# Generate 10,000 random urls to test and put them in the queue
queue = Queue()
for each in range(10000):
    rand_num = random.randint(1000,10000)
    url = ('http://www.' + str(rand_num) + '.com')
    queue.put(url)

# Main funtion for checking to see if generated url is active
def check(q):
    while True:
        try:
            url = q.get(False)
            try:
                request = urllib.urlopen(url)
                del request
                print url + ' is an active url!'
            except:
                print url + ' is not an active url!'
        except:
            if q.empty():
                break

# Then start all the threads (50)
for thread in range(50):
    task = Process(target=check, args=(queue,))
    task.start()

So if you run this you will notice that it starts 50 instances on the function but only runs one at a time. You may think that the 'Global Interpreter Lock' is doing this but it isn't. Try changing the function to a mathematical function instead of a network request and you will see that all fifty threads run simultaneously.

So will I have to work with sockets? Or is there something I can do that will give python access to more ports? Or is there something I am not seeing? Let me know what you think! Thanks!

*Edit

So I wrote this script to test things better with the requests library. It seems as though I had not tested it very well with this before. (I had mainly used urllib and urllib2)

from multiprocessing import Process, Queue
from threading import Thread
from Queue import Queue as Q
import requests
import time

# A main timestamp
main_time = time.time()

# Generate 100 urls to test and put them in the queue
queue = Queue()
for each in range(100):
    url = ('http://www.' + str(each) + '.com')
    queue.put(url)

# Timer queue
time_queue = Queue()

# Main funtion for checking to see if generated url is active
def check(q, t_q): # args are queue and time_queue
    while True:
        try:
            url = q.get(False)
            # Make a timestamp
            t = time.time()
            try:
                request = requests.head(url, timeout=5)
                t = time.time() - t
                t_q.put(t)
                del request
            except:
                t = time.time() - t
                t_q.put(t)
        except:
            break

# Then start all the threads (20)
thread_list = []
for thread in range(20):
    task = Process(target=check, args=(queue, time_queue))
    task.start()
    thread_list.append(task)

# Join all the threads so the main process don't quit
for each in thread_list:
    each.join()
main_time_end = time.time()

# Put the timerQueue into a list to get the average
time_queue_list = []
while True:
    try:
        time_queue_list.append(time_queue.get(False))
    except:
        break

# Results of the time
average_response = sum(time_queue_list) / float(len(time_queue_list))
total_time = main_time_end - main_time
line =  "Multiprocessing: Average response time: %s sec. -- Total time: %s sec." % (average_response, total_time)
print line

# A main timestamp
main_time = time.time()

# Generate 100 urls to test and put them in the queue
queue = Q()
for each in range(100):
    url = ('http://www.' + str(each) + '.com')
    queue.put(url)

# Timer queue
time_queue = Queue()

# Main funtion for checking to see if generated url is active
def check(q, t_q): # args are queue and time_queue
    while True:
        try:
            url = q.get(False)
            # Make a timestamp
            t = time.time()
            try:
                request = requests.head(url, timeout=5)
                t = time.time() - t
                t_q.put(t)
                del request
            except:
                t = time.time() - t
                t_q.put(t)
        except:
            break

# Then start all the threads (20)
thread_list = []
for thread in range(20):
    task = Thread(target=check, args=(queue, time_queue))
    task.start()
    thread_list.append(task)

# Join all the threads so the main process don't quit
for each in thread_list:
    each.join()
main_time_end = time.time()

# Put the timerQueue into a list to get the average
time_queue_list = []
while True:
    try:
        time_queue_list.append(time_queue.get(False))
    except:
        break

# Results of the time
average_response = sum(time_queue_list) / float(len(time_queue_list))
total_time = main_time_end - main_time
line =  "Standard Threading: Average response time: %s sec. -- Total time: %s sec." % (average_response, total_time)
print line

# Do the same thing all over again but this time do each url at a time
# A main timestamp
main_time = time.time()

# Generate 100 urls and test them
timer_list = []
for each in range(100):
    url = ('http://www.' + str(each) + '.com')
    t = time.time()
    try:
        request = requests.head(url, timeout=5)
        timer_list.append(time.time() - t)
    except:
        timer_list.append(time.time() - t)
main_time_end = time.time()

# Results of the time
average_response = sum(timer_list) / float(len(timer_list))
total_time = main_time_end - main_time
line = "Not using threads: Average response time: %s sec. -- Total time: %s sec." % (average_response, total_time)
print line

As you can see, it is multithreading very well. Actually, most of my tests show that the threading module is actually faster than the multiprocessing module. (I don't understand why!) Here are some of my results.

Multiprocessing: Average response time: 2.40511314869 sec. -- Total time: 25.6876308918 sec.
Standard Threading: Average response time: 2.2179402256 sec. -- Total time: 24.2941861153 sec.
Not using threads: Average response time: 2.1740363431 sec. -- Total time: 217.404567957 sec.

This was done on my home network, the response time on my server is much faster. I think my question has been answered indirectly, since I was having my problems on a much more complex script. All of the suggestions helped me optimize it very well. Thanks to everyone!

like image 998
TysonU Avatar asked Nov 09 '22 13:11

TysonU


1 Answers

it starts 50 instances on the function but only runs one at a time

You have misinterpreted the results of htop. Only a few, if any, copies of python will be runnable at any specific instance. Most of them will be blocked waiting for network I/O.

The processes are, in fact, running parallel.

Try changing the function to a mathematical function instead of a network request and you will see that all fifty threads run simultaneously.

Changing the task to a mathematical function merely illustrates the difference between CPU-bound (e.g. math) and IO-bound (e.g. urlopen) processes. The former is always runnable, the latter is rarely runnable.

it only prints one at a time. If it was actually running multiple processes it would print many out at once.

It prints one at a time because you are writing lines to a terminal. Because the lines are indistinguishable, you wouldn't be able to tell if they are written all by one thread, or each by a separate thread in turn.

like image 148
Robᵩ Avatar answered Nov 14 '22 22:11

Robᵩ