Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get a faster speed when using multi-threading in python

Now i am studying how to fetch data from website as fast as possible. To get faster speed, im considering using multi-thread. Here is the code i used to test the difference between multi-threaded and simple post.

import threading
import time
import urllib
import urllib2


class Post:

    def __init__(self, website, data, mode):
        self.website = website
        self.data = data

        #mode is either "Simple"(Simple POST) or "Multiple"(Multi-thread POST)
        self.mode = mode

    def post(self):

        #post data
        req = urllib2.Request(self.website)
        open_url = urllib2.urlopen(req, self.data)

        if self.mode == "Multiple":
            time.sleep(0.001)

        #read HTMLData
        HTMLData = open_url.read()



        print "OK"

if __name__ == "__main__":

    current_post = Post("http://forum.xda-developers.com/login.php", "vb_login_username=test&vb_login_password&securitytoken=guest&do=login", \
                        "Simple")

    #save the time before post data
    origin_time = time.time()

    if(current_post.mode == "Multiple"):

        #multithreading POST

        for i in range(0, 10):
           thread = threading.Thread(target = current_post.post)
           thread.start()
           thread.join()

        #calculate the time interval
        time_interval = time.time() - origin_time

        print time_interval

    if(current_post.mode == "Simple"):

        #simple POST

        for i in range(0, 10):
            current_post.post()

        #calculate the time interval
        time_interval = time.time() - origin_time

        print time_interval

just as you can see, this is a very simple code. first i set the mode to "Simple", and i can get the time interval: 50s(maybe my speed is a little slow :(). then i set the mode to "Multiple", and i get the time interval: 35. from that i can see, multi-thread can actually increase the speed, but the result isnt as good as i imagine. i want to get a much faster speed.

from debugging, i found that the program mainly blocks at the line: open_url = urllib2.urlopen(req, self.data), this line of code takes a lot of time to post and receive data from the specified website. i guess maybe i can get a faster speed by adding time.sleep() and using multi-threading inside the urlopen function, but i cannot do that because its the python's own function.

if not considering the prossible limits that the server blocks the post speed, what else can i do to get the faster speed? or any other code i can modify? thx a lot!

like image 574
Searene Avatar asked Apr 14 '12 14:04

Searene


1 Answers

The biggest thing you are doing wrong, that is hurting your throughput the most, is the way you are calling thread.start() and thread.join():

for i in range(0, 10):
   thread = threading.Thread(target = current_post.post)
   thread.start()
   thread.join()

Each time through the loop, you create a thread, start it, and then wait for it to finish Before moving on to the next thread. You aren't doing anything concurrently at all!

What you should probably be doing instead is:

threads = []

# start all of the threads
for i in range(0, 10):
   thread = threading.Thread(target = current_post.post)
   thread.start()
   threads.append(thread)

# now wait for them all to finish
for thread in threads:
   thread.join()
like image 52
SingleNegationElimination Avatar answered Sep 28 '22 15:09

SingleNegationElimination