Now i am studying how to fetch data from website as fast as possible. To get faster speed, im considering using multi-thread. Here is the code i used to test the difference between multi-threaded and simple post.
import threading
import time
import urllib
import urllib2
class Post:
def __init__(self, website, data, mode):
self.website = website
self.data = data
#mode is either "Simple"(Simple POST) or "Multiple"(Multi-thread POST)
self.mode = mode
def post(self):
#post data
req = urllib2.Request(self.website)
open_url = urllib2.urlopen(req, self.data)
if self.mode == "Multiple":
time.sleep(0.001)
#read HTMLData
HTMLData = open_url.read()
print "OK"
if __name__ == "__main__":
current_post = Post("http://forum.xda-developers.com/login.php", "vb_login_username=test&vb_login_password&securitytoken=guest&do=login", \
"Simple")
#save the time before post data
origin_time = time.time()
if(current_post.mode == "Multiple"):
#multithreading POST
for i in range(0, 10):
thread = threading.Thread(target = current_post.post)
thread.start()
thread.join()
#calculate the time interval
time_interval = time.time() - origin_time
print time_interval
if(current_post.mode == "Simple"):
#simple POST
for i in range(0, 10):
current_post.post()
#calculate the time interval
time_interval = time.time() - origin_time
print time_interval
just as you can see, this is a very simple code. first i set the mode to "Simple", and i can get the time interval: 50s(maybe my speed is a little slow :(). then i set the mode to "Multiple", and i get the time interval: 35. from that i can see, multi-thread can actually increase the speed, but the result isnt as good as i imagine. i want to get a much faster speed.
from debugging, i found that the program mainly blocks at the line: open_url = urllib2.urlopen(req, self.data)
, this line of code takes a lot of time to post and receive data from the specified website. i guess maybe i can get a faster speed by adding time.sleep()
and using multi-threading inside the urlopen
function, but i cannot do that because its the python's own function.
if not considering the prossible limits that the server blocks the post speed, what else can i do to get the faster speed? or any other code i can modify? thx a lot!
The biggest thing you are doing wrong, that is hurting your throughput the most, is the way you are calling thread.start()
and thread.join()
:
for i in range(0, 10):
thread = threading.Thread(target = current_post.post)
thread.start()
thread.join()
Each time through the loop, you create a thread, start it, and then wait for it to finish Before moving on to the next thread. You aren't doing anything concurrently at all!
What you should probably be doing instead is:
threads = []
# start all of the threads
for i in range(0, 10):
thread = threading.Thread(target = current_post.post)
thread.start()
threads.append(thread)
# now wait for them all to finish
for thread in threads:
thread.join()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With