Yesterday I wrote a simple Python program (really simple as shown below) to validate the HTTP status responses of around 5000 URLs. The thing is the program seems getting stuck for every 400 to 500 URLs. As I'm really new to programming, I have no idea how to track the problem.
I added the "a = a + 1" piece to track how many URLs had been processed when it got stuck.
How can I find what the problem is? Thank you very much!
I'm using Ubuntu 11.10 and Python 2.7
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import httplib
raw_url_list = open ('url.txt', 'r')
url_list = raw_url_list.readlines()
result_file = open('result.txt', 'w')
a = 0
for url in url_list:
url = url.strip()[23:]
conn = httplib.HTTPConnection('www.123456789.cn')
conn.request('HEAD', url)
res = conn.getresponse()
result_file.write('http://www.123456789.cn%s, %s, %s \n' % (url, res.status, res.reason))
a = a + 1
print a
raw_url_list.close()
result_file.close()
You need to close your connections after you are done. Just add this to the end of your for loop.
conn.close()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With