import re, urllib.request
textfile = open('depth_1.txt','wt')
print('enter the url you would like to crawl')
print('Usage - "http://phocks.org/stumble/creepy/" <-- with the double quotes')
my_url = input()
for i in re.findall(b'''href=["'](.[^"']+)["']''', urllib.request.urlopen(my_url).read(), re.I):
print(i)
for ee in re.findall(b'''href=["'](.[^"']+)["']''', urllib.request.urlopen(i).read(), re.I): #this is line 20!
print(ee)
textfile.write(ee+'\n')
textfile.close()
After looking around for a solution to my problem, I couldn't find a fix. The error occures in line 20 (AttributeError: 'bytes' object has no attribute 'timeout'). I don't fully understand the error, so I'm looking for an answer and an explanation of what I did wrong. Thanks!
From the docs for urllib.request.urlopen
:
urllib.request.urlopen(url[, data][, timeout])
Open the URL url, which can be either a string or a Request object.
If urllib.request.urlopen
doesn't receive a string, it assumes it is a Request object. You are passing a bytestring which is why it's failing, eg:
>>> a = urllib.request.urlopen('http://www.google.com').read() # success
>>> a = urllib.request.urlopen(b'http://www.google.com').read() # throws same error
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/opt/local/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/urllib/request.py", line 153, in urlopen
return opener.open(url, data, timeout)
File "/opt/local/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/urllib/request.py", line 446, in open
req.timeout = timeout
AttributeError: 'bytes' object has no attribute 'timeout'
To fix that, convert your bytestring back to a str by decoding it with the appropriate codec:
>>> a = urllib.request.urlopen(b'http://www.google.com'.decode('ASCII')).read()
Or don't use bytestrings in the first place.
This errors is caused by you can't use a bytestring as a url, check encoding of your program
Because it is an attribute error, some code either you wrote or in a library you use attempted to access the timeout property of an object it was passed. In your case you had a bytes object passed, which is probably your problem. You probably pass the wrong object type around somewhere. If your sure the objects you are passing are correct, follow the traceback to see exactly where timeout is called and check if you can tell what object it expects.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With