So, I have this code:
url = 'http://google.com' linkregex = re.compile('<a\s*href=[\'|"](.*?)[\'"].*?>') m = urllib.request.urlopen(url) msg = m.read() links = linkregex.findall(msg)
But then python returns this error:
links = linkregex.findall(msg) TypeError: can't use a string pattern on a bytes-like object
What did I do wrong?
TypeError: can't use a string pattern
on a bytes-like object
what did i do wrong??
You used a string pattern on a bytes object. Use a bytes pattern instead:
linkregex = re.compile(b'<a\s*href=[\'|"](.*?)[\'"].*?>') ^ Add the b there, it makes it into a bytes object
(ps:
>>> from disclaimer include dont_use_regexp_on_html "Use BeautifulSoup or lxml instead."
)
If you are running Python 2.6 then there isn't any "request" in "urllib". So the third line becomes:
m = urllib.urlopen(url)
And in version 3 you should use this:
links = linkregex.findall(str(msg))
Because 'msg' is a bytes object and not a string as findall() expects. Or you could decode using the correct encoding. For instance, if "latin1" is the encoding then:
links = linkregex.findall(msg.decode("latin1"))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With