Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

urlib.request.urlopen not accepting query string with spaces

I am taking a udacity course on python where we are supposed to check for profane words in a document. I am using the website http://www.wdylike.appspot.com/?q= (text_to_be_checked_for_profanity). The text to be checked can be passed as a query string in the above URL and the website would return a true or false after checking for profane words. Below is my code.

import urllib.request

# Read the content from a document
def read_content():

    quotes = open("movie_quotes.txt")
    content = quotes.read()
    quotes.close()
    check_profanity(content)



def check_profanity(text_to_read):
    connection = urllib.request.urlopen("http://www.wdylike.appspot.com/?q="+text_to_read)
    result = connection.read()
    print(result)
    connection.close

read_content()

It gives me the following error

Traceback (most recent call last):
   File "/Users/Vrushita/Desktop/Rishit/profanity_check.py", line 21, in <module>
     read_content()
   File "/Users/Vrushita/Desktop/Rishit/profanity_check.py", line 11, in read_content
     check_profanity(content)
   File "/Users/Vrushita/Desktop/Rishit/profanity_check.py", line 16, in check_profanity
     connection = urllib.request.urlopen("http://www.wdylike.appspot.com/?q="+text_to_read)
   File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/urllib/request.py", line 163, in urlopen
     return opener.open(url, data, timeout)
   File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/urllib/request.py", line 472, in open
     response = meth(req, response)
   File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/urllib/request.py", line 582, in http_response
     'http', request, response, code, msg, hdrs)
   File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/urllib/request.py", line 510, in error
     return self._call_chain(*args)
   File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/urllib/request.py", line 444, in _call_chain
     result = func(*args)
   File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/urllib/request.py", line 590, in http_error_default
     raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 400: Bad Request

The document that I am trying to read the content from contains a string "Hello world" However, if I change the string to "Hello+world", the same code works and returns the desired result. Can someone explain why this is happening and what is a workaround for this?

like image 837
Rishit Shah Avatar asked Dec 15 '22 01:12

Rishit Shah


1 Answers

urllib accepts it, the server doesn't. And well it should not, because a space is not a valid URL character.

Escape your query string properly with urllib.parse.quote_plus(); it'll ensure your string is valid for use in query parameters. Or better still, use the urllib.parse.urlencode() function to encode all key-value pairs:

from urllib.parse import urlencode

params = urlencode({'q': text_to_read})
connection = urllib.request.urlopen(f"http://www.wdylike.appspot.com/?{params}")
like image 83
Martijn Pieters Avatar answered Mar 06 '23 14:03

Martijn Pieters