I am taking a udacity course on python where we are supposed to check for profane words in a document. I am using the website http://www.wdylike.appspot.com/?q= (text_to_be_checked_for_profanity). The text to be checked can be passed as a query string in the above URL and the website would return a true or false after checking for profane words. Below is my code.
import urllib.request
# Read the content from a document
def read_content():
quotes = open("movie_quotes.txt")
content = quotes.read()
quotes.close()
check_profanity(content)
def check_profanity(text_to_read):
connection = urllib.request.urlopen("http://www.wdylike.appspot.com/?q="+text_to_read)
result = connection.read()
print(result)
connection.close
read_content()
It gives me the following error
Traceback (most recent call last):
File "/Users/Vrushita/Desktop/Rishit/profanity_check.py", line 21, in <module>
read_content()
File "/Users/Vrushita/Desktop/Rishit/profanity_check.py", line 11, in read_content
check_profanity(content)
File "/Users/Vrushita/Desktop/Rishit/profanity_check.py", line 16, in check_profanity
connection = urllib.request.urlopen("http://www.wdylike.appspot.com/?q="+text_to_read)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/urllib/request.py", line 163, in urlopen
return opener.open(url, data, timeout)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/urllib/request.py", line 472, in open
response = meth(req, response)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/urllib/request.py", line 582, in http_response
'http', request, response, code, msg, hdrs)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/urllib/request.py", line 510, in error
return self._call_chain(*args)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/urllib/request.py", line 444, in _call_chain
result = func(*args)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/urllib/request.py", line 590, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 400: Bad Request
The document that I am trying to read the content from contains a string "Hello world" However, if I change the string to "Hello+world", the same code works and returns the desired result. Can someone explain why this is happening and what is a workaround for this?
urllib
accepts it, the server doesn't. And well it should not, because a space is not a valid URL character.
Escape your query string properly with urllib.parse.quote_plus()
; it'll ensure your string is valid for use in query parameters. Or better still, use the urllib.parse.urlencode()
function to encode all key-value pairs:
from urllib.parse import urlencode
params = urlencode({'q': text_to_read})
connection = urllib.request.urlopen(f"http://www.wdylike.appspot.com/?{params}")
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With