I am using urllib to get a string of html from a website and need to put each word in the html document into a list. Here is the code I have so far. I keep getting an error. I have also copied the error below. <pre class="prettyprint"><code>import urllib.request url = input("Please enter a URL: ") z=urllib.request.urlopen(url) z=str(z.read()) removeSpecialChars = str.replace("!@#$%^&*()[]{};:,./<>?\|`~-=_+", " ") words = removeSpecialChars.split() print ("Words list: ", words[0:20]) </code></pre> Here is the error. <pre class="prettyprint"><code>Please enter a URL: http://simleyfootball.com Traceback (most recent call last): File "C:\Users\jeremy.KLUG\My Documents\LiClipse Workspace\Python Project 2\Module2.py", line 7, in <module> removeSpecialChars = str.replace("!@#$%^&*()[]{};:,./<>?\|`~-=_+", " ") TypeError: replace() takes at least 2 arguments (1 given) </code></pre>

str.replace is the wrong function for what you want to do (apart from it being used incorrectly). You want to replace any character of a set with a space, not the whole set with a single space (the latter is what replace does). You can use translate like this: <pre class="prettyprint"><code>removeSpecialChars = z.translate ({ord(c): " " for c in "!@#$%^&*()[]{};:,./<>?\|`~-=_+"}) </code></pre> This creates a mapping which maps every character in your list of special characters to a space, then calls translate() on the string, replacing every single character in the set of special characters with a space.

replace special characters in a string python

I am using urllib to get a string of html from a website and need to put each word in the html document into a list.

Here is the code I have so far. I keep getting an error. I have also copied the error below.

import urllib.request

url = input("Please enter a URL: ")

z=urllib.request.urlopen(url)
z=str(z.read())
removeSpecialChars = str.replace("!@#$%^&*()[]{};:,./<>?\|`~-=_+", " ")

words = removeSpecialChars.split()

print ("Words list: ", words[0:20])

Here is the error.

Please enter a URL: http://simleyfootball.com
Traceback (most recent call last):
  File "C:\Users\jeremy.KLUG\My Documents\LiClipse Workspace\Python Project 2\Module2.py", line 7, in <module>
    removeSpecialChars = str.replace("!@#$%^&*()[]{};:,./<>?\|`~-=_+", " ")
TypeError: replace() takes at least 2 arguments (1 given)

How do you replace a special character in a string in Python?

Using 'str.replace() , we can replace a specific character. If we want to remove that specific character, replace that character with an empty string. The str. replace() method will replace all occurrences of the specific character mentioned.

How do you replace symbols in Python?

replace() method helps to replace the occurrence of the given old character with the new character or substring. The method contains the parameters like old(a character that you wish to replace), new(a new character you would like to replace with), and count(a number of times you want to replace the character).

One way is to use re.sub, that's my preferred way.

import re
my_str = "hey th~!ere"
my_new_string = re.sub('[^a-zA-Z0-9 \n\.]', '', my_str)
print my_new_string

Output:

hey there

Another way is to use re.escape:

import string
import re

my_str = "hey th~!ere"

chars = re.escape(string.punctuation)
print re.sub(r'['+chars+']', '',my_str)

Output:

hey there

Just a small tip about parameters style in python by PEP-8 parameters should be remove_special_chars and not removeSpecialChars

Also if you want to keep the spaces just change [^a-zA-Z0-9 \n\.] to [^a-zA-Z0-9\n\.]

str.replace is the wrong function for what you want to do (apart from it being used incorrectly). You want to replace any character of a set with a space, not the whole set with a single space (the latter is what replace does). You can use translate like this:

removeSpecialChars = z.translate ({ord(c): " " for c in "!@#$%^&*()[]{};:,./<>?\|`~-=_+"})

This creates a mapping which maps every character in your list of special characters to a space, then calls translate() on the string, replacing every single character in the set of special characters with a space.

You need to call replace on z and not on str, since you want to replace characters located in the string variable z

removeSpecialChars = z.replace("!@#$%^&*()[]{};:,./<>?\|`~-=_+", " ")

But this will not work, as replace looks for a substring, you will most likely need to use regular expression module re with the sub function:

import re
removeSpecialChars = re.sub("[!@#$%^&*()[]{};:,./<>?\|`~-=_+]", " ", z)

Don't forget the [], which indicates that this is a set of characters to be replaced.

replace special characters in a string python

Tags:

python

string

replace

list

urllib

user2363217

People also ask

3 Answers

Kobi K

rassahah

Danny M

Recent Activity

Donate For Us

replace special characters in a string python

Tags:

python

string

replace

list

urllib

user2363217

People also ask

3 Answers

Kobi K

rassahah

Danny M

Related questions

Recent Activity

Donate For Us