Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to remove special characters except space from a file in python?

I have a huge corpus of text (line by line) and I want to remove special characters but sustain the space and structure of the string.

hello? there A-Z-R_T(,**), world, welcome to python.
this **should? the next line#followed- by@ an#other %million^ %%like $this.

should be

hello there A Z R T world welcome to python
this should be the next line followed by another million like this
like image 512
pythonlearn Avatar asked Apr 12 '17 01:04

pythonlearn


People also ask

How do you ignore special characters and spaces in Python?

Remove Special Characters From the String in Python Using the str. isalnum() Method. The str. isalnum() method returns True if the characters are alphanumeric characters, meaning no special characters in the string.

How do I remove unwanted characters from a list in Python?

Python Remove Character from String using translate() Python string translate() function replace each character in the string using the given translation table. We have to specify the Unicode code point for the character and 'None' as a replacement to remove it from the result string.

How do I remove multiple special characters from a string in Python?

To remove multiple characters from a string we can easily use the function str. replace and pass a parameter multiple characters. The String class (Str) provides a method to replace(old_str, new_str) to replace the sub-strings in a string. It replaces all the elements of the old sub-string with the new sub-string.


1 Answers

You can use this pattern, too, with regex:

import re
a = '''hello? there A-Z-R_T(,**), world, welcome to python.
this **should? the next line#followed- by@ an#other %million^ %%like $this.'''

for k in a.split("\n"):
    print(re.sub(r"[^a-zA-Z0-9]+", ' ', k))
    # Or:
    # final = " ".join(re.findall(r"[a-zA-Z0-9]+", k))
    # print(final)

Output:

hello there A Z R T world welcome to python 
this should the next line followed by an other million like this 

Edit:

Otherwise, you can store the final lines into a list:

final = [re.sub(r"[^a-zA-Z0-9]+", ' ', k) for k in a.split("\n")]
print(final)

Output:

['hello there A Z R T world welcome to python ', 'this should the next line followed by an other million like this ']
like image 68
Chiheb Nexus Avatar answered Oct 04 '22 19:10

Chiheb Nexus