Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to remove this special character?

Tags:

python

string

I was trying to unify the lines in my file when I observed the following:

word1 word2
word1 word2

I did not understand why these lines were not combined so I opened the file in vim and used :set list to see if there are any special characters and I found this:

 word1 <feff>word2
 word1 word2

I am not sure how to clean this word in Python. Any suggestions on what character might be and how this can be cleaned?

like image 253
Legend Avatar asked Jul 22 '11 06:07

Legend


2 Answers

U+FEFF is the Byte Order Mark character, which should only occur at the start of a document. In documents, it should be treated as a ZERO WIDTH NON-BREAKING SPACE. If this causes issues, you can remove it like any other character:

>>> s = u'word1 \ufeffword2'
>>> s = s.replace(u'\ufeff', '')
>>> s
u'word1 word2'

(In Python 3.1 or 3.2, drop the u in front of strings)

like image 106
phihag Avatar answered Nov 14 '22 06:11

phihag


Have you tried mytext.split(string.whitespace) ?

like image 35
Matt N. Avatar answered Nov 14 '22 06:11

Matt N.