Split function add: \xef\xbb\xbf...\n to my list

Tags:

split

I want to open my file.txt and split all data from this file.

Here is my file.txt:

some_data1 some_data2 some_data3 some_data4 some_data5

and here is my python code:

>>>file_txt = open("file.txt", 'r') >>>data = file_txt.read() >>>data_list = data.split(' ') >>>print data some_data1 some_data2 some_data3 some_data4 some_data5 >>>print data_list ['\xef\xbb\xbfsome_data1', 'some_data1', "some_data1", 'some_data1', 'some_data1\n']

As you can see here, when I print my data_list it adds to my list this: \xef\xbb\xbf and this: \n. What are these and how can I clean my list from them.

Thanks.

430

asked Sep 06 '13 18:09

1 Answers

Your file contains UTF-8 BOM in the beginning.

To get rid of it, first decode your file contents to unicode.

fp = open("file.txt") data = fp.read().decode("utf-8-sig").encode("utf-8")

But better don't encode it back to utf-8, but work with unicoded text. There is a good rule: decode all your input text data to unicode as soon as possible, and work only with unicode; and encode the output data to the required encoding as late as possible. This will save you from many headaches.

To read bigger files in a certain encoding, use io.open or codecs.open.

Also check this.

Use str.strip() or str.rstrip() to get rid of the newline character \n.

answered Oct 04 '22 17:10

warvariuc

Related questions
                            
                                Pytest: Deselecting tests
                            
                                Python copy files to a new directory and rename if file name already exists
                            
                                In numpy.sum() there is parameter called "keepdims". What does it do?
                            
                                Sqlalchemy delete subquery
                            
                                numpy.sin function in degrees?
                            
                                How can I use pywin32 with a virtualenv without having to include the host environment's site-packages folder?
                            
                                Fast replacement of values in a numpy array
                            
                                How can I change the x axis in matplotlib so there is no white space?
                            
                                How can i use multiple requests and pass items in between them in scrapy python
                            
                                How to debug python application under uWSGI?
                            
                                Converting datetime to POSIX time
                            
                                Custom validation in Django admin
                            
                                round off float to nearest 0.5 in python [duplicate]
                            
                                Modulo operator in Python
                            
                                Python for a Perl programmer
                            
                                How to dynamically add / remove periodic tasks to Celery (celerybeat)
                            
                                How to execute "left outer join" in SqlAlchemy
                            
                                Why does Python not support record type? (i.e. mutable namedtuple)
                            
                                Why Python is so slow for a simple for loop?
                            
                                pdb.set_trace() causing frozen nosetests, does not drop into debugger

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Split function add: \xef\xbb\xbf...\n to my list

Tags:

python

split

Michael

People also ask

1 Answers

warvariuc

Recent Activity

Donate For Us