Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python file.read() seeing junk characters at the beginning of a file

Tags:

python

file-io

I'm trying to use Python to concatenate a few javascript files together before minifying them, basically like so:

outfile = open("output.js", "w")
for somefile in a_list_of_file_names:
    js = open(somefile)
    outfile.write(js.read())
    js.close()
outfile.close()

The minifier complains about illegal characters and syntax errors at the beginning of each file, so I did some diagnostics.

>>> r = open("output.js")
>>> somestring = r.readline()
>>> somestring
'\xef\xbb\xbfvar $j = jQuery.noConflict(),\n'
>>> print somestring
var $j = jQuery.noConflict(),

The first line of the file should, of course be "var $j = jQuery.noConflict(),"

In case it makes a difference, I'm working from within Windows.

Any thoughts?

Edit: Here's what I'm getting from the minifier:

U:\>java -jar c:\path\yuicompressor-2.4.2.jar c:\path\somefile.js -o c:\path\bccsminified.js --type js -v

[INFO] Using charset Cp1252

[ERROR] 1:2:illegal character

[ERROR] 1:2:syntax error

[ERROR] 1:3:illegal character
like image 726
Chris Avatar asked Aug 12 '10 15:08

Chris


People also ask

How do I find the special characters in a file using Python?

Approach : Make a regular expression(regex) object of all the special characters that we don't want, then pass a string in search method. If any one character of string is matching with regex object then search method returns a match object otherwise return None.

How do I read a specific part of a file in Python?

You can seek into the file the file and then read a certain amount from there. Seek allows you to get to a specific offset within a file, and then you can limit your read to only the number of bytes in that range. That will only read that data that you're looking for. Save this answer.

How do you get to the beginning of a file in Python?

Seek the Beginning of the File We can move the file pointer to the beginning of the file using the seek() method by passing the setting whence to 0.


2 Answers

That's a UTF-8 BOM (Byte Order Mark). You've probably edited the file with Notepad.

like image 163
Ned Batchelder Avatar answered Nov 15 '22 04:11

Ned Batchelder


EF BB BF is a Unicode Byte-Order Mark (BOM). Those are actually in your files. That's why Python is seeing it.

Either ignore/discard the BOM or reencode the files to omit it.

like image 41
Borealid Avatar answered Nov 15 '22 04:11

Borealid