Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is the "file" command get confused on .py files?

Tags:

python

I have several python modules that I've written. Randomly, I used file on this directory, and I was really surprised by what I saw. Here's the resulting count of what it thought the files were:

  1 ASCII Java program text, with very long lines
  1 a /bin/env python script text executable
  1 a python script text executable
  2 ASCII C++ program text
  4 ASCII English text
 18 ASCII Java program text

That's strange! Any idea what's going on or why it seems to think python modules are very often java files?

I'm using CentOS 5.2.

Edit The question is more geared towards my curiosity on why obviously non-java and non-c++ program file were being classified as such. Certainly I don't expect file to be perfect, but was surprised on the choices that were being made. I would have guessed it would just give up and say text file rather than making very incorrect inferences.

like image 368
pythonic metaphor Avatar asked Dec 13 '22 15:12

pythonic metaphor


2 Answers

I just ran a test and in every case of incorrect identification, there was no shebang line.

For every file that had:

#!/usr/bin/env python

file correctly identified it.

Looking at the magic file, another thing that triggers recognition as a Python file is a triple quote on the first line.

$ echo '"""' | file -
/dev/stdin: python script text executable
$ echo '#!/usr/bin/python' | file -
/dev/stdin: python script text executable
$ echo '#!/usr/bin/env python' | file -
/dev/stdin: a python script text executable
like image 70
Dennis Williamson Avatar answered Dec 15 '22 05:12

Dennis Williamson


From the file man page

File tests each argument in an attempt to classify it. There are three sets of tests, performed in this order: filesystem tests, magic number tests, and language tests. The first test that succeeds causes the file type to be printed.

My guess is that some of your files happen to match tests for different languages and incorrectly identify the file.

Also, file is generally intended for binary files, as the bugs section indicates.

file uses several algorithms that favor speed over accuracy, thus it can be misled about the contents of text files.

The support for text files (primarily for programming languages) is simplistic, inefficient and requires recompilation to update.

like image 43
Dan H Avatar answered Dec 15 '22 05:12

Dan H