Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Source Code detection with Java

Any idea how to detect a source code (Java, C#, SQL and so on) in a text file with Java without looking at the file extension or using an extraordinary long, selfmade regular expression?

Maybe there are some tools doing this work already?

like image 625
Thomas Avatar asked Oct 28 '11 12:10

Thomas


2 Answers

Linguist

We use this library at GitHub to detect blob languages, highlight code, ignore binary files, suppress generated files in diffs and generate language breakdown graphs.

Unfortunately it is written in Ruby, maybe JRuby can handle it?

like image 135
Tomasz Nurkiewicz Avatar answered Oct 06 '22 00:10

Tomasz Nurkiewicz


You should find a minimalistic amount of keywords and define some logical rules. If you define the right rules, the regular expression defined by them will be not extraordinary big. Note, that the fewer keywrods and rules you have, the bigger is the probability of a mistake (SourceCode = true for a file which is not a source code, SourceCode = false for a file which is a source code). Also, the more keywords and rules you have the more time is needed to check whether a file is a source code or not.

like image 38
Lajos Arpad Avatar answered Oct 06 '22 01:10

Lajos Arpad