Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I determine if a file is binary or text in c#? [duplicate]

I need to determine in 80% if a file is binary or text, is there any way to do it even quick and dirty/ugly in c#?

like image 644
Pablo Retyk Avatar asked May 26 '09 14:05

Pablo Retyk


People also ask

How can you tell if a file is text or binary?

We can usually tell if a file is binary or text based on its file extension. This is because by convention the extension reflects the file format, and it is ultimately the file format that dictates whether the file data is binary or text.

How do you tell if a file is binary or Ascii in Windows?

You can either check for values with ASCII code <128 or for some charset you define (e.g. 'a'-'z','A'-'Z','0'-'9'...) and treat the file as binary if it contains some other characters. You could also check for regular linebreaks (0x10 or 0x13,0x10) to detect text files.

Is a file a binary?

A binary file is a file whose content is in a binary format consisting of a series of sequential bytes, each of which is eight bits in length. The content must be interpreted by a program or a hardware processor that understands in advance exactly how that content is formatted and how to read the data.


2 Answers

There's a method called Markov Chains. Scan a few model files of both kinds and for each byte value from 0 to 255 gather stats (basically probability) of a subsequent value. This will give you a 64Kb (256x256) profile you can compare your runtime files against (within a % threshold).

Supposedly, this is how browsers' Auto-Detect Encoding feature works.

like image 62
Stop Putin Stop War Avatar answered Sep 22 '22 13:09

Stop Putin Stop War


I would probably look for an abundance of control characters which would typically be present in a binary file but rarely in an text file. Binary files tend to use 0 enough that just testing for many 0 bytes would probably be sufficient to catch most files. If you care about localization you'd need to test multi-byte patterns as well.

As stated though, you can always be unlucky and get a binary file that looks like text or vice versa.

like image 31
Ron Warholic Avatar answered Sep 18 '22 13:09

Ron Warholic