Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

a clear understanding of file, file encoding, file format

I lack a clear understanding of the concepts of file, file encoding and file format. Google helped up to a point. From what I understand so far, all the files are binary, i.e., each byte in such a file can contain any of the 256 possible strings of bits. ASCII files (and here's where we get to the encoding part) are a subset of binary files, where each byte uses only 7 bits.

And here's where things get mixed up. A file format seems to be a way to interpret the bytes in a file, and file extensions seem to be one of the most used ways of identifying a file format.

Does this mean there are formats defined for binary files and formats defined for ASCII files? Are formats like xml, pdf, doc, rtf, html, xls, sql, tex, java, cs "referring" to ASCII files? Whereas formats like jpg, mp3, avi, eps, obj, out, dll are a clue that we're talking about binary files?

like image 502
N56 dH Avatar asked Dec 14 '12 11:12

N56 dH


People also ask

What is encoding file format?

Your computer translates the numeric values into visible characters. It does this is by using an encoding standard. An encoding standard is a numbering scheme that assigns each text character in a character set to a numeric value. A character set can include alphabetical characters, numbers, and other symbols.

What is file encoding Why is it important?

Encoding keeps your data safe since the files are not readable unless you have access to the algorithms that were used to encode it. This is a good way to protect your data from theft since any stolen files would not be usable.

What does file format mean?

A file format refers to the way data are arranged logically within a file. File formatting allows a program to retrieve data, correctly interpret the information and continue with processing.


1 Answers

I don't think you can talk about ASCII and BINARY files, but TEXT and BINARY files.

In that sense, these are text files: XML, HTML, RTF, SQL, TEXT, JAVA, CSS, EPS.

And these are binary files: PDF, DOC, XLS, JPG, MP3, AVI, OBJ, DLL.

ASCII is just a table of characters used in the beginning of computing to represent text, but its is nowadays somewhat discouraged since it can't represent text in languages such as Chinese, Arabic, Spanish (word with ñ, Ñ, tildes), French and others. Nowadays other CHARACTER REPRESENTATIONS are encouraged instead of ASCII. The most well known is probably UTF-8. But there are others like ISO-8859-1, ISO-8859-3 and such. Take a look at this article by Joel Spolsky talking about UNICODE. It's very enlightening.

File formats are just another very different issue. File formats are protocols which programs agree on, to represent information. In that sense, a JPG file is an image that has a certain (well know) internal format that allows programs (Browsers, Spreadsheets, Word Processors) to use them as images.

Text files also have formats (I.E., there are specifications for text files like XML and HTML). Its format, as in JPG and other binary files permits applications to use them in a coherent and specific way to achieve something: I.E., render a WEB PAGE (HTML and XHTML file format).

like image 107
Pablo Santa Cruz Avatar answered Oct 21 '22 21:10

Pablo Santa Cruz