I lack a clear understanding of the concepts of file, file encoding and file format. Google helped up to a point. From what I understand so far, all the files are binary, i.e., each byte in such a file can contain any of the 256 possible strings of bits. ASCII files (and here's where we get to the encoding part) are a subset of binary files, where each byte uses only 7 bits.
And here's where things get mixed up. A file format seems to be a way to interpret the bytes in a file, and file extensions seem to be one of the most used ways of identifying a file format.
Does this mean there are formats defined for binary files and formats defined for ASCII files? Are formats like xml, pdf, doc, rtf, html, xls, sql, tex, java, cs "referring" to ASCII files? Whereas formats like jpg, mp3, avi, eps, obj, out, dll are a clue that we're talking about binary files?
Your computer translates the numeric values into visible characters. It does this is by using an encoding standard. An encoding standard is a numbering scheme that assigns each text character in a character set to a numeric value. A character set can include alphabetical characters, numbers, and other symbols.
Encoding keeps your data safe since the files are not readable unless you have access to the algorithms that were used to encode it. This is a good way to protect your data from theft since any stolen files would not be usable.
A file format refers to the way data are arranged logically within a file. File formatting allows a program to retrieve data, correctly interpret the information and continue with processing.
I don't think you can talk about ASCII and BINARY files, but TEXT and BINARY files.
In that sense, these are text files: XML, HTML, RTF, SQL, TEXT, JAVA, CSS, EPS.
And these are binary files: PDF, DOC, XLS, JPG, MP3, AVI, OBJ, DLL.
ASCII is just a table of characters used in the beginning of computing to represent text, but its is nowadays somewhat discouraged since it can't represent text in languages such as Chinese, Arabic, Spanish (word with ñ, Ñ, tildes), French and others. Nowadays other CHARACTER REPRESENTATIONS are encouraged instead of ASCII. The most well known is probably UTF-8. But there are others like ISO-8859-1, ISO-8859-3 and such. Take a look at this article by Joel Spolsky talking about UNICODE. It's very enlightening.
File formats are just another very different issue. File formats are protocols which programs agree on, to represent information. In that sense, a JPG file is an image that has a certain (well know) internal format that allows programs (Browsers, Spreadsheets, Word Processors) to use them as images.
Text files also have formats (I.E., there are specifications for text files like XML and HTML). Its format, as in JPG and other binary files permits applications to use them in a coherent and specific way to achieve something: I.E., render a WEB PAGE (HTML and XHTML file format).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With