Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why are there different encoding types?

This is a noob question, but I wanna know why there are different encoding types and what are their differences (ie. ASCII, utf-8 and 16, base64, etc.)

like image 209
Coola Avatar asked Apr 10 '12 12:04

Coola


People also ask

What is the purpose of encoding explain different types of encoding with a diagram?

Encoding is the process of converting the data or a given sequence of characters, symbols, alphabets etc., into a specified format, for the secured transmission of data. Decoding is the reverse process of encoding which is to extract the information from the converted format.

What is the purpose of encoding?

The purpose of encoding is to transform data so that it can be properly (and safely) consumed by a different type of system, e.g. binary data being sent over email, or viewing special characters on a web page. The goal is not to keep information secret, but rather to ensure that it's able to be properly consumed.

What are the reasons for encoding data?

Encoding data is typically used to ensure the integrity and usability of data and is commonly used when data cannot be transferred in its current format between systems or applications. Encoding is not used to protect or secure data because it is easy to reverse.


1 Answers

Reasons are many I believe but the main point is: "How many characters you need to display (encode)?" If you live in US for example, you could go pretty far with ASCII. But in many counties we need characters like ä, å, ü etc. (If SO was ASCII only or you try to read this text as ASCII encoded text, you'd see some weird characters in the places of ä, å and ü.) Think also the China, Japan, Thailand and other "exotic" countires. Those weird figures on photos you may have seen around the world just might be letters, not pretty pictures.

As for the differences between different encoding types you need to see their specification. Here's something for UTF-8.

  • http://www.unicode.org/standard/standard.html
  • http://www.utf-8.com/
  • http://en.wikipedia.org/wiki/UTF-8#Compared_to_other_multi-byte_encodings

I'm not familiar with UTF-16. Here's some information about the differences.

  • http://en.wikipedia.org/wiki/Unicode
  • http://en.wikipedia.org/wiki/Unicode_plane

Base64 is used when there is a need to encode binary data that needs to be stored and transferred over media that are designed to deal with textual data. If you've ever made somesort of email system with PHP, you've probably encountered Base64.

  • http://en.wikipedia.org/wiki/Base64
  • http://www.phpeveryday.com/articles/PHP-Email-Using-Embedded-Images-in-HTML-Email-P113.html

Is short: To support computer program's user interface localizations to many different languages. (Programming languages still mainly consist of characters found in ASCII encoding, althought it's possible for example in Java to use UTF-8 encoding in variable names, and the source code file is usually stored as something else than ASCII encoded text, for example UTF-8 encoding.)

In short vol.2: Always when different people are trying to solve some problem from a specific point of view (or even without a point of view if it's even possible), results may be quite different. Quote from Joel's unicode article (link below): "Because bytes have room for up to eight bits, lots of people got to thinking, "gosh, we can use the codes 128-255 for our own purposes." The trouble was, lots of people had this idea at the same time, and they had their own ideas of what should go where in the space from 128 to 255."

Thanks to Joachim and tchrist for all the info and discussion. Here's two articles I just read. (Both links are on the page I linked to earlier.) I'd forgotten most of the stuff from Joel's article since I last read it a few years back. Good introduction to the subject I hope. Mark Davis goes a little deeper.

  • http://www.joelonsoftware.com/articles/Unicode.html
  • http://www.icu-project.org/docs/papers/forms_of_unicode/
like image 181
ZZ-bb Avatar answered Oct 11 '22 19:10

ZZ-bb