Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How exactly are data types represented in a computer?

I'm a beginning programmer reading K&R, and I feel as if the book assumes a lot of previous knowledge. One aspect that confuses me is the actual representation, or should I say existence, of variables in memory. What exactly does a data type specify for a variable? I'm not too sure of how to word this question... but I'll ask a few questions and perhaps someone can come up with a coherent answer for me.

When using getchar(), I was told that it is better to use type "int" than type "char" due to the fact that "int" can hold more values while "char" can hold only 256 values. Since we may need the variable to hold the EOF value, we will need more than 256 or the EOF value will overlap with one of the 256 characters. In my mind, I view this as a bunch of boxes with empty holes. Could someone give me a better representation? Do these "boxes" have index numbers? When EOF overlaps with a value in the 256 available values, can we predict which value it will overlap with?

Also, does this mean that the data type "char" is only fine to use when we are simply assigning a value to a variable manually, such as char c = 'a', when we definitely know that we will only have 256 possible ASCII characters?

Also, what is the actual important difference between "char" and "int"? If we can use "int" type instead of "char" type, why do we decide to use one over the other at certain times? Is it to save "memory" (I use quotes as I do not actually how "memory" exactly works).

Lastly, how exactly is the 256 available values of type char obtained? I read something about modulo 2^n, where n = 8, but why does that work (something to do with binary?). What is the modulo portion of "modulo 2^n" mean (if it has any relevance to modular arithmetic, I can't see the relation...)?

like image 206
withchemicals Avatar asked Jan 09 '10 17:01

withchemicals


1 Answers

Great questions. K&R was written back in the days when there was a lot less to know about computers, and so programmers knew a lot more about the hardware. Every programmer ought to be familiar with this stuff, but (understandably) many beginning programmers aren't.

At Carnegie Mellon University they developed an entire course to fill in this gap in knowledge, which I was a TA for. I recommend the textbook for that class: "Computer Systems: A Programmer's Perspective" http://amzn.com/013034074X/

The answers to your questions are longer than can really be covered here, but I'll give you some brief pointers for your own research.

Basically, computers store all information--whether in memory (RAM) or on disk--in binary, a base-2 number system (as opposed to decimal, which is base 10). One binary digit is called a bit. Computers tend to work with memory in 8-bit chunks called bytes.

A char in C is one byte. An int is typically four bytes (although it can be different on different machines). So a char can hold only 256 possible values, 2^8. An int can hold 2^32 different values.

For more, definitely read the book, or read a few Wikipedia pages:

  • http://en.wikipedia.org/wiki/Binary_numeral_system
  • http://en.wikipedia.org/wiki/Twos_complement

Best of luck!

UPDATE with info on modular arithmetic as requested:

First, read up on modular arithmetic: http://en.wikipedia.org/wiki/Modular_arithmetic

Basically, in a two's complement system, an n-bit number really represents an equivalence class of integers modulo 2^n.

If that seems to make it more complicated instead of less, then the key things to know are simply:

  • An unsigned n-bit number holds values from 0 to 2^n-1. The values "wrap around", so e.g., when you add two numbers and get 2^n, you really get zero. (This is called "overflow".)
  • A signed n-bit number holds values from -2^(n-1) to 2^(n-1)-1. Numbers still wrap around, but the highest number wraps around to the most negative, and it starts counting up towards zero from there.

So, an unsigned byte (8-bit number) can be 0 to 255. 255 + 1 wraps around to 0. 255 + 2 ends up as 1, and so forth. A signed byte can be -128 to 127. 127 + 1 ends up as -128. (!) 127 + 2 ends up as -127, etc.

like image 142
jasoncrawford Avatar answered Sep 29 '22 03:09

jasoncrawford