If all values are nothing more than one or more bytes, and no byte can contain metadata, how does the system keep track of what sort of number a byte represents? Looking into Two's Complement and Single Point on Wikipedia reveals how these numbers can be represented in base-two, but I'm still left wondering how the compiler or processor (not sure which I'm really dealing with here) determines that this byte must be a signed integer.
It is analogous to receiving an encrypted letter and, looking at my shelf of cyphers, wondering which one to grab. Some indicator is necessary.
If I think about what I might do to solve this problem, two solutions come to mind. Either I would claim an additional byte and use it to store a description, or I would allocate sections of memory specifically for numerical representations; a section for signed numbers, a section for floats, etc.
I'm dealing primarily with C on a Unix system but this may be a more general question.
The GetType() method of array class in C# gets the Type of the current instance. To get the type. Type tp = value. GetType();
Floating-point, integer, double, character. Union, structure, array, etc. The basic data types are also known as the primary data types in C programming.
Each data type requires an amount of memory and performs specific operations. There are some common data types in C − int − Used to store an integer value. char − Used to store a single character. float − Used to store decimal numbers with single precision.
Primary data types in C are of 4 types: int, char, float, and double.
how does the system keep track of what sort of number a byte represents?
"The system" doesn't. During translation, the compiler knows the types of the objects it's dealing with, and generates the appropriate machine instructions for dealing with those values.
Ooh, good question. Let's start with the CPU - assuming an Intel x86 chip.
It turns out the CPU does not know whether a byte is "signed" or "unsigned." So when you add two numbers - or do any operation - a "status register" flag is set.
Take a look at the "sign flag." When you add two numbers, the CPU does just that - adds the numbers and stores the result in a register. But the CPU says "if instead we interpreted these numbers as twos complement signed integers, is the result negative?" If so, then that "sign flag" is set to 1.
So if your program cares about signed vs unsigned, writing in assembly, you would check the status of that flag and the rest of your program would perform a different task based on that flag.
So when you use signed int
versus unsigned int
in C, you are basically telling the compiler how (or whether) to use that sign flag.
The code that is executed has no information about the types. The only tool that knows the types is the compiler at the time it compiles the code. Types in C are solely a restriction at compile time to prevent you from using the wrong type somewhere. While compiling, the C compiler keeps track of the type of each variable and therefore knows which type belongs to which variable.
This is the reason why you need to use format strings in printf
, for example. printf
has no chance of knowing what type it will get in the parameter list as this information is lost. In languages like go or java you have a runtime with reflection capabilities which makes it possible to get the type.
Suppose your compiled C code would still have type information in it, there would be the need for the resulting assembler language to check for types. It turns out that the only thing close to types in assembly is size of the operands for an instruction determined by suffixes (in GAS). So what is left from your type information is the size and nothing more.
One example for assembly which supports type is the java VM bytecode, which has type suffixes for operands for primitives.
It is important to remember that C and C++ are high level languages. The compiler's job is to take the plain text representation of the code and build it into the platform specific instructions the target platform is expecting to execute. For most people using PCs this tends to be x86 assembly.
This is why C and C++ are so loose with how they define the basic data types. For example most people say there are 8 bits in a byte. This is not defined by the standard and there is nothing against some machine out there having 7 bits per byte as its native interpretation of data. The standard only recognizes that a byte is the smallest addressable unit of data.
So the interpretation of data is up to the instruction set of the processor. In many modern languages there is another abstraction on top of this, the Virtual Machine.
If you write your own scripting language it is up to you to define how you interpret your data in software.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With