I have to determine if the mars simulator is big or little endian as homework, this seems pretty straightforward at first, but I am having some issues.
First I tried storing 4 bytes in memory with .byte 0, 0, 0, 1, in memory this appears as 0x01000000, so, in reverse order, which seems to indicate that the simulator is little endian, however, when I load the 4 bytes as an integer to a register, what appears in the register is 0x01000000 again, as I understand if it was little endian what would be loaded is 0x00000001.
Also, when storing 4 bytes with .word 1, what is stored is 0x00000001, no bytes reversed this time.
I would like to know whether the simulator is big or little endian, and an explanation to this behaviour
Since MIPS assumes a Big Endian organization, the book will label the MSB as bit 0, and the LSB as bit 31 in a word, and is bit 63 in a double word.
If it is little-endian, it would be stored as “01 00 00 00”. The program checks the first byte by dereferencing the cptr pointer. If it equals to 0, it means the processor is big-endian(“00 00 00 01”), If it equals to 1, it means the processor is little-endian (“01 00 00 00”).
Intel based processors are little endians. ARM processors were little endians. Current generation ARM processors are bi-endian.
Solely big-endian architectures include the IBM z/Architecture and OpenRISC. Some instruction set architectures are "bi-endian" and allow running software of either endianness; these include Power ISA, SPARC, ARM AArch64, C-Sky, and RISC-V.
There are several layers in your question involved, so I try to address them one by one...
The machine has memory addressable by bytes. First byte has address 0, second has address 1, etc... Whenever I will write about content of memory in this answer, I will use this formatting: 01 02 0E 0F 10 ...
, using hexadecimal values and using spaces between bytes, with addresses going continually from starting address toward ending address. I.e. if this content would start at address 0x800000, the memory would be (all hexa):
address | byte value
------- | ----------
800000 | 01
800001 | 02
800002 | 0E
800003 | 0F
800004 | 10
800005 | ...
So far it does not matter, whether the target MIPS platform is little or big endian, as long as byte-sized memory is involved, the order of bytes is "normal".
If you would load byte from address 0x800000
into t0
(with lb
instruction), t0
will be equal to value 1
.
If you would load word from address 0x800000
into t0
(with lw
instruction), the endianness will come to play finally.
On little-endian machine the t0
will be equal to value 0x0F0E0201
, the first byte of word (in memory) is amount of 2560 (the lowest power), second is amount of 2561, ... the last one is amount of 2563.
On big-endian machine the t0
will be equal to value 0x01020E0F
, the first byte of word (in memory) is amount of 2563, second is amount of 2562, ... the last one is amount of 2560.
(256 is 28, and that magic number comes from "one byte is 8 bits", one bit can contain two values (0 or 1), and one byte has 8 bits, so one byte can contain 28 different values)
In both cases the CPU will read the same four bytes from memory (at addresses 0x800000 to 0x800003), but the endianness defines in which order they will appear as the final 32 bits of word value.
The t0
is physically formed by 32 bits on the CPU chip, it has no address. When you want to address it in CPU instruction (i.e. use value stored in t0
), you encode it into instruction as $8
register ($8
has $t0
alias for convenience in your assembler, so I'm using that t0
alias rather).
The endianness does not apply to those 32 bits of register, they are already 32 bits b0-b31, and once the value 0x0F0E0201
is loaded, those 32 bits are set to 0000 1111 0000 1110 ...
(I'm writing it from top b31 bit down to bottom b0, to make sense of shift left/right instructions and also to make it work as human formatted binary number), there's no point to think about endianness of register or in which physical order the bits are stored on the chip, it's enough to think about it as full 32 bit value and in arithmetic instructions it will work as that.
When loading byte value with lb
into register, it lands into b0-b7 bits with b8-b31 containing copy of b7 (sign-extending the signed 8 bit value into signed 32 bit value).
When storing value of register into memory, the endianness again does apply, i.e. storing word
value 0x11223344
into memory will set up individual bytes as 44 33 22 11
.
A well configured assembler for it's target platform will hide the endianness from programmer, to make usage of word values convenient.
So when you define memory value like:
myPreciousValue .word 0x11223344
The assembler will parse text (your source code is text (!), i.e. one character is one byte value - in ASCII encoding, if you write the source in UTF8 text editor and use non-ASCII characters, they may be encoded across multiple bytes, the ASCII printable characters have the same encoding in both ASCII and UTF8, and occupy single byte only) "0x11223344" (10 bytes 30 78 31 31 32 32 33 33 34 34
), calculate 32 bit word value 0x11223344
out of it, and then it will apply target-platform endianness to that to produce four bytes of machine code, either:
44 33 22 11 # little-endian target
or:
11 22 33 44 # big-endian target
When you then use the lw
instruction in your code, to load myPreciousValue
from memory into register, the register will contain the expected word value 0x11223344
(as long as you didn't mix up your assembler configuration and used the wrong endianness, can't happen in MARS/SPIM, as that supports only little-endian configuration in everything (VM, assembler, debugger)).
So the programmer does not have to think about endianness every time he writes the 32 bit value somewhere in the source, the assembler will parse and process it to the target variant of byte values.
If the programmer wants to define four bytes 01 02 03 04
in memory, she can be "smart" and use .word 0x04030201
for little-endian target platform, but that's obfuscating the original intent, so I suggest to use .byte 1, 2, 3, 4
in such case, as the intent of programmer was to define bytes, not word.
When you declare byte values with .byte
directive, they are compiled in the order how you write them, no endianness is applied to that.
And finally memory/register view of debugger... this tool again will try hard to work in intuitive/convenient way, so when you check memory view, and have it configured to bytes, the memory will be shown as:
0x800000: 31 32 33 34 41 42 43 44 | 1234ABCD
When you switch it to "word" view, it will use the configured endianness to concatenate bytes in the target platform order, i.e. in MARS/SPIM as little-endian platform it will show on the same memory:
0x800000: 34333231 44434241
(if the ASCII view is also included, is it "worded" too? If yes, then it will look as 4321 DCBA
. I don't have at the moment MARS/SPIM installed to check what they memory view in debugger actually looks like, sorry)
So you as programmer can read the "word" value directly from display, without shuffling the bytes into "correct" order, you already see what the "word" value will be (from those four bytes of memory content).
The register view usually by default shows hexadecimal word values, i.e. after loading word from that address 0x800000 into t0
, the register $8
will contain value 0x34333231
(875770417
in decimal).
If you are interested what was the value of first byte in memory used for that load, at this point you have to apply your knowledge of endianness of that target platform, and look either at the first two digits "34" (big endian), or last two "31" (little endian) in the register view (or rather use the memory view in byte-view mode to avoid any mistake).
So with all that information above, the runtime detection code should be easy to understand (unfortunately I don't have MARS/SPIM at the moment, so I didn't verify it works, let me know):
.data
checkEndianness: .word 0 # temporary memory for function
# can be avoided by using stack memory instead (in function)
.text
main:
jal IsLittleEndian
# ... do something with v0 value ...
... exit (TODO by reader)
# --- functions ---
# returns (in v0) 0 for big-endian machine, and 1 for little-endian
IsLittleEndian:
# set value of register to 1
li $v0,1
# store the word value 1 into memory (4 bytes written)
sw $v0,(checkEndianness)
# memory contains "01 00 00 00" on little-endian machine
# or "00 00 00 01" on big-endian machine
# load only the first byte back
lb $v0,(checkEndianness)
jr $ra
What is it good for? As long as you write your software for the single target platform, and you are storing/loading words by the target CPU, you don't need to care about endianness.
But if you have software which is multi-platform, and it does save binary files... To make the files work in the same way on both big/little endian platforms, the specification of file structure must specify also endianness of the file data. And then according to that specs, one type of target platforms may read it as "native" word values, the other one will have to shuffle the bytes in word values to read correct word value (plus the specs should also specify how many bytes "word" is :) ). Then such runtime test may be handy, if you will include the shuffler into save/load routines, using the endianness detection routine to decide whether it has to shuffle the word bytes or not. That will make the target platform endianness "transparent" to the remaining code, which will simply send to save/load routine it's native "word" values, and your save/load may use the same source on every platform (at least as long as you use some multi-platform portable programming language like C, of course the assembly for MIPS will not work on different CPUs at all, and would need to be rewritten from scratch).
Also the network communication is often done with custom binary protocols (wrapped usually in the most common TCP/IP packets for the network layer, or even encrypted, but your application will extract the raw bytes content out of it at one point), and then endianness of sent/received data matters, and the "other" platforms have to shuffle the bytes to read correct values then.
Can apply pretty much everything from above, just check what is byte
and word
on the other platform (I think byte
is pretty set in stone as 8 bits for last 35+ years, but word
may differ, for example on x86 platforms word
is 16 bit only). Still little-endian machine will read "word" bytes in "reversed" order, the first byte used as amount of the smallest 2560 power and last byte used as amount of the highest 256 power (2561 on x86 platform, as only two bytes form word there, the MIPS "word" is called "double word" or "dword" in x86 world).
This is from the site: http://courses.missouristate.edu/KenVollmar/mars/Help/MarsHelpDebugging.html
Memory addresses and values, and register values, can be viewed in either
decimal or hexadecimal format. All data are stored in little-endian
byte order (each word consists of byte 3 followed by byte 2 then 1 then 0).
Note that each word can hold 4 characters of a string and those 4
characters will appear in the reverse order from that of the string literal
As you can see it is little-endian
According to the Patterson and Hennessey book (Computer Organization and Design: the Hardware/Software Interface),
On page 70 :
MIPS is in the big endian camp. Since the order matters only if you access the identical data both as a word and as four bytes, few need to be aware of the endianness.
Newer versions of the MIPS chip can support both big and little endian, unlike the previous versions.
As for the MARS simulator, It is considered little-endian.
You can test this, in the data segment (.data
) write:
.data
store: .byte 0,0,0,1
store2: .byte 2,0,0,0 #I loaded with a 2 to avoid confusion.
and when you proceed to assemble the code, you can see how they are stored in the data segment. Pay attention to the Value+0 (0x01000000) and Value+4 (0x00000002)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With