Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why writing speed of RAM is coming far less than what is mentioned on it?

I uses below command to measure the ram writing speed but it is showing me far less than what is mentioned on RAM.

time dd if=/dev/zero of=tes bs=100M count=10 oflag=dsync && sync
10+0 records in
10+0 records out
1048576000 bytes (1.0 GB) copied, 1.05167 s, 997 MB/s

real    0m1.056s
user    0m0.001s
sys     0m1.053s

I am using DDR3 and calculating theoretical max ram speed by below formula is:

Max transfer rate= clock x no of bits / 8
DIMM module transfer 64 bits 
Max Theoretical Transfer Rate= clock x (64/8)
=1333 x 8  
=10,664 MB/s

So theoretical expected speed should be 10 GB/s (approx) but in reality it is comming out far less. So Can anyone please tell me why? Thanks in advance!

like image 365
humanshu Avatar asked Sep 30 '22 06:09

humanshu


2 Answers

dd measures not RAM speed but filesystem speed. Even if you were to dd to /dev/shm (on Linux systems /dev/shm is a ramdisk), you're still measuring mostly the filesystem overhead and very little of the memory write throughput.

There are memory test tools for checking RAM speeds, both Linux command-line or boot-into. I use the boot-into memtest86 when checking my systems.

Your "max bandwidth" calculation does not account for address and cycle times; actual max throughput will be less. On my DDR3 AMD system I measure a little above 4GB/sec actual read throughput (Intel is higher I believe).

like image 172
Andras Avatar answered Oct 21 '22 06:10

Andras


IMO, there are several wrong assumptions in the question, but it is interesting anyway.

The calculation of theoretical RAM speed proposed in the question seems to forget multi-channel architectures. I would use the following formula:

Max transfer rate = clock frequency * transfers per clock * interface width * number of interfaces
                    to be divided by 8 to get the results in bytes/s

In your example, clock frequency = 667 MHz, transfers per clock = 2 (because it is DDR-1333 memory), interface width = 64 bits, and the number of interfaces depends on your motherboard and the number of plugged memory modules. Most recent PCs provide 2 channels. Recent servers provide 3 or 4 channels. The number of interfaces is min(number of modules per CPU, number of channels).

Some information about the burst rate of the DD3 memory: http://en.wikipedia.org/wiki/DDR3_SDRAM

Now, you have to keep in mind that this bandwidth corresponds to a theoretical burst rate, generally only sustainable for brief periods of time. Furthermore, it only qualifies the memory module capabilities, it means nothing for the front side bus and the CPU memory controllers. In other words, even with very fast memory modules, a slow CPU may not be able to saturate the memory bandwidth. Bottlenecks are not always in the memory modules.

On ccNUMA machines (most servers with 2 or 4 sockets), if a CPU core needs to access a block located on a memory bank attached to another CPUs, the interconnection bus (QPI or hypertransport) will be used. This bus can also be a bottleneck.

Finally, I think the methodology of the test (using dd) is flawed, because:

  • It does not exercise only memory transfers, because dd uses the filesystem interface. Even assuming that the resulting file is hosted in a memory filesystem (such as tmpfs or /dev/shm), dd will make system calls to perform the operation, which brings additional costs.

  • dd is a single-threaded process. One single core may not be enough to saturate the whole memory bandwidth. On a server with multiple sockets, this is 100% guaranteed. On a single socket system, I guess it depends on the CPU itself.

If you really want to evaluate the actual memory bandwidth and compare it to the theoretical limit, I would suggest to use a benchmark program designed for this purpose. For instance the STREAM benchmark is often used to measure the sustainable memory bandwidth.

like image 1
Didier Spezia Avatar answered Oct 21 '22 06:10

Didier Spezia