Approximate cost to access various caches and main memory?

2 Answers

Numbers everyone should know

           0.5 ns - CPU L1 dCACHE reference
           1   ns - speed-of-light (a photon) travel a 1 ft (30.5cm) distance
           5   ns - CPU L1 iCACHE Branch mispredict
           7   ns - CPU L2  CACHE reference
          71   ns - CPU cross-QPI/NUMA best  case on XEON E5-46*
         100   ns - MUTEX lock/unlock
         100   ns - own DDR MEMORY reference
         135   ns - CPU cross-QPI/NUMA best  case on XEON E7-*
         202   ns - CPU cross-QPI/NUMA worst case on XEON E7-*
         325   ns - CPU cross-QPI/NUMA worst case on XEON E5-46*
      10,000   ns - Compress 1K bytes with Zippy PROCESS
      20,000   ns - Send 2K bytes over 1 Gbps NETWORK
     250,000   ns - Read 1 MB sequentially from MEMORY
     500,000   ns - Round trip within a same DataCenter
  10,000,000   ns - DISK seek
  10,000,000   ns - Read 1 MB sequentially from NETWORK
  30,000,000   ns - Read 1 MB sequentially from DISK
 150,000,000   ns - Send a NETWORK packet CA -> Netherlands
|   |   |   |
|   |   | ns|
|   | us|
| ms|

From: Originally by Peter Norvig:
- http://norvig.com/21-days.html#answers
- http://surana.wordpress.com/2009/01/01/numbers-everyone-should-know/,
- http://sites.google.com/site/io/building-scalable-web-applications-with-google-app-engine

a visual comparison

223

answered Oct 16 '22 22:10

Andrey

Here is a Performance Analysis Guide for the i7 and Xeon range of processors. I should stress, this has what you need and more (for example, check page 22 for some timings & cycles for example).

Additionally, this page has some details on clock cycles etc. The second link served the following numbers:

Core i7 Xeon 5500 Series Data Source Latency (approximate)               [Pg. 22]

local  L1 CACHE hit,                              ~4 cycles (   2.1 -  1.2 ns )
local  L2 CACHE hit,                             ~10 cycles (   5.3 -  3.0 ns )
local  L3 CACHE hit, line unshared               ~40 cycles (  21.4 - 12.0 ns )
local  L3 CACHE hit, shared line in another core ~65 cycles (  34.8 - 19.5 ns )
local  L3 CACHE hit, modified in another core    ~75 cycles (  40.2 - 22.5 ns )

remote L3 CACHE (Ref: Fig.1 [Pg. 5])        ~100-300 cycles ( 160.7 - 30.0 ns )

local  DRAM                                                   ~60 ns
remote DRAM                                                  ~100 ns

EDIT2:
The most important is the notice under the cited table, saying:

_{"NOTE: THESE VALUES ARE ROUGH APPROXIMATIONS. THEY DEPEND ON
CORE AND UNCORE FREQUENCIES, MEMORY SPEEDS, BIOS SETTINGS,
NUMBERS OF DIMMS, ETC,ETC..YOUR MILEAGE MAY VARY."}

EDIT: I should highlight that, as well as timing/cycle information, the above intel document addresses much more (extremely) useful details of the i7 and Xeon range of processors (from a performance point of view).

answered Oct 16 '22 21:10

Dave

Related questions
                            
                                How do I release memory used by a pandas dataframe?
                            
                                What is the maximum amount of RAM an app can use?
                            
                                In C, do braces act as a stack frame?
                            
                                Java using much more memory than heap size (or size correctly Docker memory limit)
                            
                                java.lang.OutOfMemoryError: bitmap size exceeds VM budget - Android
                            
                                scala vs java, performance and memory? [closed]
                            
                                Determining memory usage of objects?
                            
                                Calculate size of Object in Java [duplicate]
                            
                                Difference between "on-heap" and "off-heap"
                            
                                ios app maximum memory budget
                            
                                How do I determine the correct "max-old-space-size" for node.js?
                            
                                How to clear variables in ipython?
                            
                                String literals: Where do they go?
                            
                                Redis cache vs using memory directly
                            
                                What does "Memory allocated at compile time" really mean?
                            
                                What happens when a computer program runs?
                            
                                What is the difference between buffer and cache memory in Linux?
                            
                                Where in memory are my variables stored in C?
                            
                                Why does appending "" to a String save memory?
                            
                                How do cache lines work?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Approximate cost to access various caches and main memory?

Tags:

performance

memory

cpu-cache

latency

low-latency

Ted Graham

People also ask

2 Answers

Andrey

Dave

Recent Activity

Donate For Us