I wonder if there is a penalty for running Dalvik+JIT on a multi-core ARM chip vs a single core chip?
E.g., if I disable multi-core support in my Android system build and execute the entire phone with a single CPU core, will I get higher performance when running a single-threaded Java benchmark?
How much is the cost of memory barrier and synchronization on multi-core?
I am asking because I vaugely remember seeing single-threaded benchmark scores from single core phones vs dual core phones. As long as the Mhz is about the same, there is no big difference between the two phones. I had expected a slow down in the dual-core phone ....
The simple answer is "why don't you try it and find out?"
The complex answer is this: There are costs to doing multicore synchronization but there are also benefits to have multiple cores. You can undoubtedly devise a pathological case where a program suffers from the additional overhead of synchronization primitives such that it is deeply affected by their performance. This is usually due to locking at too deep of a level (inside your fast loop). But in the general case, the fact that the dozen other system programs are able to get CPU time on other cores, as well as the kernel servicing interrupts and IO on them instead of interrupting your process, are likely to greatly overwhelm the penalty incurred by MP synchronization.
In answer to your question, a DSB can take dozens or hundreds of cycles and a DMB is likely more costly. Depending on the implementation exclusive load-store instructions can be very fast or very slow. WFE can consume several microseconds, though it shouldn't be needed if you are not experiencing contention.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With