Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I programmatically disable hardware prefetching?

I would like to programmatically disable hardware prefetching.

From Optimizing Application Performance on Intel® Core™ Microarchitecture Using Hardware-Implemented Prefetchers and How to Choose between Hardware and Software Prefetch on 32-Bit Intel® Architecture, I need to update the MSR to disable hardware prefetching.

Here is a relevant snippet:

"DPL Prefetch and L2 Streaming Prefetch settings can also be changed programmatically by writing a device driver utility for changing the bits in the IA32_MISC_ENABLE register – MSR 0x1A0. Such a utility offers the ability to enable or disable prefetch mechanisms without requiring any server downtime.

The table below shows the bits in the IA32_MISC_ENABLE MSR that have to be changed in order to control the DPL and L2 Streaming Prefetch:

Prefetcher Type MSR (0x1A0) Bit Value  DPL (Hardware Prefetch) Bit 9 0 = Enable 1 = Disable  L2 Streamer (Adjacent Cache Line Prefetch) Bit 19 0 = Enable 1 = Disable" 

I tried using http://etallen.com/msr.html but this did not work. I also tried using wrmsr in asm/msr.h directly but that segfaults. I tried doing this in a kernel module ... and killed the machine.

BTW - I am using kernel 2.6.18-92.el5 and it has MSR linked in the kernel:

$ grep -i msr /boot/config-$(uname -r) CONFIG_X86_MSR=y ... 
like image 786
Carlos Avatar asked Apr 23 '09 23:04

Carlos


People also ask

Should hardware prefetcher be enabled?

The hardware prefetchers can throttle themselves in response to software prefetching, so even if hardware prefetching is not effective for a certain application, it does not need to be disabled because it will remain mostly inactive.

Does prefetch increase performance?

Only in over-provisioned systems, can prefetching with low predictive accuracy improve performance. However, the data cache is obviously under-provisioned as it can keep only a subset of the data-set. The prefetched data typically shares the cache space with demand-paged data.

What is hardware prefetching?

Hardware based prefetching is typically accomplished by having a dedicated hardware mechanism in the processor that watches the stream of instructions or data being requested by the executing program, recognizes the next few elements that the program might need based on this stream and prefetches into the processor's ...


2 Answers

You can enable or disable the hardware prefetchers using msr-tools http://www.kernel.org/pub/linux/utils/cpu/msr-tools/.

The following enables the hardware prefetcher (by unsetting bit 9):

[root@... msr-tools-1.2]# ./wrmsr -p 0 0x1a0 0x60628e2089  [root@... msr-tools-1.2]# ./rdmsr 0x1a0  60628e2089 

The following disables the hardware prefetcher (by enabling bit 9):

[root@... msr-tools-1.2]# ./wrmsr -p 0 0x1a0 0x60628e2289  [root@... msr-tools-1.2]# ./rdmsr 0x1a0  60628e2289 

Programatically, you can do this as root by opening /dev/cpu/<cpunumber>/msr and using pwrite to write to the msr "file" at the 0x1a0 offset.

like image 118
Carlos Avatar answered Sep 21 '22 17:09

Carlos


From the Intel reference:
This instruction must be executed at privilege level 0 or in real-address mode; otherwise, a general protection exception #GP(0) will be generated. Specifying a reserved or unimplemented MSR address in ECX will also cause a general protection exception.

...
The CPUID instruction should be used to determine whether MSRs are supported (EDX[5]=1) before using this instruction.

So, your fault might be related to a cpu that doesn't support MSRs or using the wrong MSR address.

There are lots of examples of using the MSRs in the kernel source:

In the kernel source, for a single cpu, it demonstrates disabling prefetch for the Xeon in arch/i386/kernel/cpu/intel.c, in the function:

static void __cpuinit Intel_errata_workarounds(struct cpuinfo_x86 *c)

The rdmsr function arguments are the msr number, a pointer to the low 32 bit word, and a pointer to the high 32 bit word.
The wrmsr function arguments are the msr number, the low 32 bit word value, and the high 32 bit word value.

multi-core or smp systems have to pass the cpu struct in as the first argument:
void rdmsr_on_cpu(unsigned int cpu, u32 msr_no, u32 *l, u32 *h);
void wrmsr_on_cpu(unsigned int cpu, u32 msr_no, u32 l, u32 h);

like image 29
Chris Avatar answered Sep 24 '22 17:09

Chris