Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to calculate the frequency of CPU cores

I am trying to use RDTSC but it seems like my approach may be wrong to get the core speed:

#include "stdafx.h"
#include <windows.h>
#include <process.h>
#include <iostream>

using namespace std;

struct Core
{
    int CoreNumber;
};

static void startMonitoringCoreSpeeds(void *param)
{
    Core core = *((Core *)param);
    SetThreadAffinityMask(GetCurrentThread(), 1 << core.CoreNumber);
    while (true)
    {
        DWORD64 first = __rdtsc();
        Sleep(1000);
        DWORD64 second = __rdtsc();
        cout << "Core " << core.CoreNumber << " has frequency " << ((second - first)*pow(10, -6)) << " MHz" << endl;
    }
}

int GetNumberOfProcessorCores()
{
    DWORD process, system;
    if (GetProcessAffinityMask(GetCurrentProcess(), &process, &system))
    {
        int count = 0;
        for (int i = 0; i < 32; i++)
        {
            if (system & (1 << i))
            {
                count++;
            }
        }
        return count;
    }
    SYSTEM_INFO sysinfo;
    GetSystemInfo(&sysinfo);
    return sysinfo.dwNumberOfProcessors;
}

int _tmain(int argc, _TCHAR* argv[])
{
    for (int i = 0; i < GetNumberOfProcessorCores(); i++)
    {
        Core *core = new Core {0};
        core->CoreNumber = i;
        _beginthread(startMonitoringCoreSpeeds, 0, core);
    }
    cin.get();
}

It always prints out values around 3.3 GHz, which is wrong because things like Turbo Boost are on from time to time and my cores jump to 4.3 GHz for sure. Let me cross-reference some articles behind this idea.

Firstly (http://users.utcluj.ro/~ancapop/labscs/SCS2.pdf): "The TSCs on the processor’s cores are not synchronized. So it is not sure that if a process migrates during execution from one core to another, the measurement will not be affected. To avoid this problem, the measured process’s affinity has to be set to just one core, to prevent process migration." This tells me that RDTSC should return a different value per core my thread is on using the affinity mask I set, which is great.

Secondly, and please check this article (http://randomascii.wordpress.com/2011/07/29/rdtsc-in-the-age-of-sandybridge/): "If you need a consistent timer that works across cores and can be used to measure time then this is good news. If you want to measure actual CPU clock cycles then you are out of luck. If you want consistency across a wide range of CPU families then it sucks to be you. Update: section 16.11 of the Intel System Programming Guide documents this behavior of the Time-Stamp Counter. Roughly speaking it says that on older processors the clock rate changes, but on newer processors it remains uniform. It finishes by saying, of Constant TSC, “This is the architectural behavior moving forward." Okay, this tells me that RDTSC stays consistent, which makes my above results make sense since my CPU cores are rated at a standard 3.3 GHz...

Which REALLY begs the question, how do applications like Intel's Turbo Boost Technology Monitor and Piriform's Speccy and CPUID's CPU-Z measure a processor's clock speed while undergoing turbo boost, realtime?

like image 338
Alexandru Avatar asked Apr 23 '14 17:04

Alexandru


People also ask

How CPU cores are calculated?

To calculate the number of logical CPUs in vSphere Client, multiply the number of sockets by the number of cores. For example, if you need to configure a VM to use 2-processor sockets, each has 2 CPU cores, then the total number of logical CPUs is 2*2=4.

How many GHz is 2 cores?

Clock speeds For instance, a quad-core processor may support a clock speed of 3.0GHz, while a dual-core processor may hold a clock speed of 3.5 GHz for every processor. This means that a dual-core processor can run 14% faster. So, if you have a single-threaded program, the dual-core processor is indeed more efficient.

What is a core frequency?

The clock speed of the CPU, which is generally significantly higher than the speed of the buses that connect to it. See system bus.


1 Answers

Full solution follows. I've adapted the IOCTL sample driver on MSDN to do this. Note, the IOCTL sample is the only relative WDM sample skeleton driver I could find and also the closest thing I could find to a WDM template because most kernel mode templates out of the box in WDK are WDF-based drivers (any WDM driver template is actually blank with absolutely no source code), yet the only sample logic I've seen to do this input/output was through a WDM-based driver. Also, some fun facts I've learned along the way: kernel drivers don't like floating arithmetic and you can't use "windows.h" which really limits you to "ntddk.h", a special kernel-mode header. This also means I can't do all of my computations inside of kernel mode because I can't call functions like QueryPerformanceFrequency in there, so I had to get the mean performance ratio between timestamps and return them back to user mode for some computations (without QueryPerformanceFrequency, the values you get from CPU registers that store ticks like what QueryPerformanceCounter uses are useless because you don't know the step size; maybe there's a workaround to this but I opted to just use the mean since it works pretty damn well). Also, as per the one second sleep, the reason I used that is because otherwise you're almost spin-computing shit on multiple threads, which really messes up your calculations because your frequencies will go up per core constantly checking results from QueryPerformanceCounter (you drive your cores up as you do more computations) - NOT TO MENTION - its a ratio...so the delta time is not that important since its cycles per time...you can always increase the delta, it should still give you the same ratio relative to the step size. Furthermore, this is as minimalistic as I could get it to be. Good luck making it much smaller or shorter than this. Also, if you want to install the driver, you have two options unless you want to buy a Code Signing certificate from some third party, both suck, so pick one and suck it up. Let's start with the driver:

driver.c:

//
// Include files.
//

#include <ntddk.h>          // various NT definitions
#include <string.h>
#include <intrin.h>

#include "driver.h"

#define NT_DEVICE_NAME      L"\\Device\\KernelModeDriver"
#define DOS_DEVICE_NAME     L"\\DosDevices\\KernelModeDriver"

#if DBG
#define DRIVER_PRINT(_x_) \
                DbgPrint("KernelModeDriver.sys: ");\
                DbgPrint _x_;

#else
#define DRIVER_PRINT(_x_)
#endif

//
// Device driver routine declarations.
//

DRIVER_INITIALIZE DriverEntry;

_Dispatch_type_(IRP_MJ_CREATE)
_Dispatch_type_(IRP_MJ_CLOSE)
DRIVER_DISPATCH DriverCreateClose;

_Dispatch_type_(IRP_MJ_DEVICE_CONTROL)
DRIVER_DISPATCH DriverDeviceControl;

DRIVER_UNLOAD DriverUnloadDriver;

VOID
PrintIrpInfo(
    PIRP Irp
    );
VOID
PrintChars(
    _In_reads_(CountChars) PCHAR BufferAddress,
    _In_ size_t CountChars
    );

#ifdef ALLOC_PRAGMA
#pragma alloc_text( INIT, DriverEntry )
#pragma alloc_text( PAGE, DriverCreateClose)
#pragma alloc_text( PAGE, DriverDeviceControl)
#pragma alloc_text( PAGE, DriverUnloadDriver)
#pragma alloc_text( PAGE, PrintIrpInfo)
#pragma alloc_text( PAGE, PrintChars)
#endif // ALLOC_PRAGMA


NTSTATUS
DriverEntry(
    _In_ PDRIVER_OBJECT   DriverObject,
    _In_ PUNICODE_STRING      RegistryPath
    )
/*++

Routine Description:
    This routine is called by the Operating System to initialize the driver.

    It creates the device object, fills in the dispatch entry points and
    completes the initialization.

Arguments:
    DriverObject - a pointer to the object that represents this device
    driver.

    RegistryPath - a pointer to our Services key in the registry.

Return Value:
    STATUS_SUCCESS if initialized; an error otherwise.

--*/

{
    NTSTATUS        ntStatus;
    UNICODE_STRING  ntUnicodeString;    // NT Device Name "\Device\KernelModeDriver"
    UNICODE_STRING  ntWin32NameString;    // Win32 Name "\DosDevices\KernelModeDriver"
    PDEVICE_OBJECT  deviceObject = NULL;    // ptr to device object

    UNREFERENCED_PARAMETER(RegistryPath);

    RtlInitUnicodeString( &ntUnicodeString, NT_DEVICE_NAME );

    ntStatus = IoCreateDevice(
        DriverObject,                   // Our Driver Object
        0,                              // We don't use a device extension
        &ntUnicodeString,               // Device name "\Device\KernelModeDriver"
        FILE_DEVICE_UNKNOWN,            // Device type
        FILE_DEVICE_SECURE_OPEN,        // Device characteristics
        FALSE,                          // Not an exclusive device
        &deviceObject );                // Returned ptr to Device Object

    if ( !NT_SUCCESS( ntStatus ) )
    {
        DRIVER_PRINT(("Couldn't create the device object\n"));
        return ntStatus;
    }

    //
    // Initialize the driver object with this driver's entry points.
    //

    DriverObject->MajorFunction[IRP_MJ_CREATE] = DriverCreateClose;
    DriverObject->MajorFunction[IRP_MJ_CLOSE] = DriverCreateClose;
    DriverObject->MajorFunction[IRP_MJ_DEVICE_CONTROL] = DriverDeviceControl;
    DriverObject->DriverUnload = DriverUnloadDriver;

    //
    // Initialize a Unicode String containing the Win32 name
    // for our device.
    //

    RtlInitUnicodeString( &ntWin32NameString, DOS_DEVICE_NAME );

    //
    // Create a symbolic link between our device name  and the Win32 name
    //

    ntStatus = IoCreateSymbolicLink(
                        &ntWin32NameString, &ntUnicodeString );

    if ( !NT_SUCCESS( ntStatus ) )
    {
        //
        // Delete everything that this routine has allocated.
        //
        DRIVER_PRINT(("Couldn't create symbolic link\n"));
        IoDeleteDevice( deviceObject );
    }


    return ntStatus;
}


NTSTATUS
DriverCreateClose(
    PDEVICE_OBJECT DeviceObject,
    PIRP Irp
    )
/*++

Routine Description:

    This routine is called by the I/O system when the KernelModeDriver is opened or
    closed.

    No action is performed other than completing the request successfully.

Arguments:

    DeviceObject - a pointer to the object that represents the device
    that I/O is to be done on.

    Irp - a pointer to the I/O Request Packet for this request.

Return Value:

    NT status code

--*/

{
    UNREFERENCED_PARAMETER(DeviceObject);

    PAGED_CODE();

    Irp->IoStatus.Status = STATUS_SUCCESS;
    Irp->IoStatus.Information = 0;

    IoCompleteRequest( Irp, IO_NO_INCREMENT );

    return STATUS_SUCCESS;
}

VOID
DriverUnloadDriver(
    _In_ PDRIVER_OBJECT DriverObject
    )
/*++

Routine Description:

    This routine is called by the I/O system to unload the driver.

    Any resources previously allocated must be freed.

Arguments:

    DriverObject - a pointer to the object that represents our driver.

Return Value:

    None
--*/

{
    PDEVICE_OBJECT deviceObject = DriverObject->DeviceObject;
    UNICODE_STRING uniWin32NameString;

    PAGED_CODE();

    //
    // Create counted string version of our Win32 device name.
    //

    RtlInitUnicodeString( &uniWin32NameString, DOS_DEVICE_NAME );


    //
    // Delete the link from our device name to a name in the Win32 namespace.
    //

    IoDeleteSymbolicLink( &uniWin32NameString );

    if ( deviceObject != NULL )
    {
        IoDeleteDevice( deviceObject );
    }



}

NTSTATUS
DriverDeviceControl(
    PDEVICE_OBJECT DeviceObject,
    PIRP Irp
    )

/*++

Routine Description:

    This routine is called by the I/O system to perform a device I/O
    control function.

Arguments:

    DeviceObject - a pointer to the object that represents the device
        that I/O is to be done on.

    Irp - a pointer to the I/O Request Packet for this request.

Return Value:

    NT status code

--*/

{
    PIO_STACK_LOCATION  irpSp;// Pointer to current stack location
    NTSTATUS            ntStatus = STATUS_SUCCESS;// Assume success
    ULONG               inBufLength; // Input buffer length
    ULONG               outBufLength; // Output buffer length
    void                *inBuf; // pointer to input buffer
    unsigned __int64    *outBuf; // pointer to the output buffer

    UNREFERENCED_PARAMETER(DeviceObject);

    PAGED_CODE();

    irpSp = IoGetCurrentIrpStackLocation( Irp );
    inBufLength = irpSp->Parameters.DeviceIoControl.InputBufferLength;
    outBufLength = irpSp->Parameters.DeviceIoControl.OutputBufferLength;

    if (!inBufLength || !outBufLength || outBufLength != sizeof(unsigned __int64)*2)
    {
        ntStatus = STATUS_INVALID_PARAMETER;
        goto End;
    }

    //
    // Determine which I/O control code was specified.
    //

    switch ( irpSp->Parameters.DeviceIoControl.IoControlCode )
    {
    case IOCTL_SIOCTL_METHOD_BUFFERED:

        //
        // In this method the I/O manager allocates a buffer large enough to
        // to accommodate larger of the user input buffer and output buffer,
        // assigns the address to Irp->AssociatedIrp.SystemBuffer, and
        // copies the content of the user input buffer into this SystemBuffer
        //

        DRIVER_PRINT(("Called IOCTL_SIOCTL_METHOD_BUFFERED\n"));
        PrintIrpInfo(Irp);

        //
        // Input buffer and output buffer is same in this case, read the
        // content of the buffer before writing to it
        //

        inBuf = (void *)Irp->AssociatedIrp.SystemBuffer;
        outBuf = (unsigned __int64 *)Irp->AssociatedIrp.SystemBuffer;

        //
        // Read the data from the buffer
        //

        DRIVER_PRINT(("\tData from User :"));
        //
        // We are using the following function to print characters instead
        // DebugPrint with %s format because we string we get may or
        // may not be null terminated.
        //
        PrintChars(inBuf, inBufLength);

        //
        // Write to the buffer
        //

        unsigned __int64 data[sizeof(unsigned __int64) * 2];
        data[0] = __readmsr(232);
        data[1] = __readmsr(231);

        DRIVER_PRINT(("data[0]: %d", data[0]));
        DRIVER_PRINT(("data[1]: %d", data[1]));

        RtlCopyBytes(outBuf, data, outBufLength);

        //
        // Assign the length of the data copied to IoStatus.Information
        // of the Irp and complete the Irp.
        //

        Irp->IoStatus.Information = sizeof(unsigned __int64)*2;

        //
        // When the Irp is completed the content of the SystemBuffer
        // is copied to the User output buffer and the SystemBuffer is
        // is freed.
        //

       break;

    default:

        //
        // The specified I/O control code is unrecognized by this driver.
        //

        ntStatus = STATUS_INVALID_DEVICE_REQUEST;
        DRIVER_PRINT(("ERROR: unrecognized IOCTL %x\n",
            irpSp->Parameters.DeviceIoControl.IoControlCode));
        break;
    }

End:
    //
    // Finish the I/O operation by simply completing the packet and returning
    // the same status as in the packet itself.
    //

    Irp->IoStatus.Status = ntStatus;

    IoCompleteRequest( Irp, IO_NO_INCREMENT );

    return ntStatus;
}

VOID
PrintIrpInfo(
    PIRP Irp)
{
    PIO_STACK_LOCATION  irpSp;
    irpSp = IoGetCurrentIrpStackLocation( Irp );

    PAGED_CODE();

    DRIVER_PRINT(("\tIrp->AssociatedIrp.SystemBuffer = 0x%p\n",
        Irp->AssociatedIrp.SystemBuffer));
    DRIVER_PRINT(("\tIrp->UserBuffer = 0x%p\n", Irp->UserBuffer));
    DRIVER_PRINT(("\tirpSp->Parameters.DeviceIoControl.Type3InputBuffer = 0x%p\n",
        irpSp->Parameters.DeviceIoControl.Type3InputBuffer));
    DRIVER_PRINT(("\tirpSp->Parameters.DeviceIoControl.InputBufferLength = %d\n",
        irpSp->Parameters.DeviceIoControl.InputBufferLength));
    DRIVER_PRINT(("\tirpSp->Parameters.DeviceIoControl.OutputBufferLength = %d\n",
        irpSp->Parameters.DeviceIoControl.OutputBufferLength ));
    return;
}

VOID
PrintChars(
    _In_reads_(CountChars) PCHAR BufferAddress,
    _In_ size_t CountChars
    )
{
    PAGED_CODE();

    if (CountChars) {

        while (CountChars--) {

            if (*BufferAddress > 31
                 && *BufferAddress != 127) {

                KdPrint (( "%c", *BufferAddress) );

            } else {

                KdPrint(( ".") );

            }
            BufferAddress++;
        }
        KdPrint (("\n"));
    }
    return;
}

driver.h:

//
// Device type           -- in the "User Defined" range."
//
#define SIOCTL_TYPE 40000
//
// The IOCTL function codes from 0x800 to 0xFFF are for customer use.
//
#define IOCTL_SIOCTL_METHOD_IN_DIRECT \
    CTL_CODE( SIOCTL_TYPE, 0x900, METHOD_IN_DIRECT, FILE_ANY_ACCESS  )

#define IOCTL_SIOCTL_METHOD_OUT_DIRECT \
    CTL_CODE( SIOCTL_TYPE, 0x901, METHOD_OUT_DIRECT , FILE_ANY_ACCESS  )

#define IOCTL_SIOCTL_METHOD_BUFFERED \
    CTL_CODE( SIOCTL_TYPE, 0x902, METHOD_BUFFERED, FILE_ANY_ACCESS  )

#define IOCTL_SIOCTL_METHOD_NEITHER \
    CTL_CODE( SIOCTL_TYPE, 0x903, METHOD_NEITHER , FILE_ANY_ACCESS  )


#define DRIVER_FUNC_INSTALL     0x01
#define DRIVER_FUNC_REMOVE      0x02

#define DRIVER_NAME       "ReadMSRDriver"

Now, here is the application that loads up and uses the driver (Win32 Console Application):

FrequencyCalculator.cpp:

#include "stdafx.h"
#include <iostream>
#include <windows.h>
#include <winioctl.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <strsafe.h>
#include <process.h>
#include "..\KernelModeDriver\driver.h"

using namespace std;

BOOLEAN
ManageDriver(
_In_ LPCTSTR  DriverName,
_In_ LPCTSTR  ServiceName,
_In_ USHORT   Function
);

HANDLE hDevice;
TCHAR driverLocation[MAX_PATH];

void InstallDriver()
{
    DWORD errNum = 0;
    GetCurrentDirectory(MAX_PATH, driverLocation);
    _tcscat_s(driverLocation, _T("\\KernelModeDriver.sys"));

    std::wcout << "Trying to install driver at " << driverLocation << std::endl;

    //
    // open the device
    //

    if ((hDevice = CreateFile(_T("\\\\.\\KernelModeDriver"),
        GENERIC_READ | GENERIC_WRITE,
        0,
        NULL,
        CREATE_ALWAYS,
        FILE_ATTRIBUTE_NORMAL,
        NULL)) == INVALID_HANDLE_VALUE) {

        errNum = GetLastError();

        if (errNum != ERROR_FILE_NOT_FOUND) {

            printf("CreateFile failed!  ERROR_FILE_NOT_FOUND = %d\n", errNum);

            return;
        }

        //
        // The driver is not started yet so let us the install the driver.
        // First setup full path to driver name.
        //

        if (!ManageDriver(_T(DRIVER_NAME),
            driverLocation,
            DRIVER_FUNC_INSTALL
            )) {

            printf("Unable to install driver. \n");

            //
            // Error - remove driver.
            //

            ManageDriver(_T(DRIVER_NAME),
                driverLocation,
                DRIVER_FUNC_REMOVE
                );

            return;
        }

        hDevice = CreateFile(_T("\\\\.\\KernelModeDriver"),
            GENERIC_READ | GENERIC_WRITE,
            0,
            NULL,
            CREATE_ALWAYS,
            FILE_ATTRIBUTE_NORMAL,
            NULL);

        if (hDevice == INVALID_HANDLE_VALUE){
            printf("Error: CreatFile Failed : %d\n", GetLastError());
            return;
        }
    }
}

void UninstallDriver()
{
    //
    // close the handle to the device.
    //

    CloseHandle(hDevice);

    //
    // Unload the driver.  Ignore any errors.
    //
    ManageDriver(_T(DRIVER_NAME),
        driverLocation,
        DRIVER_FUNC_REMOVE
        );
}

double GetPerformanceRatio()
{
    BOOL bRc;
    ULONG bytesReturned;

    int input = 0;
    unsigned __int64 output[2];
    memset(output, 0, sizeof(unsigned __int64) * 2);

    //printf("InputBuffer Pointer = %p, BufLength = %d\n", &input, sizeof(&input));
    //printf("OutputBuffer Pointer = %p BufLength = %d\n", &output, sizeof(&output));

    //
    // Performing METHOD_BUFFERED
    //

    //printf("\nCalling DeviceIoControl METHOD_BUFFERED:\n");

    bRc = DeviceIoControl(hDevice,
        (DWORD)IOCTL_SIOCTL_METHOD_BUFFERED,
        &input,
        sizeof(&input),
        output,
        sizeof(unsigned __int64)*2,
        &bytesReturned,
        NULL
        );

    if (!bRc)
    {
        //printf("Error in DeviceIoControl : %d", GetLastError());
        return 0;

    }
    //printf("    OutBuffer (%d): %d\n", bytesReturned, output);
    if (output[1] == 0)
    {
        return 0;
    }
    else
    {
        return (float)output[0] / (float)output[1];
    }
}

struct Core
{
    int CoreNumber;
};

int GetNumberOfProcessorCores()
{
    SYSTEM_INFO sysinfo;
    GetSystemInfo(&sysinfo);
    return sysinfo.dwNumberOfProcessors;
}

float GetCoreFrequency()
{
    // __rdtsc: Returns the processor time stamp which records the number of clock cycles since the last reset.
    // QueryPerformanceCounter: Returns a high resolution time stamp that can be used for time-interval measurements.
    // Get the frequency which defines the step size of the QueryPerformanceCounter method.
    LARGE_INTEGER frequency;
    QueryPerformanceFrequency(&frequency);
    // Get the number of cycles before we start.
    ULONG cyclesBefore = __rdtsc();
    // Get the Intel performance ratio at the start.
    float ratioBefore = GetPerformanceRatio();
    // Get the start time.
    LARGE_INTEGER startTime;
    QueryPerformanceCounter(&startTime);
    // Give the CPU cores enough time to repopulate their __rdtsc and QueryPerformanceCounter registers.
    Sleep(1000);
    ULONG cyclesAfter = __rdtsc();
    // Get the Intel performance ratio at the end.
    float ratioAfter = GetPerformanceRatio();
    // Get the end time.
    LARGE_INTEGER endTime;
    QueryPerformanceCounter(&endTime);
    // Return the number of MHz. Multiply the core's frequency by the mean MSR (model-specific register) ratio (the APERF register's value divided by the MPERF register's value) between the two timestamps.
    return ((ratioAfter + ratioBefore) / 2)*(cyclesAfter - cyclesBefore)*pow(10, -6) / ((endTime.QuadPart - startTime.QuadPart) / frequency.QuadPart);
}

struct CoreResults
{
    int CoreNumber;
    float CoreFrequency;
};

CRITICAL_SECTION printLock;

static void printResult(void *param)
{
    EnterCriticalSection(&printLock);
    CoreResults coreResults = *((CoreResults *)param);
    std::cout << "Core " << coreResults.CoreNumber << " has a speed of " << coreResults.CoreFrequency << " MHz" << std::endl;
    delete param;
    LeaveCriticalSection(&printLock);
}

bool closed = false;

static void startMonitoringCoreSpeeds(void *param)
{
    Core core = *((Core *)param);
    SetThreadAffinityMask(GetCurrentThread(), 1 << core.CoreNumber);
    while (!closed)
    {
        CoreResults *coreResults = new CoreResults();
        coreResults->CoreNumber = core.CoreNumber;
        coreResults->CoreFrequency = GetCoreFrequency();
        _beginthread(printResult, 0, coreResults);
        Sleep(1000);
    }
    delete param;
}

int _tmain(int argc, _TCHAR* argv[])
{
    InitializeCriticalSection(&printLock);
    InstallDriver();
    for (int i = 0; i < GetNumberOfProcessorCores(); i++)
    {
        Core *core = new Core{ 0 };
        core->CoreNumber = i;
        _beginthread(startMonitoringCoreSpeeds, 0, core);
    }
    std::cin.get();
    closed = true;
    UninstallDriver();
    DeleteCriticalSection(&printLock);
}

It uses install.cpp which you can get from the IOCTL sample. I will post a working, fully working and ready solution (with code, obviously) on my blog over the next few days, if not tonight.

Edit: Blogged it at http://www.dima.to/blog/?p=101 (full source code available there)...

like image 158
Alexandru Avatar answered Oct 02 '22 23:10

Alexandru