Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why are size_t and unsigned int slower than int?

I was experimenting with different integer types in Visual Studio project in Windows using a simple exchange sort algorithm below. The processor is Intel. The code was compiled in Release x64. The optimization setting is "Maximize Speed (/O2)". The command line corresponding to the compilation settings is

/permissive- /GS /GL /W3 /Gy /Zc:wchar_t /Zi /Gm- /O2 /sdl /Fd"x64\Release\vc141.pdb" /Zc:inline /fp:precise /D "NDEBUG" /D "_CONSOLE" /D "_UNICODE" /D "UNICODE" /errorReport:prompt /WX- /Zc:forScope /Gd /Oi /MD /Fa"x64\Release\" /EHsc /nologo /Fo"x64\Release\" /Fp"x64\Release\SpeedTestForIntegerTypes.pch" /diagnostics:classic 

The code itself:

#include <ctime>
#include <vector>
#include <iostream>

void sort(int N, int A[], int WorkArray[]) // exchange sort
{
    int i, j, index, val_min;
    for (j = 0; j < N; j++)
    {
        val_min = 500000;
        for (i = j; i < N; i++)
        {
            if (A[i] < val_min)
            {
                val_min = A[i];
                index = i;
            }
        }
        WorkArray[j] = A[j];
        A[j] = val_min;
        A[index] = WorkArray[j];
    }
}

int main()
{
    std::vector<int> A(400000), WorkArray(400000);
    for(size_t k = 0; k < 400000; k++)
        A[k] = 400000 - (k+1);

    clock_t begin = clock();

    sort(400000, &A[0], &WorkArray[0]);

    clock_t end = clock();
    double sortTime = double(end - begin) / CLOCKS_PER_SEC;
    std::cout << "Sort time: " << sortTime << std::endl;
    return 0;
}

The WorkArray is only needed to save the vector before sorting. The point is, this sorting took me 22.3 seconds to complete. The interesting part is that if I change type int to size_t for arrays A, WorkArray (both in std::vector and in the argument list of function sort), as well as for val_min, the time increases to 67.4! This is threefold slower! The new code is below:

#include <ctime>
#include <vector>
#include <iostream>

void sort(int N, size_t A[], size_t WorkArray[]) // exchange sort
{
    int i, j, index;
    size_t val_min;
    for (j = 0; j < N; j++)
    {
        val_min = 500000U;
        for (i = j; i < N; i++)
        {
            if (A[i] < val_min)
            {
                val_min = A[i];
                index = i;
            }
        }
        WorkArray[j] = A[j];
        A[j] = val_min;
        A[index] = WorkArray[j];
    }
}

int main()
{
    std::vector<size_t> A(400000), WorkArray(400000);
    for(size_t k = 0; k < 400000; k++)
        A[k] = 400000 - (k+1);

    clock_t begin = clock();

    sort(400000, &A[0], &WorkArray[0]);

    clock_t end = clock();
    double sortTime = double(end - begin) / CLOCKS_PER_SEC;
    std::cout << "Sort time: " << sortTime << std::endl;
    return 0;
}

Note that I still keep type int for function local variables i, j, index, N, and so the only two arithmetical operations that are i++ and j++ should take the same amount of time to perform in both cases. Therefore, this slowdown has to do with other reasons. Is it related to the memory alignment issue or register sizes or something else?

But the most outrageous part was when I changed int to unsigned int. Both unsigned int and int occupy the same number of bytes which is 4 (sizeof showed that). But the runtime for unsigned int was 65.8 s! While the first outcome was somewhat ok to accept, the second one totally confuses me! Why is there such a significant difference in time it takes to run such a simple algorithm that does not even involve sign checks?

Thanks to all addressing both of these questions. Where can I start reading more about these hardware-level optimization peculiarities? I don't care about the sorting algorithm itself, it's here for illustration of the problem only.

UPDATE: once again, I stress the fact that I use ints for array indices in all three cases.

like image 986
MajinSaha Avatar asked Mar 16 '18 17:03

MajinSaha


People also ask

Is Size_t unsigned long long?

One downside is that on 64-bit Windows size_t is unsigned long long because long is just 32 bits (even in 64-bit mode).

What is the difference between Size_t and unsigned int?

If we consider the standard, both are integers of size 16 bits. On a typical 64-bit system, the size_t will be 64-bit, but unsigned int will be 32 bit. So we cannot use them interchangeably. One standard recommendation is that the size_t be at most as big as an unsigned long.

Is it better to use Size_t or int?

When writing C code you should always use size_t whenever dealing with memory ranges. The int type on the other hand is basically defined as the size of the (signed) integer value that the host machine can use to most efficiently perform integer arithmetic.

Is unsigned int faster?

In C, why is signed int faster than unsigned int ? True, I know that this has been asked and answered multiple times on this website (links below). However, most people said that there is no difference. I have written code and accidentally found a significant performance difference.


Video Answer


2 Answers

Inspecting the generated assembly for all 3 variants (int, unsigned, size_t), the big difference is that in the int case the loop in the sort function is unrolled and uses SSE instructions (working on 8 ints at a time), while in the other 2 cases it does neither. Interestingly enough, the sort function is called in the int case, while it is inlined into main in the other two (likely due to the increased size of the function due to the loop unrolling).

I'm compiling from the command line using cl /nologo /W4 /MD /EHsc /Zi /Ox, using dumpbin to get the disassembly, with toolset Microsoft (R) C/C++ Optimizing Compiler Version 19.12.25830.2 for x64.

I get execution times of around 30 seconds for int and 100 seconds for the other two.

like image 89
1201ProgramAlarm Avatar answered Oct 13 '22 13:10

1201ProgramAlarm


I tried this code in VS2017. I succeeded in reproducing.

I modified the code as follows so that the time is almost the same.

The cause seems to be due to the implicit casting of the array index.

#include <ctime>
#include <vector>
#include <iostream>

using namespace std;

// exchange sort
template<typename elem_t, typename index_t>
void sort(index_t size, elem_t* a, elem_t* b)
{
    index_t index = 0, i, j;
    elem_t min;

    for (j = 0; j < size; j++)
    {
        min = 500000;
        for (i = j; i < size; i++)
        {
            if (a[i] < min)
            {
                min = a[i];
                index = i;
            }
        }
        b[j] = a[j];
        a[j] = min;
        a[index] = b[j];
    }
}

template<typename elem_t, typename index_t, index_t size>
void test() {
    //vector<elem_t> a(size);
    //vector<elem_t> b(size);

    elem_t a[size];
    elem_t b[size];

    for (index_t k = 0; k < size; k++)
        a[k] = (elem_t)(size - (k + 1));

    clock_t begin = clock();
    sort(size, &a[0], &b[0]);
    clock_t end = clock();

    double sortTime = double(end - begin) / CLOCKS_PER_SEC;
    cout << "Sort time: " << sortTime << endl;
}

int main()
{
    const int size = 40000;

    cout << "<size_t, int>" << endl;
    test<size_t, int, size>();
    cout << endl;

    cout << "<size_t, size_t>" << endl;
    test<size_t, size_t, size>();
    cout << endl;

    cout << "<int, int>" << endl;
    test<int, int, size>();
    cout << endl;

    cout << "<int, size_t>" << endl;
    test<int, size_t, size>();
    cout << endl;

    cout << "<uint, int>" << endl;
    test<unsigned int, int, size>();
    cout << endl;

    cout << "<uint, size_t>" << endl;
    test<unsigned int, size_t, size>();
    cout << endl;

    cout << "<uint, uint>" << endl;
    test<unsigned int, unsigned int, size>();
    cout << endl;
}

Personally, I do not like implicit casting. For troubleshooting this kind of problem, increase the warning level to the maximum, and resolve all warnings, and then convert to generic code. This will help you identify the problem.

The result of this code appears as a result of various combinations.

  • signed vs unsigned: In C, why is "signed int" faster than "unsigned int"?

  • type size (int32 vs int64)

  • array index assembly code

  • vc++ optimization: /O2 (Maximum Optimization (Favor Speed))

    • this make fast for (int / int).
like image 36
Minos Avatar answered Oct 13 '22 13:10

Minos