I am currently writing a prime number generator in C++. I have made a single-threaded version first, and later a multi-threaded version.
I found out that if my program generates values less than 100'000
, the single-threaded version is faster than the multi-threaded one. Obviously I am doing something wrong.
My code below:
#include <iostream>
#include <fstream>
#include <set>
#include <string>
#include <thread>
#include <mutex>
#include <shared_mutex>
using namespace std;
set<unsigned long long> primeContainer;
shared_mutex m;
void checkPrime(const unsigned long long p)
{
if (p % 3 == 0)
return;
bool isPrime = true;
for (set<unsigned long long>::const_iterator it = primeContainer.cbegin(); it != primeContainer.cend(); ++it)
{
if (p % *it == 0)
{
isPrime = false;
break;
}
if (*it * *it > p) // check only up to square root
break;
}
if (isPrime)
primeContainer.insert(p);
}
void checkPrimeLock(const unsigned long long p)
{
if (p % 3 == 0)
return;
bool isPrime = true;
try
{
shared_lock<shared_mutex> l(m);
for (set<unsigned long long>::const_iterator it = primeContainer.cbegin(); it != primeContainer.cend(); ++it)
{
if (p % *it == 0)
{
isPrime = false;
break;
}
if (*it * *it > p)
break;
}
}
catch (exception& e)
{
cout << e.what() << endl;
system("pause");
}
if (isPrime)
{
try
{
unique_lock<shared_mutex> l(m);
primeContainer.insert(p);
}
catch (exception& e)
{
cout << e.what() << endl;
system("pause");
}
}
}
void runLoopThread(const unsigned long long& l)
{
for (unsigned long long i = 10; i < l; i += 10)
{
thread t1(checkPrimeLock, i + 1);
thread t2(checkPrimeLock, i + 3);
thread t3(checkPrimeLock, i + 7);
thread t4(checkPrimeLock, i + 9);
t1.join();
t2.join();
t3.join();
t4.join();
}
}
void runLoop(const unsigned long long& l)
{
for (unsigned long long i = 10; i < l; i += 10)
{
checkPrime(i + 1);
checkPrime(i + 3);
checkPrime(i + 7);
checkPrime(i + 9);
}
}
void printPrimes(const unsigned long long& l)
{
if (1U <= l)
cout << "1 ";
if (2U <= l)
cout << "2 ";
if (3U <= l)
cout << "3 ";
if (5U <= l)
cout << "5 ";
for (auto it = primeContainer.cbegin(); it != primeContainer.cend(); ++it)
{
if (*it <= l)
cout << *it << " ";
}
cout << endl;
}
void writeToFile(const unsigned long long& l)
{
string name = "primes_" + to_string(l) + ".txt";
ofstream f(name);
if (f.is_open())
{
if (1U <= l)
f << "1 ";
if (2U <= l)
f << "2 ";
if (3U <= l)
f << "3 ";
if (5U <= l)
f << "5 ";
for (auto it = primeContainer.cbegin(); it != primeContainer.cend(); ++it)
{
if (*it <= l)
f << *it << " ";
}
}
else
{
cout << "Error opening file." << endl;
system("pause");
}
}
int main()
{
unsigned int n = thread::hardware_concurrency();
std::cout << n << " concurrent threads are supported." << endl;
unsigned long long limit;
cout << "Please enter the limit of prime generation: ";
cin >> limit;
primeContainer.insert(7);
if (10 < limit)
{
//runLoop(limit); //single-threaded
runLoopThread(limit); //multi-threaded
}
printPrimes(limit);
//writeToFile(limit);
system("pause");
return 0;
}
In the main
function you will find comments about which function is single and multi threaded.
The main difference between those is the use of locks, shared for container iteration, and unique for insertion. If it matters, my CPU has 4 cores.
Why is the single thread version faster?
Every thread needs some overhead and system resources, so it also slows down performance. Another problem is the so called "thread explosion" when MORE thread are created than cores are on the system. And some waiting threads for the end of other threads is the worst idea for multi threading.
Therefore, multithreaded programs can run much faster than on a uniprocessor system. They can also be faster than a program using multiple processes, because threads require fewer resources and generate less overhead.
Disadvantages. Depending on the design and architecture of the processor, simultaneous multithreading can decrease performance if any of the shared resources are bottlenecks for performance.
If you're unsatisfied with the processing output of your computer, it might be time to hyper-thread your central processing unit's (CPU) cores. Hyper-threading can be a great way to improve the processing speed of your PC without having to go through a major hardware makeover.
You have several problems.
First, you keep creating and destroying threads needlessly. Have each thread loop doing work until there is no more work to do.
Second, your locks are way too fine and as a result, you're acquiring them way too often. Have each thread grab a block of 100 numbers to test rather than one at a time, and have them insert found primes from each block in one go.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With