Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to multithread reading a file in c++11?

I have a big file, and i have to read it by chunk. Each time when i read a chunk, i have to do some time consuming operation, so i think multithread reading might help, each thread reads a chunk one by one and does its operation. here is my code in c++11

#include<iostream>
#include<fstream>
#include <condition_variable>
#include <mutex>
#include <thread>
using namespace std;
const int CHAR_PER_FILE = 1e8;
const int NUM_THREAD = 2;
int order = -1;
bool is_reading = false;
mutex mtx;
condition_variable file_not_reading;
void partition(ifstream& is)
{
    while (is.peek() != EOF)
    {
        unique_lock<mutex> lock(mtx);
        while (is_reading)
            file_not_reading.wait(lock);

        is_reading = true;
        char *c = new char[CHAR_PER_FILE];

        is.read(c, CHAR_PER_FILE);
        order++;

        is_reading = false;

        file_not_reading.notify_all();
        lock.unlock();

        char oc[3];
        sprintf(oc, "%d", order);
        this_thread::sleep_for(chrono::milliseconds(2000));//some operations that take long time
        ofstream os(oc, ios::binary);
        os.write(c, CHAR_PER_FILE);
        delete[] c;
        os.close();
    }
}

int main()
{
    ifstream is("bigfile.txt",ios::binary);
    thread threads[NUM_THREAD];
    for (int i = 0; i < NUM_THREAD; i++)
        threads[i] = thread(partition, ref(is));

    for (int i = 0; i < NUM_THREAD; i++)
        threads[i].join();

    is.close();
    system("pause");
    return 0;
}

But my code didn't work, it only created 4 files instead of `bigfilesize/CHAR_PER_FILE, and threads seem got stuck, how can i make it work?

Is there any c++11 multithread reading file implementation or example?

Thanks.

like image 779
user1024 Avatar asked Dec 15 '22 16:12

user1024


1 Answers

My advice:

  • Use one thread to read chunks from the file. Every time a chunk is read, post it to a request queue. It is not worth reading multithreaded as there will be internal locks/blocking reading a common resource.
  • Use a pool of threads. Each of them read from the queue, retrieves a chunk, execute the expensive operation and go back to wait for a new request.
  • The queue must be mutex protected.
  • Don't use more threads than the number of processing units (CPU/Cores/HyperThreads) you have.
  • The main caveat of the above is that it will not guarantee the processing order. You will probably need to post the results to a central place that can reorder (again central place -> must be mutex protected).
like image 143
jsantander Avatar answered Dec 31 '22 01:12

jsantander