Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Will using multiple threads with a RandomAccessFile help performance?

I am working on a (database-ish) project, where data is stored in a flat file. For reading/writing I'm using the RandomAccessFile class. Will I gain anything from multithreading, and giving each thread an instance each of RandomAccessFile, or will one thread/instance be just as fast? Is there any difference in reading/writing, as you can make instances that only do the reading, and can't write?

like image 608
drRoflol Avatar asked Jun 23 '09 14:06

drRoflol


1 Answers

I now did a benchmark with the code below (excuse me, its in cpp). The code reads a 5 MB textfile with a number of threads passed as a command line argument.

The results clearly show that multiple threads always speed up a program:

Update: It came to my mind, that file caching will play quite a role here. So i made copies of the testdata file, rebooted and used a different file for each run. Updated results below (old ones in brackets). The conclusion remains the same.

Runtime in Seconds

Machine A (Dual Quad Core XEON running XP x64 with 4 10k SAS Drives in RAID 5)

  • 1 Thread: 0.61s (0.61s)
  • 2 Threads: 0.44s (0.43s)
  • 4 Threads: 0.31s (0.28s) (Fastest)
  • 8 Threads: 0.53s (0.63s)

Machine B (Dual Core Laptop running XP with one fragmented 2.5 Inch Drive)

  • 1 Thread: 0.98s (1.01s)
  • 2 Threads: 0.67s (0.61s) (Fastest)
  • 4 Threads: 1.78s (0.63s)
  • 8 Threads: 2.06s (0.80s)

Sourcecode (Windows):

// FileReadThreads.cpp : Defines the entry point for the console application.
//

#include "Windows.h"
#include "stdio.h"
#include "conio.h"
#include <sys\timeb.h>
#include <io.h>    

///////////////////////////////////////////////////////////////////////////////
///////////////////////////////////////////////////////////////////////////////
int threadCount = 1;
char *fileName = 0;
int fileSize = 0;
double  GetSecs(void);

///////////////////////////////////////////////////////////////////////////////
///////////////////////////////////////////////////////////////////////////////

DWORD WINAPI FileReadThreadEntry(LPVOID lpThreadParameter)

{   char tx[255];

    int index = (int)lpThreadParameter;
    FILE *file = fopen(fileName, "rt");

    int start = (fileSize / threadCount) * index;
    int end   = (fileSize / threadCount) * (index + 1);

    fseek(file, start, SEEK_SET);

    printf("THREAD %4d started: Bytes %d-%d\n", GetCurrentThreadId(), start, end);


    for(int i = 0;; i++)
    {
        if(! fgets(tx, sizeof(tx), file))
            break;
        if(ftell(file) >= end)
            break;
    }
    fclose(file);

    printf("THREAD %4d done\n", GetCurrentThreadId());

    return 0;
}
///////////////////////////////////////////////////////////////////////////////
///////////////////////////////////////////////////////////////////////////////



int main(int argc, char* argv[])
{
    if(argc <= 1)
    {
        printf("Usage:  <InputFile> <threadCount>\n");
        exit(-1);
    }

    if(argc > 2)
        threadCount = atoi(argv[2]);

    fileName = argv[1];
    FILE *file = fopen(fileName, "rt");
    if(! file)
    {
        printf("Unable to open %s\n", argv[1]);
        exit(-1);
    }

    fseek(file, 0, SEEK_END);
    fileSize = ftell(file);
    fclose(file);


    printf("Starting to read file %s with %d threads\n", fileName, threadCount);
    ///////////////////////////////////////////////////////////////////////////
    // Start threads
    ///////////////////////////////////////////////////////////////////////////
    double start = GetSecs();

    HANDLE mWorkThread[255];        

    for(int i = 0; i < threadCount; i++)
    {
        mWorkThread[i] = CreateThread(
                  NULL,
                  0,
                  FileReadThreadEntry,
                  (LPVOID) i,
                  0, 
                  NULL);
    }
    WaitForMultipleObjects(threadCount, mWorkThread, TRUE, INFINITE);

    printf("Runtime %.2f Secs\nDone\n", (GetSecs() - start) / 1000.);
    return 0;
}

///////////////////////////////////////////////////////////////////////////////
///////////////////////////////////////////////////////////////////////////////

double  GetSecs(void)

{
        struct timeb timebuffer;
        ftime(&timebuffer);
        return (double)timebuffer.millitm + 
              ((double)timebuffer.time * 1000.) - // Timezone needed for DbfGetToday
              ((double)timebuffer.timezone * 60. * 1000.);
}
like image 166
RED SOFT ADAIR Avatar answered Nov 15 '22 07:11

RED SOFT ADAIR