Optimizing disk IO

Tags:

I have a piece of code that analyzes streams of data from very large (10-100GB) binary files. It works well, so it's time to start optimizing, and currently disk IO is the biggest bottleneck.

There are two types of files in use. The first type of file consists of a stream of 16-bit integers, which must be scaled after I/O to convert to a floating point value which is physically meaningful. I read the file in chunks, and I read in the chunks of data by reading one 16-bit code at a time, performing the required scaling, and then storing the result in an array. Code is below:

int64_t read_current_chimera(FILE *input, double *current,
                             int64_t position, int64_t length, chimera *daqsetup)
{
    int64_t test;
    uint16_t iv;

    int64_t i;
    int64_t read = 0;

    if (fseeko64(input, (off64_t)position * sizeof(uint16_t), SEEK_SET))
    {
        return 0;
    }

    for (i = 0; i < length; i++)
    {
        test = fread(&iv, sizeof(uint16_t), 1, input);
        if (test == 1)
        {
            read++;
            current[i] = chimera_gain(iv, daqsetup);
        }
        else
        {
            perror("End of file reached");
            break;
        }
    }
    return read;
}

The chimera_gain function just takes a 16-bit integer, scales it and returns the double for storage.

The second file type contains 64-bit doubles, but it contains two columns, of which I only need the first. To do this I fread pairs of doubles and discard the second one. The double must also be endian-swapped before use. The code I use to do this is below:

int64_t read_current_double(FILE *input, double *current, int64_t position, int64_t length)
{
    int64_t test;
    double iv[2];

    int64_t i;
    int64_t read = 0;

    if (fseeko64(input, (off64_t)position * 2 * sizeof(double), SEEK_SET))
    {
        return 0;
    }

    for (i = 0; i < length; i++)
    {
        test = fread(iv, sizeof(double), 2, input);
        if (test == 2)
        {
            read++;
            swapByteOrder((int64_t *)&iv[0]);
            current[i] = iv[0];
        }
        else
        {
            perror("End of file reached: ");
            break;
        }
    }
    return read;
}

Can anyone suggest a method of reading these file types that would be significantly faster than what I am currently doing?

446

asked Aug 19 '16 16:08

KBriggs

1 Answers

First off, it would be useful to use a profiler to identify the hot spots in your program. Based on your description of the problem, you have a lot of overhead going on by the sheer number of freads. As the files are large there will be a big benefit to increasing the amount of data you read per io.

Convince yourself of this by putting together 2 small programs that read the stream.

1) read it as you are in the example above, of 2 doubles.

2) read it the same way, but make it 10,000 doubles.

Time both runs a few times, and odds are you will be observe #2 runs much faster.

Best of luck.

answered Sep 26 '22 15:09

EvilTeach

Related questions
                            
                                Understand assembly code in c
                            
                                How to calculate the hash of a string literal using only the C preprocessor?
                            
                                Assembly code redundancy in optimized C code
                            
                                Is there a standard C way to print floating-point values "perfectly" a la Dragon4?
                            
                                POSIX Threads not producing speed up in C
                            
                                Discrete Wavelet Transform LeGal 5/3 with Lifting (negative values, visualizing, LH HL confusion)
                            
                                send and receive JSON over sockets in server and client application in C
                            
                                Is the C standard time() function thread safe even if provided a NULL parameter?
                            
                                Write a simple Bootloader HelloWorld - Error function print string
                            
                                saving gtk window position
                            
                                What is the real definition of the xorshift128+ algorithm?
                            
                                why did wait4() get replaced by waitpid()
                            
                                What does "type domain" and "real type" mean?
                            
                                Kernel Module Programming
                            
                                Parsing Call and Ret with ptrace.
                            
                                How to compute sine wave with accuracy over the time
                            
                                How to check if a linux shared library has been preloaded using LD_PRELOAD
                            
                                Fastest way to remove huge number of elements from an array in C
                            
                                When does a Perl script need to call `tzset` before calling `localtime`?
                            
                                Debugging a nasty SIGILL crash: Text Segment corruption

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Optimizing disk IO

Tags:

c

io

optimization

KBriggs

People also ask

1 Answers

EvilTeach

Recent Activity

Donate For Us