Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Merging File without space is faster than with space

So, I create a c++ executable file for merging files. I have 43 files with size of 100MB each. so a total of about 4.3GB.

Two cases:

One: If the file name are 1, 2, 3, 4, 5, 6, ..., 43 it takes about 2 minutes to finish merging.

Two: If the file name are This File.ova0, This File.ova1, ..., This File.ova42 it will takes about 7 minutes to finish merging.

This is the same exact file, I just rename the file. Any idea what's wrong?

This is the c++ code

#include <iostream>
#include <fstream>

#include <vector>
#include <string>

#include "boost/filesystem.hpp"

namespace bfs = boost::filesystem;

#pragma warning(disable : 4244)


typedef std::vector<std::string> FileVector;
int main(int argc, char **argv)
{

    int bucketSize = 3024 * 3024;

    FileVector Files;

    //Check all command-line params to see if they exist..
    for(int i = 1; i < argc; i++)
    {
        if(!bfs::exists(argv[i]))
        {
            std::cerr << "Failed to locate required part file: " << argv[i] << std::endl;
            return 1;
        }

        //Store this file and continue on..
        std::cout << "ADDING " << argv[i] << std::endl;
        Files.push_back(argv[i]);
    }

    //Prepare to combine all the files..
    FILE *FinalFile = fopen("abc def.ova", "ab");

    for(int i = 0; i < Files.size(); i++)
    {
        FILE *ThisFile = fopen(Files[i].c_str(), "rb");     

        char *dataBucket = new char[bucketSize];

        std::cout << "Combining " << Files[i].c_str() << "..." << std::endl;

        //Read the file in chucks so we do not chew up all the memory..
        while(long read_size = (fread(dataBucket, 1, bucketSize, ThisFile)))
        {
            //FILE *FinalFile = fopen("abc def.ova", "ab");
            //::fseek(FinalFile, 0, SEEK_END);
            fwrite(dataBucket, 1, read_size, FinalFile);
            //fclose(FinalFile);
        }

        delete [] dataBucket;
        fclose(ThisFile);
    }
    fclose(FinalFile);

    return 0;
}

I run it through .bat file like this:

@ECHO OFF

Combiner.exe "This File.ova0" "This File.ova1" "This File.ova2" 

PAUSE

or

@ECHO OFF

Combiner.exe 1 2 3

PAUSE

both .bat file goes until the end of file name, I just wrote 3 files in here, otherwise it will be too long

Thank you

like image 368
Harts Avatar asked Nov 12 '22 13:11

Harts


1 Answers

By default, Windows caches file data that is read from disks and written to disks. This implies that read operations read file data from an area in system memory known as the system file cache, rather than from the physical disk. Correspondingly, write operations write file data to the system file cache rather than to the disk, and this type of cache is referred to as a write-back cache. Caching is managed per file object: More informations: File Caching

like image 74
HMVC Avatar answered Nov 15 '22 06:11

HMVC