Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

fastest way to check if memory is zeroed

Tags:

c++

linux

windows

i got a program that needs to check if a chunk of a file is zeroed or has data. This alg runs for the whole file for sizes upto a couple of gigs and takes a while to run. Is there a better way to check to see if its zeroed?

Platform: Linux and windows

bool WGTController::isBlockCompleted(wgBlock* block)
{
    if (!block)
        return false;

    uint32 bufSize = (uint32)block->size;
    uint64 fileSize = UTIL::FS::UTIL_getFileSize(m_szFile);

    if (fileSize < (block->size + block->fileOffset))
        return false;

    char* buffer = new char[bufSize];

    FHANDLE fh=NULL;

    try
    {
        fh = UTIL::FS::UTIL_openFile(m_szFile, UTIL::FS::FILE_READ);
        UTIL::FS::UTIL_seekFile(fh, block->fileOffset);
        UTIL::FS::UTIL_readFile(fh, buffer, bufSize);
        UTIL::FS::UTIL_closeFile(fh);
    }
    catch (gcException &)
    {
        SAFE_DELETEA(buffer);
        UTIL::FS::UTIL_closeFile(fh);
        return false;
    }

    bool res = false;

    for (uint32 x=0; x<bufSize; x++)
    {
        if (buffer[x] != 0)
        {
            res = true;
            break;
        }
    }

    SAFE_DELETEA(buffer);
    return res;
}
like image 452
Lodle Avatar asked Mar 01 '23 16:03

Lodle


2 Answers

How long is 'a while'? ... I'd say attempting to compare as many values in parallel as possible will help, maybe use some SIMD instructions to compare more than 4 bytes at a time?

Do keep in mind though, that no matter how fast you make the comparison, ultimately the data still needs to be read from the file. If the file is not already in a cache somewhere in memory, then you may be limited to in the order of 100-150 MB/s at a maximum before the bandwidth of the disk is saturated. If you have already hit this point, then you may first need to look at an approach that avoids having to load the file, or just accept the fact that it's not going to be faster than that.

like image 163
jerryjvl Avatar answered Mar 03 '23 04:03

jerryjvl


Are there places in the file/chunk where it is more likely to have non-zero values? You only have to find one non-zero value (your break condition), so look in places first where you most probably find them - which doesn't have to be the beginning of a file/chunk. It might make sense to start at the end, or check the 1/3 in the middle, depending on the actual application.

However, I would not recommend to jump randomly to different positions; reading from disk might become incredibly ;) ..

like image 26
beef2k Avatar answered Mar 03 '23 05:03

beef2k