i got a program that needs to check if a chunk of a file is zeroed or has data. This alg runs for the whole file for sizes upto a couple of gigs and takes a while to run. Is there a better way to check to see if its zeroed?
Platform: Linux and windows
bool WGTController::isBlockCompleted(wgBlock* block)
{
if (!block)
return false;
uint32 bufSize = (uint32)block->size;
uint64 fileSize = UTIL::FS::UTIL_getFileSize(m_szFile);
if (fileSize < (block->size + block->fileOffset))
return false;
char* buffer = new char[bufSize];
FHANDLE fh=NULL;
try
{
fh = UTIL::FS::UTIL_openFile(m_szFile, UTIL::FS::FILE_READ);
UTIL::FS::UTIL_seekFile(fh, block->fileOffset);
UTIL::FS::UTIL_readFile(fh, buffer, bufSize);
UTIL::FS::UTIL_closeFile(fh);
}
catch (gcException &)
{
SAFE_DELETEA(buffer);
UTIL::FS::UTIL_closeFile(fh);
return false;
}
bool res = false;
for (uint32 x=0; x<bufSize; x++)
{
if (buffer[x] != 0)
{
res = true;
break;
}
}
SAFE_DELETEA(buffer);
return res;
}
How long is 'a while'? ... I'd say attempting to compare as many values in parallel as possible will help, maybe use some SIMD instructions to compare more than 4 bytes at a time?
Do keep in mind though, that no matter how fast you make the comparison, ultimately the data still needs to be read from the file. If the file is not already in a cache somewhere in memory, then you may be limited to in the order of 100-150 MB/s at a maximum before the bandwidth of the disk is saturated. If you have already hit this point, then you may first need to look at an approach that avoids having to load the file, or just accept the fact that it's not going to be faster than that.
Are there places in the file/chunk where it is more likely to have non-zero values? You only have to find one non-zero value (your break condition), so look in places first where you most probably find them - which doesn't have to be the beginning of a file/chunk. It might make sense to start at the end, or check the 1/3 in the middle, depending on the actual application.
However, I would not recommend to jump randomly to different positions; reading from disk might become incredibly ;) ..
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With