So I won't bore you with why, but my application can optionally perform some integrity checking on very large files (Up to 50gb) using CRC. Because I don't want to kill user's machines if they turn this option on I set the IoPriorityHintVeryLow hint on the handle and also was setting the thread priority to THREAD_MODE_BACKGROUND_BEGIN using this API.
The time consuming part of my code looks like this:
//
// Read one block of the changed data at a time, checking each CRC
//
DWORD blockNum = 0;
vector<BYTE> changeBuffer(DIRTY_BLOCK_SIZE);
outputDirtyBlockMap.reserve(crcList.size() / 8);
while (::ReadFile(hChangedFile, changeBuffer.data(), DIRTY_BLOCK_SIZE, &bytesRead, NULL) && bytesRead > 0)
{
// Check for cancellation every 500 blocks, doing it every time reduces CPU performance by 50% since WaitForSingleObject is quite expensive
if ((blockNum % 500 == 0) && IsCancelEventSignalled(hCancel))
{
RETURN_TRACED(ERROR_CANCELLED);
}
// Increase the size of the dirty block map?
ULONG mapByte = blockNum / 8;
if (mapByte == outputDirtyBlockMap.size())
{
outputDirtyBlockMap.resize(mapByte + 1);
}
DWORD mapBitNum = blockNum & 0x7L;
UCHAR mapBit = 1 << (7 - mapBitNum);
if (driverDirtyBlockMap.size() > mapByte && (driverDirtyBlockMap[mapByte] & mapBit))
{
//
// The bit is already set in the drivers block map, we don't have to bother generating comparing CRCs for this block
//
outputDirtyBlockMap[mapByte] |= mapBit;
}
else
{
// Validate that the CRC hasn't changed, if it has, mark it as such in the dirty block map
DWORD newCrc = CRC::Crc32(changeBuffer.data(), changeBuffer.size());
if ((blockNum >= crcList.size() || newCrc != crcList[blockNum]))
{
OPTIONAL_DEBUG(DEBUG_DIRTY_BLOCK_MAP & DEBUG_VERBOSE, "Detected change at block [%u], CRC [new 0x%x != old 0x%x]", blockNum, newCrc, blockNum < crcList.size() ? crcList[blockNum] : 0x0);
// The CRC is changed or the file has grown, mark it as such in the dirty block map
outputDirtyBlockMap[mapByte] |= mapBit;
}
}
++blockNum;
}
When I was profiling this code I was very surprised to find that when this loop runs in THREAD_MODE_BACKGROUND_BEGIN it takes 74 seconds to run over a 500Mb file. When running with THREAD_PRIORITY_LOWEST it takes 2.7 seconds to run over a 500Mb file. (I've tested this around 8 times now and that was the average)
In both cases the machine I was testing on was idle other than running this loop. So question:
Why does THREAD_MODE_BACKGROUND_BEGIN make this take so long? I'd have thought that if the machine isn't doing anything else, it should still run as quick as with any other priority because it doesn't need to be prioritized?
Is there something I should know about this priority that I haven't been able to figure out from the docs?
Setting background mode has the following effects:
While setting the relative thread priority to LOWEST has the following effect:
So, in general, especially if you're I/O bound (but even in cases of being CPU bound), you would definitely expect a thread at priority 4, running with Very Low I/O priority and Background Memory Priority (1) to perform a lot more poorly than a thread with Foreground Memory Priority (5) + Normal I/O Priority at priority 6...
That THREAD_MODE_* is different from THREAD_PRIORITY_* is maybe not that surprising?
I don't know if the exact differences are documented anywhere but it would not surprise me if background mode tries to run everything on a single core if the CPU supports core parking and at a lower frequency.
The SetThreadPriority documentation also hints to some changes to any I/O the thread performs:
The THREAD_PRIORITY_* values affect the CPU scheduling priority of the thread. For threads that perform background work such as file I/O, network I/O, or data processing, it is not sufficient to adjust the CPU scheduling priority; even an idle CPU priority thread can easily interfere with system responsiveness when it uses the disk and memory. Threads that perform background work should use the THREAD_MODE_BACKGROUND_BEGIN and THREAD_MODE_BACKGROUND_END values to adjust their resource scheduling priorities; threads that interact with the user should not use THREAD_MODE_BACKGROUND_BEGIN.
Have you tried to measure to see if the performance loss is in ReadFile
or the CRC calculation?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With