How much memory does opening a file take up on a modern Windows system? Some application loads will need to open "a lot" of files. Windows is very capable of opening "a lot" of files, but what is the load of keeping a single file open, so that one can decide when "a lot" is "too much"?
For sequential processing of large-ish datasets (100s MB ~ few GB) inside a 32bit process, we need to come up with a buffer that stores its contents on disk instead of in memory.
We have fleshed out a little class without too much problem (using CreateFile
with FILE_ATTRIBUTE_TEMPORARY
and FILE_FLAG_DELETE_ON_CLOSE
).
The problem is, the way these buffers will be used is such that each buffer (each temporary file) can potentially store from a few bytes up to a few GB of data, and we would like to keep the buffer class itself as minimal and as general as possible.
The use case ranges from 100 buffers with ~ 100MB each to 100.000s of buffers with just a few bytes each. (And yes, it is important that each buffer in this sense has it's own file.)
It would seem natural to include a buffer threshold in the buffer class that only starts creating and using a temporary on-disk file when it is actually storing more bytes than the (memory) overhead of creating+referencing a temporary file uses - in process as well as load on physical machine memory.
How much memory, in bytes, does opening a (temporary) file take up on a modern Windows system?
CreateFile
with FILE_ATTRIBUTE_TEMPORARY
and FILE_FLAG_DELETE_ON_CLOSE
That is, what is the threshold, in bytes, when you start seeing a net main memory gain (both in-process as well as physically) from storing data in a file instead of in-memory?
The comment mentioned open file limit is not applicable to CreateFile
, only to the MS CRT file API. (Opening 10.00s of files via CreateFile is no problem at all on my system -- whether it's a good idea is an entirely different matter and not part of this question.
Memory mapped files: Are totally unsuitable to process GB of data in a 32 bit process because you cannot reliably map such large datasets in to the normal 2GB address range of a 32 bit process. Are totally useless for my problem and do not, in any way at all, relate to the actual question. Plain files are just fine for the background problem.
Looked at http://blogs.technet.com/b/markrussinovich/archive/2009/09/29/3283844.aspx - which tells me that a HANDLE
itself takes up 16 bytes on a 64 bit system, but that's just the handle.
Looked at STXXL and it's docs, but neither is this lib appropriate for my task nor did I find any mention of a useful threshold before starting to actually use files.
Raymond writes: "The answer will vary depending on what antivirus software is installed, so the only way to know is to test it on the production configuration."
qwm writes: "I would care more about cpu overhead. Anyway, the best way to answer your question is to test it. All I can say is that size of _FILE_OBJECT
alone (including _OBJECT_HEADER
) is ~300b, and some of its fields are pointers to other related structures."
Damon writes: "One correct answer is: 10 bytes (on my Windows 7 machine). Since nobody else seemed it worthwhile to actually try, I did (measured difference in MEMORYSTATUSEX::ullAvailVirtual
over 100k calls, nothing else running). Don't ask me why it isn't 8 or 16 bytes, I wouldn't know. Took around 17 seconds of kernel time, process had 100,030 handles open upon exiting. Private working set goes up by 412k during run whereas global available VM goes down by 1M, so roughly 60% of the memory overhead is inside the kernel. (...)"
"What's more stunning is the huge amount of kernel time (which is busy CPU time, not something like waiting on disk!) that CreateFile
obviously consumes. 17 seconds for 100k calls boils down to around 450,000 cycles for opening one handle on this machine. Compared to that, the mere 10 bytes of virtual memory going away are kind of negligible."
At the Process tab, check the usage state of CPU, Memory, and Disk. If the Memory column displays 70%, 80%, 90%, or 99% used as the two figures shown below, it's a "High Memory Usage" issue.
In modern operating systems such as Windows, applications and many system processes always reference memory by using virtual memory addresses. Virtual memory addresses are automatically translated to real (RAM) addresses by the hardware.
Virtual memory is a common technique used in a computer's operating system (OS). Virtual memory uses both hardware and software to enable a computer to compensate for physical memory shortages, temporarily transferring data from random access memory (RAM) to disk storage.
The Maximum size is three (3) x the initial size. So let's say you have 4 GB (1 GB = 1,024 MB x 4 = 4,096 MB) of memory. The initial size would be 1.5 x 4,096 = 6,144 MB and the maximum size would be 3 x 6,144 = 18,432 MB."
I now did some measurements:
The call to create a temporary file (and I keep it's handle until the end) looks like this:
HANDLE CreateNewTempFile(LPCTSTR filePath) {
return ::CreateFile(
filePath,
GENERIC_READ | GENERIC_WRITE, // reading and writing
FILE_SHARE_READ, // Note: FILE_FLAG_DELETE_ON_CLOSE will also block readers, unless they specify FILE_SHARE_DELETE
/*Security:*/NULL,
CREATE_NEW, // only create if does not exist
FILE_ATTRIBUTE_TEMPORARY | // optimize access for temporary file
FILE_FLAG_DELETE_ON_CLOSE, // delete once the last handle has been closed
NULL);
}
The results are:
Note that I also tracked paging and the page file was not utilized at all (as I would hope for since this machine has 16GB of RAM and at the lowest point I still had ~ 4GB free).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With