Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Accessing >2,3,4GB Files in 32bit Process on 64bit (or 32bit) Windows

Disclaimer: I apologize for the verbosity of this question (I think it's an interesting problem, though!), yet I cannot figure out how to more concisely word it.

I have done hours of research as to the apparently myriad of ways in which to solve the problem of accessing multi-GB files in a 32-bit process on 64-bit Windows 7, ranging from /LARGEADDRESSAWARE to VirtualAllocEx AWE. I am somewhat comfortable in writing a multi-view memory-mapped system in Windows (CreateFileMapping, MapViewOfFile, etc.), yet can't quite escape the feeling that there is a more elegant solution to this problem. Also, I'm quite aware of Boost's interprocess and iostream templates, although they appear to be rather lightweight, requiring a similar amount of effort to writing a system utilizing only Windows API calls (not to mention the fact that I already have a memory-mapped architecture semi-implemented using Windows API calls).

I'm attempting to process large datasets. The program depends on pre-compiled 32-bit libraries, which is why, for the moment, the program itself is also running in a 32-bit process, even though the system is 64-bit, with a 64-bit OS. I know there are ways in which I could add wrapper libraries around this, yet, seeing as it's part of a larger codebase, it would indeed be a bit of an undertaking. I set the binary headers to allow for /LARGEADDRESSAWARE (at the expense of decreasing my kernel space?), such that I get up to around 2-3 GB of addressable memory per process, give or take (depending on heap fragmentation, etc.).

Here's the issue: the datasets are 4+GB, and have DSP algorithms run upon them that require essentially random access across the file. A pointer to the object generated from the file is handled in C#, yet the file itself is loaded into memory (with this partial memory-mapped system) in C++ (it's P/Invoked). Thus, I believe the solution is unfortunately not as simple as simply adjusting the windowing to access the portion of the file I need to access, as essentially I want to still have the entire file abstracted into a single pointer, from which I can call methods to access data almost anywhere in the file.

Apparently, most memory mapped architectures rely upon splitting the singular process into multiple processes.. so, for example, I'd access a 6 GB file with 3x processes, each holding a 2 GB window to the file. I would then need to add a significant amount of logic to pull and recombine data from across these different windows/processes. VirtualAllocEx apparently provides a method of increasing the virtual address space, but I'm still not entirely sure if this is the best way of going about it.

But, let's say I want this program to function just as "easily" as a singular 64-bit proccess on a 64-bit system. Assume that I don't care about thrashing, I just want to be able to manipulate a large file on the system, even if only, say, 500 MB were loaded into physical RAM at any one time. Is there any way to obtain this functionality without having to write a somewhat ridiculous, manual memory system by hand? Or, is there some better way than what I have found through thusfar combing SO and the internet?

This lends itself to a secondary question: is there a way of limiting how much physical RAM would be used by this process? For example, what if I wanted to limit the process to only having 500 MB loaded into physical RAM at any one time (whilst keeping the multi-GB file paged on disk)?

I'm sorry for the long question, but I feel as though it's a decent summary of what appear to be many questions (with only partial answers) that I've found on SO and the net at large. I'm hoping that this can be an area wherein a definitive answer (or at least some pros/cons) can be fleshed out, and we can all learn something valuable in the process!

like image 487
Kadaj Nakamura Avatar asked Apr 03 '13 20:04

Kadaj Nakamura


People also ask

Can a 32-bit application use more than 4GB of RAM?

A 32-bit application can allocate more than 4GB of memory, and you don't need 64-bit Windows to do it.

How a 32-bit program running on 64-bit Windows can allocate more than 4GB of memory?

3 Answers. Show activity on this post. If your program is written as a 32-bit application, it can only use the 32-bit Windows Subsystem which is still in all versions of 64-bit Windows. To use more than 4GB (or 3GB) you need to recompile your program and target the 64-bit platform.

How much memory can a 32-bit application access?

The 2 GB limit refers to a physical memory barrier for a process running on a 32-bit operating system, which can only use a maximum of 2 GB of memory.

Are 32-bit programs limited to 4GB?

As about the 32-bit operating systems, contrary to the belief, there is no physical limit of 4GB for 32-bit operating systems.


1 Answers

You could write an accessor class which you give it a base address and a length. It returns data or throws exception (or however else you want to inform of error conditions) if error conditions arise (out of bounds, etc).

Then, any time you need to read from the file, the accessor object can use SetFilePointerEx() before calling ReadFile(). You can then pass the accessor class to the constructor of whatever objects you create when you read the file. The objects then use the accessor class to read the data from the file. Then it returns the data to the object's constructor which parses it into object data.

If, later down the line, you're able to compile to 64-bit, you can just change (or extend) the accessor class to read from memory instead.

As for limiting the amount of RAM used by the process.. that's mostly a matter of making sure that A) you don't have memory leaks (especially obscene ones) and B) destroying objects you don't need at the very moment. Even if you will need it later down the line but the data won't change... just destroy the object. Then recreate it later when you do need it, allowing it to re-read the data from the file.

like image 176
inetknght Avatar answered Oct 17 '22 08:10

inetknght