Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to solve memory segmentation and force FastMM to release memory to OS?

Note: 32 bit application, which is not planned to be migrated to 64 bit.

I'm working with a very memory consuming application and have pretty much optimized all the relevant paths in respect to memory allocation/de-allocation. (there are no memory leaks, no handle leaks, no any other kind of leaks in the application itself AFAIK and tested. 3rd party libs which I cannot touch are of course candidates but unlikely in my scenario)

The application will frequently allocate large single and bi-dimensional dynamic arrays of single and packed records of up to 4 singles. By large I mean 5000x5000 of record(single,single,single,single) is normal. Also having even 6 or 7 such arrays in work at a given time. This is needed as there are a lot of cross-computations made on these arrays and having them read from disk would be a real performance killer.

Having this clarified, I am getting out of memory errors a lot because of these large dynamic arrays which will not go away after releasing them, no matter if I setlength them to 0 or finalize them. This is of course something FastMM is doing in order to be fast, I know that much.

I am tracking both FastMM allocated blocks and process consumed memory (RAM + PF) by using:

function CurrentProcessMemory(AWaitForConsistentRead:boolean): Cardinal;
var
  MemCounters: TProcessMemoryCounters;
  LastRead:Cardinal;
  maxCnt:integer;
begin
  result := 0;// stupid D2010 compiler warning
  maxCnt := 0;
  repeat
    Inc(maxCnt);
    // this is a stabilization loop;
    // in tight loops, the system doesn't get
    // much chance to release allocated resources, which in turn will get falsely
    // reported by this function as still being used, resulting in a false-positive
    // memory leak report in the application.
    // so we do a tight loop here, waiting, until the application reported memory
    // gets stable.
    LastRead := result;
    MemCounters.cb := SizeOf(MemCounters);
    if GetProcessMemoryInfo(GetCurrentProcess,
        @MemCounters,
        SizeOf(MemCounters)) then
      Result := MemCounters.WorkingSetSize + MemCounters.PagefileUsage
    else
      RaiseLastOSError;
    if AWaitForConsistentRead and (LastRead <> 0) and (abs(LastRead - result)>1024) then
    begin
      sleep(60);
      application.processmessages;
    end;
  until (not AWaitForConsistentRead) or (abs(LastRead - result)<1024) or (maxCnt>1000);
  // 60 seconds wait is a bit too much
  // so if the system is that "unstable", let's just forget it.
end;

function CurrentFastMMMemory:Cardinal;
var mem:TMemoryManagerUsageSummary;
begin
  GetMemoryManagerUsageSummary(mem);
  result := mem.AllocatedBytes + mem.OverheadBytes;
end;

I am running the code on a 64bit computer and my top memory consumption before crashes is about 3.3 - 3.4 GB. After that, I get memory/resources related crashes anywhere in the application. Took me some time to pin it down on the large dynamic arrays usage which were buried down in some 3rd party library.

The way I am getting over this is that I made the application resume itself from where it left off, by re-starting itself and closing with certain parameters. This is all nice and dandy if memory consumption is fair and current operation finishes.

The big problem happens when the current memory usage is 1GB and the next operation to process requires 2.5 GB memory or more to be processed. My current code limited itself to an upper value of 1.5 GB used memory before resuming, but in this situation, I'd have to drop the limit down under 1 GB which would basically have the application resume itself after each operation and not even that guaranteeing that everything will be fine.

What if another operation will have a larger data set to process and it will require a total of 4GB or more memory?

To note that I am not talking about actual 4 GB in memory, but consumed memory by allocating huge dynamic arrays which the OS doesn't get back once de-allocated and hence it still sees it as consumed, so it adds up.

So, my next point of attack is to force fastmm to release all (or at least part of) memory to the OS. I'm specifically targeting the huge dynamic arrays here. Again, these are in a 3rd party library so re-coding that is not really in the top options. It's much easier and faster to tinker in the fastmm code and write a proc to release the memory.

I can't switch from FastMM as currently the entire application and some of the 3rd party libs are heavily coded around the use of PushAllocationGroup in order to quickly find and pinpoint any memory leaks. I know I can write a dummy FastMM unit to solve the compilation references, but I will be left without this quick and certain leak detection.

In conclusion: is there any way I can force FastMM to release at least some of it's large blocks to the OS? (well, sure there is, the actual question is: did anybody write it and if so, mind sharing?)

Thanks

later edit:

I will come up with a small relevant test application soon. It doesn't appear to be that easy to mock up one

like image 380
ciuly Avatar asked Dec 18 '13 20:12

ciuly


2 Answers

I doubt that the issue is actually down to FastMM. For huge memory blocks, FastMM will not do any sub-allocation. Your allocation request will be handled with a straight VirtualAlloc. And then deallocation is VirtualFree.

That's assuming that you are allocating those 380MB objects in one contiguous block. I suspect that what you actually have are ragged 2D dynamic arrays. And they are not single allocations. a 5000x5000 ragged 2D dynamic arrays takes 5001 allocations to initialise. One for the row pointers, and 5000 for the rows. Those will be medium FastMM blocks. There will be sub-allocation.

I think you are asking too much. In my experience, any time you need over 3GB of memory in a 32 bit process, it's game over. Fragmentation of address space will stop you before you run out of memory. You cannot hope for this to work. Switch to 64 bit, or use a cleverer, less demanding allocation pattern. Or do you really need dense 2D arrays? Can you use sparse storage?

If you cannot alleviate your memory demands that way, you could use memory mapped files. This would allow you to make use of the extra memory that your 64 bit system has. The system's disk cache can be larger than 4GB and so your app can traverse more than 4GB of memory without actually needing to hit the disk.

You could certainly try different memory managers. I honestly do not hold out any hope that it would help. You could write a trivial replacement memory manager that used HeapAlloc. And enable the low fragmentation heap (enabled by default from Vista on). But I sincerely doubt that it will help. I'm afraid that there won't be a quick fix for you. To resolve this you face a more fundamental modification to your code.

like image 189
David Heffernan Avatar answered Nov 14 '22 22:11

David Heffernan


Your issue as others have said is most likely attributable to memory fragmentation. You could test this by using VirtualQuery to create a picture of how memory is allocated to your application. You will very likely find that although you may have more than enough total memory for a new array, you don't have enough contiguous memory.

FastMem already does a lot to try and avoid problems due to memory fragmentation. "Small" allocations are done at the low end of the address space, whereas "large" allocations are done at the high end. This avoids a common problem where a series of large then small allocations followed by all large allocations being released results in a large amount of fragmented memory that is almost unusable. (Certainly unusable by anything slightly larger than the original large allocations.)

To see the benfits of FastMem's approach, imagine your memory layed out as follows:

Each digit represent a 100mb block.
[0123456789012345678901234567890123456789]

Small allocations represented by "s".
Large allocations repestented by capital letters.
[0sssss678901GGGGFFFFEEEEDDDDCCCCBBBBAAAA]

Now if you free all your large blocks, you should have no trouble performing similar large allocations later.
[0sssss6789012345678901234567890123456789]

The problem is that "large" and "small" are relative, and highly dependent on the nature of your application. FastMem defines a dividing line between "large" and "small". If you happen to have some small allocations that FastMem would classify as large, you may encounter the following problem.

[0sss4sGGGGsFFFFsEEEEsDDDDsCCCCsBBBBsAAAA]

Now if you free the large blocks you're left with:
[0sss4s6789s1234s6789s1234s6789s1234s6789]

And an attempt to allocate something larger than 400mb will fail.


Options

  1. You may be able to tweak the FastMem settings so that all your "small" allocations are also considered small by FastMem. However, there are a few situations where this won't work:
    • Any DLLs you use that allocate memory to your application but bypass FastMem may still cause fragmentation.
    • If you don't release all your large blocks together, those that remain may induce fragmentation which will slowly get worse over time.
  2. You could take on the task of memory management yourself.
    • Allocate one very large block e.g. 3.5GB which you keep for the entire lifetime of the application.
    • Instead of using dynamic arrays, you determine the pointer locations to use when setting up a new array.
  3. Of course the simplest alternative would be to go 64-bit.
  4. You could consider alternate data structures.
    • Do you really need array lookup capability? If not, another structure that allocates in smaller chunks may suffice.
    • Even if you do need array lookup, consider a paged array. Sparse arrays are a combination of arrays and linked lists. Data is stored on pages, with linked lists chaining each page.
    • A simple variant (since you mentioned your arrays are 2 dimensional) would be to leverage that: One dimension forms its own array providing a lookup into one of multiple arrays for the second dimension.
  5. Related to the alternate data structures option, consider storing some data on disk. Yes performance will be slower. But if an efficient caching mechanism can be found, then maybe not so much. It would be better to be a little slower, but not crashing.
like image 38
Disillusioned Avatar answered Nov 14 '22 23:11

Disillusioned