If I am hosting a long running application such as a web server within a Common Lisp image, what strategy should I use to manage the garbage collector?
I'm assuming that, by default, the garbage collector is entitled to spend long periods of time sorting out the heap, at times I can't predict. This may impact a particular browser request in ways I don't want.
Is there a method in Common Lisp to control this? Perhaps by encouraging it to work in a 'little-and-often' manner?
Garbage collection was invented by John McCarthy around 1959 to solve problems in Lisp. Every Common Lisp implementation, must have garbage collection defined, since any standard implementation must comply to Common Lisp ANSI standard.
Using Runtime. getRuntime(). gc() method: Runtime class allows the application to interface with the JVM in which the application is running. Hence by using its gc() method, we can request JVM to run Garbage Collector.
Garbage-collection has always been necessary because the computer's supply of addressable space has always been much less than the total space used during execution of a list-processing program. Garbage-collection makes it possible to reuse the system's limited supply of addressable space.
Garbage collection (GC) is a memory recovery feature built into programming languages such as C# and Java. A GC-enabled programming language includes one or more garbage collectors (GC engines) that automatically free up memory space that has been allocated to objects no longer needed by the program.
Several Lisp implementations have excellent Garbage Collectors. A special problem is that Lisp applications often have a high allocation rate of small objects (conses, ...).
There are a few things to consider.
Precise vs. Conservative GC. I'm not a huge fan of conservative GCs (Boehm etc.) for Lisp. A problem of conservative GCs is that they don't find all garbage. This can be a problem for long running programs and lead to fragmentation and not reclaimed unused memory. Precise GCs are using the tag information of Lisp data and can identify every data type of every object. Conservative GCs were invented for programming language implementations that don't use tagged data (C++, ...).
copying GC, compacting GC. To work against memory fragmentation in long running Lisps, a GC that compacts and localizes objects can be useful. A problem sometimes comes up when hashtables need to be rehashed (because the location changes). A copying GC may need more memory (because there is a from and a to space of memory). But when the GC copies the objects from one memory space into another, it automatically makes it more compact. More advanced GCs (like on the Lisp Machine) can also sort objects and allocate objects of the same type near each other - assuming that this will speed up accessing objects.
ephemeral GC. This means that there is a first GC stage that exclusively runs in main memory and gets some support from a memory management unit to identify changed memory regions. Scanning main memory is fast than scanning virtual memory and scanning only changed memory regions reduces the amount of work further. When lots of objects are allocated and quickly turning into garbage, this gives very short GC pauses.
generational GC. Usually GCs nowadays are generational. There is more than one generation and objects that survive a few GCs are promoted to an older generation. Usually only the first generation is GCed very often.
Tuning. GCs of, say, LispWorks and Allegro CL have a lot of tuning knobs. Especially for long-running applications it makes sense to read the manual and for example tune the number of generations, their sizes and other things.
virtual memory. GC over virtual memory is usually very slow. Avoid that if possible - add more RAM to the machines.
manual memory management. For example the CL-HTTP web server does some manual memory management using resources. These are pools of pre-allocated objects that can be reinitialized very quickly. The Lisp Machines were using this a lot. Typical use of them is in read buffers for streams. Instead of creating new strings by every read operation, it is useful to use reusable buffers.
stack allocation. Some Common Lisp allow stack allocation of some data - leaving the block then automatically frees the memory. This assumes then that the memory is no longer referenced on leaving a block. See the declaration dynamic-extent
.
concurrent GC. None of the usual Common Lisp implementations has a concurrent GC AND support for concurrent Lisp threads. Some implementations have concurrent Lisp threads, but the GC will stop them all while it is working. If the Lisp implementation runs on the JVM, like ABCL, then it might be able to use a JVM concurrent/parallel GC.
profiling the memory allocation. If you are not sure where allocation happens and what the GC does, you need to find that out using profiling information.
If your Lisp has a precise generational GC which runs in main memory it is hard to get problems with long pauses. Clozure CL (a free Common Lisp implementation) for example has a very good GC implementation on some platforms. You want to avoid memory fragmentation and garbage collections in virtual memory. If necessary use a 64bit Lisp implementation with more main memory.
Pointers:
You can see from the documentation that LispWorks and Allegro CL have lots of knobs for tuning GC.
Common Lisp has a few functions that deal with the implementation environment. (ROOM) is a function that gives an overview of memory usage. (ROOM t) gives more detail (here LispWorks):
CL-USER 2 > (room t)
Generation 0: Total Size 1823K, Allocated 1090K, Free 725K
Segment 2008AAB8: Total Size 507K, Allocated 361K, Free 142K
minimum free space 64K,
Awaiting promotion = 0K, sweeps before promotion =10
Segment 217E7050: Total Size 1315K, Allocated 729K, Free 582K
minimum free space 0K,
Awaiting promotion = 0K, sweeps before promotion =2
Generation 1: Total Size 1397K, Allocated 513K, Free 871K
Segment 20CB9A50: Total Size 68K, Allocated 48K, Free 15K
minimum free space 0K,
Awaiting promotion = 0K, sweeps before promotion =4
Segment 216D7050: Total Size 1088K, Allocated 331K, Free 752K
minimum free space 0K,
Awaiting promotion = 0K, sweeps before promotion =4
Segment 2004E4F8: Total Size 241K, Allocated 133K, Free 103K
minimum free space 0K, static
Generation 2: Total Size 2884K, Allocated 1290K, Free 1585K
Segment 21417050: Total Size 2816K, Allocated 1227K, Free 1584K
minimum free space 0K,
Awaiting promotion = 0K, sweeps before promotion =4
Segment 20DA5DA0: Total Size 68K, Allocated 62K, Free 1K
minimum free space 117K,
Awaiting promotion = 0K, sweeps before promotion =4
Generation 3: Total Size 19373K, Allocated 19232K, Free 128K
Segment 20109A50: Total Size 11968K, Allocated 11963K, Free 0K
minimum free space 3K,
Awaiting promotion = 0K, sweeps before promotion =10
Segment 20DB6E18: Total Size 6528K, Allocated 6396K, Free 128K
minimum free space 0K,
Awaiting promotion = 0K, sweeps before promotion =10
Segment 20CCAAC8: Total Size 876K, Allocated 872K, Free 0K
minimum free space 0K,
Awaiting promotion = 0K, sweeps before promotion =10
Total Size 25792K, Allocated 22127K, Free 3310K
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With