On 2005-04-27 an e-mail conversation occurred between John Richard Moser and Eduardo Pérez Ureta, which is mirrored here. This conversation was about memory allocators and their efficiency, and a redesigned memory allocator independently designed by John. Keep in mind that John is not a well known hacker, so take what he says with a grain of salt
Explaination of The Topic
The problem as seen by John is that the current memory allocators work based on one giant area of memory called the heap, only able to move the higher bound. This means that they hold unused memory from the system if anything below the highest used area is not used. The current allocator from glibc has the following characteristics.
- The "heap" is one giant memory allocation known as the "data" segment, with only the highest bound able to move.
- Single segments are created for single allocations larger than or equal to 128KiB. These are released to the system immediately when free()d.
- All other allocations either find an unused space of appropriate size in the heap and use it, or grow the heap to acommodate the new allocation.
- The heap shrinks when the highest page in the data segment is not used. Lower unused pages can't be released to the system until the higher ones are unused.
This produces the following effects:
- Single large allocations go back to the system immediately when free()d.
- The heap can shrink only to the highest used page.
- Large amounts of unused memory may be held by the program, especially if it runs for a long time.
- Unused but allocated pages may be swapped to the disk. This exchanges the overhead of a system call to get more memory from the OS for a segmentation fault handled by the OS virtual memory system by reading data from disk and copying it into memory.
John has theorized that a better design can be made, based on the assumption that one giant heap was useful only until 640KiB of memory was not enough for anything you wanted to do. To this end, the following describes a new allocator.
- The heap is made of a set of "miniheaps" created independently.
- Each miniheap should normally be of a predetermined size. This size should be as small as possible for optimal operation; that is, it should be just big enough that the OS doesn't choke tracking memory mappings and that the overhead of tracking the miniheaps is not significant.
- Allocations bigger than a miniheap should be allocated by themselves, in the same way the current allocator allocates a 128KiB allocation. These should probably be tracked separate from other miniheaps, simply because they should never be otherwise used so that they can return to the system as soon as they are free()d.
- Miniheaps should never shrink or grow. Shrinking miniheaps make them appear more used and get more usage; growing them recreates the issues of the current single-heap design. The exception is single allocation miniheaps, which should shrink or grow with their single allocation using mremap().
- Miniheaps may not shrink or grow physically but still be tracked as being the same size and usage. This would allow the unused memory at the end of a miniheap to go back to the system; however, other mmap() segments may then get in the way of mremap()ing the miniheap back up. The miniheaps cannot move in memory and thus the efficiency of managing them in this way would cause excessive allocator overhead.
- The allocations for a miniheap must be tracked inside the miniheap itself, with adjacent free allocations combining into single used allocations such that an empty miniheap has one giant allocation.
- Miniheaps must be tracked using memory not reachable by the actual allocator, i.e. malloc(), free(), and realloc(); otherwise, miniheaps could hold other miniheaps as used just for a small index for a miniheap, and cause miniheap existence interdependencies that would hold memory as used simply because other memory in another miniheap was used.
- Allocations should be radically imbalanced; that is, heavilly used miniheaps should become more heavily used and highly unused miniheaps should become less used. This is done by favoring more used miniheaps for new allocations and only using less used miniheaps if the allocations don't fit elsewhere.
- Allocating new miniheaps should be avoided.
- The realloc() function, which resizes allocations, should aid the radical imbalance of the heap by moving smaller allocations to more filled miniheaps if possible. Larger allocations would produce too much overhead if moved around.
This has the following effects:
- Miniheaps typically will either become more used or less used, depending on how used they are in relation to other miniheaps. The imbalance of allocations causes less used blocks to naturally starve, because new allocations aren't created and old ones should be free()d. Thus, less used blocks starve off and go back to the system.
- Single huge allocations are ultimately imbalanced, and thus are released to the system when free()d.
- Adjacent free allocations combine to make bigger allocations, for simplified code in the allocator implimentation.
Miniheaps appear at random locations on kernels utilizing address space layout randomization, such as PaX, GrSecurity, or 2.6.12 Linux kernels. This creates a better caching situation, and a better obscurity situation in case attackers need data off the heap. We don't care, because mmap() tells us where the segment starts anyway. People who can't find the GOT in a PIE executable they're trying to hijack are going to be very pissed, which is a good thing.
In other words, free space in memory should group itself, and used space in memory should group itself, and large chunks of free space should go back to memory.
It should be noted that glibc's allocator logically splits up the heap and uses different sections for different threads to implement thread safety. In the interest of memory efficiency, it would probably be better to simply use a read-write locking semaphore for the list of miniheaps and for each miniheap, so that fine grained locking can occur. Because pthreads may use malloc(), the allocator may need its own trivial read-write locking semaphore, which means it may need its own mutex code (which is used to build a read-write locking semaphore).
E-Mail Conversation
Eduardo Pérez Ureta wrote:
I saw your reference to using mmap() in malloc() at: http://live.gnome.org/MemoryReduction
I was testing if that was true and using this simple example I saw that in fact glibc libc6_2.3.5-1 from Debian already is using that:
int main (int argc, char *argv[]) {
- char* mem = (char*)malloc(128*1024);
}
mmap2(NULL, 135168, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7fc8000
Could you be more specific on what is left to be done? And what are the problems with the current implementation?
John Richard Moser wrote:
Right now mmap() is used to get an exclusive allocation for allocations bigger than 128KiB, as you've shown. The problem is that all smaller allocations go into a single mapping (the BSS/Data segment is an mmap() segment in the eyes of the kernel) which only grows up and shrinks down (right side moves):
*****
*****++++
*****++
The current allocators are designed for situations where the right side (higher memory addresses) of the heap to hold data, while there are free areas elsewhere in the heap. They find an area big enough for a requested allocation and use it, growing the heap ONLY when there isn't a big enough area AND the allocation is less than 128KiB.
The problem is that certain usage patterns typical of long-running programs cause spikes in memory usage and then sharp drop-offs, but leave a little bit of data at the end of the heap. For example, a web browser may need 40M of heap space, then grow to about 90 when 15 tabs are open viewing flash sites. As entries are put in the address bar, memory is allocated on the heap to hold the new entries from history. Once all tabs are closed, only about 5-10 MiB of ram may be freed, leaving an 80-90 meg heap that has about 50-60% free space.
For a redesign, I'm trying to work out an allocator that allocates 4MiB mmap() segments and manages those. The following characteristics seem optimal:
- Allocate "Miniheaps" of an arbitrary but small size, right now 4MiB, and manage those as you would the heap.
- Favor the "miniheaps" with the least free memory (most used) for new allocations so that the ones with the most free memory (least used) eventually naturally starve off. If they do not starve off by the time the others are filled, they will be used. For example, if miniheap A has 3000 bytes free, and miniheap B has 3MiB free, miniheap A will be favored for the next allocation, if it can fit in one of the free areas.
- Single allocations bigger than the normal "miniheap" size (4MiB) will be adjusted up to use a full, aligned mmap() segment, which will be freed as soon as that allocation is freed. This is just like the 128KiB+ allocations in glibc. These allocations can be heuristically identified because they're the exact size of the miniheap; but it's easier if they're just flagged, because the field that indicates if the allocation is free or full already exists.
- Single allocation "miniheaps" should shrink and grow using mremap() when realloc() is used. Again, realloc() has to find the allocation, so it can simply check the flag.
- realloc() should actively float allocations from less used to more used miniheaps, even if the realloc() can succeed in place, if the allocation is sufficiently small and an area in a more used miniheap is available. This would empty out the less used miniheaps, allowing them to be freed to the system.
- A special area must be used to track miniheaps. It's possible to allocate the structures that point out the allocation areas in other allocation areas, but this in the worst case means you have to free the last allocation area before freeing the earlier ones. It is most favorable to keep the allocations that describe the actual system of miniheaps in its own separate allocation mechanism, or in the first miniheap. Storage area for precisely one miniheap structure must be available in a static variable so that the location is already known.
If I'm not mistaken, there's three new concepts here: Using miniheaps, favoring the least free allocation blocks (miniheaps), and floating the smaller allocations up to less free blocks when realloc() is used. I believe these would cause the natural arbitrary chunks of contiguous free space to occur in groups large enough to fill miniheaps, thus freeing memory back to the system. It is optimal in terms of memory usage to use smaller miniheaps.
My current code doesn't do all of this; in particular, it misses the last three, and doesn't exclusivate the miniheaps for larger allocations as in the third point. It also segfaults when used, because I suck and coded it all buggy and shit. it runs for a while, but with constant free/alloc it eventually rapes itself. It also appears to push VM space up to the max, something it shouldn't do because I never allocate enough to fill even one miniheap and thus shouldn't be requesting more memory from the system. In short, it's REALLY buggy.
Eduardo Pérez Ureta wrote:
WOW, what a response!!! Let me take some time to digest it Anyway this should be put in the wiki page: http://live.gnome.org/MemoryReduction if you didn't already did that. As I'm not the only one interested in this topic.
"* Single allocations bigger than the normal "miniheap" size (4MiB) will be adjusted up to use a full, aligned (...)"
Maybe separate mmap allocations should still be done to sizes above 128 KiB (256, 1024, or something). For example, allocating two 3 MiB blocks would give two mmapped areas, but with tracking added.