This is a project to reduce the amount of memory required for the desktop and for applications. The following sections are available:
Overview - We describe the basic problem.
/ProgressTracker - Track our progress, and see which things have been fixed!
/KnownBugs - Pick one and fix it.
/Tools - What we have; what we don't have.
/Tasks - What needs to be done.
/HackerHelpNeeded - Are you an experienced hacker? Here's some stuff we'd like you to do.
Note that the following discussions have been obsoleted by the development of the new g_slice API in GLib, which uses ideas from the slab allocator (MatthiasClasen)
/JohnEduardoEmail - An e-mail between John and Eduardo about a possible redesign of the memory allocator to attempt to block together mmap() segments of used and unused memory, allowing the fully unused blocks to be freed to memory.
/NewAlloc - A brief overview of a possible redesign of the memory allocator spawned from input from JohnBerthels. Berthels supplies some insight allowing a much less invasive change to the allocator than JohnMoser and Eduardo discussed; JohnMoser predicts that the effectiveness of Berthels' design may be enhanced by crowding allocations together.
Overview
The plan is to reduce the amount of memory that Gnome applications consume. Gnome is barely usable on a machine with 128 MB of RAM; contrast this with Windows XP, which is very snappy on such a configuration.
Why do you want to reduce memory consumption?
- It will allow you to postpone upgrading your machine.
- It will let Gnome run on the computers that are typical in developing countries (OLPC, or second-hand machines, low on memory, slow processors).
- Your machine will be faster since it has to swap less.
- All the cool kids are doing it.
- Another interesting usage, embedded applications
To begin hacking on memory reduction, one must break the problem down into many steps. There are a number of areas where the memory problem is exposed:
At the application level. This is by far the most important factor: does a particular application use too much memory? How much memory is Evolution using for a heavy email user? How much memory does my web browser use if I have been surfing all day? What is the total memory usage for the typical desktop (base desktop + web + music + email + office)?
JohnMoser: Consider applications running for a long time to be impossible, especially if they spike resources.
The heap can't shrink if fragmented; heap fragmentation is evil and we need a new malloc() drop-in memory allocator that doesn't use the heap. If the heap is UUxxxxxxxxxxxxxxxxxxxxxxxxxU (Used, x=free), then all that x=free space is at best swapped out (very bad); when the far right Used space is freed, the thing will finally shrink the heap and return it to the system. I've seen this cause Nautilus to use 300M of memory (RSS) after it had to delete 6000 files; good news being it won't allocate any MORE memory until it's used the empty area. This is likely also why Firefox eats about 40M at start-up and 100M later, then stops growing but won't shrink either. My suggestion is to rewrite malloc() to abuse mmap() segments and/or shared memory to return big, unused areas that held small allocations to the system. Not much else can really be done, sorry. WLI has said before that the Linux kernel can handle "millions" of mmap() segments; the kernel devs will likely work on making that more efficient if it's ever overloaded, not like that's on the horizon.
The new allocator approach would not be "make an mmap() segment for each allocation;" it would need to make managed mmap() segments each holding multiple small mappings or being part of a large mapping span. Huge pages (4M pages) would result in the smallest mmap() returning a 4M area; this would be more efficient speed-wise, and so an mmap()-based allocator would potentially fuel the kernel developers forward towards this goal. In the mean while, just allocate 1024 4K anonymous pages and manage 4M wide mmap() segments, and consider multiples of 4M. Also, don't try adjacentising the mappings or giving fixed offsets; address space layout randomization (kernel-level security) would randomly chose the base of the mmap() segments, so just allocate segments of multiple pages at whatever offset the kernel gives, and treat them as "mini-heaps." Ideally, any use of mmap() with a fixed or even suggested virtual address offset is discouraged. Consider that this will result in being able to free some areas of memory back to the OS, and thus allocate memory in many cases where instead you would have to swap in unused pages (DISK ACCESS!). This would be good for speed (responsiveness) as well as memory usage.
(JohnMoser had a lot to say, didn't he?)
At the desktop level. This includes programs that run throughout the duration of the session. How much memory is each applet using? How is nautilus using its memory? How about gnome-settings-daemon? Since these processes "never die" from the viewpoint of the user, it is important to make them as small as possible, to free up memory for real applications.
At the toolkit level. This means everything from the X server, to GLib, GObject, Fontconfig, Freetype, Pango, GTK+, and the various Gnome libraries. Using GTK+ from CVS HEAD on Feb. 21, 2005, a simple Hello World has a Valgrind-reported malloc() heap of about 500 KB. And this is just for Hello World! Given that users run many apps at a time, costs like this add up. Since the GTK 2.4 series, there has been alot of great work, that killed a few 100 KB off of this. In addition, work has been done by fontconfig to achieve even more savings. Continuing progress in this area will show up for everyone. Every application has a basic memory footprint, due to the toolkit.
StephaneChauveau: Allocation a lot of small blocks with mmap instead of a large heap is probably a good idea.
- However, that won't work for applications with a bad malloc/free behavior. For example, consider an application that allocates thousands of 32 byte blocks managed in 4KB mmap pages (128 blocks per page) and then free 99% of them. Most of the 4KB mmap pages will still contain 1 or 2 blocks. A solution could be to implement a 'memory defragmenter'. In practice, that means that the allocated blocks do not have a constant address so the application must be informed one way or another of all reallocations.
How to measure the memory used?
First we need to know how to measure the memory used. Tool like ps are quite useless for that purpose, unless you know exactly what they mean. memory usage with smaps. Also, other interesting posts, especially for newbies, are this and this. Andy Wingo has a nice article about reducing the memory footprint of Python applications here.
See this blog entry about
How is memory used?
Here we only consider memory that is allocated for application data; we don't consider (for now) the memory used for code.
JohnMoser: Interesting consideration, at least with ET_DYN (PIEs and libraries), the code is actually mmap()ed into memory. As long as you don't alter it (and you shouldn't, ever, under any circumstances), that area will be file-backed and exist as disk cache-- that is, it will be purged if not used and should be shared between processes mapping the same area in. This means that the code takes effectively zero memory. There's some for the memory mapping in the kernel, and some for the Global Offset Table (GOT) entries. So we can ignore that, to a degree. (JohnGilmore: This isn't true -- any shared library page that gets relocated by the dynamic linker will have been made writeable (copy-on-write) and will then have to either be maintained in RAM forever (OLPC) or swapped out to paging space on a hard drive. Pages are relocated either because the library wasn't prelinked to the address where it ended up mapped in; or because it refers to symbols from other libraries.)
The heap is what you malloc(). Most of an application's memory consumption is likely to be here. Tools like memprof and valgrind let you monitor the heap.
Gnome applications also use memory in other ways. Every GTK+ process has a block of shared memory that it uses to quickly communicate image data to the X server. See gnomecvs:gtk+/gdk/gdkrgb.c.
Also, all X clients use resources inside the server. A program like xrestop lets you monitor this usage. For example, Mozilla stores image data in web pages as pixmaps in the X server. That's why your X server balloons to 100 MB when viewing large, graphics-heavy web pages.
StephaneChauveau: There are also the non read-only pages that are allocated statically by each shared library. They contain all non-constant global data and the shared library address translation tables. Those pages are not shared between processes. I made a quick computation for gcalctool (on AMD64). That simple program uses 73 shared libraries that allocate a total of 13Mb of non-shared static data. I hope that I made a mistake somewhere.
JohnMoser: read the man page for ld for the --as-needed switch please. Apparently some people using or developing Ubuntu have discovered that they can shave a big chunk off the GNOME dependencies using this switch. This is apparently experimental; but has something to do with different libraries being needed on different platforms. linku.
Meeting logs
GTK+ team meeting log, 2005/Feb/21
Basic footprint
All applications in the desktop have a minimum memory footprint which is determined by the toolkit. For example, all GTK+ apps allocate memory for this:
- GObject: Bookkeeping of the type system. Class types, fundamental types, interfaces, object properties, signals.
- Pango: Font sets. FIXME: is this accurate?
- Freetype and Fontconfig: font lists and glyph data.
- GTK+: Xlib internal data, shared memory segments for GdkRGB, widget paraphernalia.
BenMaurer reports that a trivial "Hello World" program in GTK+ 2.6 generates the following data in Valgrind:
- ==7284== still reachable: 457933 bytes in 6262 blocks.
That's 450 KB of data for a program that does nothing but paint a button! And that doesn't even include the code size. It also does not take into account the GdkRGB shared memory segments, which are 96K pixels per application. On machines with 24-bit displays, that's 400 extra KB per process.
[ Isn't this just the allocated data that was still allocated at exit? It doesn't count data that was used and then freed. If GTK cleaned up after itself properly, this value should be zero... so this value is just a lower bound on the memory consumption of hello world. I guess massiv would show a more accurate figure. ]
BenMaurer: Not really. First, I used ctrl+c to exit, so valgrind did not do anything to clean up gtk+. Second, remember that the OS is about to free everythnng mre efficently than we ever could. "cleaning up" is useless. For a lareger test case, massif is a better idea however.
A typical Gnome desktop has around 20 processes running all the time: gnome-session, gnome-settings-daemon, Nautilus, the panel, a bunch of applets, etc. So that's 20 times 850 KB, which amounts to 17 MB just for the footprint of the basic desktop components. This of course doesn't take into account each component's real data: file lists in Nautilus, images for applets, the panel's menus, the Metacity theme, etc.
JohnMoser: Looking into Position Independent Executables (PIEs), which are used on some platforms to allow address space layout randomization (ASLR) to do a better job, it may be possible to try and mitigate the applets to a degree. libc6/glibc is a PIE; if you chmod +x libc6.so, you can run it as a program! If the applets had an entry point as a library, and were compiled as PIEs, it may be possible to craft each applet to be able to run either as they are now (separate process) or as a plug-in which is loaded by gnome-panel, depending on a user option (and of course if the applet supports that). This could shave off some data, assuming that the applets have to devote a significant amount of heap space to setting up basic GTK+ structures just to set up a significantly smaller set of controls.
JohnGilmore: You can see what memory a running process is using by "cat /proc/nnn/smaps", where nnn is the process number. This will print serveral lines for each region of memory, listing the range of memory addresses, permissions, the filename of the file mapped here (if any), and then various sizes. Here is one section from the output for a Gnome Terminal process:
00411000-004c4000 r-xp 00000000 fd:00 37009854 /usr/lib/libvte.so.9.1.5 Size: 716 kB Rss: 188 kB Shared_Clean: 0 kB Shared_Dirty: 0 kB Private_Clean: 188 kB Private_Dirty: 0 kB 004c4000-004c7000 rwxp 000b3000 fd:00 37009854 /usr/lib/libvte.so.9.1.5 Size: 12 kB Rss: 8 kB Shared_Clean: 0 kB Shared_Dirty: 0 kB Private_Clean: 0 kB Private_Dirty: 8 kB
This shows that parts of this library used by Gnome Terminal have been mapped into two address ranges (first for the read-only memory (r-xp), then for the read-write (rwxp)). There's 716 kB of read-only, of which 188 kB is resident and is private and clean (not written to). The rest of the 716 kB is paged out. There's 12 kB of writeable memory, of which 8 kB is resident, it's private rather than shared with another process, and it's dirty because it has been modified since being read in from disk.
When reducing memory footprints for the OLPC, keeping track of the Private_Dirty memory is key, because it can never be swapped out. The OLPC has no hard drive, and doesn't swap to flash (to avoid wearing out the flash rapidly). Clean pages can be discarded at any time, because they can be reread from the file when needed; dirty pages must be kept by the kernel.
In the Gnome Terminal output, the biggest chunk is this entry, which represents the heap:
089a9000-09f5d000 rwxp 089a9000 00:00 0 Size: 22224 kB Rss: 4720 kB Shared_Clean: 0 kB Shared_Dirty: 0 kB Private_Clean: 528 kB Private_Dirty: 4192 kB
Here, 4.7 MB is resident, almost all of it is dirty, and there's 22 MB allocated but not resident.
Hmm, here are some big chunks too -- though they aren't resident, they take up a lot of virtual memory and are not actually being used by this ordinary terminal emulator. Why is Gnome Terminal loading Korean and Japanese fonts?
b36a5000-b408d000 r--p 00000000 fd:00 37227941 /usr/share/fonts/korean/TrueType/gulim.ttf Size: 10144 kB Rss: 16 kB Shared_Clean: 16 kB Shared_Dirty: 0 kB Private_Clean: 0 kB Private_Dirty: 0 kB b412f000-b4885000 r--p 00000000 fd:00 37227497 /usr/share/fonts/japanese/TrueType/sazanami-gothic.ttf Size: 7512 kB Rss: 44 kB Shared_Clean: 36 kB Shared_Dirty: 0 kB Private_Clean: 8 kB Private_Dirty: 0 kB
Please don't take out the fonts that allow me to read the names of the artists/songs of some of my music instead of seeing files "named" iAAA{!r!{{.ogg Why use unicode at all if you're only interested in 7-bit ASCII? (rhetorical question)