Skip to content

3.1 Virtual Memory

D3vil0p3r edited this page May 29, 2023 · 7 revisions

Dirty Ratio

It contains, as a percentage of total available memory that contains free pages and reclaimable pages, the number of pages at which a process which is generating disk writes will itself start writing out dirty data.

In Athena we choose to set:

vm.dirty_ratio=40

Dirty Background Ratio

It contains, as a percentage of total available memory that contains free pages and reclaimable pages, the number of pages at which the background kernel flusher threads will start writing out dirty data.

In Athena we choose to set:

vm.dirty_background_ratio=15

Source: https://medium.com/for-linux-users/fine-tuning-linux-virtual-memory-c64583cb8fcc

Drop Caches

Writing to this will cause the kernel to drop clean caches, as well as reclaimable slab objects like dentries and inodes. Once dropped, their memory becomes free.

To free pagecache:

echo 1 > /proc/sys/vm/drop_caches

To free reclaimable slab objects (includes dentries and inodes):

echo 2 > /proc/sys/vm/drop_caches

To free slab objects and pagecache:

echo 3 > /proc/sys/vm/drop_caches

This is a non-destructive operation and will not free any dirty objects. To increase the number of objects freed by this operation, the user may run sync prior to writing to /proc/sys/vm/drop_caches. This will minimize the number of dirty objects on the system and create more candidates to be dropped.

This file is not a means to control the growth of the various kernel caches (inodes, dentries, pagecache, etc…) These objects are automatically reclaimed by the kernel when memory is needed elsewhere on the system.

Use of this file can cause performance problems. Since it discards cached objects, it may cost a significant amount of I/O and CPU to recreate the dropped objects, especially if they were under heavy use. Because of this, use outside of a testing or debugging environment is not recommended.

In Athena at the beginning a cron job with echo 3 > /proc/sys/vm/drop_caches command was run each 12 minutes. Due to negative performance impact, it has been decided to remove it.

OOM Dump Tasks

It enables a system-wide task dump (excluding kernel threads) to be produced when the kernel performs an OOM-killing and includes such information as pid, uid, tgid, vm size, rss, pgtables_bytes, swapents, oom_score_adj score, and name. This is helpful to determine why the OOM killer was invoked, to identify the rogue task that caused it, and to determine why the OOM killer chose the task it did to kill:

  • If this is set to zero, this information is suppressed. On very large systems with thousands of tasks it may not be feasible to dump the memory state information for each one. Such systems should not be forced to incur a performance penalty in OOM conditions when the information may not be desired.
  • If this is set to non-zero, this information is shown whenever the OOM killer actually kills a memory-hogging task.

In Athena we choose to set:

vm.oom_dump_tasks=0

OOM Kill Allocating Task

This enables or disables killing the OOM-triggering task in out-of-memory situations:

  • If this is set to zero, the OOM killer will scan through the entire tasklist and select a task based on heuristics to kill. This normally selects a rogue memory-hogging task that frees up a large amount of memory when killed.
  • If this is set to non-zero, the OOM killer simply kills the task that triggered the out-of-memory condition. This avoids the expensive tasklist scan.
  • If panic_on_oom is selected, it takes precedence over whatever value is used in oom_kill_allocating_task.

In Athena we choose to set:

vm.oom_kill_allocating_task=1

Source: https://askubuntu.com/a/402940

Overcommit Memory

This value contains a flag that enables memory overcommitment:

  • When this flag is 0, the kernel attempts to estimate the amount of free memory left when user-space requests more memory.
  • When this flag is 1, the kernel pretends there is always enough memory until it actually runs out.
  • When this flag is 2, the kernel uses a "never overcommit" policy that attempts to prevent any overcommit of memory. Note that user_reserve_kbytes affects this policy.

This feature can be very useful because there are a lot of programs that malloc() huge amounts of memory "just-in-case" and don’t use much of it.

In Athena we choose to set:

vm.overcommit_memory=1

Source: https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/performance_tuning_guide/sect-red_hat_enterprise_linux-performance_tuning_guide-configuration_tools-configuring_system_memory_capacity

Swappiness

This control is used to define the rough relative IO cost of swapping and filesystem paging, as a value between 0 and 200. At 100, the VM assumes equal IO cost and will thus apply memory pressure to the page cache and swap-backed pages equally; lower values signify more expensive swap IO, higher values indicates cheaper.

Keep in mind that filesystem IO patterns under memory pressure tend to be more efficient than swap’s random IO. An optimal value will require experimentation and will also be workload-dependent.

The default value is 60.

For in-memory swap, like zram or zswap, as well as hybrid setups that have swap on faster devices than the filesystem, values beyond 100 can be considered. For example, if the random IO against the swap device is on average 2x faster than IO from the filesystem, swappiness should be 133 (x + 2x = 200, 2x = 133.33).

At 0, the kernel will not initiate swap until the amount of free and file-backed pages is less than the high watermark in a zone.

Athena uses zram instead of zswap. For this reason, we choose to set:

vm.swappiness=180

as suggested in https://wiki.archlinux.org/title/Zram#Optimizing_swap_on_zram

Source: https://medium.com/for-linux-users/fine-tuning-linux-virtual-memory-c64583cb8fcc

VFS Cache Pressure

This percentage value controls the tendency of the kernel to reclaim the memory which is used for caching of directory and inode objects.

At the default value of vfs_cache_pressure=100 the kernel will attempt to reclaim dentries and inodes at a “fair” rate with respect to pagecache and swapcache reclaim. Decreasing vfs_cache_pressure causes the kernel to prefer to retain dentry and inode caches. When vfs_cache_pressure=0, the kernel will never reclaim dentries and inodes due to memory pressure and this can easily lead to out-of-memory conditions. Increasing vfs_cache_pressure beyond 100 causes the kernel to prefer to reclaim dentries and inodes.

Increasing vfs_cache_pressure significantly beyond 100 may have negative performance impact. Reclaim code needs to take various locks to find freeable directory and inode objects. With vfs_cache_pressure=1000, it will look for ten times more freeable objects than there are.

In Athena we choose to set it with its default value that is:

vm.vfs_cache_pressure=100

Previously the value was set to 50. It caused to invoke kernel's OOM killer during some intense workload (like running LinPEAS), killing process like gnome-shell that caused the user session to be restarted. For this reason, on Athena the default value has been set.

Source: https://wiki.archlinux.org/title/Sysctl#VFS_cache

Notes

In Athena these parameters have been defined in /etc/sysctl.d/98-misc.conf with the following content:

vm.dirty_background_ratio=15
vm.dirty_ratio=40
vm.oom_dump_tasks=0
vm.oom_kill_allocating_task=1
vm.overcommit_memory=1
vm.swappiness=10