cgroups
(control groups) have been around for a long time providing varioius functions for resource management and the ability to segregate workloads with their own constraints. cgroups
have provided the basis for container engines such as docker
that has become so aggressively adopted in enterprises over the last few years.cgroups
are enabled on all contemporary linuxes with systemd
being the API to manage them; the system will manage various cgroups
organised in:- slices - see
systemd-cgls
where we will typically see system services under thesystem
slice users services, including the segregation for user sessions in another. Encapsulates scopes and services. - scopes - parent for a logical grouping of units/services which can be managed (killed/stop/resource managed)
- services - logical grouping that provides a service, such as the
sshd
, that are usual started based on configuration in unit files
But how can
cgroup
and resource controls be useful for a developer?It is not uncommon that when debugging some work devs find themselves with runaway processes that can lockup the machine by consuming too much CPU and RAM leading to an unresponsive machine. Fortunately in these situations we can normally
ssh
in from another machine to kill the rogue process - sshd
will be protected from a users runaway process because it lives in a different In such a dev scenario, it would be very useful to resource constrain the debugged process and
cgroups
are perfect for this: it is of course possible to put constraints around the process via ulimit
but there is not the same level of fine grain control.We can control the amount of memory/swap/cpu/io limits etc our process receives so lets go with that.
Before we show how to impose the constraints, We need to understand that there are 2 versions of
cgroups
available in the mainline kernel: version1 and version2. Fedora 28 ships with both but defaults to version1 - this has a subtle issue shown below.cgroups v1
What we'd like to do as a non-privileged user is issue:$ systemd-run --user --unit=dev-limit.service -t -p MemoryMax=64M -p MemorySwapMax=32M -p CPUQuota=50% ...
which will run our process in the user scope with an identifiable name (the unit name) and set mem/swap/cpu limits which will let the system kill the job if it exceeds these constraints.BUT this does NOT work due to restrictions in unsafe delegation of controllers to unprivileged programs in
cgroups v1
.Instead, we need to do the following which explicitly runs the job as the calling user (the dev) and puts the user's slice.
# remove any reference to previously failed item
sudo systemctl reset-failed dev-limit
sudo systemd-run -t \
--slice=user-$(id -u).slice --unit=dev-limit.service \
-p MemoryMax=128M -p MemorySwapMax=32M -p CPUQuota=50% \
--uid $(id -nu) \
"$@"
It'll work but now we need to give escalated access to the dev. sudo systemctl reset-failed dev-limit
sudo systemd-run -t \
--slice=user-$(id -u).slice --unit=dev-limit.service \
-p MemoryMax=128M -p MemorySwapMax=32M -p CPUQuota=50% \
--uid $(id -nu) \
"$@"
cgroups v2
will let us acheive of aim.cgroups v2
To enablecgroups v2
requires kernel boot time flag: systemd.unified_cgroup_hierarchy=1
NB: When enabled current versions of
docker
will fail to run as its built on cgroups v1
.Fedora
Edit/etc/default/grub
and add to the line GRUB_CMDLINE_LINUX=..
after which run (BIOS) grub2-mkconfig -o /boot/grub2/grub.cfg
or (EFI) grub2-mkconfig -o /boot/efi/EFI/fedora/grub.cfg
. If this fails with grub2-editenv: error: environment block too small
simply remove rm /boot/grub2/grubenv
and try again - this file will be regenerated.Raspberry Pi
The additional flagcgroup_enable=memory cgroup_memory=1
needs to be added to /boot/cmdline.txt
to enable memory resource management can confirmed by looking at /proc/cgroups
There's a couple of ways to solve our original problem of resource constraining our process:
within an explicit slice
$ systemd-run --user --slice=foo --scope -p MemoryMax=128M -p MemorySwapMax=64M -p CPUQuota=50% ...
This will creates dynamic slice and scope with a name that that you can track using:$ systemd-cgtop /user.slice/user-$(id -u).slice/user@$(id -u).service/foo.slice
The slices remain after the process has gone and we can add other process to the slice. To remove the slice we can issue systemctl --user stop foo
within an explicit unit
Without --slice
we need to obtain the created scope to track it:$ systemctl --user reset-failed dev-limit
$ systemd-run --user -t --unit=dev-limit.service -p MemoryMax=128M -p MemorySwapMax=64M -p CPUQuota=50% ...
$ systemd-run --user -t --unit=dev-limit.service -p MemoryMax=128M -p MemorySwapMax=64M -p CPUQuota=50% ...
and trackable with the explitc named unit:
$ systemd-cgtop /user.slice/user-$(id -u).slice/user@$(id -u).service/dev-limit.service
Without the named unit the system generates a transient name that we use to track:
$ systemctl --user reset-failed dev-limit
$ systemd-run --user -p MemoryMax=128M -p MemorySwapMax=64M -p CPUQuota=50% ...
Running as unit: run-r45edf2288f1b4a5caae55057343decc9.service
$ systemd-cgtop /user.slice/user-$(id -u).slice/user@$(id -u).service/run-r45edf2288f1b4a5caae55057343decc9.service
$ systemd-run --user -p MemoryMax=128M -p MemorySwapMax=64M -p CPUQuota=50% ...
Running as unit: run-r45edf2288f1b4a5caae55057343decc9.service
$ systemd-cgtop /user.slice/user-$(id -u).slice/user@$(id -u).service/run-r45edf2288f1b4a5caae55057343decc9.service
Verifying constraints within your launched scopes/units can be done via:
systemctl --user show foo.scope (or foo.unit)
Finally, once you have your constrained process running inside a scope or an unit we can further tweak the resource constraints by explcitly reference the container the process runs:
$ systemctl --user set-property run-r45edf2288f1b4a5caae55057343decc9.service MemoryHigh=32M
$ systemctl --user set-property run-rabadb09ecbdc47e18fa64ed504869134.scope MemoryHigh=32M
# adjusting at the scope level affects all its children
$ systemctl --user set-property foo.scope MemoryHigh=32M
It is worthwhile restating that constraints are top down: meaining if a parent slice (for example) has been constrained then any child/grandchild etc will be similarly constrained even if it it requests for a higher constraint.$ systemctl --user set-property run-rabadb09ecbdc47e18fa64ed504869134.scope MemoryHigh=32M
# adjusting at the scope level affects all its children
$ systemctl --user set-property foo.scope MemoryHigh=32M
This leads nice to the superuser's ability to dynamically limit hungry user processes using the same mechansims.
Limiting services
The above discussion is also applicable to system wide services: assuming we want add memory limits to theforked-daapd
service.$ systemctl status forked-daapd
● forked-daapd.service - DAAP/DACP (iTunes) and MPD server, supporting AirPlay and Spotify
Loaded: loaded (/lib/systemd/system/forked-daapd.service; enabled; vendor preset: enabled)
Active: active (running) since Tue 2018-10-23 23:42:29 BST; 1 weeks 1 days ago
Docs: man:forked-daapd(8)
Main PID: 2876 (forked-daapd)
CGroup: /system.slice/forked-daapd.service
└─2876 /usr/sbin/forked-daapd -f
To enable we have to add an override service file:● forked-daapd.service - DAAP/DACP (iTunes) and MPD server, supporting AirPlay and Spotify
Loaded: loaded (/lib/systemd/system/forked-daapd.service; enabled; vendor preset: enabled)
Active: active (running) since Tue 2018-10-23 23:42:29 BST; 1 weeks 1 days ago
Docs: man:forked-daapd(8)
Main PID: 2876 (forked-daapd)
CGroup: /system.slice/forked-daapd.service
└─2876 /usr/sbin/forked-daapd -f
$ vi /etc/systemd/system/forked-daapd.service
.include /lib/systemd/system/forked-daapd.service
[Service]
MemoryMax=256M
MemorySwapMax=8M
# restart
$ systemctl daemon-reload && \
systemctl restart forked-daapd && \
systemctl status forked-daapd
● forked-daapd.service - DAAP/DACP (iTunes) and MPD server, supporting AirPlay and Spotify
Loaded: loaded (/etc/systemd/system/forked-daapd.service; enabled; vendor preset: enabled)
Active: active (running) since Thu 2018-11-01 20:08:37 GMT; 10s ago
Docs: man:forked-daapd(8)
Main PID: 28080 (forked-daapd)
Memory: 28.0M (max: 256.0M swap max: 8.0M)
CGroup: /system.slice/forked-daapd.service
└─28080 /usr/sbin/forked-daapd -f
And we're done.
.include /lib/systemd/system/forked-daapd.service
[Service]
MemoryMax=256M
MemorySwapMax=8M
# restart
$ systemctl daemon-reload && \
systemctl restart forked-daapd && \
systemctl status forked-daapd
● forked-daapd.service - DAAP/DACP (iTunes) and MPD server, supporting AirPlay and Spotify
Loaded: loaded (/etc/systemd/system/forked-daapd.service; enabled; vendor preset: enabled)
Active: active (running) since Thu 2018-11-01 20:08:37 GMT; 10s ago
Docs: man:forked-daapd(8)
Main PID: 28080 (forked-daapd)
Memory: 28.0M (max: 256.0M swap max: 8.0M)
CGroup: /system.slice/forked-daapd.service
└─28080 /usr/sbin/forked-daapd -f
No comments:
Post a Comment