OpenVZ Forum: Devel » [PATCH 00/29] Rename Containers to Control Groups

Home » Mailing lists » Devel » [PATCH 00/29] Rename Containers to Control Groups

Show: Today's Messages :: Show Polls :: Message Navigator
E-mail to friend

[PATCH 01/29] task containersv11 basic task container framework [message #20092 is a reply to message #20064]

Tue, 11 September 2007 19:52

Paul Menage
Messages: 642
Registered: September 2006

Senior Member

From: Paul Menage <menage@google.com>

Generic Process Control Groups
--------------------------

There have recently been various proposals floating around for
resource management/accounting and other task grouping subsystems in
the kernel, including ResGroups, User BeanCounters, NSProxy
cgroups, and others.  These all need the basic abstraction of being
able to group together multiple processes in an aggregate, in order to
track/limit the resources permitted to those processes, or control
other behaviour of the processes, and all implement this grouping in
different ways.

This patchset provides a framework for tracking and grouping processes
into arbitrary "cgroups" and assigning arbitrary state to those
groupings, in order to control the behaviour of the cgroup as an
aggregate.

The intention is that the various resource management and
virtualization/cgroup efforts can also become task cgroup
clients, with the result that:

- the userspace APIs are (somewhat) normalised

- it's easier to test e.g. the ResGroups CPU controller in
 conjunction with the BeanCounters memory controller, or use either of
them as the resource-control portion of a virtual server system.

- the additional kernel footprint of any of the competing resource
 management systems is substantially reduced, since it doesn't need
 to provide process grouping/containment, hence improving their
 chances of getting into the kernel



This patch:

Add the main task cgroups framework - the cgroup filesystem, and the
basic structures for tracking membership and associating subsystem state
objects to tasks.

Signed-off-by: Paul Menage <menage@google.com>
Cc: Serge E. Hallyn <serue@us.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Jackson <pj@sgi.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 Documentation/cgroups.txt     |  526 ++++++++++++
 include/linux/cgroup.h        |  214 +++++
 include/linux/cgroup_subsys.h |   10 
 include/linux/magic.h            |    1 
 include/linux/sched.h            |   34 
 init/Kconfig                     |    8 
 init/main.c                      |    3 
 kernel/Makefile                  |    1 
 kernel/cgroup.c               | 1199 +++++++++++++++++++++++++++++
 9 files changed, 1995 insertions(+), 1 deletion(-)

diff -puN /dev/null Documentation/cgroups.txt
--- /dev/null
+++ a/Documentation/cgroups.txt
@@ -0,0 +1,526 @@
+				CGROUPS
+				-------
+
+Written by Paul Menage <menage@google.com> based on Documentation/cpusets.txt
+
+Original copyright statements from cpusets.txt:
+Portions Copyright (C) 2004 BULL SA.
+Portions Copyright (c) 2004-2006 Silicon Graphics, Inc.
+Modified by Paul Jackson <pj@sgi.com>
+Modified by Christoph Lameter <clameter@sgi.com>
+
+CONTENTS:
+=========
+
+1. Control Groups
+  1.1 What are cgroups ?
+  1.2 Why are cgroups needed ?
+  1.3 How are cgroups implemented ?
+  1.4 What does notify_on_release do ?
+  1.5 How do I use cgroups ?
+2. Usage Examples and Syntax
+  2.1 Basic Usage
+  2.2 Attaching processes
+3. Kernel API
+  3.1 Overview
+  3.2 Synchronization
+  3.3 Subsystem API
+4. Questions
+
+1. Control Groups
+==========
+
+1.1 What are cgroups ?
+----------------------
+
+Control Groups provide a mechanism for aggregating/partitioning sets of
+tasks, and all their future children, into hierarchical groups with
+specialized behaviour.
+
+Definitions:
+
+A *cgroup* associates a set of tasks with a set of parameters for one
+or more subsystems.
+
+A *subsystem* is a module that makes use of the task grouping
+facilities provided by cgroups to treat groups of tasks in
+particular ways. A subsystem is typically a "resource controller" that
+schedules a resource or applies per-cgroup limits, but it may be
+anything that wants to act on a group of processes, e.g. a
+virtualization subsystem.
+
+A *hierarchy* is a set of cgroups arranged in a tree, such that
+every task in the system is in exactly one of the cgroups in the
+hierarchy, and a set of subsystems; each subsystem has system-specific
+state attached to each cgroup in the hierarchy.  Each hierarchy has
+an instance of the cgroup virtual filesystem associated with it.
+
+At any one time there may be multiple active hierachies of task
+cgroups. Each hierarchy is a partition of all tasks in the system.
+
+User level code may create and destroy cgroups by name in an
+instance of the cgroup virtual file system, specify and query to
+which cgroup a task is assigned, and list the task pids assigned to
+a cgroup. Those creations and assignments only affect the hierarchy
+associated with that instance of the cgroup file system.
+
+On their own, the only use for cgroups is for simple job
+tracking. The intention is that other subsystems hook into the generic
+cgroup support to provide new attributes for cgroups, such as
+accounting/limiting the resources which processes in a cgroup can
+access. For example, cpusets (see Documentation/cpusets.txt) allows
+you to associate a set of CPUs and a set of memory nodes with the
+tasks in each cgroup.
+
+1.2 Why are cgroups needed ?
+----------------------------
+
+There are multiple efforts to provide process aggregations in the
+Linux kernel, mainly for resource tracking purposes. Such efforts
+include cpusets, CKRM/ResGroups, UserBeanCounters, and virtual server
+namespaces. These all require the basic notion of a
+grouping/partitioning of processes, with newly forked processes ending
+in the same group (cgroup) as their parent process.
+
+The kernel cgroup patch provides the minimum essential kernel
+mechanisms required to efficiently implement such groups. It has
+minimal impact on the system fast paths, and provides hooks for
+specific subsystems such as cpusets to provide additional behaviour as
+desired.
+
+Multiple hierarchy support is provided to allow for situations where
+the division of tasks into cgroups is distinctly different for
+different subsystems - having parallel hierarchies allows each
+hierarchy to be a natural division of tasks, without having to handle
+complex combinations of tasks that would be present if several
+unrelated subsystems needed to be forced into the same tree of
+cgroups.
+
+At one extreme, each resource controller or subsystem could be in a
+separate hierarchy; at the other extreme, all subsystems
+would be attached to the same hierarchy.
+
+As an example of a scenario (originally proposed by vatsa@in.ibm.com)
+that can benefit from multiple hierarchies, consider a large
+university server with various users - students, professors, system
+tasks etc. The resource planning for this server could be along the
+following lines:
+
+       CPU :           Top cpuset
+                       /       \
+               CPUSet1         CPUSet2
+                  |              |
+               (Profs)         (Students)
+
+               In addition (system tasks) are attached to topcpuset (so
+               that they can run anywhere) with a limit of 20%
+
+       Memory : Professors (50%), students (30%), system (20%)
+
+       Disk : Prof (50%), students (30%), system (20%)
+
+       Network : WWW browsing (20%), Network File System (60%), others (20%)
+                               / \
+                       Prof (15%) students (5%)
+
+Browsers like firefox/lynx go into the WWW network class, while (k)nfsd go
+into NFS network class.
+
+At the same time firefox/lynx will share an appropriate CPU/Memory class
+depending on who launched it (prof/student).
+
+With the ability to classify tasks differently for different resources
+(by putting those resource subsystems in different hierarchies) then
+the admin can easily set up a script which receives exec notifications
+and depending on who is launching the browser he can
+
+       # echo browser_pid > /mnt/<restype>/<userclass>/tasks
+
+With only a single hierarchy, he now would potentially have to create
+a separate cgroup for every browser launched and associate it with
+approp network and other resource class.  This may lead to
+proliferation of such cgroups.
+
+Also lets say that the administrator would like to give enhanced network
+access temporarily to a student's browser (since it is night and the user
+wants to do online gaming :)  OR give one of the students simulation
+apps enhanced CPU power,
+
+With ability to write pids directly to resource classes, its just a
+matter of :
+
+       # echo pid > /mnt/network/<new_class>/tasks
+       (after some time)
+       # echo pid > /mnt/network/<orig_class>/tasks
+
+Without this ability, he would have to split the cgroup into
+multiple separate ones and then associate the new cgroups with the
+new resource classes.
+
+
+
+1.3 How are cgroups implemented ?
+---------------------------------
+
+Control Groups extends the kernel as follows:
+
+ - Each task in the system has a reference-counted pointer to a
+   css_set.
+
+ - A css_set contains a set of reference-counted pointers to
+   cgroup_subsys_state objects, one for each cgroup subsystem
+   registered in the system. There is no direct link from a task to
+   the cgroup of which it's a member in each hierarchy, but this
+   can be determined by following pointers through the
+   cgroup_subsys_state objects. This is because accessing the
+   subsystem state is something that's expected to happen frequently
+   and in performance-critical code, whereas operations that require a
+   task's actual cgroup assignments (in particular, moving between
+   cgroups) are less common.
+
+ - A cgroup hierarchy filesystem can be mounted  for browsing and
+   manipulation from user space.
+
+ - You

...

[ Show the rest of the message ]

Report message to a moderator

[Message index]

		[PATCH 00/29] Rename Containers to Control Groups By: Paul Menage on Tue, 11 September 2007 19:52
		[PATCH 14/29] add containerstats v3 By: Paul Menage on Tue, 11 September 2007 19:52
		[PATCH 09/29] task containersv11 shared container subsystem group arrays include fix By: Paul Menage on Tue, 11 September 2007 19:52
		[PATCH 16/29] containers implement namespace tracking subsystem By: Paul Menage on Tue, 11 September 2007 19:52
		[PATCH 24/29] memory controller task migration v7 By: Paul Menage on Tue, 11 September 2007 19:53
		[PATCH 26/29] memory controller add per container lru and reclaim v7 fix By: Paul Menage on Tue, 11 September 2007 19:53
		[PATCH 08/29] task containersv11 shared container subsystem group arrays avoid lockdep warning By: Paul Menage on Tue, 11 September 2007 19:52
		[PATCH 20/29] memory controller resource counters v7 fix By: Paul Menage on Tue, 11 September 2007 19:52
		[PATCH 18/29] memory controller add documentation By: Paul Menage on Tue, 11 September 2007 19:52
		[PATCH 03/29] task containersv11 add tasks file interface By: Paul Menage on Tue, 11 September 2007 19:52
		[PATCH 15/29] add containerstats v3 fix By: Paul Menage on Tue, 11 September 2007 19:52
		[PATCH 04/29] task containersv11 add fork exit hooks By: Paul Menage on Tue, 11 September 2007 19:52
		[PATCH 25/29] memory controller add per container lru and reclaim v7 By: Paul Menage on Tue, 11 September 2007 19:53
		[PATCH 02/29] task containersv11 basic task container framework fix By: Paul Menage on Tue, 11 September 2007 19:52
		[PATCH 23/29] memory controller memory accounting v7 By: Paul Menage on Tue, 11 September 2007 19:53
		Re: [PATCH 23/29] memory controller memory accounting v7 By: Peter Zijlstra on Wed, 12 September 2007 20:56
		Re: [PATCH 23/29] memory controller memory accounting v7 By: Balbir Singh on Thu, 13 September 2007 09:49
		Re: [PATCH 23/29] memory controller memory accounting v7 By: Peter Zijlstra on Thu, 13 September 2007 10:18
		Re: [PATCH 23/29] memory controller memory accounting v7 By: Balbir Singh on Thu, 13 September 2007 10:29
		[PATCH 05/29] task containersv11 add container_clone interface By: Paul Menage on Tue, 11 September 2007 19:52
		[PATCH 29/29] memory controller make page_referenced container aware v7 By: Paul Menage on Tue, 11 September 2007 19:53
		[PATCH 10/29] task containersv11 automatic userspace notification of idle containers By: Paul Menage on Tue, 11 September 2007 19:52
		[PATCH 13/29] task containersv11 simple task container debug info subsystem By: Paul Menage on Tue, 11 September 2007 19:52
		[PATCH 06/29] task containersv11 add procfs interface By: Paul Menage on Tue, 11 September 2007 19:52
		[PATCH 22/29] memory controller accounting setup v7 By: Paul Menage on Tue, 11 September 2007 19:53
		[PATCH 27/29] memory controller oom handling v7 By: Paul Menage on Tue, 11 September 2007 19:53
		[PATCH 17/29] containers implement namespace tracking subsystem fix order of container subsystems in By: Paul Menage on Tue, 11 September 2007 19:52
		[PATCH 28/29] memory controller add switch to control what type of pages to limit v7 By: Paul Menage on Tue, 11 September 2007 19:53
		[PATCH 12/29] task containersv11 example cpu accounting subsystem By: Paul Menage on Tue, 11 September 2007 19:52
		[PATCH 07/29] task containersv11 shared container subsystem group arrays By: Paul Menage on Tue, 11 September 2007 19:52
		[PATCH 01/29] task containersv11 basic task container framework By: Paul Menage on Tue, 11 September 2007 19:52
		Re: [PATCH 01/29] task containersv11 basic task container framework By: Paul Jackson on Sun, 30 September 2007 04:40
		Re: [PATCH 01/29] task containersv11 basic task container framework By: Paul Jackson on Sun, 30 September 2007 05:10
		Re: [PATCH 01/29] task containersv11 basic task container framework By: Paul Jackson on Sun, 30 September 2007 05:14
		Re: [PATCH 01/29] task containersv11 basic task container framework By: Paul Menage on Sun, 30 September 2007 07:10
		Re: [PATCH 01/29] task containersv11 basic task container framework By: akpm on Sun, 30 September 2007 09:03
		Re: [PATCH 01/29] task containersv11 basic task container framework By: Paul Jackson on Sun, 30 September 2007 09:15
		Re: [PATCH 01/29] task containersv11 basic task container framework By: akpm on Sun, 30 September 2007 09:29
		Re: [PATCH 01/29] task containersv11 basic task container framework By: Paul Jackson on Sun, 30 September 2007 09:36
		[PATCH 11/29] task containersv11 make cpusets a client of containers By: Paul Menage on Tue, 11 September 2007 19:52
		Re: [PATCH 11/29] task containersv11 make cpusets a client of containers By: Paul Jackson on Sun, 30 September 2007 06:25
		Re: [PATCH 11/29] task containersv11 make cpusets a client of containers By: Paul Menage on Sun, 30 September 2007 07:11
		Re: [PATCH 11/29] task containersv11 make cpusets a client of containers By: Paul Jackson on Sun, 30 September 2007 07:19
		[PATCH 21/29] memory controller containers setup v7 By: Paul Menage on Tue, 11 September 2007 19:53
		[PATCH 19/29] memory controller resource counters v7 By: Paul Menage on Tue, 11 September 2007 19:52

Previous Topic:	[PATCH] Update get_net_ns_by_pid
Next Topic:	[RFC}[PATCH] forced uncharge for successful rmdir.

Goto Forum:

-=] Back to Top [=-

[ Syndicate this forum (XML) ] [

] [

]

Current Time: Sun Jul 05 22:17:41 GMT 2026

Total time taken to generate the page: 1.04247 seconds