OpenVZ Forum


Home » Mailing lists » Devel » [RFC] Control Groups Roadmap ideas
Re: [RFC] Control Groups Roadmap ideas [message #29404 is a reply to message #29398] Sat, 12 April 2008 05:10 Go to previous messageGo to previous message
Balbir Singh is currently offline  Balbir Singh
Messages: 491
Registered: August 2006
Senior Member
On Fri, Apr 11, 2008 at 8:18 PM, Serge E. Hallyn <serue@us.ibm.com> wrote:
>
> Quoting Paul Menage (menage@google.com):
>  > This is a list of some of the sub-projects that I'm planning for
>  > Control Groups, or that I know others are planning on or working on.
>  > Any comments or suggestions are welcome.
>  >
>  >
>  > 1) Stateless subsystems
>  > -----
>  >
>  > This was motivated by the recent "freezer" subsystem proposal, which
>  > included a facility for sending signals to all members of a cgroup.
>  > This wasn't specifically freezer-related, and wasn't even something
>  > that needed particular per-cgroup state - its only state is that set
>  > of processes, which is already tracked by crgoups. So it could
>  > theoretically be mounted on multiple hierarchies at once, and wouldn't
>  > need an entry in the css_set array.
>  >
>  > This would require a few internal plumbing changes in cgroups, in particular:
>  >
>  > - hashing css_set objects based on their cgroups rather than their css pointers
>  > - allowing stateless subsystems to be in multiple hierarchies
>  > - changing the way hierarchy ids are calculated - simply ORing
>  > together the subsystem would no longer work since that could result in
>  > duplicates
>  >
>  > 2) More flexible binding/unbinding/rebinding
>  > -----
>  >
>  > Currently you can only add/remove subsystems to a hierarchy when it
>  > has just a single (root) cgroup. This is a bit inflexible, so I'm
>  > planning to support:
>  >
>  > - adding a subsystem to an existing hierarchy by automatically
>  > creating a subsys state object for the new subsystem for each existing
>  > cgroup in the hierarchy and doing the appropriate
>  > can_attach()/attach_tasks() callbacks for all tasks in the system
>  >
>  > - removing a subsystem from an existing hierarchy by moving all tasks
>  > to that subsystem's root cgroup and destroying the child subsystem
>  > state objects
>  >
>  > - merging two existing hierarchies that have identical cgroup trees
>  >
>  > - (maybe) splitting one hierarchy into two separate hierarchies
>  >
>  > Whether all these operations should be forced through the mount()
>  > system call, or whether they should be done via operations on cgroup
>  > control files, is something I've not figured out yet.
>
>  I'm tempted to ask what the use case is for this (I assume you have one,
>  you don't generally introduce features for no good reason), but it
>  doesn't sound like this would have any performance effect on the general
>  case, so it sounds good.
>
>  I'd stick with mount semantics.  Just
>         mount -t cgroup -o remount,devices,cpu none /devwh"
>  should handle all cases, no?
>
>
>
>  > 3) Subsystem dependencies
>  > -----
>  >
>  > This would be a fairly simple change, essentially allowing one
>  > subsystem to require that it only be mounted on a hierarchy when some
>  > other subsystem was also present. The implementation would probably be
>  > a callback that allows a subsystem to confirm whether it's prepared to
>  > be included in a proposed hierarchy containing a specified subsystem
>  > bitmask; it would be able to prevent the hierarchy from being created
>  > by giving an error return. An example of a use for this would be a
>  > swap subsystem that is mostly independent of the memory controller,
>  > but uses the page-ownership tracking of the memory controller to
>  > determine which cgroup to charge swap pages to. Hence it would require
>  > that it only be mounted on a hierarchy that also included a memory
>  > controller. The memory controller would make no such requirement by
>  > itself, so could be used on its own without the swap controller.
>  >
>  >
>  > 4) Subsystem Inheritance
>  > ------
>  >
>  > This is an idea that I've been kicking around for a while trying to
>  > figure out whether its usefulness is worth the in-kernel complexity,
>  > versus doing it in userspace. It comes from the idea that although
>  > cgroups supports multiple hierarchies so that different subsystems can
>  > see different task groupings, one of the more common uses of this is
>  > (I believe) to support a setup where say we have separate groups A, B
>  > and C for one resource X, but for resource Y we want a group
>  > consisting of A+B+C. E.g. we want individual CPU limits for A, B and
>  > C, but for disk I/O we want them all to share a common limit. This can
>  > be done from userspace by mounting two hierarchies, one for CPU and
>  > one for disk I/O, and creating appropriate groupings, but it could
>  > also be done in the kernel as follows:
>  >
>  > - each subsystem "foo" would have a "foo.inherit" file provided by
>  > (and handled by) cgroups in each group directory
>  >
>  > - setting the foo.inherit flag (i.e. writing 1 to it) would cause
>  > tasks in that cgroup to share the "foo" subsystem state with the
>  > parent cgroup
>  >
>  > - from the subsystem's point of view, it would only need to worry
>  > about its own foo_cgroup objects  and which task was associated with
>  > each object; the subsystem wouldn't need to care about which tasks
>  > were part of each cgroup, and which cgroups were sharing state; that
>  > would all be taken care of by the cgroup framework
>  >
>  > I've mentioned this a couple of times on the containers list as part
>  > of other random discussions; at one point Serge Hallyn expressed some
>  > interest but there's not been much noise about it either way. I
>  > figured I'd include it on this list anyway to see what people think of
>  > it.
>
>  I guess I'm hoping that if libcg goes well then a userspace daemon can
>  do all we need.  Of course the use case I envision is having a container
>  which is locked to some amount of ram, wherein the container admin wants
>  to lock some daemon to a subset of that ram.  If the host admin lets the
>  container admin edit a config file (or talk to a daemon through some
>  sock designated for the container) that will only create a child of the
>  container's cgroup, that's probably great.
>

I thought of doing something like this in libcg (having a daemon and a
client socket interface), but dropped the idea later. When all
controllers support multi-levels well, the plan is to create a
sub-directory in the cgroup hierarchy and give subtree ownership to
the application administrator.

>  So I'm basically being quiet until I see whether libcg will suffice.
>

If you do have any specific requirements, we can cater to them right
now. Please do let us know. The biggest challenge right now is getting
a stable API.

>
>
>  > 5) "procs" control file
>  > -----
>  >
>  > This would be the equivalent of the "tasks" file, but acting/reporting
>  > on entire thread groups. Not sure exactly what the read semantics
>  > should be if a sub-thread of a process is in the cgroup, but not its
>  > thread group leader.
>  >
>  >
>  > 6) Statistics / binary API
>  > ----
>  >
>  > Balaji Rao is working on a generic way to gather per-subsystem
>  > statistics; it would also be interesting to construct an extensible
>  > binary API via taskstats. One possible way to do this (taken from my
>  > email earlier today) would be:
>  >
>  > With the taskstats interface, we could have operations to:
>  >
>  > - describe the API exported by a given subsystem (automatically
>  > generated, based on its registered control files and their access
>  > methods)
>  >
>  > - retrieve a specified set of stats in a binary format
>  >
>  > So as a concrete example, with the memory, cpuacct and cpu subsystems
>  > configured, the reported API might look something like (in pseudo-code
>  > form)
>  >
>  > 0 : memory.usage_in_bytes : u64
>  > 1 : memory.limit_in_bytes : u64
>  > 2 : memory.failcnt : u64
>  > 3 : memory.stat : map
>  > 4 : cpuacct.usage : u64
>  > 5 : cpu.shares : u64
>  > 6 : cpu.rt_runtime_ms : s64
>  > 7 : cpu.stat : map
>  >
>  > This list would be auto-generated by cgroups based on inspection of
>  > the control files.
>  >
>  > The user could then request stats 0, 3 and 7 for a cgroup to get the
>  > memory.usage_in_bytes, memory.stat and cpu.stat statistics.
>  >
>  > The stats could be returned in a binary format; the format for each
>  > individual stat would depend on the type of that stat, and these could
>  > be simply concatenated together.
>  >
>  > A u64 or s64 stat would simply be a 64-bit value in the data stream
>  >
>  > A map stat would be represented as a sequence of 64-bit values,
>  > representing the values in the map. There would be no need to include
>  > the size of the map or the key ordering in the binary format, since
>  > userspace could determine that by reading the ASCII version of the map
>  > control file once at startup.
>  >
>  > So in the case of the request above for stats 0, 3 & 7, the binary
>  > stats stream would be a sequence of 64-bit values consisting of:
>  >
> 
...

 
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Previous Topic: same nfs mount dir in VEs
Next Topic: [RFC][PATCH 0/4] Object creation with a specified id
Goto Forum:
  


Current Time: Sat Sep 14 17:57:34 GMT 2024

Total time taken to generate the page: 0.03753 seconds