OpenVZ Forum


Home » Mailing lists » Devel » Re: [ckrm-tech] [PATCH 7/7] containers (V7): Container interface to nsproxy subsystem
Re: [ckrm-tech] [PATCH 7/7] containers (V7): Container interface to nsproxy subsystem [message #11771 is a reply to message #11749] Thu, 05 April 2007 16:50 Go to previous messageGo to next message
Srivatsa Vaddagiri is currently offline  Srivatsa Vaddagiri
Messages: 241
Registered: August 2006
Senior Member
On Wed, Apr 04, 2007 at 11:57:18AM -0700, Paul Menage wrote:
> > Very true. That's a bug and can be rectified. Atm however in my patch
> > stack, I have dropped this whole find_nsproxy() and instead create a new
> > nsproxy whenever tasks move ..Not the best I agree on long run.
>
> Or even short term, I think - although it won't hurt too much for
> cases where a single process enters a container and all children stay
> in the container, it'll hurt a lot in cases where you ever move the
> contents of one container into another. (Possibly in an orthogonal
> hierarchy).
>
> BTW, what does rcfs do if a subsystem is registered when there are
> already mounted hierarchies? It looks as though it leaves the relevant
> pointers as NULLs.

The approach I am on currently doesnt deal with dynamically loaded
modules ..Partly because it allows subsystem ids to be compile-time
decided and also init_nsproxy.ctlr_data[] can be initialised to default
values at compile time itself.

For ex: init_nsproxy.ctlr_data[CPUSET_ID] can be set to &top_cpuset
at compile time.

If there is demand later, I agree we need to address the shortcoming you
point out ..

> I think you're only right if you're going to overload the nsproxy
> count field as the total refcount, rather than the number of tasks. If
> you do that then you risk having to allocate way more memory than you
> need in any routine that tries to allocate an array with one pointer
> per task in the container (e.g. when reading the tasks file, or moving
> a task into a new cpuset). My patch separates out the refcounts from
> the tasks counts for exactly that reason - there's a per-subsys_state
> refcount which can be cheaply manipulated via the subsystem. (I have a
> version with lower locking requirements that I've not yet posted).

We have been thr' this in a diff thread. I trust we have discussed all
that need to be discussed here?

> > Again note that I am not hell-bent on avoiding a container-like object
> > to store shared state of a group. My main desire was to avoid additional
> > pointer in task_struct and reuse nsproxy if possible. If the grand scheme of
> > things requires a 'struct container' object in addition to reusing ->nsproxy,
> > to store shared state like 'notify_on_release', 'hierarchical information'
> > then I have absolutely no objection.
>
> OK, but at that point you're basically back at my patches, but using
> nsproxy rather than container_group.

Ok ..by posting rcfs patches, I didn't mean to introduce a "yours" and
"mine" rift ..honestly. In fact you would notice that they have your
(sole) copyright still on them! It took me just two days to convert over the
patches to use nsproxy and come up with the rcfs patches and obviously I
couldnt have done that without your excellent patches to start with.

So if there is is larger interest in using nsproxy at some point, I hope
they serve as some reference patches to do the conversion. I would go ahead
and post what I have now (which incorporates several bug fixes) which
could be used as you deem necessary at a later point if/when considering
moving to nsproxy.

> > Why would it mean duplicating code? A generic function which takes a
> > dentry pointer and returns its vfs path will avoid this code
> > duplication?
>
> It's still adding boilerplate (extra pointers, extra /proc files,
> logic to hook them together) to every subsystem that needs this, that
> could be avoided. But I guess this aspect of it isn't a huge issue.

I agree that is not a big issue now (considering there are larger things
to go after!).

> > > > Did you mean to say "when the number of aggregators sharing the same
> > > > container object are more" ?
> > >
> > > Yes. Although having thought about the possibility of null groupings
> > > that I described above, I'm no longer convinced that argument is
> > > valid.
> >
> > I think the null grouping as defined so far is very fuzzy ..How would
> > the kernel use this grouping, which would require reserving N pointers
> > in 'struct container_group'/'struct nsproxy'?
>
> The kernel wouldn't use it, other than to act as an inescapable task
> grouping whose members could always be easily listed from userspace.

I am still trying to come to terms with this null groupings and how they
would be used in real life.

- Can you list a real world use of it?

- If they are "inescapable" task groups, how does the first task enter
such a group, using just the filesystem interface?

- If there is no real kernel use for such groups, can this be
implemented in userspace (user ids, session ids etc)

- If userspace soln is not feasible, why would you want such null
groupings in multiple hierarchies and not in one hierarchy?
If having them in one hierarchy satisfies the requirements,
then we could create empty cpusets at top level (or at an
appropriate subcpuset level) and use them as null groups?
By defining permissionss of the null-group cpuset directories,
we could restrict movement of tasks across groups ?

--
Regards,
vatsa
Re: [ckrm-tech] [PATCH 7/7] containers (V7): Container interface to nsproxy subsystem [message #11774 is a reply to message #11771] Thu, 05 April 2007 17:14 Go to previous messageGo to next message
Srivatsa Vaddagiri is currently offline  Srivatsa Vaddagiri
Messages: 241
Registered: August 2006
Senior Member
On Thu, Apr 05, 2007 at 10:27:52PM +0530, Srivatsa Vaddagiri wrote:
> Ok ..by posting rcfs patches, I didn't mean to introduce a "yours" and
> "mine" rift ..honestly. In fact you would notice that they have your
> (sole) copyright still on them! It took me just two days to convert over the
> patches to use nsproxy and come up with the rcfs patches and obviously I
> couldnt have done that without your excellent patches to start with.

By doing rcfs, I have been able to do a much better review of your
patches than by just browsing thr' the code. I hope you would find usefull some
of the feedback I have sent so far on the container patches ..

--
Regards,
vatsa
Re: [ckrm-tech] [PATCH 7/7] containers (V7): Container interface to nsproxy subsystem [message #11800 is a reply to message #11771] Fri, 06 April 2007 21:54 Go to previous message
Paul Menage is currently offline  Paul Menage
Messages: 642
Registered: September 2006
Senior Member
On 4/5/07, Srivatsa Vaddagiri <vatsa@in.ibm.com> wrote:
>
> The approach I am on currently doesnt deal with dynamically loaded
> modules ..Partly because it allows subsystem ids to be compile-time
> decided

Yes, that part is definitely a good idea, since it removes one of the
potential performance complaints that people have compared to
hard-coded pointers in a structure.

I've reworked my patches to require subsystems to be declared at compile time.

> and also init_nsproxy.ctlr_data[] can be initialised to default
> values at compile time itself.

> Ok ..by posting rcfs patches, I didn't mean to introduce a "yours" and
> "mine" rift ..honestly. In fact you would notice that they have your
> (sole) copyright still on them! It took me just two days to convert over the
> patches to use nsproxy and come up with the rcfs patches and obviously I
> couldnt have done that without your excellent patches to start with.

OK, sorry if I came across as possessive :-) There are definitely some
great ideas in your patches, some of which I've incorporated in my
patches as you'll see when I send them out later this afternoon

>
> I am still trying to come to terms with this null groupings and how they
> would be used in real life.
>
> - Can you list a real world use of it?

As a simple job tracking mechanism, without any other implications.

>
> - If they are "inescapable" task groups, how does the first task enter
> such a group, using just the filesystem interface?

Root would be able to move tasks around between containers, as normal.

>
> - If there is no real kernel use for such groups, can this be
> implemented in userspace (user ids, session ids etc)

Not cleanly. (Multiple jobs with the same user, session ids can be
changed by the user). Currently in the job-control system I'm working
on here, I was tagging any processes introduced in a job with a
job-unique extra gid, so we could identify which job a process was in
by looking at its group list. But that's a bit ugly.

In a more modern kernel we can just use cpusets without bothering to
make distinctions between the memory and cpus in different cpusets,
but it seems ugly to have to use a more heavyweight solution than you
really need.

In practice, this would be more of a toy/example, since anyone doing
job control probably is interested in at least some rudimentary kind
of resource tracking/control.

Paul
Re: [ckrm-tech] [PATCH 7/7] containers (V7): Container interface to nsproxy subsystem [message #18067 is a reply to message #11712] Tue, 03 April 2007 16:19 Go to previous message
Paul Menage is currently offline  Paul Menage
Messages: 642
Registered: September 2006
Senior Member
On 4/3/07, Kirill Korotaev <dev@sw.ru> wrote:
>
> Sounds reasonable.
> Pavel is preparing new RSS patches on top of your patches
> which take into account other comments and Andrew objections.
> Can we cooperate to send it altogher then?

Sure. Do you want to send me any core container changes that you have?

Paul
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers
Re: [ckrm-tech] [PATCH 7/7] containers (V7): Container interface to nsproxy subsystem [message #18071 is a reply to message #18067] Wed, 04 April 2007 08:30 Go to previous message
xemul is currently offline  xemul
Messages: 248
Registered: November 2005
Senior Member
Paul Menage wrote:
> On 4/3/07, Kirill Korotaev <dev@sw.ru> wrote:
>>
>> Sounds reasonable.
>> Pavel is preparing new RSS patches on top of your patches
>> which take into account other comments and Andrew objections.
>> Can we cooperate to send it altogher then?
> 
> Sure. Do you want to send me any core container changes that you have?

Actually I don't have any right now. Everything is built at
the top of 2.6.20 patches you sent.

Maybe you will send me (give an URL) the preliminary patches
for more fresh kernel and I'll check what is still needed
for RSS container?

I remind that I had only two problems with containers:
1. fork hook was called to early - before the new task
   initializes completely
2. early need for rss containers (earlier than initcalls
   are called)

> Paul
> 
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers
Re: [ckrm-tech] [PATCH 7/7] containers (V7): Container interface to nsproxy subsystem [message #18072 is a reply to message #18071] Thu, 05 April 2007 07:06 Go to previous message
Paul Menage is currently offline  Paul Menage
Messages: 642
Registered: September 2006
Senior Member
On 4/4/07, Pavel Emelianov <xemul@sw.ru> wrote:
> Paul Menage wrote:
> > On 4/3/07, Kirill Korotaev <dev@sw.ru> wrote:
> >>
> >> Sounds reasonable.
> >> Pavel is preparing new RSS patches on top of your patches
> >> which take into account other comments and Andrew objections.
> >> Can we cooperate to send it altogher then?
> >
> > Sure. Do you want to send me any core container changes that you have?
>
> Actually I don't have any right now. Everything is built at
> the top of 2.6.20 patches you sent.
>
> Maybe you will send me (give an URL) the preliminary patches
> for more fresh kernel and I'll check what is still needed
> for RSS container?
>
> I remind that I had only two problems with containers:
> 1. fork hook was called to early - before the new task
>    initializes completely

OK, looking at this, it's not clear where the best place to call it
is. "As late as possible" sounds like a reasonable idea, but that
messes up the current support for the nsproxy container subsystem,
which wants to be able to move the current task into a new container
based on nsproxy unsharing, after we've added the new task to its
parent. I can imagine other subsystems maybe wanting to clone the
current container at fork time too.

That sort of implies that we need to split the container fork
mechanism up into two parts, one early to add the refcount to the
parent's container_group, and one late to handle the callbacks if
desired. But that should be pretty straightforward.

> 2. early need for rss containers (earlier than initcalls
>    are called)

Couldn't you copy the way cpuset_init_early() works?

Paul
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers
Re: [ckrm-tech] [PATCH 7/7] containers (V7): Container interface to nsproxy subsystem [message #18075 is a reply to message #18072] Thu, 05 April 2007 07:18 Go to previous message
xemul is currently offline  xemul
Messages: 248
Registered: November 2005
Senior Member
Paul Menage wrote:
> On 4/4/07, Pavel Emelianov <xemul@sw.ru> wrote:
>> Paul Menage wrote:
>> > On 4/3/07, Kirill Korotaev <dev@sw.ru> wrote:
>> >>
>> >> Sounds reasonable.
>> >> Pavel is preparing new RSS patches on top of your patches
>> >> which take into account other comments and Andrew objections.
>> >> Can we cooperate to send it altogher then?
>> >
>> > Sure. Do you want to send me any core container changes that you have?
>>
>> Actually I don't have any right now. Everything is built at
>> the top of 2.6.20 patches you sent.
>>
>> Maybe you will send me (give an URL) the preliminary patches
>> for more fresh kernel and I'll check what is still needed
>> for RSS container?
>>
>> I remind that I had only two problems with containers:
>> 1. fork hook was called to early - before the new task
>>    initializes completely
> 
> OK, looking at this, it's not clear where the best place to call it
> is. "As late as possible" sounds like a reasonable idea, but that
> messes up the current support for the nsproxy container subsystem,
> which wants to be able to move the current task into a new container
> based on nsproxy unsharing, after we've added the new task to its
> parent. I can imagine other subsystems maybe wanting to clone the
> current container at fork time too.
> 
> That sort of implies that we need to split the container fork
> mechanism up into two parts, one early to add the refcount to the
> parent's container_group, and one late to handle the callbacks if
> desired. But that should be pretty straightforward.

Splitting sounds good. I'll try to prepare an appropriate patch.

>> 2. early need for rss containers (earlier than initcalls
>>    are called)
> 
> Couldn't you copy the way cpuset_init_early() works?

I did this in my previous version of rss container,
but I think it can be generalized. This is initialization
code that must not be nice-looking, but if it can be it
should be.

> Paul
> 

_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers
Re: [ckrm-tech] [PATCH 7/7] containers (V7): Container interface to nsproxy subsystem [message #18077 is a reply to message #11766] Thu, 05 April 2007 14:13 Go to previous message
Srivatsa Vaddagiri is currently offline  Srivatsa Vaddagiri
Messages: 241
Registered: August 2006
Senior Member
On Thu, Apr 05, 2007 at 06:13:25PM +0530, Srivatsa Vaddagiri wrote:
> Lets go back to the f_bc example here for a moment. Lets say T1 was in C1 and 
> opened file f1. f1->f_bc points to C1->beancounter.
> 
> T1 moves from C1 -> C2, but f1 is not migrated. 
> C1->beancounter.count stays at 1 (to account for f1->f_bc).

Actually C1->beancounter.count should be at 2 (C1->beancounter and
f1->f_bc are pointing to it).

> File f1 is closed. C1->beancounter.count becomes zero.

C1->beancounter.count should go to 1 ..

> Now user issues rmdir C1. If rmdir finds (after taking manage_mutex that
> is)
> 
> 	- zero tasks in C1
> 	- zero refcount in C1->beancounter

s/zero refcount in C1->beancounter/exactly 1 refcount in C1->beancounter

> why is it not safe to assume that C1->beancounter.count will continue to
> stay zero?

s/zero/at one

> Basically I am struggling to answer "How can a zero refcount (beancounter) 
> object go non-zero when zero tasks are attached to it" ..

s/zero/one and s/non-zero/>1

Essentially bc_subsys->can_attach(struct bean_counter *b) can return
-EBUSY if (atomic_read(&b->count) > 1) ..

-- 
Regards,
vatsa
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers
Previous Topic: Re: network namespace website
Next Topic: Re: [PATCH] net: Add etun driver
Goto Forum:
  


Current Time: Wed May 08 20:37:26 GMT 2024

Total time taken to generate the page: 0.02030 seconds