OpenVZ Forum


Home » Mailing lists » Devel » [PATCH 0/28] Pid namespaces (two models)
Re: [PATCH 16/28] [FLAT 1/6] Changes in data structures for flat model [message #19041 is a reply to message #18942] Wed, 20 June 2007 17:24 Go to previous messageGo to next message
ebiederm is currently offline  ebiederm
Messages: 1354
Registered: February 2006
Senior Member
Pavel Emelianov <xemul@openvz.org> writes:

> Eric W. Biederman wrote:
>> Pavel Emelianov <xemul@openvz.org> writes:
>> 
>
> [snip]
>
>>>> | --- ./include/linux/sched.h.flatdatast 2007-06-15 15:14:33.000000000 +0400
>>>> | +++ ./include/linux/sched.h	2007-06-15 15:19:14.000000000 +0400
>>>> | @@ -482,7 +482,10 @@ struct signal_struct {
>>>> |  		pid_t session __deprecated;
>>>> |  		pid_t __session;
>>>> |  	};
>>>> | -
>>>> | +#ifdef CONFIG_PID_NS_FLAT
>>>> | +	pid_t vpgrp;
>>>> | +	pid_t vsession;
>>>> | +#endif
>>>> |  	/* boolean value for session group leader */
>>>> |  	int leader;
>>>> | 
>>>> | @@ -944,6 +947,11 @@ struct task_struct {
>>>> |  	unsigned did_exec:1;
>>>> |  	pid_t pid;
>>>> |  	pid_t tgid;
>>>> | +#ifdef CONFIG_PID_NS_FLAT
>>>> | +	/* hash the virtual ids as well */
>>>> | +	pid_t vpid;
>>>> | +	pid_t vtgid;
>>>> | +#endif
>> 
>> Adding vpgrp, vsession, vpid, and vtgid is wrong.
>> 
>> A case can probably be made for caching the common case (users view),
>> but we already have fields for that.
>> 
>> For a global view we must use struct pid *, otherwise we are just asking
>> for trouble.
>
> Nope. If we have global unique numerical pid we're not asking for
> trouble. We're just making kernel work like it always did. Virtual
> pid makes sense *only* when interacting with user level.
>
> Making task->pid virtual is asking for trouble.

I'm not talking about making it virtual.  Virtualization is the wrong
concept.  We aren't virtualizing something that isn't already
virtualized.  We are changing the implementation of an abstraction.

I'm talking about keeping task->pid as what the user space sees.  Full stop.



The only point of even retaining task->pid or any of the other numeric
pid identifiers on struct task_struct, struct signal_struct in data
structures outside of struct pid is as an optimization.  To make it
clearer what we are doing we may want to rename the fields in
task_struct and signal_struct but in no sense can I see use needing
to cache anything except the common case which is what user space
sees.



None of the implementations in the kernel needs a global numeric
identifier for processes.  In all cases a pointer to a struct pid is
just as good from a uniqueness perspective.  And all of the heavy
lifting has already been done, to change the in kernel users to use
struct pid pointers.  There are only a few remaining cases that need to be
touched now and those cases are the cases where semantically things
matter.


There is very much a human interface issue with needing global numeric
process ids (as single dense namespace for things is easier for people
to wrap their heads around).   However we have global numeric ids that
live in struct pid, and show up in the init_pid_namespace.  So that
is not an issue.


Given this conversation I think it is time to investigate if the
optimization of reading pid values from the task struct instead of
struct pid is measurable.

Eric
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers
Re: [PATCH 16/28] [FLAT 1/6] Changes in data structures for flat model [message #19042 is a reply to message #18942] Wed, 20 June 2007 17:36 Go to previous messageGo to next message
ebiederm is currently offline  ebiederm
Messages: 1354
Registered: February 2006
Senior Member
Pavel Emelianov <xemul@openvz.org> writes:

> Eric W. Biederman wrote:
>> Pavel Emelianov <xemul@openvz.org> writes:
>> 
>
> [snip]
>
>>>> | --- ./include/linux/sched.h.flatdatast 2007-06-15 15:14:33.000000000 +0400
>>>> | +++ ./include/linux/sched.h	2007-06-15 15:19:14.000000000 +0400
>>>> | @@ -482,7 +482,10 @@ struct signal_struct {
>>>> |  		pid_t session __deprecated;
>>>> |  		pid_t __session;
>>>> |  	};
>>>> | -
>>>> | +#ifdef CONFIG_PID_NS_FLAT
>>>> | +	pid_t vpgrp;
>>>> | +	pid_t vsession;
>>>> | +#endif
>>>> |  	/* boolean value for session group leader */
>>>> |  	int leader;
>>>> | 
>>>> | @@ -944,6 +947,11 @@ struct task_struct {
>>>> |  	unsigned did_exec:1;
>>>> |  	pid_t pid;
>>>> |  	pid_t tgid;
>>>> | +#ifdef CONFIG_PID_NS_FLAT
>>>> | +	/* hash the virtual ids as well */
>>>> | +	pid_t vpid;
>>>> | +	pid_t vtgid;
>>>> | +#endif
>> 
>> Adding vpgrp, vsession, vpid, and vtgid is wrong.
>> 
>> A case can probably be made for caching the common case (users view),
>> but we already have fields for that.
>> 
>> For a global view we must use struct pid *, otherwise we are just asking
>> for trouble.
>
> Nope. If we have global unique numerical pid we're not asking for
> trouble. We're just making kernel work like it always did. Virtual
> pid makes sense *only* when interacting with user level.
>
> Making task->pid virtual is asking for trouble.

I'm not talking about making it virtual.  Virtualization is the wrong
concept.  We aren't virtualizing something that isn't already
virtualized.  We are changing the implementation of an abstraction.

I'm talking about keeping task->pid as what the user space sees.  Full stop.



The only point of even retaining task->pid or any of the other numeric
pid identifiers on struct task_struct, struct signal_struct in data
structures outside of struct pid is as an optimization.  To make it
clearer what we are doing we may want to rename the fields in
task_struct and signal_struct but in no sense can I see use needing
to cache anything except the common case which is what user space
sees.



None of the implementations in the kernel needs a global numeric
identifier for processes.  In all cases a pointer to a struct pid is
just as good from a uniqueness perspective.  And all of the heavy
lifting has already been done, to change the in kernel users to use
struct pid pointers.  There are only a few remaining cases that need to be
touched now and those cases are the cases where semantically things
matter.


There is very much a human interface issue with needing global numeric
process ids (as single dense namespace for things is easier for people
to wrap their heads around).   However we have global numeric ids that
live in struct pid, and show up in the init_pid_namespace.  So that
is not an issue.


Given this conversation I think it is time to investigate if the
optimization of reading pid values from the task struct instead of
struct pid is measurable.

Eric
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers
Re: [PATCH 13/28] [PREP 13/14] Miscellaneous preparations in pid namespaces [message #19047 is a reply to message #18939] Wed, 20 June 2007 21:10 Go to previous messageGo to next message
Sukadev Bhattiprolu is currently offline  Sukadev Bhattiprolu
Messages: 413
Registered: August 2006
Senior Member
Pavel Emelianov [xemul@openvz.org] wrote:
| The most important one is moving exit_task_namespaces behind exit_notify
| in do_exit() to make it possible to see the task's pid namespace to
| properly notify the parent.

Hmm. I think we tried this once a few months ago and got a crash in nfsd
See http://lkml.org/lkml/2007/1/17/75

            [<c01f6115>] lockd_down+0x125/0x190
            [<c01d26bd>] nfs_free_server+0x6d/0xd0
            [<c01d8e9c>] nfs_kill_super+0xc/0x20
            [<c0161c5d>] deactivate_super+0x7d/0xa0
            [<c0175e0e>] release_mounts+0x6e/0x80
            [<c0175e86>] __put_mnt_ns+0x66/0x80
            [<c0132b3e>] free_nsproxy+0x5e/0x60
                // exit_task_namespaces() after returning from exit_notify()
            [<c011f021>] do_exit+0x791/0x810
            [<c011f0c6>] do_group_exit+0x26/0x70
            [<c0103142>] sysenter_past_esp+0x5f/0x85

exit_notify() sets current->sighand to NULL and I think lockd_down() called
from exit_task_namespaces/__put_mnt_ns() was accesssing current->sighand.

Do your other patches in this set tweak something to prevent it ?

Thats one of the reasons we had to remove pid_ns from nsproxy and use
the pid_ns from pid->upid_list[i]->pid_ns.

Suka
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers
Re: [PATCH 16/28] [FLAT 1/6] Changes in data structures for flat model [message #19049 is a reply to message #19019] Wed, 20 June 2007 10:53 Go to previous messageGo to next message
Pavel Emelianov is currently offline  Pavel Emelianov
Messages: 1149
Registered: September 2006
Senior Member
Eric W. Biederman wrote:
> Pavel Emelianov <xemul@openvz.org> writes:
> 

[snip]

>>> | --- ./include/linux/sched.h.flatdatast 2007-06-15 15:14:33.000000000 +0400
>>> | +++ ./include/linux/sched.h	2007-06-15 15:19:14.000000000 +0400
>>> | @@ -482,7 +482,10 @@ struct signal_struct {
>>> |  		pid_t session __deprecated;
>>> |  		pid_t __session;
>>> |  	};
>>> | -
>>> | +#ifdef CONFIG_PID_NS_FLAT
>>> | +	pid_t vpgrp;
>>> | +	pid_t vsession;
>>> | +#endif
>>> |  	/* boolean value for session group leader */
>>> |  	int leader;
>>> | 
>>> | @@ -944,6 +947,11 @@ struct task_struct {
>>> |  	unsigned did_exec:1;
>>> |  	pid_t pid;
>>> |  	pid_t tgid;
>>> | +#ifdef CONFIG_PID_NS_FLAT
>>> | +	/* hash the virtual ids as well */
>>> | +	pid_t vpid;
>>> | +	pid_t vtgid;
>>> | +#endif
> 
> Adding vpgrp, vsession, vpid, and vtgid is wrong.
> 
> A case can probably be made for caching the common case (users view),
> but we already have fields for that.
> 
> For a global view we must use struct pid *, otherwise we are just asking
> for trouble.

Nope. If we have global unique numerical pid we're not asking for
trouble. We're just making kernel work like it always did. Virtual
pid makes sense *only* when interacting with user level.

Making task->pid virtual is asking for trouble.

> Eric

Pavel
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers
Re: [PATCH 13/28] [PREP 13/14] Miscellaneous preparations in pid namespaces [message #19082 is a reply to message #19047] Fri, 22 June 2007 01:10 Go to previous messageGo to next message
Pavel Emelianov is currently offline  Pavel Emelianov
Messages: 1149
Registered: September 2006
Senior Member
sukadev@us.ibm.com wrote:
> Pavel Emelianov [xemul@openvz.org] wrote:
> | The most important one is moving exit_task_namespaces behind exit_notify
> | in do_exit() to make it possible to see the task's pid namespace to
> | properly notify the parent.
> 
> Hmm. I think we tried this once a few months ago and got a crash in nfsd
> See http://lkml.org/lkml/2007/1/17/75
> 
>             [<c01f6115>] lockd_down+0x125/0x190
>             [<c01d26bd>] nfs_free_server+0x6d/0xd0
>             [<c01d8e9c>] nfs_kill_super+0xc/0x20
>             [<c0161c5d>] deactivate_super+0x7d/0xa0
>             [<c0175e0e>] release_mounts+0x6e/0x80
>             [<c0175e86>] __put_mnt_ns+0x66/0x80
>             [<c0132b3e>] free_nsproxy+0x5e/0x60
>                 // exit_task_namespaces() after returning from exit_notify()
>             [<c011f021>] do_exit+0x791/0x810
>             [<c011f0c6>] do_group_exit+0x26/0x70
>             [<c0103142>] sysenter_past_esp+0x5f/0x85
> 
> exit_notify() sets current->sighand to NULL and I think lockd_down() called
> from exit_task_namespaces/__put_mnt_ns() was accesssing current->sighand.

If sighand is set to NULL and then accessed then how is this related to pid namespace? 

> Do your other patches in this set tweak something to prevent it ?

I think no. I'll check it for my current patches.

> Thats one of the reasons we had to remove pid_ns from nsproxy and use
> the pid_ns from pid->upid_list[i]->pid_ns.
> 
> Suka
> 

_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers
Re: Re: [PATCH 13/28] [PREP 13/14] Miscellaneous preparations in pid namespaces [message #19083 is a reply to message #19047] Fri, 22 June 2007 01:27 Go to previous messageGo to next message
Pavel Emelianov is currently offline  Pavel Emelianov
Messages: 1149
Registered: September 2006
Senior Member
sukadev@us.ibm.com wrote:
> Pavel Emelianov [xemul@openvz.org] wrote:
> | The most important one is moving exit_task_namespaces behind exit_notify
> | in do_exit() to make it possible to see the task's pid namespace to
> | properly notify the parent.
> 
> Hmm. I think we tried this once a few months ago and got a crash in nfsd
> See http://lkml.org/lkml/2007/1/17/75
> 
>             [<c01f6115>] lockd_down+0x125/0x190
>             [<c01d26bd>] nfs_free_server+0x6d/0xd0
>             [<c01d8e9c>] nfs_kill_super+0xc/0x20
>             [<c0161c5d>] deactivate_super+0x7d/0xa0
>             [<c0175e0e>] release_mounts+0x6e/0x80
>             [<c0175e86>] __put_mnt_ns+0x66/0x80
>             [<c0132b3e>] free_nsproxy+0x5e/0x60
>                 // exit_task_namespaces() after returning from exit_notify()
>             [<c011f021>] do_exit+0x791/0x810
>             [<c011f0c6>] do_group_exit+0x26/0x70
>             [<c0103142>] sysenter_past_esp+0x5f/0x85
> 
> exit_notify() sets current->sighand to NULL and I think lockd_down() called
> from exit_task_namespaces/__put_mnt_ns() was accesssing current->sighand.

OK. I've got it. That's easy - no need to give up the nsproxy->pid_ns.
I will show it in the next patches. Hope to send them today.

Thank

> Do your other patches in this set tweak something to prevent it ?
> 
> Thats one of the reasons we had to remove pid_ns from nsproxy and use
> the pid_ns from pid->upid_list[i]->pid_ns.
> 
> Suka
> _______________________________________________
> Containers mailing list
> Containers@lists.linux-foundation.org
> https://lists.linux-foundation.org/mailman/listinfo/containers
> 
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers
Re: [PATCH 13/28] [PREP 13/14] Miscellaneous preparations in pid namespaces [message #19087 is a reply to message #19082] Fri, 22 June 2007 16:32 Go to previous messageGo to next message
Sukadev Bhattiprolu is currently offline  Sukadev Bhattiprolu
Messages: 413
Registered: August 2006
Senior Member
Pavel Emelianov [xemul@openvz.org] wrote:
| sukadev@us.ibm.com wrote:
| > Pavel Emelianov [xemul@openvz.org] wrote:
| > | The most important one is moving exit_task_namespaces behind exit_notify
| > | in do_exit() to make it possible to see the task's pid namespace to
| > | properly notify the parent.
| > 
| > Hmm. I think we tried this once a few months ago and got a crash in nfsd
| > See http://lkml.org/lkml/2007/1/17/75
| > 
| >             [<c01f6115>] lockd_down+0x125/0x190
| >             [<c01d26bd>] nfs_free_server+0x6d/0xd0
| >             [<c01d8e9c>] nfs_kill_super+0xc/0x20
| >             [<c0161c5d>] deactivate_super+0x7d/0xa0
| >             [<c0175e0e>] release_mounts+0x6e/0x80
| >             [<c0175e86>] __put_mnt_ns+0x66/0x80
| >             [<c0132b3e>] free_nsproxy+0x5e/0x60
| >                 // exit_task_namespaces() after returning from exit_notify()
| >             [<c011f021>] do_exit+0x791/0x810
| >             [<c011f0c6>] do_group_exit+0x26/0x70
| >             [<c0103142>] sysenter_past_esp+0x5f/0x85
| > 
| > exit_notify() sets current->sighand to NULL and I think lockd_down() called
| > from exit_task_namespaces/__put_mnt_ns() was accesssing current->sighand.
| 
| If sighand is set to NULL and then accessed then how is this related to pid namespace? 

Switching the order of exit_notify() and exit_task_namespaces() is what
caused the problem when we did it before.

If you exit_task_namespaces() before exit_notify() as the mainline code
does, you won't see this bc nfsd would have freed its super by then.

| 
| > Do your other patches in this set tweak something to prevent it ?
| 
| I think no. I'll check it for my current patches.

Buried in that thread was a test case to repro the problem. Maybe that
will help.

| 
| > Thats one of the reasons we had to remove pid_ns from nsproxy and use
| > the pid_ns from pid->upid_list[i]->pid_ns.
| > 
| > Suka
| > 
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers
Re: [PATCH 0/28] Pid namespaces (two models) [message #19132 is a reply to message #19027] Wed, 27 June 2007 04:01 Go to previous message
Herbert Poetzl is currently offline  Herbert Poetzl
Messages: 239
Registered: February 2006
Senior Member
On Tue, Jun 19, 2007 at 04:00:09PM -0700, Andrew Morton wrote:
> On Fri, 15 Jun 2007 19:55:43 +0400
> Pavel Emelianov <xemul@openvz.org> wrote:
> 
> > Long ago Sukadev and I sent two approaches for pid namespaces - the
> > hierarchical model in which namespaces are nested into each other,
> > and the flat model, where pids have only two values and creation of
> > level 3 namespace is prohibited.
> > 
> > After that I showed that multilevel model introduces a noticeable
> > overhead of approximately 1-2% to kernel standard operations like
> > fork() and getpid(). At the same time flat model showed no performance
> > hit on these tests.
> > 
> > Nevertheless multilevel model is worth living.
> > 
> > This set introduces booth models each under its config option. The
> > set is logically splitted into the following parts:
> 
> Making this configurable sounds like a very bad idea to me, from the
> maintainablility/testability/understandability POV.
> 
> We should just make up our minds and do it one way, do it right?
> 
> I assume that means hierarchical.  
> 
> > The following tests were run:
> > [1] nptl perf test
> > [2] getpid() speed
> > [3] ltp (not for speed, but for kernel API checks)
> > 
> > The testing results summary:
> > * Flat model provides zero overhead in init namespace for all the 
> > tests and less than 7% in the namespace for nptl test only.

why do we see 7% overhead in nptl tests?
any idea what actually causes that?

TIA,
Herbert

> > * Multilevel model provides up to 2% overhead in init namespace and
> > more than 10% for nptl test in the level 2 namespace.
> > 
> 
> So that means we take a 3% hit in these operations when PID_NS_MULTILEVEL
> is enabled but the system isn't using containers at all?
> 
> If so, I'm surprised that the cost is this high.  This should be the first
> thing we should optimise and I bet there's some quicky way of doing it.
> 
> _______________________________________________
> Containers mailing list
> Containers@lists.linux-foundation.org
> https://lists.linux-foundation.org/mailman/listinfo/containers
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers
Previous Topic: [PATCH 1/2] containers: implement subsys->post_clone()
Next Topic: [PATCH 00/17] Pid-NS(V3) Enable multiple pid namespaces
Goto Forum:
  


Current Time: Sat Dec 07 18:56:33 GMT 2024

Total time taken to generate the page: 0.03721 seconds