| Home » Mailing lists » Devel » [PATCH 0/16] Pid namespaces Goto Forum:
	|  |  
	| 
		
			| Re: [PATCH 0/16] Pid namespaces [message #19259 is a reply to message #19238] | Tue, 10 July 2007 07:02   |  
			| 
				
				
					|  Pavel Emelianov Messages: 1149
 Registered: September 2006
 | Senior Member |  |  |  
	| sukadev@us.ibm.com wrote:
> I am not able to find a specific patch that this might be in,
> but what happens when the child-reaper of a container exits ?
The init namespace's init becomes this init's namespace's init :)
In other words:
        if (unlikely(tsk == child_reaper(tsk))) {
                if (tsk->nsproxy->pid_ns != &init_pid_ns)
                        tsk->nsproxy->pid_ns->child_reaper = init_pid_ns.child_reaper;
                else
                        panic("Attempted to kill init!");
        }
> Do you terminate all processes in the container ? I thought
> that was discussed earlier and the consensus was to terminate
> all processes in that container and its subordinate containers.
> 
> Is that not the case now ?
That's the case, but this code works without this change. We can do
it later.
> Suka
> 
> Pavel Emelianov [xemul@openvz.org] wrote:
> | This is "submition for inclusion" of hierarchical, not kconfig
> | configurable, zero overheaded ;) pid namespaces.
> | 
> | The overall idea is the following:
> | 
> | The namespace are organized as a tree - once a task is cloned
> | with CLONE_NEWPIDS (yes, I've also switched to it :) the new
> | namespace becomes the parent's child and tasks living in the
> | parent namespace see the tasks from the new one. The numerical
> | ids are used on the kernel-user boundary, i.e. when we export
> | pid to user we show the id, that should be used to address the
> | task in question from the namespace we're exporting this id to.
> | 
> | The main difference from Suka's patches are the following:
> | 
> | 0. Suka's patches change the kernel/pid.c code too heavy.
> |    This set keeps the kernel code look like it was without
> |    the patches. However, this is a minor issue. The major is:
> | 
> | 1. Suka's approach is to remove the notion of the task's 
> |    numerical pid from the kernel at all. The numbers are 
> |    used on the kernel-user boundary or within the kernel but
> |    with the namespace this nr belongs to. This results in 
> |    massive changes of struct's members fro int pid to struct
> |    pid *pid, task->pid becomes the virtual id and so on and
> |    so forth.
> |    My approach is to keep the good old logic in the kernel. 
> |    The task->pid is a global and unique pid, find_pid() finds
> |    the pid by its global id and so on. The virtual ids appear
> |    on the user-kernel boundary only. Thus drivers and other 
> |    kernel code may still be unaware of pids unless they do not
> |    communicate with the userspace and get/put numerical pids.
> | 
> | And some more minor differences:
> | 
> | 2. Suka's patches have the limit of pid namespace nesting. 
> |    My patches do not.
> | 
> | 3. Suka assumes that pid namespace can live without proc mount
> |    and tries to make the code work with pid_ns->proc_mnt change
> |    from NULL to not-NULL from times to times.
> |    My code calls the kern_mount() at the namespace creation and
> |    thus the pid_namespace always works with proc.
> | 
> | There are some small issues that I can describe if someone is
> | interested.
> | 
> | The tests like nptl perf, unixbench spawn, getpid and others
> | didn't reveal any performance degradation in init_namespace
> | with the RHEL5 kernel .config file. I admit, that different
> | .config-s may show that patches hurt the performance, but the
> | intention was *not* to make the kernel work worse with popular
> | distributions.
> | 
> | This set has some ways to move forward, but this is some kind
> | of a core, that do not change the init_pid_namespace behavior
> | (checked with LTP tests) and may require some hacking to do 
> | with the namespaces only.
> | 
> | Patches apply to 2.6.22-rc6-mm1.
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers |  
	|  |  |  
	| 
		
			| Re: [PATCH 4/16] Change data structures for pid namespaces [message #19260 is a reply to message #19239] | Tue, 10 July 2007 07:04   |  
			| 
				
				
					|  Pavel Emelianov Messages: 1149
 Registered: September 2006
 | Senior Member |  |  |  
	| sukadev@us.ibm.com wrote:
> Cedric Le Goater [clg@fr.ibm.com] wrote:
> | Pavel Emelianov wrote:
> | > struct pid_namespace will have the kmem_cache to allocate
> | > the pids from, the parent, as they are hierarchical, and
> | > the level of nesting value.
> | > 
> | > struct pid will have a variable length array of pid_number-s
> | > one for each namespace this pid lives in. The level value
> | > shows the level of the namespace this pid lives in and thus -
> | > the number of elements in the numbers array.
> | > 
> | > Signed-off-by: Pavel Emelianov <xemul@openvz.org>
> | > 
> | > ---
> | > 
> | >  include/linux/init_task.h     |    6 ++++++
> | >  include/linux/pid.h           |    9 +++++++++
> | >  include/linux/pid_namespace.h |    3 +++
> | >  kernel/pid.c                  |    3 ++-
> | >  4 files changed, 20 insertions(+), 1 deletion(-)
> | > 
> | > diff -upr linux-2.6.22-rc4-mm2.orig/include/linux/pid.h linux-2.6.22-rc4-mm2-2/include/linux/pid.h
> | > --- linux-2.6.22-rc4-mm2.orig/include/linux/pid.h	2007-06-14 12:14:29.000000000 +0400
> | > +++ linux-2.6.22-rc4-mm2-2/include/linux/pid.h	2007-07-04 19:00:38.000000000 +0400
> | > @@ -40,6 +40,13 @@ enum pid_type
> | >   * processes.
> | >   */
> | > 
> | > +struct pid_number {
> | > +	/* Try to keep pid_chain in the same cacheline as nr for find_pid */
> | > +	int nr;
> | > +	struct pid_namespace *ns;
> | > +	struct hlist_node pid_chain;
> | > +};
> 
> We meant to go back and look at removing the extra 'struct pid *' we had
> here. Looks like you did that. Cool.
> 
> | > +
> | >  struct pid
> | >  {
> | >  	atomic_t count;
> | > @@ -40,6 +40,8 @@ enum pid_type
> | >  	/* lists of tasks that use this pid */
> | >  	struct hlist_head tasks[PIDTYPE_MAX];
> | >  	struct rcu_head rcu;
> | > +	int level;
> | > +	struct pid_number numbers[1];
> | >  };
> | > 
> | >  extern struct pid init_struct_pid;
> | > diff -upr linux-2.6.22-rc4-mm2.orig/include/linux/pid_namespace.h linux-2.6.22-rc4-mm2-2/include/linux/pid_namespace.h
> | > --- linux-2.6.22-rc4-mm2.orig/include/linux/pid_namespace.h	2007-06-14 12:14:29.000000000 +0400
> | > +++ linux-2.6.22-rc4-mm2-2/include/linux/pid_namespace.h	2007-07-04 19:00:39.000000000 +0400
> | > @@ -16,7 +15,10 @@ struct pidmap {
> | >  	struct kref kref;
> | >  	struct pidmap pidmap[PIDMAP_ENTRIES];
> | >  	int last_pid;
> | > +	int level;
> | >  	struct task_struct *child_reaper;
> | > +	struct kmem_cache *pid_cachep;
> | 
> | so, that looks like a good idea to have the cache in the pidmap. could you 
> | push that independently to see how it all fits together ?
> 
> Yes. I like this idea too.
OK. I will collect the patches you Acked and send them to Andrew independently.
> | 
> | thanks,
> | 
> | C.
> 
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers |  
	|  |  |  
	|  |  
	|  |  
	|  |  
	|  |  
	|  |  
	|  |  
	|  |  
	| 
		
			| Re: [PATCH 0/16] Pid namespaces [message #19278 is a reply to message #19270] | Tue, 10 July 2007 13:03   |  
			| 
				
				
					|  Pavel Emelianov Messages: 1149
 Registered: September 2006
 | Senior Member |  |  |  
	| Daniel Lezcano wrote:
> Pavel Emelianov wrote:
>> Cedric Le Goater wrote:
>>> Badari Pulavarty wrote:
>>>> On Fri, 2007-07-06 at 12:01 +0400, Pavel Emelianov wrote:
>>>>> This is "submition for inclusion" of hierarchical, not kconfig
>>>>> configurable, zero overheaded ;) pid namespaces.
>>>> Not able to boot my ppc64 machine with the patchset :(
>>> I can't boot either on a x86_64 but I don't even have logs to send :(
> 
> Did you tryed to append at the kernel command  line the parameter:
> 
>     earlyprintk=serial,ttyS0,keep
I have already faced the problem, thanks :)
The bad-guy is CONFIG_DMAR. I'll talk to intel developers about this :)
And a couple of patches to fix proc (I'll send them in a moment). i386 works 
without is and all the others crash. Kirill explained me why. It looks like 
a big problem for i386. He told, that during the boot first 32MiBs of memory
are mapped and thus all NULL pointer dereferences succeed. Is it good?
Thanks,
Pavel
>>
>> Neither can I. And I cannot boot clean 2.6.22-rc6-mm1 either :(
>> Does someone already know what the reason is?
>>
>>> C.
>>>
>>
>> _______________________________________________
>> Containers mailing list
>> Containers@lists.linux-foundation.org
>> https://lists.linux-foundation.org/mailman/listinfo/containers
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers |  
	|  |  |  
	|  |  
	| 
		
			| Re: [PATCH 0/16] Pid namespaces [message #19280 is a reply to message #19234] | Tue, 10 July 2007 13:08   |  
			| 
				
				
					|  Pavel Emelianov Messages: 1149
 Registered: September 2006
 | Senior Member |  |  |  
	| sukadev@us.ibm.com wrote:
> Pavel Emelianov [xemul@openvz.org] wrote:
> | This is "submition for inclusion" of hierarchical, not kconfig
> | configurable, zero overheaded ;) pid namespaces.
> | 
> | The overall idea is the following:
> | 
> | The namespace are organized as a tree - once a task is cloned
> | with CLONE_NEWPIDS (yes, I've also switched to it :) the new
> 
> Can you really clone() a pid namespace all by itself ?
> copy_namespaces() has the following:
> 
> 
>         if (!(flags & (CLONE_NEWNS | CLONE_NEWUTS | CLONE_NEWIPC | CLONE_NEWUSER)))
>                 return 0;
> 
> doesn't it mean you cannot create a pid namespace using clone() unless
> one of the above flags are also specified ?
> 
> unshare_nsproxy_namespaces() has the following correct check:
> 
>         if (!(unshare_flags & (CLONE_NEWNS | CLONE_NEWUTS | CLONE_NEWIPC |
>                                CLONE_NEWUSER | CLONE_NEWPIDS)))
>                 return 0;
I have lost a couple of hunks when I splitted the patch :(
That's the correct version, cap_set fix and the renamed CLONE_ flag.
--- ./include/linux/sched.h.fix	2007-07-06 11:09:33.000000000 +0400
+++ ./include/linux/sched.h	2007-07-10 13:48:19.000000000 +0400
@@ -26,7 +26,7 @@
 #define CLONE_NEWUTS		0x04000000	/* New utsname group? */
 #define CLONE_NEWIPC		0x08000000	/* New ipcs */
 #define CLONE_NEWUSER		0x10000000	/* New user namespace */
-#define CLONE_NEWPIDS		0x20000000	/* New pids */
+#define CLONE_NEWPID		0x20000000	/* New pids */
 
 /*
  * Scheduling policies
--- ./kernel/capability.c.fix	2007-07-06 11:09:33.000000000 +0400
+++ ./kernel/capability.c	2007-07-10 13:50:16.000000000 +0400
@@ -103,7 +103,7 @@ static inline int cap_set_pg(int pgrp_nr
 	int found = 0;
 	struct pid *pgrp;
 
-	pgrp = find_pid(pgrp_nr);
+	pgrp = find_pid_ns(pgrp_nr, current->nsproxy->pid_ns);
 	do_each_pid_task(pgrp, PIDTYPE_PGID, g) {
 		target = g;
 		while_each_thread(g, target) {
--- ./kernel/fork.c.fix	2007-07-06 11:09:33.000000000 +0400
+++ ./kernel/fork.c	2007-07-10 13:48:13.000000000 +0400
@@ -1267,7 +1267,7 @@ static struct task_struct *copy_process(
 			__ptrace_link(p, current->parent);
 
 		if (thread_group_leader(p)) {
-			if (clone_flags & CLONE_NEWPIDS) {
+			if (clone_flags & CLONE_NEWPID) {
 				p->nsproxy->pid_ns->child_reaper = p;
 				p->signal->tty = NULL;
 				p->signal->pgrp = p->pid;
@@ -1434,7 +1434,7 @@ long do_fork(unsigned long clone_flags,
 		else
 			p->state = TASK_STOPPED;
 
-		nr = (clone_flags & CLONE_NEWPIDS) ?
+		nr = (clone_flags & CLONE_NEWPID) ?
 			pid_nr_ns(task_pid(p), current->nsproxy->pid_ns) :
 				pid_vnr(task_pid(p));
 
--- ./kernel/nsproxy.c.fix	2007-07-06 11:09:33.000000000 +0400
+++ ./kernel/nsproxy.c	2007-07-10 13:48:13.000000000 +0400
@@ -132,7 +132,8 @@ int copy_namespaces(unsigned long flags,
 
 	get_nsproxy(old_ns);
 
-	if (!(flags & (CLONE_NEWNS | CLONE_NEWUTS | CLONE_NEWIPC | CLONE_NEWUSER)))
+	if (!(flags & (CLONE_NEWNS | CLONE_NEWUTS | CLONE_NEWIPC |
+					CLONE_NEWUSER | CLONE_NEWPID)))
 		return 0;
 
 	if (!capable(CAP_SYS_ADMIN)) {
@@ -184,7 +185,7 @@ int unshare_nsproxy_namespaces(unsigned 
 	int err = 0;
 
 	if (!(unshare_flags & (CLONE_NEWNS | CLONE_NEWUTS | CLONE_NEWIPC |
-			       CLONE_NEWUSER | CLONE_NEWPIDS)))
+			       CLONE_NEWUSER)))
 		return 0;
 
 	if (!capable(CAP_SYS_ADMIN))
--- ./kernel/pid.c.fix	2007-07-06 11:09:33.000000000 +0400
+++ ./kernel/pid.c	2007-07-10 13:48:19.000000000 +0400
@@ -523,7 +523,7 @@ struct pid_namespace *copy_pid_ns(unsign
 	BUG_ON(!old_ns);
 	get_pid_ns(old_ns);
 	new_ns = old_ns;
-	if (!(flags & CLONE_NEWPIDS))
+	if (!(flags & CLONE_NEWPID))
 		goto out;
 
 	new_ns = ERR_PTR(-EINVAL);
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers |  
	|  |  |  
	|  |  
	| 
		
			| Re: [PATCH 0/16] Pid namespaces [message #19303 is a reply to message #19189] | Wed, 11 July 2007 15:14   |  
			| 
				
				
					|  Matt Mackall Messages: 9
 Registered: April 2007
 | Junior Member |  |  |  
	| On Wed, Jul 11, 2007 at 10:39:02AM +0400, Pavel Emelianov wrote:
> Matt Mackall wrote:
> > On Fri, Jul 06, 2007 at 12:01:59PM +0400, Pavel Emelianov wrote:
> >> This is "submition for inclusion" of hierarchical, not kconfig
> >> configurable, zero overheaded ;) pid namespaces.
> > 
> > How big is it?
> 
> Summary: 50 files changed, 877 insertions, 241 deletions. Besides,
> each patch has a diffstat output in the letter.
No, how much impact does it have on the compiled kernel's size?
> > Do I want it on my cell phone or on my wireless router?
> 
> I do not know, but those who do not need the namespaces at
> all won't even notice its presence. I tried to do such.
It seems likely that something that adds a net 600 lines of code will
have a noticeable footprint.
-- 
Mathematics is the supreme nostalgia of our time.
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers |  
	|  |  |  
	| 
		
			| Re: [PATCH 0/16] Pid namespaces [message #19305 is a reply to message #19288] | Wed, 11 July 2007 06:39   |  
			| 
				
				
					|  Pavel Emelianov Messages: 1149
 Registered: September 2006
 | Senior Member |  |  |  
	| Matt Mackall wrote:
> On Fri, Jul 06, 2007 at 12:01:59PM +0400, Pavel Emelianov wrote:
>> This is "submition for inclusion" of hierarchical, not kconfig
>> configurable, zero overheaded ;) pid namespaces.
> 
> How big is it?
Summary: 50 files changed, 877 insertions, 241 deletions. Besides,
each patch has a diffstat output in the letter.
> Do I want it on my cell phone or on my wireless router?
> 
I do not know, but those who do not need the namespaces at
all won't even notice its presence. I tried to do such.
Pavel.
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers |  
	|  |  |  
	| 
		
			| Re: [PATCH 10/16] Changes in copy_process() to work with pid namespaces [message #19317 is a reply to message #19199] | Thu, 12 July 2007 00:21   |  
			| 
				
				
					|  Sukadev Bhattiprolu Messages: 413
 Registered: August 2006
 | Senior Member |  |  |  
	| Pavel Emelianov [xemul@openvz.org] wrote:
| We must pass the namespace pointer to the alloc_pid() to
| show what namespace to allocate the pid from and we should
| call this *after* the namespace is copied.
| 
| Essentially, the task->pid etc initialization is done after
| the alloc_pid().
| 
| To do so I move the alloc_pid() inside copy_process() and
| introduce an argument to the alloc_pid() function.
| 
| Signed-off-by: Pavel Emelianov <xemul@openvz.org>
| 
| ---
| 
|  include/linux/pid.h |    2 +-
|  kernel/fork.c       |   29 +++++++++++++++++------------
|  kernel/pid.c        |    2 +-
|  3 files changed, 19 insertions(+), 14 deletions(-)
| 
| --- ./include/linux/pid.h.ve9	2007-07-06 11:03:55.000000000 +0400
| +++ ./include/linux/pid.h	2007-07-06 11:03:55.000000000 +0400
| @@ -116,7 +116,7 @@ extern struct pid *FASTCALL(find_pid_ns(
|  extern struct pid *find_get_pid(int nr);
|  extern struct pid *find_ge_pid(int nr, struct pid_namespace *);
| 
| -extern struct pid *alloc_pid(void);
| +extern struct pid *alloc_pid(struct pid_namespace *ns);
|  extern void FASTCALL(free_pid(struct pid *pid));
| 
|  /*
| --- ./kernel/fork.c.ve9	2007-07-06 11:03:55.000000000 +0400
| +++ ./kernel/fork.c	2007-07-06 11:04:07.000000000 +0400
| @@ -50,6 +50,7 @@
|  #include <linux/taskstats_kern.h>
|  #include <linux/random.h>
|  #include <linux/tty.h>
| +#include <linux/pid.h>
| 
|  #include <asm/pgtable.h>
|  #include <asm/pgalloc.h>
| @@ -1032,7 +1033,6 @@ static struct task_struct *copy_process(
|  	p->did_exec = 0;
|  	delayacct_tsk_init(p);	/* Must remain after dup_task_struct() */
|  	copy_flags(clone_flags, p);
| -	p->pid = pid_nr(pid);
|  	INIT_LIST_HEAD(&p->children);
|  	INIT_LIST_HEAD(&p->sibling);
|  	p->vfork_done = NULL;
| @@ -1107,10 +1107,6 @@ static struct task_struct *copy_process(
|  	p->blocked_on = NULL; /* not blocked yet */
|  #endif
| 
| -	p->tgid = p->pid;
| -	if (clone_flags & CLONE_THREAD)
| -		p->tgid = current->tgid;
| -
|  	if ((retval = security_task_alloc(p)))
|  		goto bad_fork_cleanup_policy;
|  	if ((retval = audit_alloc(p)))
| @@ -1132,9 +1128,14 @@ static struct task_struct *copy_process(
|  		goto bad_fork_cleanup_mm;
|  	if ((retval = copy_namespaces(clone_flags, p)))
|  		goto bad_fork_cleanup_keys;
| +	if (likely(pid == NULL)) {
| +		pid = alloc_pid(p->nsproxy->pid_ns);
| +		if (pid == NULL)
| +			goto bad_fork_cleanup_namespaces;
| +	}
|  	retval = copy_thread(0, clone_flags, stack_start, stack_size, p, regs);
|  	if (retval)
| -		goto bad_fork_cleanup_namespaces;
| +		goto bad_fork_cleanup_pid;
| 
|  	p->set_child_tid = (clone_flags & CLONE_CHILD_SETTID) ? child_tidptr : NULL;
|  	/*
| @@ -1255,6 +1256,11 @@ static struct task_struct *copy_process(
|  		}
|  	}
| 
| +	p->pid = pid_nr(pid);
| +	p->tgid = p->pid;
| +	if (clone_flags & CLONE_THREAD)
| +		p->tgid = current->tgid;
| +
|  	if (likely(p->pid)) {
|  		add_parent(p);
|  		if (unlikely(p->ptrace & PT_PTRACED))
| @@ -1288,6 +1294,8 @@ static struct task_struct *copy_process(
|  	proc_fork_connector(p);
|  	return p;
| 
| +bad_fork_cleanup_pid:
| +	free_pid(pid);
If copy_process() was called from fork_idle(), pid would refer
to &init_struct_pid. Don't you need something like:
	if (pid == &init_struct_pid)
		return;
in free_pid() to not do anything in that case ?
|  bad_fork_cleanup_namespaces:
|  	exit_task_namespaces(p);
|  bad_fork_cleanup_keys:
| @@ -1380,19 +1388,16 @@ long do_fork(unsigned long clone_flags,
|  {
|  	struct task_struct *p;
|  	int trace = 0;
| -	struct pid *pid = alloc_pid();
|  	long nr;
| 
| -	if (!pid)
| -		return -EAGAIN;
| -	nr = pid->nr;
|  	if (unlikely(current->ptrace)) {
|  		trace = fork_traceflag (clone_flags);
|  		if (trace)
|  			clone_flags |= CLONE_PTRACE;
|  	}
| 
| -	p = copy_process(clone_flags, stack_start, regs, stack_size, parent_tidptr, child_tidptr, pid);
| +	p = copy_process(clone_flags, stack_start, regs, stack_size,
| +			parent_tidptr, child_tidptr, NULL);
|  	/*
|  	 * Do this prior waking up the new thread - the thread pointer
|  	 * might get invalid after that point, if the thread exits quickly.
| @@ -1418,6 +1423,7 @@ long do_fork(unsigned long clone_flags,
|  		else
|  			p->state = TASK_STOPPED;
| 
| +		nr = pid_vnr(task_pid(p));
|  		if (unlikely (trace)) {
|  			current->ptrace_message = nr;
|  			ptrace_notify ((trace << 8) | SIGTRAP);
| @@ -1433,7 +1439,6 @@ long do_fork(unsigned long clone_flags,
|  			}
|  		}
|  	} else {
| -		free_pid(pid);
|  		nr = PTR_ERR(p);
|  	}
|  	return nr;
| --- ./kernel/pid.c.ve9	2007-07-06 11:03:55.000000000 +0400
| +++ ./kernel/pid.c	2007-07-06 11:03:55.000000000 +0400
| @@ -206,7 +206,7 @@ fastcall void free_pid(struct pid *pid)
|  	call_rcu(&pid->rcu, delayed_put_pid);
|  }
| 
| -struct pid *alloc_pid(void)
| +struct pid *alloc_pid(struct pid_namespace *pid_ns)
|  {
|  	struct pid *pid;
|  	enum pid_type type;
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers |  
	|  |  |  
	| 
		
			| Re: [PATCH 0/16] Pid namespaces [message #19320 is a reply to message #19280] | Thu, 12 July 2007 03:19   |  
			| 
				
				
					|  Sukadev Bhattiprolu Messages: 413
 Registered: August 2006
 | Senior Member |  |  |  
	| Pavel Emelianov [xemul@openvz.org] wrote:
| sukadev@us.ibm.com wrote:
| > Pavel Emelianov [xemul@openvz.org] wrote:
| > | This is "submition for inclusion" of hierarchical, not kconfig
| > | configurable, zero overheaded ;) pid namespaces.
| > | 
| > | The overall idea is the following:
| > | 
| > | The namespace are organized as a tree - once a task is cloned
| > | with CLONE_NEWPIDS (yes, I've also switched to it :) the new
| > 
| > Can you really clone() a pid namespace all by itself ?
| > copy_namespaces() has the following:
| > 
| > 
| >         if (!(flags & (CLONE_NEWNS | CLONE_NEWUTS | CLONE_NEWIPC | CLONE_NEWUSER)))
| >                 return 0;
| > 
| > doesn't it mean you cannot create a pid namespace using clone() unless
| > one of the above flags are also specified ?
| > 
| > unshare_nsproxy_namespaces() has the following correct check:
| > 
| >         if (!(unshare_flags & (CLONE_NEWNS | CLONE_NEWUTS | CLONE_NEWIPC |
| >                                CLONE_NEWUSER | CLONE_NEWPIDS)))
| >                 return 0;
| 
| I have lost a couple of hunks when I splitted the patch :(
| That's the correct version, cap_set fix and the renamed CLONE_ flag.
| 
| --- ./include/linux/sched.h.fix	2007-07-06 11:09:33.000000000 +0400
| +++ ./include/linux/sched.h	2007-07-10 13:48:19.000000000 +0400
| @@ -26,7 +26,7 @@
|  #define CLONE_NEWUTS		0x04000000	/* New utsname group? */
|  #define CLONE_NEWIPC		0x08000000	/* New ipcs */
|  #define CLONE_NEWUSER		0x10000000	/* New user namespace */
| -#define CLONE_NEWPIDS		0x20000000	/* New pids */
| +#define CLONE_NEWPID		0x20000000	/* New pids */
| 
|  /*
|   * Scheduling policies
| --- ./kernel/capability.c.fix	2007-07-06 11:09:33.000000000 +0400
| +++ ./kernel/capability.c	2007-07-10 13:50:16.000000000 +0400
| @@ -103,7 +103,7 @@ static inline int cap_set_pg(int pgrp_nr
|  	int found = 0;
|  	struct pid *pgrp;
| 
| -	pgrp = find_pid(pgrp_nr);
| +	pgrp = find_pid_ns(pgrp_nr, current->nsproxy->pid_ns);
|  	do_each_pid_task(pgrp, PIDTYPE_PGID, g) {
|  		target = g;
|  		while_each_thread(g, target) {
| --- ./kernel/fork.c.fix	2007-07-06 11:09:33.000000000 +0400
| +++ ./kernel/fork.c	2007-07-10 13:48:13.000000000 +0400
| @@ -1267,7 +1267,7 @@ static struct task_struct *copy_process(
|  			__ptrace_link(p, current->parent);
| 
|  		if (thread_group_leader(p)) {
| -			if (clone_flags & CLONE_NEWPIDS) {
| +			if (clone_flags & CLONE_NEWPID) {
|  				p->nsproxy->pid_ns->child_reaper = p;
|  				p->signal->tty = NULL;
|  				p->signal->pgrp = p->pid;
| @@ -1434,7 +1434,7 @@ long do_fork(unsigned long clone_flags,
|  		else
|  			p->state = TASK_STOPPED;
| 
| -		nr = (clone_flags & CLONE_NEWPIDS) ?
| +		nr = (clone_flags & CLONE_NEWPID) ?
|  			pid_nr_ns(task_pid(p), current->nsproxy->pid_ns) :
|  				pid_vnr(task_pid(p));
| 
| --- ./kernel/nsproxy.c.fix	2007-07-06 11:09:33.000000000 +0400
| +++ ./kernel/nsproxy.c	2007-07-10 13:48:13.000000000 +0400
| @@ -132,7 +132,8 @@ int copy_namespaces(unsigned long flags,
| 
|  	get_nsproxy(old_ns);
| 
| -	if (!(flags & (CLONE_NEWNS | CLONE_NEWUTS | CLONE_NEWIPC | CLONE_NEWUSER)))
| +	if (!(flags & (CLONE_NEWNS | CLONE_NEWUTS | CLONE_NEWIPC |
| +					CLONE_NEWUSER | CLONE_NEWPID)))
|  		return 0;
| 
|  	if (!capable(CAP_SYS_ADMIN)) {
| @@ -184,7 +185,7 @@ int unshare_nsproxy_namespaces(unsigned 
|  	int err = 0;
| 
|  	if (!(unshare_flags & (CLONE_NEWNS | CLONE_NEWUTS | CLONE_NEWIPC |
| -			       CLONE_NEWUSER | CLONE_NEWPIDS)))
| +			       CLONE_NEWUSER)))
|  		return 0;
| 
|  	if (!capable(CAP_SYS_ADMIN))
| --- ./kernel/pid.c.fix	2007-07-06 11:09:33.000000000 +0400
| +++ ./kernel/pid.c	2007-07-10 13:48:19.000000000 +0400
| @@ -523,7 +523,7 @@ struct pid_namespace *copy_pid_ns(unsign
|  	BUG_ON(!old_ns);
|  	get_pid_ns(old_ns);
|  	new_ns = old_ns;
| -	if (!(flags & CLONE_NEWPIDS))
| +	if (!(flags & CLONE_NEWPID))
|  		goto out;
| 
|  	new_ns = ERR_PTR(-EINVAL);
I applied the above patch and the following patch to your
16 patches you first sent.
--- ./fs/proc/root.c.procfix    2007-07-10 13:52:08.000000000 +0400
+++ ./fs/proc/root.c    2007-07-10 15:23:20.000000000 +0400
@@ -111,7 +111,7 @@ void __init proc_root_init(void)
        err = register_filesystem(&proc_fs_type);
        if (err)
                return;
-       proc_mnt = kern_mount(&proc_fs_type);
+       proc_mnt = kern_mount_data(&proc_fs_type, &init_pid_ns);
My x86_64 system boots fine but crashes as below, when I run my
'pidns_exec' test with a simple program that prints getpid(), getppid()
etc of the process in the child pid ns.
Pls see 
http://www.geocities.com/sukadevb/Pidspace/2.6.22-rc6-mm1-pavel-1.tgz
for the patches I currently have applied and let me know if I need more
on top.
And see
http://www.geocities.com/sukadevb/Pidspace/Test1/
for the test programs. You may need to run the 'mypid-loop.x' script
to repro the crash. The pidns_exec.c program calls clone() with CLONE_NEWPID
and execs the given program (it was included in Patch 0 of the patchset I
posted to Containers).
Suka
login: Unable to handle kernel NULL pointer dereference at 00000000000002fc RIP:
 [<ffffffff802b9e5e>] proc_get_sb+0xfb/0x138
PGD 104d53067 PUD 104d4d067 PMD 0
Oops: 0002 [1] SMP
CPU 2
Modules linked in:
Pid: 3279, comm: pidns_exec Not tainted 2.6.22-rc6-mm1-ovz1 #10
RIP: 0010:[<ffffffff802b9e5e>]  [<ffffffff802b9e5e>] proc_get_sb+0xfb/0x138
RSP: 0018:ffff8101029d7d28  EFLAGS: 00010202
RAX: ffff810100651840 RBX: ffff810104461400 RCX: ffff810100651878
RDX: 0000000000000000 RSI: ffffffff806e5460 RDI: 0000000000000238
RBP: ffff810102d886f8 R08: ffff810104461400 R09: ffff810100026000
R10: 0000000000000000 R11: 0000000000000002 R12: ffff8101029c6000
R13: 0000000002000000 R14: ffffffff806ee920 R15: ffff810102088cc0
FS:  00002b0b499ec6f0(0000) GS:ffff81010069c3c0(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00000000000002fc CR3: 000000010381b000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process pidns_exec (pid: 3279, threadinfo ffff8101029d6000, task ffff81010269e7f0)
Stack:  ffff810102088cc0 ffff810102088cc0 00000000fffffff4 ffffffff806ee920
 ffffffff8065f9d9 ffff8101029c6000 0000000002000000 ffffffff80287164
 00000000000000d0 ffff8101029c6000 ffffffff806e5460 ffff8101029c6000
Call Trace:
 [<ffffffff80287164>] vfs_kern_mount+0x4f/0x8b
 [<ffffffff802b9cf4>] pid_ns_prepare_proc+0x13/0x2e
 [<ffffffff80245be3>] copy_pid_ns+0xd7/0x164
 [<ffffffff8024af34>] create_new_namespaces+0xde/0x192
 [<ffffffff8024b0aa>] copy_namespaces+0x4b/0x85
 [<ffffffff802347e2>] copy_process+0xcb4/0x1439
 [<ffffffff8020bbee>] system_call+0x7e/0x83
 [<ffffffff8023556a>] do_fork+0x6c/0x1e7
 [<ffffffff8020bf07>] ptregscall_common+0x67/0xb0
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers |  
	|  |  |  
	| 
		
			| Re: [PATCH 0/16] Pid namespaces [message #19324 is a reply to message #19320] | Thu, 12 July 2007 18:55   |  
			| 
				
				
					|  Badari Pulavarty Messages: 15
 Registered: September 2006
 | Junior Member |  |  |  
	| On Wed, 2007-07-11 at 20:19 -0700, sukadev@us.ibm.com wrote:
> Pavel Emelianov [xemul@openvz.org] wrote:
> | sukadev@us.ibm.com wrote:
> |..
> My x86_64 system boots fine but crashes as below, when I run my
> 'pidns_exec' test with a simple program that prints getpid(), getppid()
> etc of the process in the child pid ns.
> 
> Pls see 
> 
> http://www.geocities.com/sukadevb/Pidspace/2.6.22-rc6-mm1-pavel-1.tgz
> 
> for the patches I currently have applied and let me know if I need more
> on top.
> 
> And see
> 
> http://www.geocities.com/sukadevb/Pidspace/Test1/
> 
> for the test programs. You may need to run the 'mypid-loop.x' script
> to repro the crash. The pidns_exec.c program calls clone() with CLONE_NEWPID
> and execs the given program (it was included in Patch 0 of the patchset I
> posted to Containers).
> 
> Suka
> 
> login: Unable to handle kernel NULL pointer dereference at 00000000000002fc RIP:
Yes. I am noticing different set of problems on my ppc64 with above
tests and patchset.
Thanks,
Badari
Not cloning container for unused subsystem ns
Not cloning container for unused subsystem ns
Not cloning container for unused subsystem ns
Not cloning container for unused subsystem ns
Not cloning container for unused subsystem ns
Not cloning container for unused subsystem ns
Not cloning container for unused subsystem ns
Not cloning container for unused subsystem ns
------------[ cut here ]------------
kernel BUG at kernel/workqueue.c:258!
Oops: Exception in kernel mode, sig: 5 [#11]
SMP NR_CPUS=32 NUMA pSeries
Modules linked in:
NIP: c000000000062e6c LR: c000000000062e54 CTR: 0000000000000000
REGS: c00000000d157af0 TRAP: 0700   Tainted: G      D  (2.6.22-rc6-mm1)
MSR: 8000000000029032 <EE,ME,IR,DR>  CR: 24000084  XER: 2000000f
TASK = c00000000d0f7460[33] 'events/6' THREAD: c00000000d154000 CPU: 6
GPR00: 0000000000000001 c00000000d157d70 c00000000060fed8 0000000000000001
GPR04: c0000000005c2938 0000000000000000 c000000000064438 c00000000d005600
GPR08: c000000000684f20 7ffffffeffffffff 0000000003506570 c0000000f204fe88
GPR12: 0000000000004000 c000000000529980 0000000000000000 c00000000045e928
GPR16: 4000000001c00000 c00000000045d340 0000000000000000 0000000000000000
GPR20: c00000000050e1d8 000000000210e1d8 000000000210e448 c00000000050e448
GPR24: 0000000001a1fb80 c00000000050c2d8 c00000000d154000 c0000000005c2968
GPR28: c0000000f204fe80 c0000000f2665008 c000000000540eb8 c0000000f2665000
NIP [c000000000062e6c] .run_workqueue+0xe0/0x208
LR [c000000000062e54] .run_workqueue+0xc8/0x208
Call Trace:
[c00000000d157d70] [c000000000062ea4] .run_workqueue+0x118/0x208 (unreliable)
[c00000000d157e10] [c0000000000639bc] .worker_thread+0x114/0x138
[c00000000d157f00] [c000000000068ac8] .kthread+0x78/0xc4
[c00000000d157f90] [c0000000000231b4] .kernel_thread+0x4c/0x68
Instruction dump:
980d01cc 7c2004ac 38000000 38600001 901c0000 4bfa80f9 60000000 e81dfff8
78000764 7c00e278 3120ffff 7c090110 <0b000000> 38000001 7d20f8a8 7d290078
Not cloning container for unused subsystem ns
Not cloning container for unused subsystem ns
Unable to handle kernel paging request for data at address 0x00000008
Faulting instruction address: 0xc0000000000d9990
Oops: Kernel access of bad area, sig: 11 [#12]
SMP NR_CPUS=32 NUMA pSeries
Modules linked in:
NIP: c0000000000d9990 LR: c0000000000d9a28 CTR: c000000000048060
REGS: c00000000de9b870 TRAP: 0300   Tainted: G      D  (2.6.22-rc6-mm1)
MSR: 8000000000009032 <EE,ME,IR,DR>  CR: 24000428  XER: 2000000f
DAR: 0000000000000008, DSISR: 0000000042000000
TASK = c00000000d15cfa0[4871] 'pidns_exec' THREAD: c00000000de98000 CPU: 2
GPR00: c0000000000d9a28 c00000000de9baf0 c00000000060fed8 0000000000000000
GPR04: c00000000065d708 c0000000100b3f00 c0000000000d8eac c000000000631980
GPR08: c0000000100b3f00 0000000000000000 c000000000966158 c00000000de9bc00
GPR12: 0000000000000001 c000000000529180 00000000ffa0d094 0000000000000000
GPR16: 00000000100a0000 00000000100a92c8 00000000100a0000 0000000010080000
GPR20: 0000000000000000 0000000000000000 0000000000000000 00000000100a9568
GPR24: 00000000f7fddb60 c00000000065e2c8 c00000000de9bc00 c000000000634500
GPR28: 0000000000000000 c00000000d27a200 c00000000055ec28 c00000000d27a200
NIP [c0000000000d9990] .release_mounts+0x3c/0x108
LR [c0000000000d9a28] .release_mounts+0xd4/0x108
Call Trace:
[c00000000de9baf0] [c0000000000d9a28] .release_mounts+0xd4/0x108 (unreliable)
[c00000000de9bb90] [c0000000000d9b38] .__put_mnt_ns+0xdc/0x114
[c00000000de9bc50] [c00000000006cd74] .free_nsproxy+0x54/0xe8
[c00000000de9bce0] [c00000000005272c] .do_exit+0x948/0xa08
[c00000000de9bda0] [c0000000000528c0] .sys_exit_group+0x0/0x8
[c00000000de9be30] [c00000000000872c] syscall_exit+0x0/0x40
Instruction dump:
fb61ffd8 fb81ffe0 fba1ffe8 fbe1fff8 f8010010 f821ff61 ebc2c998 7c7a1b78
480000a8 e97f0008 e93f0000 f92b0000 <f9690008> fbff0000 fbff0008 e81f0010
Not cloning container for unused subsystem ns
Not cloning container for unused subsystem ns
Not cloning container for unused subsystem ns
Not cloning container for unused subsystem ns
Unable to handle kernel paging request for data at address 0x00000020
Faulting instruction address: 0xc000000000051e88
Oops: Kernel access of bad area, sig: 11 [#13]
SMP NR_CPUS=32 NUMA pSeries
Modules linked in:
NIP: c000000000051e88 LR: c000000000051e1c CTR: 800000000013f270
REGS: c00000000de9b3a0 TRAP: 0300   Tainted: G      D  (2.6.22-rc6-mm1)
MSR: 8000000000009032 <EE,ME,IR,DR>  CR: 28000444  XER: 00000002
DAR: 0000000000000020, DSISR: 0000000040000000
TASK = c00000000d15cfa0[4871] 'pidns_exec' THREAD: c00000000de98000 CPU: 2
GPR00: 0000000000000000 c00000000de9b620 c00000000060fed8 0000000000000000
GPR04: 0000000000000000 c00000000d15cfa0 ffffffffffffffff 0000000000000000
GPR08: 0000000000000000 0000000000000000 c00000000063e580 c00000000063e580
GPR12: 0000000000004000 c000000000529180 00000000ffa0d094 0000000000000000
GPR16: 00000000100a0000 00000000100a92c8 00000000100a0000 0000000010080000
GPR20: 0000000000000000 0000000000000000 0000000000000000 00000000100a9568
GPR24: 00000000f7fddb60 c00000000065e2c8 000000000000000b 000000000000000b
GPR28: c00000000045fad8 c00000000d15cfa0 c000000000540998 0000000000000001
NIP [c000000000051e88] .do_exit+0xa4/0xa08
LR [c000000000051e1c] .do_exit+0x38/0xa08
Call Trace:
[c00000000de9b620] [c000000000051e1c] .do_exit+0x38/0xa08 (unreliable)
[c00000000de9b6e0] [c000000000021478] .die+0x20c/0x210
[c00000000de9b780] [c00000000002971c] .bad_page_fault+0xb8/0xd4
[c00000000de9b800] [c000000000004e98] handle_page_fault+0x3c/0x58
--- Exception: 300 at .release_mounts+0x3c/0x108
    LR = .release_mounts+0xd4/0x108
[c00000000de9bb90] [c0000000000d9b38] .__put_mnt_ns+0xdc/0x114
[c00000000de9bc50] [c00000000006cd74] .free_nsproxy+0x54/0xe8
[c00000000de9bce0] [c00000000005272c] .do_exit+0x948/0xa08
[c00000000de9bda0] [c0000000000528c0] .sys_exit_group+0x0/0x8
[c00000000de9be30] [c00000000000872c] syscall_exit+0x0/0x40
Instruction dump:
801d0198 2f800000 40be0010 e87e8050 4bffb5c9 60000000 e80d01a0 7fa00278
3120ffff 7c090110 0b000000 e93d04a8 <e9290020> e8090828 7fbd0000 40be0048
Not cloning container for unused subsystem ns
Not cloning container for unused subsystem ns
Not cloning container for unused subsystem ns
Not cloning container for unused subsystem ns
Not cloning container for unused subsystem ns
Not cloning container for unused subsystem ns
Unable to handle kernel paging request for data at address 0x00000020
Faulting instruction address: 0xc000000000051e88
Oops: Kernel access of bad area, sig: 11 [#14]
SMP NR_CPUS=32 NUMA pSeries
Modules linked in:
NIP: c000000000051e88 LR: c000000000051e1c CTR: 800000000013f270
REGS: c00000000de9aed0 TRAP: 0300   Tainted: G      D  (2.6.22-rc6-mm1)
MSR: 8000000000009032 <EE,ME,IR,DR>  CR: 28000444  XER: 00000002
DAR: 0000000000000020, DSISR: 0000000040000000
TASK = c00000000d15cfa0[4871] 'pidns_exec' THREAD: c00000000de98000 CPU: 2
GPR00: 0000000000000000 c00000000de9b150 c00000000060fed8 0000000000000000
GPR04: 0000000000000000 c00000000d15cfa0 ffffffffffffffff 0000000000000000
GPR08: 0000000000000000 0000000000000000 c00000000063e580 c00000000063e580
GPR12: 0000000000004000 c000000000529180 00000000ffa0d094 0000000000000000
GPR16: 00000000100a0000 00000000100a92c8 00000000100a0000 0000000010080000
GPR20: 0000000000000000 0000000000000000 0000000000000000 00000000100a9568
GPR24: 00000000f7fddb60 c00000000065e2c8 000000000000000b 000000000000000b
GPR28: c00000000045fad8 c00000000d15cfa0 c000000000540998 0000000000000001
NIP [c000000000051e88] .do_exit+0xa4/0xa08
LR [c000000000051e1c] .do_exit+0x38/0xa08
Call Trace:
[c00000000de9b150] [c000000000051e1c] .do_exit+0x38/0xa08 (unreliable)
[c00000000de9b210] [c000000000021478] .die+0x20c/0x210
[c00000000de9b2b0] [c00000000002971c] .bad_page_fault+0xb8/0xd4
[c00000000de9b330] [c000000000004e98] handle_page_fault+0x3c/0x58
--- Exception: 300 at .do_exit+0xa4/0xa08
    LR = .do_exit+0x38/0xa08
[c00000000de9b6e0] [c000000000021478] .die+0x20c/0x210
[c00000000de9b780] [c00000000002971c] .bad_page_fault+0xb8/0xd4
[c00000000de9b800] [c000000000004e98] handle_page_fault+0x3c/0x58
--- Exception: 300 at .release_mounts+0x3c/0x108
    LR = .release_mounts+0xd4/0x108
[c00000000de9bb90] [c0000000000d9b38] .__put_mnt_ns+0xdc/0x114
[c00000000de9bc50] [c00000000006cd74] .free_nsproxy+0x54/0xe8
[c00000000de9bce0] [c00000000005272c] .do_exit+0x948/0xa08
[c00000000de9bda0] [c0000000000528c0] .sys_exit_group+0x0/0x8
[c00000000de9be30] [c00000000000872c] syscall_exit+0x0/0x40
Instruction dump:
801d0198 2f800000 40be0010 e87e8050 4bffb5c9 60000000 e80d01a0 7fa00278
3120ffff 7c090110 0b000000 e93d04a8 <e9290020> e8090828 7fbd0000 40be0048
Not cloning container for unused subsystem ns
Not cloning container for unused subsystem ns
Not cloning container for unused subsystem ns
Not cloning container for unused subsystem ns
Not cloning container for unused subsystem ns
Not cloning container for unused subsystem ns
Unable to handle kernel paging request for data at address 0x00000020
Faulting instruction address: 0xc000000000051e88
Oops: Kernel access of bad area, sig: 11 [#15]
SMP NR_CPUS=32 NUMA pSeries
Modules linked in:
NIP: c000000000051e88 LR: c000000000051e1c CTR: 800000000013f270
REGS: c00000000de9aa00 TRAP: 0300   Tainted: G      D  (2.6.22-rc6-mm1)
MSR: 8000000000009032 <EE,ME,IR,DR>  CR: 28000444  XER: 00000002
DAR: 0000000000000020, DSISR: 0000000040000000
TASK = c00000000d15cfa0[4871] 'pidns_exec' THREAD: c00000000de98000 CPU: 2
GPR00: 0000000000000000 c00000000de9ac80 c00000000060fed8 0000000000000000
GPR04: 0000000000000000 c00000000d15cfa0 ffffffffffffffff 0000000000000000
GPR08: 0000000000000000 0000000000000000 c00000000063e580 c00000000063e580
GPR12: 0000000000004000 c000000000529180 00000000ffa0d094 0000000000000000
GPR16: 00000000100a0000 00000000100a92c8 00000000100a0000 0000000010080000
GPR20: 0000000000000000 0000000000000000 0000000000000000 00000000100a9568
GPR24: 00000000f7fddb60 c00000000065e2c8 000000000000000b 000000000000000b
GPR28: c00000000045fad8 c00000000d15cfa0 c000000000540998 0000000000000001
NIP [c000000000051e88] .do_exit+0xa4/0xa08
LR [c000000000051e1c] .do_exit+0x38/0xa08
Call Trace:
[c00000000de9ac80] [c000000000051e1c] .do_exit+0x38/0xa08 (unreliable)
[c00000000de9ad40] [c000000000021478] .die+0x20c/0x210
[c00000000de9ade0] [c00000000002971c] .bad_page_fault+0xb8/0xd4
[c00000000de9ae60] [c000000000004e98] handle_page_fault+0x3c/0x58
--- Exception: 300 at .do_exit+0xa4/0xa08
    LR = .do_exit+0x38/0xa08
[c00000000de9b210] [c000000000021478] .die+0x20c/0x210
[c00000000de9b2b0] [c00000000002971c] .bad_page_fault+0xb8/0xd4
[c00000000de9b330] [c000000000004e98] handle_page_fault+0x3c/0x58
--- Exception: 300 at .do_exit+0xa4/0xa08
    LR = .do_exit+0x38/0xa08
[c00000000de9b6e0] [c000000000021478] .die+0x20c/0x210
[c00000000de9b780] [c00000000002971c] .bad_page_fault+0xb8/0xd4
[c00000000de9b800] [c000000000004e98] handle_page_fault+0x3c/0x58
--- Exception: 300 at .release_mounts+0x3c/0x108
    LR = .release_mounts+0xd4/0x108
[c00000000de9bb90] [c0000000000d9b38] .__put_mnt_ns+0xdc/0x114
[c00000000de9bc50] [c00000000006cd74] .free_nsproxy+0x54/0xe8
[c00000000de9bce0] [c00000000005272c] .do_exit+0x948/0xa08
[c00000000de9bda0] [c0000000000528c0] .sys_exit_group+0x0/0x8
[c00000000de9be30] [c00000000000872c] syscall_exit+0x0/0x40
Instruction dump:
801d0198 2f800000 40be0010 e87e8050 4bffb5c9 60000000 e80d01a0 7fa00278
3120ffff 7c090110 0b000000 e93d04a8 <e9290020> e8090828 7fbd0000 40be0048
Not cloning container for unused subsystem ns
Not cloning container for unused subsystem ns
Not cloning container for unused subsystem ns
Not cloning container for unused subsystem ns
Not cloning container for unused subsystem ns
Not cloning container for unused subsystem ns
Unable to handle kernel paging request for data at address 0x00000020
Faulting instruction address: 0xc000000000051e88
Oops: Kernel access of bad area, sig: 11 [#16]
SMP NR_CPUS=32 NUMA pSeries
Modules linked in:
NIP: c000000000051e88 LR: c000000000051e1c CTR: 800000000013f270
REGS: c00000000de9a530 TRAP: 0300   Tainted: G      D  (2.6.22-rc6-mm1)
MSR: 8000000000009032 <EE,ME,IR,DR>  CR: 28000444  XER: 00000002
DAR: 0000000000000020, DSISR: 0000000040000000
TASK = c00000000d15cfa0[4871] 'pidns_exec' THREAD: c00000000de98000 CPU: 2
GPR00: 0000000000000000 c00000000de9a7b0 c00000000060fed8 0000000000000000
GPR04: 0000000000000000 c00000000d15cfa0 ffffffffffffffff 0000000000000000
GPR08: 0000000000000000 0000000000000000 c00000000063e580 c00000000063e580
GPR12: 0000000000004000 c000000000529180 00000000ffa0d094 0000000000000000
GPR16: 00000000100a0000 00000000100a92c8 00000000100a0000 0000000010080000
GPR20: 0000000000000000 0000000000000000 0000000000000000 00000000100a9568
GPR24: 00000000f7fddb60 c00000000065e2c8 000000000000000b 000000000000000b
GPR28: c00000000045fad8 c00000000d15cfa0 c000000000540998 0000000000000001
NIP [c000000000051e88] .do_exit+0xa4/0xa08
LR [c000000000051e1c] .do_exit+0x38/0xa08
Call Trace:
[c00000000de9a7b0] [c000000000051e1c] .do_exit+0x38/0xa08 (unreliable)
[c00000000de9a870] [c000000000021478] .die+0x20c/0x210
[c00000000de9a910] [c00000000002971c] .bad_page_fault+0xb8/0xd4
[c00000000de9a990] [c000000000004e98] handle_page_fault+0x3c/0x58
--- Exception: 300 at .do_exit+0xa4/0xa08
    LR = .do_exit+0x38/0xa08
[c00000000de9ad40] [c000000000021478] .die+0x20c/0x210
[c00000000de9ade0] [c00000000002971c] .bad_page_fault+0xb8/0xd4
[c00000000de9ae60] [c000000000004e98] handle_page_fault+0x3c/0x58
--- Exception: 300 at .do_exit+0xa4/0xa08
    LR = .do_exit+0x38/0xa08
[c00000000de9b210] [c000000000021478] .die+0x20c/0x210
[c00000000de9b2b0] [c00000000002971c] .bad_page_fault+0xb8/0xd4
[c00000000de9b330] [c000000000004e98] handle_page_fault+0x3c/0x58
--- Exception: 300 at .do_exit+0xa4/0xa08
    LR = .do_exit+0x38/0xa08
[c00000000de9b6e0] [c000000000021478] .die+0x20c/0x210
[c00000000de9b780] [c00000000002971c] .bad_page_fault+0xb8/0xd4
[c00000000de9b800] [c000000000004e98] handle_page_fault+0x3c/0x58
--- Exception: 300 at .release_mounts+0x3c/0x108
    LR = .release_mounts+0xd4/0x108
[c00000000de9bb90] [c0000000000d9b38] .__put_mnt_ns+0xdc/0x114
[c00000000de9bc50] [c00000000006cd74] .free_nsproxy+0x54/0xe8
[c00000000de9bce0] [c00000000005272c] .do_exit+0x948/0xa08
[c00000000de9bda0] [c0000000000528c0] .sys_exit_group+0x0/0x8
[c00000000de9be30] [c00000000000872c] syscall_exit+0x0/0x40
Instruction dump:
801d0198 2f800000 40be0010 e87e8050 4bffb5c9 60000000 e80d01a0 7fa00278
3120ffff 7c090110 0b000000 e93d04a8 <e9290020> e8090828 7fbd0000 40be0048
...
                                                            
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers |  
	|  |  |  
	| 
		
			| Re: [PATCH 0/16] Pid namespaces [message #19351 is a reply to message #19320] | Mon, 16 July 2007 08:47   |  
			| 
				
				
					|  Pavel Emelianov Messages: 1149
 Registered: September 2006
 | Senior Member |  |  |  
	| > My x86_64 system boots fine but crashes as below, when I run my
> 'pidns_exec' test with a simple program that prints getpid(), getppid()
> etc of the process in the child pid ns.
> 
> Pls see 
> 
> http://www.geocities.com/sukadevb/Pidspace/2.6.22-rc6-mm1-pavel-1.tgz
> 
> for the patches I currently have applied and let me know if I need more
> on top.
> 
> And see
> 
> http://www.geocities.com/sukadevb/Pidspace/Test1/
> 
> for the test programs. You may need to run the 'mypid-loop.x' script
> to repro the crash. The pidns_exec.c program calls clone() with CLONE_NEWPID
> and execs the given program (it was included in Patch 0 of the patchset I
> posted to Containers).
> 
> Suka
> 
> login: Unable to handle kernel NULL pointer dereference at 00000000000002fc RIP:
>  [<ffffffff802b9e5e>] proc_get_sb+0xfb/0x138
> PGD 104d53067 PUD 104d4d067 PMD 0
> Oops: 0002 [1] SMP
> CPU 2
> Modules linked in:
> Pid: 3279, comm: pidns_exec Not tainted 2.6.22-rc6-mm1-ovz1 #10
> RIP: 0010:[<ffffffff802b9e5e>]  [<ffffffff802b9e5e>] proc_get_sb+0xfb/0x138
> RSP: 0018:ffff8101029d7d28  EFLAGS: 00010202
> RAX: ffff810100651840 RBX: ffff810104461400 RCX: ffff810100651878
> RDX: 0000000000000000 RSI: ffffffff806e5460 RDI: 0000000000000238
> RBP: ffff810102d886f8 R08: ffff810104461400 R09: ffff810100026000
> R10: 0000000000000000 R11: 0000000000000002 R12: ffff8101029c6000
> R13: 0000000002000000 R14: ffffffff806ee920 R15: ffff810102088cc0
> FS:  00002b0b499ec6f0(0000) GS:ffff81010069c3c0(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> CR2: 00000000000002fc CR3: 000000010381b000 CR4: 00000000000006e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Process pidns_exec (pid: 3279, threadinfo ffff8101029d6000, task ffff81010269e7f0)
> Stack:  ffff810102088cc0 ffff810102088cc0 00000000fffffff4 ffffffff806ee920
>  ffffffff8065f9d9 ffff8101029c6000 0000000002000000 ffffffff80287164
>  00000000000000d0 ffff8101029c6000 ffffffff806e5460 ffff8101029c6000
> Call Trace:
>  [<ffffffff80287164>] vfs_kern_mount+0x4f/0x8b
>  [<ffffffff802b9cf4>] pid_ns_prepare_proc+0x13/0x2e
>  [<ffffffff80245be3>] copy_pid_ns+0xd7/0x164
>  [<ffffffff8024af34>] create_new_namespaces+0xde/0x192
>  [<ffffffff8024b0aa>] copy_namespaces+0x4b/0x85
>  [<ffffffff802347e2>] copy_process+0xcb4/0x1439
>  [<ffffffff8020bbee>] system_call+0x7e/0x83
>  [<ffffffff8023556a>] do_fork+0x6c/0x1e7
>  [<ffffffff8020bf07>] ptregscall_common+0x67/0xb0
> 
Here's the patch fixing the problem. So, Suka, I propose that you review
my patches, point out things that you don't like and would like to see
your code instead. After all I will re-split the set with your fixes, mark
some patches with your From: and send them to Andrew. What do you think?
---
--- ./fs/proc/root.c.procpidnsfix	2007-07-16 10:32:00.000000000 +0400
+++ ./fs/proc/root.c	2007-07-16 12:34:35.000000000 +0400
@@ -79,8 +79,6 @@ static int proc_get_sb(struct file_syste
 		if (!ei->pid)
 			ei->pid = find_get_pid(1);
 		sb->s_flags |= MS_ACTIVE;
-
-		mntput(ns->proc_mnt);
 		ns->proc_mnt = mnt;
 	}
 
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers |  
	|  |  |  
	| 
		
			| Re: [PATCH 0/16] Pid namespaces [message #19358 is a reply to message #19351] | Tue, 17 July 2007 04:23   |  
			| 
				
				
					|  Sukadev Bhattiprolu Messages: 413
 Registered: August 2006
 | Senior Member |  |  |  
	| Pavel Emelianov [xemul@openvz.org] wrote:
| >My x86_64 system boots fine but crashes as below, when I run my
| >'pidns_exec' test with a simple program that prints getpid(), getppid()
| >etc of the process in the child pid ns.
| >
| >Pls see 
| >
| >http://www.geocities.com/sukadevb/Pidspace/2.6.22-rc6-mm1-pavel-1.tgz
| >
| >for the patches I currently have applied and let me know if I need more
| >on top.
| >
| >And see
| >
| >http://www.geocities.com/sukadevb/Pidspace/Test1/
| >
| >for the test programs. You may need to run the 'mypid-loop.x' script
| >to repro the crash. The pidns_exec.c program calls clone() with 
| >CLONE_NEWPID
| >and execs the given program (it was included in Patch 0 of the patchset I
| >posted to Containers).
| >
| >Suka
| >
| >login: Unable to handle kernel NULL pointer dereference at 
| >00000000000002fc RIP:
| > [<ffffffff802b9e5e>] proc_get_sb+0xfb/0x138
| >PGD 104d53067 PUD 104d4d067 PMD 0
| >Oops: 0002 [1] SMP
| >CPU 2
| >Modules linked in:
| >Pid: 3279, comm: pidns_exec Not tainted 2.6.22-rc6-mm1-ovz1 #10
| >RIP: 0010:[<ffffffff802b9e5e>]  [<ffffffff802b9e5e>] proc_get_sb+0xfb/0x138
| >RSP: 0018:ffff8101029d7d28  EFLAGS: 00010202
| >RAX: ffff810100651840 RBX: ffff810104461400 RCX: ffff810100651878
| >RDX: 0000000000000000 RSI: ffffffff806e5460 RDI: 0000000000000238
| >RBP: ffff810102d886f8 R08: ffff810104461400 R09: ffff810100026000
| >R10: 0000000000000000 R11: 0000000000000002 R12: ffff8101029c6000
| >R13: 0000000002000000 R14: ffffffff806ee920 R15: ffff810102088cc0
| >FS:  00002b0b499ec6f0(0000) GS:ffff81010069c3c0(0000) 
| >knlGS:0000000000000000
| >CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
| >CR2: 00000000000002fc CR3: 000000010381b000 CR4: 00000000000006e0
| >DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
| >DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
| >Process pidns_exec (pid: 3279, threadinfo ffff8101029d6000, task 
| >ffff81010269e7f0)
| >Stack:  ffff810102088cc0 ffff810102088cc0 00000000fffffff4 ffffffff806ee920
| > ffffffff8065f9d9 ffff8101029c6000 0000000002000000 ffffffff80287164
| > 00000000000000d0 ffff8101029c6000 ffffffff806e5460 ffff8101029c6000
| >Call Trace:
| > [<ffffffff80287164>] vfs_kern_mount+0x4f/0x8b
| > [<ffffffff802b9cf4>] pid_ns_prepare_proc+0x13/0x2e
| > [<ffffffff80245be3>] copy_pid_ns+0xd7/0x164
| > [<ffffffff8024af34>] create_new_namespaces+0xde/0x192
| > [<ffffffff8024b0aa>] copy_namespaces+0x4b/0x85
| > [<ffffffff802347e2>] copy_process+0xcb4/0x1439
| > [<ffffffff8020bbee>] system_call+0x7e/0x83
| > [<ffffffff8023556a>] do_fork+0x6c/0x1e7
| > [<ffffffff8020bf07>] ptregscall_common+0x67/0xb0
| >
| 
| Here's the patch fixing the problem.
Yes, I think it fixes the problem I was seeing before.
However, my next test failed. This time I run:
	$ cat ./lxc-wrap.sh
	#!/bin/sh
	if [ $$ -eq 1 ]; then
		echo "$$ (`basename $0`): Executing in nested pid ns..."
	fi
	port_num=$1;
	if [ $# -lt 1 ]; then
	    port_num=2709
	fi
	mount -t proc lxcproc /proc
	/usr/sbin/sshd -D -p $port_num
	umount /proc
	$ ./pidns_exec ./lxc-wrap.sh
This creates a new pid ns and an sshd in that ns. From another window
I ssh to the system with port number 2709 and I am in the new pid ns.
(a simple prog to print getpid(), getppid() etc seems to be fine).
But "ps -e" fails:
	
	$ ps -e
	Error: /proc must be mounted
	  To mount /proc at boot you need an /etc/fstab line like:
	      /proc   /proc   proc    defaults
	  In the meantime, mount /proc /proc -t proc
	$ mount -t proc none /proc
	mount: /proc is busy
If I comment out the "mount" command in the lxc-wrap.sh script above,
this mount seems to work, but "ps -e" still reports the same error. 
| So, Suka, I propose that you review my patches, point out things that you
| don't like and would like to see your code instead.
I am reviewing them and I thought we wanted to start with my patchset. I
started porting my patches to rc6-mm1 plus patches we already sent to -mm.
Well, I can hold off on that and review/test your patches.
| After all I will re-split the set with your fixes, mark
| some patches with your From: and send them to Andrew. What do you think?
I don't mind re-splitting the patches, but would that be more work for us
and would Andrew and other reviewers prefer patches split logically.  For
instance if you and I contributed to a patch, can we just put both our
Signed-off on one patch rather than splitting it ?
| 
| ---
| 
| --- ./fs/proc/root.c.procpidnsfix	2007-07-16 10:32:00.000000000 +0400
| +++ ./fs/proc/root.c	2007-07-16 12:34:35.000000000 +0400
| @@ -79,8 +79,6 @@ static int proc_get_sb(struct file_syste
| 		if (!ei->pid)
| 			ei->pid = find_get_pid(1);
| 		sb->s_flags |= MS_ACTIVE;
| -
| -		mntput(ns->proc_mnt);
| 		ns->proc_mnt = mnt;
| 	}
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers |  
	|  |  |  
	| 
		
			| Re: [PATCH 13/16] Switch to operating with pid_numbers instead of pids [message #19447 is a reply to message #19202] | Wed, 25 July 2007 00:36   |  
			| 
				
				
					|  Sukadev Bhattiprolu Messages: 413
 Registered: August 2006
 | Senior Member |  |  |  
	| Pavel Emelianov [xemul@openvz.org] wrote:
| Make alloc_pid() initialize pid_numbers and hash them
| into the hashtable, not the struct pid itself.
| 
| Signed-off-by: Pavel Emelianov <xemul@openvz.org>
| 
| ---
| 
|  pid.c |   47 +++++++++++++++++++++++++++++++++--------------
|  1 files changed, 33 insertions(+), 14 deletions(-)
| 
| --- ./kernel/pid.c.ve12	2007-07-05 11:06:41.000000000 +0400
| +++ ./kernel/pid.c	2007-07-05 11:08:23.000000000 +0400
| @@ -28,8 +28,10 @@
|  #include <linux/hash.h>
|  #include <linux/pid_namespace.h>
|  #include <linux/init_task.h>
| +#include <linux/proc_fs.h>
| 
| -#define pid_hashfn(nr) hash_long((unsigned long)nr, pidhash_shift)
| +#define pid_hashfn(nr, ns)	\
| +	hash_long((unsigned long)nr + (unsigned long)ns, pidhash_shift)
|  static struct hlist_head *pid_hash;
|  static int pidhash_shift;
|  struct pid init_struct_pid = INIT_STRUCT_PID;
| @@ -194,7 +198,7 @@ fastcall void put_pid(struct pid *pid)
|  	if (!pid)
|  		return;
| 
| -	ns = pid->numbers[0].ns;
| +	ns = pid->numbers[pid->level].ns;
|  	if ((atomic_read(&pid->count) == 1) ||
|  	     atomic_dec_and_test(&pid->count))
|  		kmem_cache_free(ns->pid_cachep, pid);
| @@ -210,13 +214,17 @@ static void delayed_put_pid(struct rcu_h
|  fastcall void free_pid(struct pid *pid)
|  {
|  	/* We can be called with write_lock_irq(&tasklist_lock) held */
| +	int i;
|  	unsigned long flags;
| 
|  	spin_lock_irqsave(&pidmap_lock, flags);
| -	hlist_del_rcu(&pid->pid_chain);
| +	for (i = 0; i <= pid->level; i++)
| +		hlist_del_rcu(&pid->numbers[i].pid_chain);
|  	spin_unlock_irqrestore(&pidmap_lock, flags);
| 
| -	free_pidmap(&init_pid_ns, pid->nr);
| +	for (i = 0; i <= pid->level; i++)
| +		free_pidmap(pid->numbers[i].ns, pid->numbers[i].nr);
| +
|  	call_rcu(&pid->rcu, delayed_put_pid);
|  }
| 
| @@ -224,30 +232,43 @@ struct pid *alloc_pid(struct pid_namespa
|  {
|  	struct pid *pid;
|  	enum pid_type type;
| -	int nr = -1;
| +	struct pid_namespace *ns;
| +	int i, nr;
| 
| -	pid = kmem_cache_alloc(init_pid_ns.pid_cachep, GFP_KERNEL);
| +	pid = kmem_cache_alloc(pid_ns->pid_cachep, GFP_KERNEL);
|  	if (!pid)
|  		goto out;
| 
| -	nr = alloc_pidmap(current->nsproxy->pid_ns);
| -	if (nr < 0)
| -		goto out_free;
| +	ns = pid_ns;
| +	for (i = pid_ns->level; i >= 0; i--) {
| +		nr = alloc_pidmap(ns);
| +		if (nr < 0)
| +			goto out_free;
If pid_ns->level is say 3 and alloc_pidmap() succeeds when i=0,1
and fails when i=2, we would try to free_pidmap() even from 
pid->pid_number[2].pid_ns. This would incorrectly a)
drop reference count on that pid namespace, and incorrectly
increment pidmap->nr_free.
Should we use kmem_cache_zalloc() and check for a non-NULL pid_ns
before calling free_pidmap() below ?
| 
| +		pid->numbers[i].nr = nr;
| +		pid->numbers[i].ns = ns;
| +		ns = ns->parent;
| +	}
| +
| +	pid->level = pid_ns->level;
|  	atomic_set(&pid->count, 1);
| -	pid->nr = nr;
|  	for (type = 0; type < PIDTYPE_MAX; ++type)
|  		INIT_HLIST_HEAD(&pid->tasks[type]);
| 
|  	spin_lock_irq(&pidmap_lock);
| -	hlist_add_head_rcu(&pid->pid_chain, &pid_hash[pid_hashfn(pid->nr)]);
| +	for (i = pid->level; i >= 0; i--)
| +		hlist_add_head_rcu(&pid->numbers[i].pid_chain,
| +				&pid_hash[pid_hashfn(pid->numbers[i].nr,
| +					pid->numbers[i].ns)]);
|  	spin_unlock_irq(&pidmap_lock);
| -
|  out:
|  	return pid;
| 
|  out_free:
| -	kmem_cache_free(init_pid_ns.pid_cachep, pid);
| +	for (i++; i <= pid->level; i++)
| +		free_pidmap(pid->numbers[i].ns, pid->numbers[i].nr);
i.e all pid->numbers[] may not be initialized here right ?
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers |  
	|  |  |  
	| 
		
			| Re: [PATCH 13/16] Switch to operating with pid_numbers instead of pids [message #19450 is a reply to message #19202] | Wed, 25 July 2007 19:13   |  
			| 
				
				
					|  Sukadev Bhattiprolu Messages: 413
 Registered: August 2006
 | Senior Member |  |  |  
	| Pavel Emelianov [xemul@openvz.org] wrote:
| sukadev@us.ibm.com wrote:
| >Pavel Emelianov [xemul@openvz.org] wrote:
| >| Make alloc_pid() initialize pid_numbers and hash them
| >| into the hashtable, not the struct pid itself.
| >| 
| >| Signed-off-by: Pavel Emelianov <xemul@openvz.org>
| >| 
| >| ---
| >| 
| >|  pid.c |   47 +++++++++++++++++++++++++++++++++--------------
| >|  1 files changed, 33 insertions(+), 14 deletions(-)
| >| 
| >| --- ./kernel/pid.c.ve12	2007-07-05 11:06:41.000000000 +0400
| >| +++ ./kernel/pid.c	2007-07-05 11:08:23.000000000 +0400
| >| @@ -28,8 +28,10 @@
| >|  #include <linux/hash.h>
| >|  #include <linux/pid_namespace.h>
| >|  #include <linux/init_task.h>
| >| +#include <linux/proc_fs.h>
| >| 
| >| -#define pid_hashfn(nr) hash_long((unsigned long)nr, pidhash_shift)
| >| +#define pid_hashfn(nr, ns)	\
| >| +	hash_long((unsigned long)nr + (unsigned long)ns, pidhash_shift)
| >|  static struct hlist_head *pid_hash;
| >|  static int pidhash_shift;
| >|  struct pid init_struct_pid = INIT_STRUCT_PID;
| >| @@ -194,7 +198,7 @@ fastcall void put_pid(struct pid *pid)
| >|  	if (!pid)
| >|  		return;
| >| 
| >| -	ns = pid->numbers[0].ns;
| >| +	ns = pid->numbers[pid->level].ns;
| >|  	if ((atomic_read(&pid->count) == 1) ||
| >|  	     atomic_dec_and_test(&pid->count))
| >|  		kmem_cache_free(ns->pid_cachep, pid);
| >| @@ -210,13 +214,17 @@ static void delayed_put_pid(struct rcu_h
| >|  fastcall void free_pid(struct pid *pid)
| >|  {
| >|  	/* We can be called with write_lock_irq(&tasklist_lock) held */
| >| +	int i;
| >|  	unsigned long flags;
| >| 
| >|  	spin_lock_irqsave(&pidmap_lock, flags);
| >| -	hlist_del_rcu(&pid->pid_chain);
| >| +	for (i = 0; i <= pid->level; i++)
| >| +		hlist_del_rcu(&pid->numbers[i].pid_chain);
| >|  	spin_unlock_irqrestore(&pidmap_lock, flags);
| >| 
| >| -	free_pidmap(&init_pid_ns, pid->nr);
| >| +	for (i = 0; i <= pid->level; i++)
| >| +		free_pidmap(pid->numbers[i].ns, pid->numbers[i].nr);
| >| +
| >|  	call_rcu(&pid->rcu, delayed_put_pid);
| >|  }
| >| 
| >| @@ -224,30 +232,43 @@ struct pid *alloc_pid(struct pid_namespa
| >|  {
| >|  	struct pid *pid;
| >|  	enum pid_type type;
| >| -	int nr = -1;
| >| +	struct pid_namespace *ns;
| >| +	int i, nr;
| >| 
| >| -	pid = kmem_cache_alloc(init_pid_ns.pid_cachep, GFP_KERNEL);
| >| +	pid = kmem_cache_alloc(pid_ns->pid_cachep, GFP_KERNEL);
| >|  	if (!pid)
| >|  		goto out;
| >| 
| >| -	nr = alloc_pidmap(current->nsproxy->pid_ns);
| >| -	if (nr < 0)
| >| -		goto out_free;
| >| +	ns = pid_ns;
| >| +	for (i = pid_ns->level; i >= 0; i--) {
| >| +		nr = alloc_pidmap(ns);
| >| +		if (nr < 0)
| >| +			goto out_free;
| >
| >If pid_ns->level is say 3 and alloc_pidmap() succeeds when i=0,1
| 
| It cannot :) If level is 3, then we'll allocate for 3, 2, 1, 0 sequence.
| The loop is descending, not ascending...
Aah descending - thats right. But I still think there is a problem.
Here is your code that I am referring to:
        pid = kmem_cache_alloc(pid_ns->pid_cachep, GFP_KERNEL);
        if (!pid)
                goto out;
        ns = pid_ns;
        for (i = pid_ns->level; i >= 0; i--) {
                nr = alloc_pidmap(ns);
                if (nr < 0)
                        goto out_free;
                pid->numbers[i].nr = nr;
                pid->numbers[i].ns = ns;
                ns = ns->parent;
        }
	pid->level = pid_ns->level;
	<snip>
out:
	return pid;
out_free:
        for (i++; i <= pid->level; i++)
                free_pidmap(pid->numbers[i].ns, pid->numbers[i].nr);
        kmem_cache_free(pid_ns->pid_cachep, pid);
        pid = NULL;
	goto out;
<end code>
Lets say initially pid_ns->level = 3 and alloc_pidmap() succeeds for
i=3 and i=2 but fails for i=1 and we execute "goto out_free".
But pid->level is uninitialized at this point right ?
Even if it were set to zero (using kmem_cache_zalloc()), we may not
free the two pidmap entries we allocated for i=3 and i=2.
Suka
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers |  
	|  |  |  
	| 
		
			| Re: [PATCH 13/16] Switch to operating with pid_numbers instead of pids [message #19451 is a reply to message #19447] | Wed, 25 July 2007 10:07   |  
			| 
				
				
					|  Pavel Emelianov Messages: 1149
 Registered: September 2006
 | Senior Member |  |  |  
	| sukadev@us.ibm.com wrote:
> Pavel Emelianov [xemul@openvz.org] wrote:
> | Make alloc_pid() initialize pid_numbers and hash them
> | into the hashtable, not the struct pid itself.
> | 
> | Signed-off-by: Pavel Emelianov <xemul@openvz.org>
> | 
> | ---
> | 
> |  pid.c |   47 +++++++++++++++++++++++++++++++++--------------
> |  1 files changed, 33 insertions(+), 14 deletions(-)
> | 
> | --- ./kernel/pid.c.ve12	2007-07-05 11:06:41.000000000 +0400
> | +++ ./kernel/pid.c	2007-07-05 11:08:23.000000000 +0400
> | @@ -28,8 +28,10 @@
> |  #include <linux/hash.h>
> |  #include <linux/pid_namespace.h>
> |  #include <linux/init_task.h>
> | +#include <linux/proc_fs.h>
> | 
> | -#define pid_hashfn(nr) hash_long((unsigned long)nr, pidhash_shift)
> | +#define pid_hashfn(nr, ns)	\
> | +	hash_long((unsigned long)nr + (unsigned long)ns, pidhash_shift)
> |  static struct hlist_head *pid_hash;
> |  static int pidhash_shift;
> |  struct pid init_struct_pid = INIT_STRUCT_PID;
> | @@ -194,7 +198,7 @@ fastcall void put_pid(struct pid *pid)
> |  	if (!pid)
> |  		return;
> | 
> | -	ns = pid->numbers[0].ns;
> | +	ns = pid->numbers[pid->level].ns;
> |  	if ((atomic_read(&pid->count) == 1) ||
> |  	     atomic_dec_and_test(&pid->count))
> |  		kmem_cache_free(ns->pid_cachep, pid);
> | @@ -210,13 +214,17 @@ static void delayed_put_pid(struct rcu_h
> |  fastcall void free_pid(struct pid *pid)
> |  {
> |  	/* We can be called with write_lock_irq(&tasklist_lock) held */
> | +	int i;
> |  	unsigned long flags;
> | 
> |  	spin_lock_irqsave(&pidmap_lock, flags);
> | -	hlist_del_rcu(&pid->pid_chain);
> | +	for (i = 0; i <= pid->level; i++)
> | +		hlist_del_rcu(&pid->numbers[i].pid_chain);
> |  	spin_unlock_irqrestore(&pidmap_lock, flags);
> | 
> | -	free_pidmap(&init_pid_ns, pid->nr);
> | +	for (i = 0; i <= pid->level; i++)
> | +		free_pidmap(pid->numbers[i].ns, pid->numbers[i].nr);
> | +
> |  	call_rcu(&pid->rcu, delayed_put_pid);
> |  }
> | 
> | @@ -224,30 +232,43 @@ struct pid *alloc_pid(struct pid_namespa
> |  {
> |  	struct pid *pid;
> |  	enum pid_type type;
> | -	int nr = -1;
> | +	struct pid_namespace *ns;
> | +	int i, nr;
> | 
> | -	pid = kmem_cache_alloc(init_pid_ns.pid_cachep, GFP_KERNEL);
> | +	pid = kmem_cache_alloc(pid_ns->pid_cachep, GFP_KERNEL);
> |  	if (!pid)
> |  		goto out;
> | 
> | -	nr = alloc_pidmap(current->nsproxy->pid_ns);
> | -	if (nr < 0)
> | -		goto out_free;
> | +	ns = pid_ns;
> | +	for (i = pid_ns->level; i >= 0; i--) {
> | +		nr = alloc_pidmap(ns);
> | +		if (nr < 0)
> | +			goto out_free;
> 
> If pid_ns->level is say 3 and alloc_pidmap() succeeds when i=0,1
It cannot :) If level is 3, then we'll allocate for 3, 2, 1, 0 sequence.
The loop is descending, not ascending...
> and fails when i=2, we would try to free_pidmap() even from 
> pid->pid_number[2].pid_ns. This would incorrectly a)
> drop reference count on that pid namespace, and incorrectly
> increment pidmap->nr_free.
> 
> Should we use kmem_cache_zalloc() and check for a non-NULL pid_ns
> before calling free_pidmap() below ?
> 
> | 
> | +		pid->numbers[i].nr = nr;
> | +		pid->numbers[i].ns = ns;
> | +		ns = ns->parent;
> | +	}
> | +
> | +	pid->level = pid_ns->level;
> |  	atomic_set(&pid->count, 1);
> | -	pid->nr = nr;
> |  	for (type = 0; type < PIDTYPE_MAX; ++type)
> |  		INIT_HLIST_HEAD(&pid->tasks[type]);
> | 
> |  	spin_lock_irq(&pidmap_lock);
> | -	hlist_add_head_rcu(&pid->pid_chain, &pid_hash[pid_hashfn(pid->nr)]);
> | +	for (i = pid->level; i >= 0; i--)
> | +		hlist_add_head_rcu(&pid->numbers[i].pid_chain,
> | +				&pid_hash[pid_hashfn(pid->numbers[i].nr,
> | +					pid->numbers[i].ns)]);
> |  	spin_unlock_irq(&pidmap_lock);
> | -
> |  out:
> |  	return pid;
> | 
> |  out_free:
> | -	kmem_cache_free(init_pid_ns.pid_cachep, pid);
> | +	for (i++; i <= pid->level; i++)
> | +		free_pidmap(pid->numbers[i].ns, pid->numbers[i].nr);
> 
> i.e all pid->numbers[] may not be initialized here right ?
The numbers from i up to pid->level are initialized, so this
loop looks correct.
Thanks,
Pavel
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers |  
	|  |  |  
	| 
		
			| Re: [PATCH 13/16] Switch to operating with pid_numbers instead of pids [message #19452 is a reply to message #19450] | Thu, 26 July 2007 06:42  |  
			| 
				
				
					|  Pavel Emelianov Messages: 1149
 Registered: September 2006
 | Senior Member |  |  |  
	| sukadev@us.ibm.com wrote:
> Pavel Emelianov [xemul@openvz.org] wrote:
> | sukadev@us.ibm.com wrote:
> | >Pavel Emelianov [xemul@openvz.org] wrote:
> | >| Make alloc_pid() initialize pid_numbers and hash them
> | >| into the hashtable, not the struct pid itself.
> | >| 
> | >| Signed-off-by: Pavel Emelianov <xemul@openvz.org>
> | >| 
> | >| ---
> | >| 
> | >|  pid.c |   47 +++++++++++++++++++++++++++++++++--------------
> | >|  1 files changed, 33 insertions(+), 14 deletions(-)
> | >| 
> | >| --- ./kernel/pid.c.ve12	2007-07-05 11:06:41.000000000 +0400
> | >| +++ ./kernel/pid.c	2007-07-05 11:08:23.000000000 +0400
> | >| @@ -28,8 +28,10 @@
> | >|  #include <linux/hash.h>
> | >|  #include <linux/pid_namespace.h>
> | >|  #include <linux/init_task.h>
> | >| +#include <linux/proc_fs.h>
> | >| 
> | >| -#define pid_hashfn(nr) hash_long((unsigned long)nr, pidhash_shift)
> | >| +#define pid_hashfn(nr, ns)	\
> | >| +	hash_long((unsigned long)nr + (unsigned long)ns, pidhash_shift)
> | >|  static struct hlist_head *pid_hash;
> | >|  static int pidhash_shift;
> | >|  struct pid init_struct_pid = INIT_STRUCT_PID;
> | >| @@ -194,7 +198,7 @@ fastcall void put_pid(struct pid *pid)
> | >|  	if (!pid)
> | >|  		return;
> | >| 
> | >| -	ns = pid->numbers[0].ns;
> | >| +	ns = pid->numbers[pid->level].ns;
> | >|  	if ((atomic_read(&pid->count) == 1) ||
> | >|  	     atomic_dec_and_test(&pid->count))
> | >|  		kmem_cache_free(ns->pid_cachep, pid);
> | >| @@ -210,13 +214,17 @@ static void delayed_put_pid(struct rcu_h
> | >|  fastcall void free_pid(struct pid *pid)
> | >|  {
> | >|  	/* We can be called with write_lock_irq(&tasklist_lock) held */
> | >| +	int i;
> | >|  	unsigned long flags;
> | >| 
> | >|  	spin_lock_irqsave(&pidmap_lock, flags);
> | >| -	hlist_del_rcu(&pid->pid_chain);
> | >| +	for (i = 0; i <= pid->level; i++)
> | >| +		hlist_del_rcu(&pid->numbers[i].pid_chain);
> | >|  	spin_unlock_irqrestore(&pidmap_lock, flags);
> | >| 
> | >| -	free_pidmap(&init_pid_ns, pid->nr);
> | >| +	for (i = 0; i <= pid->level; i++)
> | >| +		free_pidmap(pid->numbers[i].ns, pid->numbers[i].nr);
> | >| +
> | >|  	call_rcu(&pid->rcu, delayed_put_pid);
> | >|  }
> | >| 
> | >| @@ -224,30 +232,43 @@ struct pid *alloc_pid(struct pid_namespa
> | >|  {
> | >|  	struct pid *pid;
> | >|  	enum pid_type type;
> | >| -	int nr = -1;
> | >| +	struct pid_namespace *ns;
> | >| +	int i, nr;
> | >| 
> | >| -	pid = kmem_cache_alloc(init_pid_ns.pid_cachep, GFP_KERNEL);
> | >| +	pid = kmem_cache_alloc(pid_ns->pid_cachep, GFP_KERNEL);
> | >|  	if (!pid)
> | >|  		goto out;
> | >| 
> | >| -	nr = alloc_pidmap(current->nsproxy->pid_ns);
> | >| -	if (nr < 0)
> | >| -		goto out_free;
> | >| +	ns = pid_ns;
> | >| +	for (i = pid_ns->level; i >= 0; i--) {
> | >| +		nr = alloc_pidmap(ns);
> | >| +		if (nr < 0)
> | >| +			goto out_free;
> | >
> | >If pid_ns->level is say 3 and alloc_pidmap() succeeds when i=0,1
> | 
> | It cannot :) If level is 3, then we'll allocate for 3, 2, 1, 0 sequence.
> | The loop is descending, not ascending...
> 
> Aah descending - thats right. But I still think there is a problem.
> 
> Here is your code that I am referring to:
> 
>         pid = kmem_cache_alloc(pid_ns->pid_cachep, GFP_KERNEL);
>         if (!pid)
>                 goto out;
> 
>         ns = pid_ns;
>         for (i = pid_ns->level; i >= 0; i--) {
>                 nr = alloc_pidmap(ns);
>                 if (nr < 0)
>                         goto out_free;
> 
>                 pid->numbers[i].nr = nr;
>                 pid->numbers[i].ns = ns;
>                 ns = ns->parent;
>         }
> 
> 	pid->level = pid_ns->level;
> 
> 	<snip>
> 
> out:
> 	return pid;
> out_free:
>         for (i++; i <= pid->level; i++)
>                 free_pidmap(pid->numbers[i].ns, pid->numbers[i].nr);
> 
>         kmem_cache_free(pid_ns->pid_cachep, pid);
>         pid = NULL;
> 	goto out;
> 
> <end code>
> 
> Lets say initially pid_ns->level = 3 and alloc_pidmap() succeeds for
> i=3 and i=2 but fails for i=1 and we execute "goto out_free".
> 
> But pid->level is uninitialized at this point right ?
> 
> Even if it were set to zero (using kmem_cache_zalloc()), we may not
> free the two pidmap entries we allocated for i=3 and i=2.
Yes. I found this after detailed look at the code and (hope) fixed.
> Suka
> 
Thanks,
Pavel
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers |  
	|  |  | 
 
 
 Current Time: Sat Oct 25 18:15:49 GMT 2025 
 Total time taken to generate the page: 0.11097 seconds |