OpenVZ Forum


Home » Mailing lists » Devel » [BUG]: Crash with CONFIG_FAIR_CGROUP_SCHED=y
[BUG]: Crash with CONFIG_FAIR_CGROUP_SCHED=y [message #22978] Thu, 08 November 2007 23:48 Go to next message
Sukadev Bhattiprolu is currently offline  Sukadev Bhattiprolu
Messages: 413
Registered: August 2006
Senior Member
With CONFIG_FAIR_CGROUP_SCHED=y, following commands on 2.6.24-rc1 crash
the system.

	$ mount -t cgroup none /cgroups
	$ ./ns_exec -cm /bin/ls

"ns_exec -cm" calls clone() to clone the mount namespace and then
executes the '/bin/ls' program in the cloned child.

Some observations that Serge and I made (we have been able to reproduce
reliably, but crash logs have not been very useful)

	a. If we skip the 'mount' command, there is no crash.

	b. If CONFIG_FAIR_CGROUP_SCHED=n again, there is no crash (even with 
	   'mount' command).

       	c. mounting just the cpu or just the ns subsystem does not
           lead to a crash.  You can even

           mount -t cgroup -o cpu none /mnt1
           mount -t cgroup -o ns none /mnt2

           and you still don't get a crash

        d. mount -t cgroup -o cpu,ns none /cgroup

           will always cause a crash with subsequent ns_exec

ns_exec.c and config file are attached. Let us know if you need more info.

Suka

---
Crash log:

Red Hat Enterprise Linux release 4.90 (Tikanga)
Kernel 2.6.24-rc1 on an x86_64

elm3a241 login: Unable to handle kernel NULL pointer dereferenceUnable to handle kernel NULL pointer dereference at 0000000000000000 RIP:
 at 0000000000000000 RIP:
 [<0000000000000000>]
 [<0000000000000000>]
PGD 102d4d067 PGD 102d4d067 PUD 102c88067 PUD 102c88067 PMD 0 PMD 0

Oops: 0000 [1] Oops: 0000 [1] SMP SMP

CPU 2 CPU 2

Modules linked in:Modules linked in:

Pid: 3273, comm: ns_exec Not tainted 2.6.24-rc1 #9
Pid: 3273, comm: ns_exec Not tainted 2.6.24-rc1 #9
RIP: 0010:[<0000000000000000>] RIP: 0010:[<0000000000000000>]  [<0000000000000000>]
 [<0000000000000000>]
RSP: 0018:ffff8101006a6af0  EFLAGS: 00055292
RSP: 0018:ffff8101006a6af0  EFLAGS: 00055292
RAX: 0000000000000000 RBX: ffff810100d11140 RCX: ffff810101de4000
RAX: 0000000000000000 RBX: ffff810100d11140 RCX: ffff810101de4000
RDX: 0000000000000000 RSI: ffff810100d1a880 RDI: ffff810001037c00
RDX: 0000000000000000 RSI: ffff810100d1a880 RDI: ffff810001037c00
RBP: ffff810102c136c0 R08: ffff810101de4000 R09: ffff810101d31bb8
RBP: ffff810102c136c0 R08: ffff810101de4000 R09: ffff810101d31bb8
R10: 0000000000000000 R11: 00000000ffffffff R12: ffff8101034075b8
R10: 0000000000000000 R11: 00000000ffffffff R12: ffff8101034075b8
R13: ffff8101029d6028 R14: ffff810103407500 R15: ffffffff80869d00
R13: ffff8101029d6028 R14: ffff810103407500 R15: ffffffff80869d00
FS:  00002b80c2a396f0(0000) GS:ffff81010068f3c0(0000) knlGS:0000000000000000
FS:  00002b80c2a396f0(0000) GS:ffff81010068f3c0(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000000 CR3: 00000001028be000 CR4: 00000000000006e0
CR2: 0000000000000000 CR3: 00000001028be000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process ns_exec (pid: 3273, threadinfo ffff810101de4000, task ffff810100d11140)
Process ns_exec (pid: 3273, threadinfo ffff810101de4000, task ffff810100d11140)
Stack: Stack:  0000000000000000 0000000000000000 0000000000000001 0000000000000001 6c2f343662696c2f 6c2f343662696c2f 2d78756e696c2d64 2d78756e696c2d64

 732e34362d363878 732e34362d363878 ffffffff00322e6f ffffffff00322e6f 762f003561736376 762f003561736376 0000357363762f63 0000357363762f63

 0000000000000000 0000000000000000 0000000000000000 0000000000000000 762f73007665642f 762f73007665642f 0000317363762f63 0000317363762f63

Call Trace:
Call Trace:




Code: Code:  Bad RIP value. Bad RIP value.

RIP RIP  [<0000000000000000>]
 [<0000000000000000>]
 RSP <ffff8101006a6af0>
 RSP <ffff8101006a6af0>
CR2: 0000000000000000
CR2: 0000000000000000


#
# Automatically generated make config: don't edit
# Linux kernel version: 2.6.24-rc1
# Thu Nov  8 13:38:29 2007
#
CONFIG_X86_64=y
CONFIG_64BIT=y
CONFIG_X86=y
CONFIG_GENERIC_TIME=y
CONFIG_GENERIC_TIME_VSYSCALL=y
CONFIG_GENERIC_CMOS_UPDATE=y
CONFIG_CLOCKSOURCE_WATCHDOG=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y
CONFIG_ZONE_DMA32=y
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_SEMAPHORE_SLEEPERS=y
CONFIG_MMU=y
CONFIG_ZONE_DMA=y
CONFIG_RWSEM_GENERIC_SPINLOCK=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_GENERIC_CALIBRATE_DELAY=y
CONFIG_X86_CMPXCHG=y
CONFIG_EARLY_PRINTK=y
CONFIG_GENERIC_ISA_DMA=y
CONFIG_GENERIC_IOMAP=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
CONFIG_ARCH_POPULATES_NODE_MAP=y
CONFIG_DMI=y
CONFIG_AUDIT_ARCH=y
CONFIG_GENERIC_BUG=y
# CONFIG_ARCH_HAS_ILOG2_U32 is not set
# CONFIG_ARCH_HAS_ILOG2_U64 is not set
CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config"

#
# General setup
#
CONFIG_EXPERIMENTAL=y
CONFIG_LOCK_KERNEL=y
CONFIG_INIT_ENV_ARG_LIMIT=32
CONFIG_LOCALVERSION=""
CONFIG_LOCALVERSION_AUTO=y
CONFIG_SWAP=y
CONFIG_SYSVIPC=y
CONFIG_SYSVIPC_SYSCTL=y
CONFIG_POSIX_MQUEUE=y
# CONFIG_BSD_PROCESS_ACCT is not set
# CONFIG_TASKSTATS is not set
CONFIG_USER_NS=y
# CONFIG_AUDIT is not set
CONFIG_IKCONFIG=y
CONFIG_IKCONFIG_PROC=y
CONFIG_LOG_BUF_SHIFT=18
CONFIG_CGROUPS=y
# CONFIG_CGROUP_DEBUG is not set
CONFIG_CGROUP_NS=y
# CONFIG_CGROUP_CPUACCT is not set
# CONFIG_CPUSETS is not set
CONFIG_FAIR_GROUP_SCHED=y
# CONFIG_FAIR_USER_SCHED is not set
CONFIG_FAIR_CGROUP_SCHED=y
CONFIG_SYSFS_DEPRECATED=y
CONFIG_RELAY=y
CONFIG_BLK_DEV_INITRD=y
CONFIG_INITRAMFS_SOURCE=""
CONFIG_CC_OPTIMIZE_FOR_SIZE=y
CONFIG_SYSCTL=y
# CONFIG_EMBEDDED is not set
CONFIG_UID16=y
CONFIG_SYSCTL_SYSCALL=y
CONFIG_KALLSYMS=y
CONFIG_KALLSYMS_ALL=y
# CONFIG_KALLSYMS_EXTRA_PASS is not set
CONFIG_HOTPLUG=y
CONFIG_PRINTK=y
CONFIG_BUG=y
CONFIG_ELF_CORE=y
CONFIG_BASE_FULL=y
CONFIG_FUTEX=y
CONFIG_ANON_INODES=y
CONFIG_EPOLL=y
CONFIG_SIGNALFD=y
CONFIG_EVENTFD=y
CONFIG_SHMEM=y
CONFIG_VM_EVENT_COUNTERS=y
CONFIG_SLAB=y
# CONFIG_SLUB is not set
# CONFIG_SLOB is not set
CONFIG_RT_MUTEXES=y
# CONFIG_TINY_SHMEM is not set
CONFIG_BASE_SMALL=0
CONFIG_MODULES=y
CONFIG_MODULE_UNLOAD=y
CONFIG_MODULE_FORCE_UNLOAD=y
# CONFIG_MODVERSIONS is not set
# CONFIG_MODULE_SRCVERSION_ALL is not set
# CONFIG_KMOD is not set
CONFIG_STOP_MACHINE=y
CONFIG_BLOCK=y
# CONFIG_BLK_DEV_IO_TRACE is not set
# CONFIG_BLK_DEV_BSG is not set
CONFIG_BLOCK_COMPAT=y

#
# IO Schedulers
#
CONFIG_IOSCHED_NOOP=y
# CONFIG_IOSCHED_AS is not set
CONFIG_IOSCHED_DEADLINE=y
CONFIG_IOSCHED_CFQ=y
# CONFIG_DEFAULT_AS is not set
# CONFIG_DEFAULT_DEADLINE is not set
CONFIG_DEFAULT_CFQ=y
# CONFIG_DEFAULT_NOOP is not set
CONFIG_DEFAULT_IOSCHED="cfq"

#
# Processor type and features
#
# CONFIG_TICK_ONESHOT is not set
# CONFIG_NO_HZ is not set
# CONFIG_HIGH_RES_TIMERS is not set
CONFIG_GENERIC_CLOCKEVENTS_BUILD=y
CONFIG_X86_PC=y
# CONFIG_X86_VSMP is not set
# CONFIG_MK8 is not set
# CONFIG_MPSC is not set
# CONFIG_MCORE2 is not set
CONFIG_GENERIC_CPU=y
CONFIG_X86_L1_CACHE_BYTES=128
CONFIG_X86_L1_CACHE_SHIFT=7
CONFIG_X86_INTERNODE_CACHE_BYTES=128
CONFIG_X86_TSC=y
CONFIG_X86_GOOD_APIC=y
# CONFIG_MICROCODE is not set
CONFIG_X86_MSR=y
CONFIG_X86_CPUID=y
CONFIG_X86_HT=y
CONFIG_X86_IO_APIC=y
CONFIG_X86_LOCAL_APIC=y
CONFIG_MTRR=y
CONFIG_SMP=y
CONFIG_SCHED_SMT=y
CONFIG_SCHED_MC=y
# CONFIG_PREEMPT_NONE is not set
CONFIG_PREEMPT_VOLUNTARY=y
# CONFIG_PREEMPT is not set
CONFIG_PREEMPT_BKL=y
CONFIG_NUMA=y
CONFIG_K8_NUMA=y
CONFIG_NODES_SHIFT=6
CONFIG_X86_64_ACPI_NUMA=y
CONFIG_NUMA_EMU=y
CONFIG_ARCH_DISCONTIGMEM_ENABLE=y
CONFIG_ARCH_DISCONTIGMEM_DEFAULT=y
CONFIG_ARCH_SPARSEMEM_ENABLE=y
CONFIG_SELECT_MEMORY_MODEL=y
# CONFIG_FLATMEM_MANUAL is not set
CONFIG_DISCONTIGMEM_MANUAL=y
# CONFIG_SPARSEMEM_MANUAL is not set
CONFIG_DISCONTIGMEM=y
CONFIG_FLAT_NODE_MEM_MAP=y
CONFIG_NEED_MULTIPLE_NODES=y
# CONFIG_SPARSEMEM_STATIC is not set
CONFIG_SPARSEMEM_VMEMMAP_ENABLE=y
CONFIG_SPLIT_PTLOCK_CPUS=4
CONFIG_MIGRATION=y
CONFIG_RESOURCES_64BIT=y
CONFIG_ZONE_DMA_FLAG=1
CONFIG_BOUNCE=y
CONFIG_VIRT_TO_BUS=y
CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID=y
CONFIG_OUT_OF_LINE_PFN_TO_PAGE=y
CONFIG_NR_CPUS=32
CONFIG_PHYSICAL_ALIGN=0x200000
CONFIG_HOTPLUG_CPU=y
CONFIG_ARCH_ENABLE_MEMORY_HOTPLUG=y
CONFIG_HPET_TIMER=y
CONFIG_HPET_EMULATE_RTC=y
CONFIG_IOMMU=y
# CONFIG_CALGARY_IOMMU is not set
CONFIG_SWIOTLB=y
CONFIG_X86_MCE=y
CONFIG_X86_MCE_INTEL=y
CONFIG_X86_MCE_AMD=y
# CONFIG_KEXEC is not set
# CONFIG_CRASH_DUMP is not set
# CONFIG_RELOCATABLE is not set
CONFIG_PHYSICAL_START=0x200000
CONFIG_SECCOMP=y
# CONFIG_CC_STACKPROTECTOR is not set
# CONFIG_HZ_100 is not set
CONFIG_HZ_250=y
# CONFIG_HZ_300 is not set
# CONFIG_HZ_1000 is not set
CONFIG_HZ=250
CONFIG_K8_NB=y
CONFIG_GENERIC_HARDIRQS=y
CONFIG_GENERIC_IRQ_PROBE=y
CONFIG_ISA_DMA_API=y
CONFIG_GENERIC_PENDING_IRQ=y

#
# Power management options
#
CONFIG_PM=y
# CONFIG_PM_LEGACY is not set
# CONFIG_PM_DEBUG is not set
CONFIG_PM_SLEEP_SMP=y
CONFIG_PM_SLEEP=y
CONFIG_SUSPEND_SMP_POSSIBLE=y
CONFIG_SUSPEND=y
CONFIG_HIBERNATION_SMP_POSSIBLE=y
CONFIG_HIBERNATION=y
CONFIG_PM_STD_PARTITION=""
CONFIG_ARCH_HIBERNATION_HEADER=y
CONFIG_ACPI=y
CONFIG_ACPI_SLEEP=y
CONFIG_ACPI_PROCFS=y
CONFIG_ACPI_PROC_EVENT=y
CONFIG_ACPI_BUTTON=y
CONFIG_ACPI_FAN=y
# CONFIG_ACPI_DOCK is not set
CONFIG_ACPI_PROCESSOR=y
CONFIG_ACPI_HOTPLUG_CPU=y
CONFIG_ACPI_THERMAL=y
CONFIG_ACPI_NUMA=y
# CONFIG_ACPI_ASUS is not set
# CONFIG_ACPI_TOSHIBA is not set
CONFIG_ACPI_BLACKLIST_YEAR=0
# CONFIG_ACPI_DEBUG is not set
CONFIG_ACPI_EC=y
CONFIG_ACPI_POWER=y
CONFIG_ACPI_SYSTEM=y
CONFIG_X86_PM_TIMER=y
CONFIG_ACPI_CONTAINER=y

#
# CPU Frequency scaling
#
CONFIG_CPU_FREQ=y
CONFIG_CPU_FREQ_TABLE=y
CONFIG_CPU_FREQ_DEBUG=y
CONFIG_CPU_FREQ_STAT=y
# CONFIG_CPU_FREQ_STAT_DETAILS is not set
CONFIG_CPU_FREQ_DEFAULT_GOV_PERFORMANCE=y
# CONFIG_CPU_FREQ_DEFAULT_GOV_USERSPACE is not set
# CONFIG_CPU_FREQ_DEFAULT_GOV_ONDEMAND is not set
# CONFIG_CPU_FREQ_DEFAULT_GOV_CONSERVATIVE is not set
CONFIG_CPU_FREQ_GOV_PERFORMANCE=y
# CONFIG_CPU_FREQ_GOV_POWERSAVE is not
...

Re: [BUG]: Crash with CONFIG_FAIR_CGROUP_SCHED=y [message #22979 is a reply to message #22978] Fri, 09 November 2007 06:52 Go to previous messageGo to next message
Srivatsa Vaddagiri is currently offline  Srivatsa Vaddagiri
Messages: 241
Registered: August 2006
Senior Member
On Thu, Nov 08, 2007 at 03:48:05PM -0800, sukadev@us.ibm.com wrote:
> With CONFIG_FAIR_CGROUP_SCHED=y, following commands on 2.6.24-rc1 crash
> the system.

Thanks for reporting the problem. It was caused because of the fact that
current task isn't kept in its runqueue in case of sched_fair class
tasks.

With the patch below, I could run ns_exec w/o any crash. Can you pls
verify it works for you as well?

Ingo,
	Once Suka verifies that the patch fixes his crash, I would request you 
to include the same in your tree and route it to Linus.

--

current task is not present in its runqueue in case of sched_fair class
tasks. Take care of this fact in rt_mutex_setprio(),
sched_setscheduler() and sched_move_task() routines.

Signed-off-by : Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>


---
 kernel/sched.c |   45 +++++++++++++++++++++++++--------------------
 1 files changed, 25 insertions(+), 20 deletions(-)

Index: current/kernel/sched.c
===================================================================
--- current.orig/kernel/sched.c
+++ current/kernel/sched.c
@@ -3986,11 +3986,13 @@ void rt_mutex_setprio(struct task_struct
 	oldprio = p->prio;
 	on_rq = p->se.on_rq;
 	running = task_running(rq, p);
-	if (on_rq) {
+	if (on_rq)
 		dequeue_task(rq, p, 0);
-		if (running)
-			p->sched_class->put_prev_task(rq, p);
-	}
+	/* current task is not kept in its runqueue in case of sched_fair class.
+	 * Hence we need the 'on_rq?' and 'running?' tests to be separate.
+	 */
+	if (running)
+		p->sched_class->put_prev_task(rq, p);
 
 	if (rt_prio(prio))
 		p->sched_class = &rt_sched_class;
@@ -3999,9 +4001,9 @@ void rt_mutex_setprio(struct task_struct
 
 	p->prio = prio;
 
+	if (running)
+		p->sched_class->set_curr_task(rq);
 	if (on_rq) {
-		if (running)
-			p->sched_class->set_curr_task(rq);
 		enqueue_task(rq, p, 0);
 		inc_load(rq, p);
 		/*
@@ -4298,18 +4300,20 @@ recheck:
 	update_rq_clock(rq);
 	on_rq = p->se.on_rq;
 	running = task_running(rq, p);
-	if (on_rq) {
+	if (on_rq)
 		deactivate_task(rq, p, 0);
-		if (running)
-			p->sched_class->put_prev_task(rq, p);
-	}
+	/* current task is not kept in its runqueue in case of sched_fair class.
+	 * Hence we need the 'on_rq?' and 'running?' tests to be separate.
+	 */
+	if (running)
+		p->sched_class->put_prev_task(rq, p);
 
 	oldprio = p->prio;
 	__setscheduler(rq, p, policy, param->sched_priority);
 
+	if (running)
+		p->sched_class->set_curr_task(rq);
 	if (on_rq) {
-		if (running)
-			p->sched_class->set_curr_task(rq);
 		activate_task(rq, p, 0);
 		/*
 		 * Reschedule if we are currently running on this runqueue and
@@ -7036,19 +7040,20 @@ void sched_move_task(struct task_struct 
 	running = task_running(rq, tsk);
 	on_rq = tsk->se.on_rq;
 
-	if (on_rq) {
+	if (on_rq)
 		dequeue_task(rq, tsk, 0);
-		if (unlikely(running))
-			tsk->sched_class->put_prev_task(rq, tsk);
-	}
+	/* current task is not kept in its runqueue in case of sched_fair class.
+	 * Hence we need the 'on_rq?' and 'running?' tests to be separate.
+	 */
+	if (unlikely(running))
+		tsk->sched_class->put_prev_task(rq, tsk);
 
 	set_task_cfs_rq(tsk);
 
-	if (on_rq) {
-		if (unlikely(running))
-			tsk->sched_class->set_curr_task(rq);
+	if (unlikely(running))
+		tsk->sched_class->set_curr_task(rq);
+	if (on_rq)
 		enqueue_task(rq, tsk, 0);
-	}
 
 done:
 	task_rq_unlock(rq, &flags);


-- 
Regards,
vatsa
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers
Re: [BUG]: Crash with CONFIG_FAIR_CGROUP_SCHED=y [message #22991 is a reply to message #22979] Fri, 09 November 2007 08:45 Go to previous messageGo to next message
Dmitry Adamushko is currently offline  Dmitry Adamushko
Messages: 20
Registered: June 2007
Junior Member
Hi Srivatsa,

> [ ... ]
> --
>
> current task is not present in its runqueue in case of sched_fair class
> tasks. Take care of this fact in rt_mutex_setprio(),
> sched_setscheduler() and sched_move_task() routines.
>
> Signed-off-by : Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
>
>
> ---
>  kernel/sched.c |   45 +++++++++++++++++++++++++--------------------
>  1 files changed, 25 insertions(+), 20 deletions(-)
>
> Index: current/kernel/sched.c
> ===================================================================
> --- current.orig/kernel/sched.c
> +++ current/kernel/sched.c
> @@ -3986,11 +3986,13 @@ void rt_mutex_setprio(struct task_struct
>         oldprio = p->prio;
>         on_rq = p->se.on_rq;
>         running = task_running(rq, p);
> -       if (on_rq) {
> +       if (on_rq)
>                 dequeue_task(rq, p, 0);
> -               if (running)
> -                       p->sched_class->put_prev_task(rq, p);
> -       }
> +       /* current task is not kept in its runqueue in case of sched_fair class.
> +        * Hence we need the 'on_rq?' and 'running?' tests to be separate.

Humm... the 'current' is not kept within the tree but
current->se.on_rq is supposed to be '1' ,
so the old code looks ok to me (at least for the 'leaf' elements).

Maybe you were able to get more useful oops on your site?


> --
> Regards,
> vatsa
>

-- 
Best regards,
Dmitry Adamushko
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers
Re: [BUG]: Crash with CONFIG_FAIR_CGROUP_SCHED=y [message #22997 is a reply to message #22991] Fri, 09 November 2007 10:04 Go to previous messageGo to next message
Srivatsa Vaddagiri is currently offline  Srivatsa Vaddagiri
Messages: 241
Registered: August 2006
Senior Member
On Fri, Nov 09, 2007 at 09:45:21AM +0100, Dmitry Adamushko wrote:
> Humm... the 'current' is not kept within the tree but
> current->se.on_rq is supposed to be '1' ,
> so the old code looks ok to me (at least for the 'leaf' elements).

You are damned right! Sorry my mistake with the previous analysis and
(as I now find out) testing :(

There are couple of problems discovered by Suka's test:

- The test requires the cgroup filesystem to be mounted with
  atleast the cpu and ns options (i.e both namespace and cpu 
  controllers are active in the same hierarchy). 

	# mkdir /dev/cpuctl
	# mount -t cgroup -ocpu,ns none cpuctl
	(or simply)
	# mount -t cgroup none cpuctl -> Will activate all controllers
					 in same hierarchy.

- The test invokes clone() with CLONE_NEWNS set. This causes a a new child
  to be created, also a new group (do_fork->copy_namespaces->ns_cgroup_clone->
  cgroup_clone) and the child is attached to the new group (cgroup_clone->
  attach_task->sched_move_task). At this point in time, the child's scheduler 
  related fields are uninitialized (including its on_rq field, which it has
  inherited from parent). As a result sched_move_task thinks its on
  runqueue, when it isn't.

  As a solution to this problem, I moved sched_fork() call, which
  initializes scheduler related fields on a new task, before
  copy_namespaces(). I am not sure though whether moving up will
  cause other side-effects. Do you see any issue?

- The second problem exposed by this test is that task_new_fair()
  assumes that parent and child will be part of the same group (which 
  needn't be as this test shows). As a result, cfs_rq->curr can be NULL
  for the child.

  The solution is to test for curr pointer being NULL in
  task_new_fair().


With the patch below, I could run ns_exec() fine w/o a crash.

Suka, can you verify whether this patch fixes your problem?


--

Fix copy_namespace() <-> sched_fork() dependency in do_fork, by moving
up sched_fork().

Also introduce a NULL pointer check for 'curr' in task_new_fair().

Signed-off-by : Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>

---
 kernel/fork.c       |    6 +++---
 kernel/sched_fair.c |    2 +-
 2 files changed, 4 insertions(+), 4 deletions(-)

Index: current/kernel/fork.c
===================================================================
--- current.orig/kernel/fork.c
+++ current/kernel/fork.c
@@ -1121,6 +1121,9 @@ static struct task_struct *copy_process(
 	p->blocked_on = NULL; /* not blocked yet */
 #endif
 
+	/* Perform scheduler related setup. Assign this task to a CPU. */
+	sched_fork(p, clone_flags);
+
 	if ((retval = security_task_alloc(p)))
 		goto bad_fork_cleanup_policy;
 	if ((retval = audit_alloc(p)))
@@ -1210,9 +1213,6 @@ static struct task_struct *copy_process(
 	INIT_LIST_HEAD(&p->ptrace_children);
 	INIT_LIST_HEAD(&p->ptrace_list);
 
-	/* Perform scheduler related setup. Assign this task to a CPU. */
-	sched_fork(p, clone_flags);
-
 	/* Now that the task is set up, run cgroup callbacks if
 	 * necessary. We need to run them before the task is visible
 	 * on the tasklist. */
Index: current/kernel/sched_fair.c
===================================================================
--- current.orig/kernel/sched_fair.c
+++ current/kernel/sched_fair.c
@@ -1023,7 +1023,7 @@ static void task_new_fair(struct rq *rq,
 	place_entity(cfs_rq, se, 1);
 
 	if (sysctl_sched_child_runs_first && this_cpu == task_cpu(p) &&
-			curr->vruntime < se->vruntime) {
+			curr && curr->vruntime < se->vruntime) {
 		/*
 		 * Upon rescheduling, sched_class::put_prev_task() will place
 		 * 'current' within the tree based on its new key value.
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers
Re: [BUG]: Crash with CONFIG_FAIR_CGROUP_SCHED=y [message #23000 is a reply to message #22997] Fri, 09 November 2007 10:25 Go to previous messageGo to next message
Ingo Molnar is currently offline  Ingo Molnar
Messages: 51
Registered: December 2005
Member
* Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com> wrote:

> With the patch below, I could run ns_exec() fine w/o a crash.
> 
> Suka, can you verify whether this patch fixes your problem?

thanks, applied. I'll wait for confirmation from Suka before sending it 
to Linus.

	Ingo
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers
Re: [BUG]: Crash with CONFIG_FAIR_CGROUP_SCHED=y [message #23001 is a reply to message #22997] Fri, 09 November 2007 10:59 Go to previous messageGo to next message
Dmitry Adamushko is currently offline  Dmitry Adamushko
Messages: 20
Registered: June 2007
Junior Member
On 09/11/2007, Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com> wrote:
> [ ... ]
>
>   As a solution to this problem, I moved sched_fork() call, which
>   initializes scheduler related fields on a new task, before
>   copy_namespaces(). I am not sure though whether moving up will
>   cause other side-effects. Do you see any issue?

Should be ok (IMHO and at first glance :-)

> - The second problem exposed by this test is that task_new_fair()
>   assumes that parent and child will be part of the same group (which
>   needn't be as this test shows). As a result, cfs_rq->curr can be NULL
>   for the child.

Would it be better, logically-wise, to use is_same_group() instead?
Although, we can't have 2 groups with cfs_rq->curr != NULL on the same
CPU... so if the child belongs to another group, it's cfs_rq->curr is
automatically NULL indeed.



-- 
Best regards,
Dmitry Adamushko
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers
Re: [BUG]: Crash with CONFIG_FAIR_CGROUP_SCHED=y [message #23006 is a reply to message #23001] Fri, 09 November 2007 12:01 Go to previous messageGo to next message
Srivatsa Vaddagiri is currently offline  Srivatsa Vaddagiri
Messages: 241
Registered: August 2006
Senior Member
On Fri, Nov 09, 2007 at 11:59:15AM +0100, Dmitry Adamushko wrote:
> > - The second problem exposed by this test is that task_new_fair()
> >   assumes that parent and child will be part of the same group (which
> >   needn't be as this test shows). As a result, cfs_rq->curr can be NULL
> >   for the child.
> 
> Would it be better, logically-wise, to use is_same_group() instead?
> Although, we can't have 2 groups with cfs_rq->curr != NULL on the same
> CPU... so if the child belongs to another group, it's cfs_rq->curr is
> automatically NULL indeed.

Yeah ..I feel safe with an explicit !curr check, perhaps with a comment like
below added to explain when curr can be NULL?


---
 kernel/sched_fair.c |    1 +
 1 files changed, 1 insertion(+)

Index: current/kernel/sched_fair.c
===================================================================
--- current.orig/kernel/sched_fair.c
+++ current/kernel/sched_fair.c
@@ -1022,6 +1022,7 @@ static void task_new_fair(struct rq *rq,
 	update_curr(cfs_rq);
 	place_entity(cfs_rq, se, 1);
 
+	/* 'curr' will be NULL if the child belongs to a different group */
 	if (sysctl_sched_child_runs_first && this_cpu == task_cpu(p) &&
 			curr && curr->vruntime < se->vruntime) {
 		/*



-- 
Regards,
vatsa
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers
Re: [BUG]: Crash with CONFIG_FAIR_CGROUP_SCHED=y [message #23022 is a reply to message #22997] Fri, 09 November 2007 16:05 Go to previous messageGo to next message
serue is currently offline  serue
Messages: 750
Registered: February 2006
Senior Member
Quoting Srivatsa Vaddagiri (vatsa@linux.vnet.ibm.com):
> On Fri, Nov 09, 2007 at 09:45:21AM +0100, Dmitry Adamushko wrote:
> > Humm... the 'current' is not kept within the tree but
> > current->se.on_rq is supposed to be '1' ,
> > so the old code looks ok to me (at least for the 'leaf' elements).
> 
> You are damned right! Sorry my mistake with the previous analysis and
> (as I now find out) testing :(
> 
> There are couple of problems discovered by Suka's test:
> 
> - The test requires the cgroup filesystem to be mounted with
>   atleast the cpu and ns options (i.e both namespace and cpu 
>   controllers are active in the same hierarchy). 
> 
> 	# mkdir /dev/cpuctl
> 	# mount -t cgroup -ocpu,ns none cpuctl
> 	(or simply)
> 	# mount -t cgroup none cpuctl -> Will activate all controllers
> 					 in same hierarchy.
> 
> - The test invokes clone() with CLONE_NEWNS set. This causes a a new child
>   to be created, also a new group (do_fork->copy_namespaces->ns_cgroup_clone->
>   cgroup_clone) and the child is attached to the new group (cgroup_clone->
>   attach_task->sched_move_task). At this point in time, the child's scheduler 
>   related fields are uninitialized (including its on_rq field, which it has
>   inherited from parent). As a result sched_move_task thinks its on
>   runqueue, when it isn't.
> 
>   As a solution to this problem, I moved sched_fork() call, which
>   initializes scheduler related fields on a new task, before
>   copy_namespaces(). I am not sure though whether moving up will
>   cause other side-effects. Do you see any issue?
> 
> - The second problem exposed by this test is that task_new_fair()
>   assumes that parent and child will be part of the same group (which 
>   needn't be as this test shows). As a result, cfs_rq->curr can be NULL
>   for the child.
> 
>   The solution is to test for curr pointer being NULL in
>   task_new_fair().
> 
> 
> With the patch below, I could run ns_exec() fine w/o a crash.
> 
> Suka, can you verify whether this patch fixes your problem?

Works on my machine.  Thanks!

> --
> 
> Fix copy_namespace() <-> sched_fork() dependency in do_fork, by moving
> up sched_fork().
> 
> Also introduce a NULL pointer check for 'curr' in task_new_fair().
> 
> Signed-off-by : Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>

Tested-by: Serge Hallyn <serue@us.ibm.com>

> 
> ---
>  kernel/fork.c       |    6 +++---
>  kernel/sched_fair.c |    2 +-
>  2 files changed, 4 insertions(+), 4 deletions(-)
> 
> Index: current/kernel/fork.c
> ===================================================================
> --- current.orig/kernel/fork.c
> +++ current/kernel/fork.c
> @@ -1121,6 +1121,9 @@ static struct task_struct *copy_process(
>  	p->blocked_on = NULL; /* not blocked yet */
>  #endif
> 
> +	/* Perform scheduler related setup. Assign this task to a CPU. */
> +	sched_fork(p, clone_flags);
> +
>  	if ((retval = security_task_alloc(p)))
>  		goto bad_fork_cleanup_policy;
>  	if ((retval = audit_alloc(p)))
> @@ -1210,9 +1213,6 @@ static struct task_struct *copy_process(
>  	INIT_LIST_HEAD(&p->ptrace_children);
>  	INIT_LIST_HEAD(&p->ptrace_list);
> 
> -	/* Perform scheduler related setup. Assign this task to a CPU. */
> -	sched_fork(p, clone_flags);
> -
>  	/* Now that the task is set up, run cgroup callbacks if
>  	 * necessary. We need to run them before the task is visible
>  	 * on the tasklist. */
> Index: current/kernel/sched_fair.c
> ===================================================================
> --- current.orig/kernel/sched_fair.c
> +++ current/kernel/sched_fair.c
> @@ -1023,7 +1023,7 @@ static void task_new_fair(struct rq *rq,
>  	place_entity(cfs_rq, se, 1);
> 
>  	if (sysctl_sched_child_runs_first && this_cpu == task_cpu(p) &&
> -			curr->vruntime < se->vruntime) {
> +			curr && curr->vruntime < se->vruntime) {
>  		/*
>  		 * Upon rescheduling, sched_class::put_prev_task() will place
>  		 * 'current' within the tree based on its new key value.
> _______________________________________________
> Containers mailing list
> Containers@lists.linux-foundation.org
> https://lists.linux-foundation.org/mailman/listinfo/containers
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers
Re: [BUG]: Crash with CONFIG_FAIR_CGROUP_SCHED=y [message #23039 is a reply to message #23022] Sat, 10 November 2007 23:13 Go to previous message
Sukadev Bhattiprolu is currently offline  Sukadev Bhattiprolu
Messages: 413
Registered: August 2006
Senior Member
Serge E. Hallyn [serue@us.ibm.com] wrote:
| Quoting Srivatsa Vaddagiri (vatsa@linux.vnet.ibm.com):
| > On Fri, Nov 09, 2007 at 09:45:21AM +0100, Dmitry Adamushko wrote:
| > > Humm... the 'current' is not kept within the tree but
| > > current->se.on_rq is supposed to be '1' ,
| > > so the old code looks ok to me (at least for the 'leaf' elements).
| > 
| > You are damned right! Sorry my mistake with the previous analysis and
| > (as I now find out) testing :(
| > 
| > There are couple of problems discovered by Suka's test:
| > 
| > - The test requires the cgroup filesystem to be mounted with
| >   atleast the cpu and ns options (i.e both namespace and cpu 
| >   controllers are active in the same hierarchy). 
| > 
| > 	# mkdir /dev/cpuctl
| > 	# mount -t cgroup -ocpu,ns none cpuctl
| > 	(or simply)
| > 	# mount -t cgroup none cpuctl -> Will activate all controllers
| > 					 in same hierarchy.
| > 
| > - The test invokes clone() with CLONE_NEWNS set. This causes a a new child
| >   to be created, also a new group (do_fork->copy_namespaces->ns_cgroup_clone->
| >   cgroup_clone) and the child is attached to the new group (cgroup_clone->
| >   attach_task->sched_move_task). At this point in time, the child's scheduler 
| >   related fields are uninitialized (including its on_rq field, which it has
| >   inherited from parent). As a result sched_move_task thinks its on
| >   runqueue, when it isn't.
| > 
| >   As a solution to this problem, I moved sched_fork() call, which
| >   initializes scheduler related fields on a new task, before
| >   copy_namespaces(). I am not sure though whether moving up will
| >   cause other side-effects. Do you see any issue?
| > 
| > - The second problem exposed by this test is that task_new_fair()
| >   assumes that parent and child will be part of the same group (which 
| >   needn't be as this test shows). As a result, cfs_rq->curr can be NULL
| >   for the child.
| > 
| >   The solution is to test for curr pointer being NULL in
| >   task_new_fair().
| > 
| > 
| > With the patch below, I could run ns_exec() fine w/o a crash.
| > 
| > Suka, can you verify whether this patch fixes your problem?
| 
| Works on my machine.  Thanks!

And mine too.  Thanks,


| 
| > --
| > 
| > Fix copy_namespace() <-> sched_fork() dependency in do_fork, by moving
| > up sched_fork().
| > 
| > Also introduce a NULL pointer check for 'curr' in task_new_fair().
| > 
| > Signed-off-by : Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
| 
| Tested-by: Serge Hallyn <serue@us.ibm.com>
Tested-by: Sukadev Bhattiprolu <sukadev@us.ibm.com>
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers
Previous Topic: [RFC][PATCH][LLC] Use existing sock refcnt debugging
Next Topic: [PATCH][IPV4] Remove bugus goto-s from ip_route_input_slow
Goto Forum:
  


Current Time: Sun Oct 26 20:09:48 GMT 2025

Total time taken to generate the page: 0.14374 seconds