OpenVZ Forum: Devel » [PATCH 1/2] namespaces: introduce sys

Home » Mailing lists » Devel » [PATCH 1/2] namespaces: introduce sys_hijack (v10)

Show: Today's Messages :: Show Polls :: Message Navigator
E-mail to friend

Re: [PATCH 1/2] namespaces: introduce sys_hijack (v10) [message #23834 is a reply to message #23812]

Tue, 27 November 2007 16:11

serue
Messages: 750
Registered: February 2006

Senior Member

Quoting Crispin Cowan (crispin@crispincowan.com):
> Just the name "sys_hijack" makes me concerned.
> 
> This post describes a bunch of "what", but doesn't tell us about "why"
> we would want this. What is it for?

Please see my response to Casey's email.

> And I second Casey's concern about careful management of the privilege
> required to "hijack" a process.

Absolutely.  We're definately still in RFC territory.

Note that there are currently several proposed (but no upstream) ways to
accomplish entering a namespace:

	1. bind_ns() is a new pair of syscalls proposed by Cedric.  An
	nsproxy is given an integer id.  The id can be used to enter
	an nsproxy, basically a straight current->nsproxy = target_nsproxy;

	2. I had previously posted a patchset on top of the nsproxy
	cgroup which allowed entering a nsproxy through the ns cgroup
	interface.

There are objections to both those patchsets because simply switching a
task's nsproxy using a syscall or file write in the middle of running a
binary is quite unsafe.  Eric Biederman had suggested using ptrace or
something like it to accomplish the goal.

Just using ptrace is however not safe either.  You are inheriting *all*
of the target's context, so it shouldn't be difficult for a nefarious
container/vserver admin to trick the host admin into running something
which gives the container/vserver admin full access to the host.

That's where the hijack idea came from.  Yes, I called it hijack to make
sure alarm bells went off :) bc it's definately still worrisome.  But at
this point I believe it is the safest solution suggested so far.

-serge

> Crispin
> 
> Mark Nelson wrote:
> > Here's the latest version of sys_hijack.
> > Apologies for its lateness.
> >
> > Thanks!
> >
> > Mark.
> >
> > Subject: [PATCH 1/2] namespaces: introduce sys_hijack (v10)
> >
> > Move most of do_fork() into a new do_fork_task() which acts on
> > a new argument, task, rather than on current.  do_fork() becomes
> > a call to do_fork_task(current, ...).
> >
> > Introduce sys_hijack (for i386 and s390 only so far).  It is like
> > clone, but in place of a stack pointer (which is assumed null) it
> > accepts a pid.  The process identified by that pid is the one
> > which is actually cloned.  Some state - including the file
> > table, the signals and sighand (and hence tty), and the ->parent
> > are taken from the calling process.
> >
> > A process to be hijacked may be identified by process id, in the
> > case of HIJACK_PID.  Alternatively, in the case of HIJACK_CG an
> > open fd for a cgroup 'tasks' file may be specified.  The first
> > available task in that cgroup will then be hijacked.
> >
> > HIJACK_NS is implemented as a third hijack method.  The main
> > purpose is to allow entering an empty cgroup without having
> > to keep a task alive in the target cgroup.  When HIJACK_NS
> > is called, only the cgroup and nsproxy are copied from the
> > cgroup.  Security, user, and rootfs info is not retained
> > in the cgroups and so cannot be copied to the child task.
> >
> > In order to hijack a process, the calling process must be
> > allowed to ptrace the target.
> >
> > Sending sigstop to the hijacked task can trick its parent shell
> > (if it is a shell foreground task) into thinking it should retake
> > its tty.
> >
> > So try not sending SIGSTOP, and instead hold the task_lock over
> > the hijacked task throughout the do_fork_task() operation.
> > This is really dangerous.  I've fixed cgroup_fork() to not
> > task_lock(task) in the hijack case, but there may well be other
> > code called during fork which can under "some circumstances"
> > task_lock(task).
> >
> > Still, this is working for me.
> >
> > The effect is a sort of namespace enter.  The following program
> > uses sys_hijack to 'enter' all namespaces of the specified task.
> > For instance in one terminal, do
> >
> > 	mount -t cgroup -ons cgroup /cgroup
> > 	hostname
> > 	  qemu
> > 	ns_exec -u /bin/sh
> > 	  hostname serge
> >           echo $$
> >             1073
> > 	  cat /proc/$$/cgroup
> > 	    ns:/node_1073
> >
> > In another terminal then do
> >
> > 	hostname
> > 	  qemu
> > 	cat /proc/$$/cgroup
> > 	  ns:/
> > 	hijack pid 1073
> > 	  hostname
> > 	    serge
> > 	  cat /proc/$$/cgroup
> > 	    ns:/node_1073
> > 	hijack cgroup /cgroup/node_1073/tasks
> >
> > Changelog:
> > 	Aug 23: send a stop signal to the hijacked process
> > 		(like ptrace does).
> > 	Oct 09: Update for 2.6.23-rc8-mm2 (mainly pidns)
> > 		Don't take task_lock under rcu_read_lock
> > 		Send hijacked process to cgroup_fork() as
> > 		the first argument.
> > 		Removed some unneeded task_locks.
> > 	Oct 16: Fix bug introduced into alloc_pid.
> > 	Oct 16: Add 'int which' argument to sys_hijack to
> > 		allow later expansion to use cgroup in place
> > 		of pid to specify what to hijack.
> > 	Oct 24: Implement hijack by open cgroup file.
> > 	Nov 02: Switch copying of task info: do full copy
> > 		from current, then copy relevant pieces from
> > 		hijacked task.
> > 	Nov 06: Verbatim task_struct copy now comes from current,
> > 		after which copy_hijackable_taskinfo() copies
> > 		relevant context pieces from the hijack source.
> > 	Nov 07: Move arch-independent hijack code to kernel/fork.c
> > 	Nov 07: powerpc and x86_64 support (Mark Nelson)
> > 	Nov 07: Don't allow hijacking members of same session.
> > 	Nov 07: introduce cgroup_may_hijack, and may_hijack hook to
> > 		cgroup subsystems.  The ns subsystem uses this to
> > 		enforce the rule that one may only hijack descendent
> > 		namespaces.
> > 	Nov 07: s390 support
> > 	Nov 08: don't send SIGSTOP to hijack source task
> > 	Nov 10: cache reference to nsproxy in ns cgroup for use in
> > 		hijacking an empty cgroup.
> > 	Nov 10: allow partial hijack of empty cgroup
> > 	Nov 13: don't double-get cgroup for hijack_ns
> > 		find_css_set() actually returns the set with a
> > 		reference already held, so cgroup_fork_fromcgroup()
> > 		by doing a get_css_set() was getting a second
> > 		reference.  Therefore after exiting the hijack
> > 		task we could not rmdir the csgroup.
> > 	Nov 22: temporarily remove x86_64 and powerpc support
> > 	Nov 27: rebased on 2.6.24-rc3
> >
> > ==============================================================
> > hijack.c
> > ==============================================================
> > /*
> >  * Your options are:
> >  *	hijack pid 1078
> >  *	hijack cgroup /cgroup/node_1078/tasks
> >  *	hijack ns /cgroup/node_1078/tasks
> >  */
> >
> > #define _BSD_SOURCE
> > #include <unistd.h>
> > #include <sys/syscall.h>
> > #include <sys/types.h>
> > #include <sys/wait.h>
> > #include <sys/stat.h>
> > #include <fcntl.h>
> > #include <sched.h>
> > #include <stdio.h>
> > #include <stdlib.h>
> > #include <string.h>
> >
> > #if __i386__
> > #    define __NR_hijack		325
> > #elif __s390x__
> > #    define __NR_hijack		319
> > #else
> > #    error "Architecture not supported"
> > #endif
> >
> > #ifndef CLONE_NEWUTS
> > #define CLONE_NEWUTS 0x04000000
> > #endif
> >
> > void usage(char *me)
> > {
> > 	printf("Usage: %s pid <pid>\n", me);
> > 	printf("     | %s cgroup <cgroup_tasks_file>\n", me);
> > 	printf("     | %s ns <cgroup_tasks_file>\n", me);
> > 	exit(1);
> > }
> >
> > int exec_shell(void)
> > {
> > 	execl("/bin/sh", "/bin/sh", NULL);
> > }
> >
> > #define HIJACK_PID 1
> > #define HIJACK_CG 2
> > #define HIJACK_NS 3
> >
> > int main(int argc, char *argv[])
> > {
> > 	int id;
> > 	int ret;
> > 	int status;
> > 	int which_hijack;
> >
> > 	if (argc < 3 || !strcmp(argv[1], "-h"))
> > 		usage(argv[0]);
> > 	if (strcmp(argv[1], "cgroup") == 0)
> > 		which_hijack = HIJACK_CG;
> > 	else if (strcmp(argv[1], "ns") == 0)
> > 		which_hijack = HIJACK_NS;
> > 	else
> > 		which_hijack = HIJACK_PID;
> >
> > 	switch(which_hijack) {
> > 		case HIJACK_PID:
> > 			id = atoi(argv[2]);
> > 			printf("hijacking pid %d\n", id);
> > 			break;
> > 		case HIJACK_CG:
> > 		case HIJACK_NS:
> > 			id = open(argv[2], O_RDONLY);
> > 			if (id == -1) {
> > 				perror("cgroup open");
> > 				return 1;
> > 			}
> > 			break;
> > 	}
> >
> > 	ret = syscall(__NR_hijack, SIGCHLD, which_hijack, (unsigned long)id);
> >
> > 	if (which_hijack != HIJACK_PID)
> > 		close(id);
> > 	if  (ret == 0) {
> > 		return exec_shell();
> > 	} else if (ret < 0) {
> > 		perror("sys_hijack");
> > 	} else {
> > 		printf("waiting on cloned process %d\n", ret);
> > 		while(waitpid(-1, &status, __WALL) != -1)
> > 				;
> >

...

[ Show the rest of the message ]

Report message to a moderator

[Message index]

		[PATCH 1/2] namespaces: introduce sys_hijack (v10) By: Mark Nelson on Tue, 27 November 2007 01:54
		[PATCH 2/2] hijack: update task_alloc_security By: Mark Nelson on Tue, 27 November 2007 02:00
		Re: [PATCH 2/2] hijack: update task_alloc_security By: serue on Tue, 27 November 2007 15:43
		Re: [PATCH 2/2] hijack: update task_alloc_security By: Crispin Cowan on Wed, 28 November 2007 05:50
		Re: [PATCH 2/2] hijack: update task_alloc_security By: serue on Wed, 28 November 2007 14:54
		Re: [PATCH 2/2] hijack: update task_alloc_security By: Crispin Cowan on Thu, 29 November 2007 04:21
		Re: [PATCH 2/2] hijack: update task_alloc_security By: serue on Thu, 29 November 2007 15:38
		Re: [PATCH 2/2] hijack: update task_alloc_security By: Crispin Cowan on Sun, 02 December 2007 01:07
		Re: [PATCH 2/2] hijack: update task_alloc_security By: serue on Mon, 03 December 2007 14:50
		Re: [PATCH 2/2] hijack: update task_alloc_security By: Crispin Cowan on Mon, 03 December 2007 19:43
		Re: [PATCH 2/2] hijack: update task_alloc_security By: serue on Tue, 27 November 2007 15:50
		Re: [PATCH 2/2] hijack: update task_alloc_security By: serue on Tue, 27 November 2007 16:01
		Re: [PATCH 2/2] hijack: update task_alloc_security By: Crispin Cowan on Wed, 28 November 2007 05:53
		Re: [PATCH 2/2] hijack: update task_alloc_security By: serue on Wed, 28 November 2007 14:57
		Re: [PATCH 2/2] hijack: update task_alloc_security By: serue on Tue, 27 November 2007 17:05
		Re: [PATCH 2/2] hijack: update task_alloc_security By: Casey Schaufler on Tue, 27 November 2007 05:04
		Re: [PATCH 2/2] hijack: update task_alloc_security By: Joshua Brindle on Tue, 27 November 2007 05:52
		Re: [PATCH 2/2] hijack: update task_alloc_security By: Stephen Smalley on Tue, 27 November 2007 14:36
		Re: [PATCH 2/2] hijack: update task_alloc_security By: rodrigo on Tue, 27 November 2007 11:08
		Re: [PATCH 1/2] namespaces: introduce sys_hijack (v10) By: Crispin Cowan on Tue, 27 November 2007 06:58
		Re: [PATCH 1/2] namespaces: introduce sys_hijack (v10) By: serue on Tue, 27 November 2007 16:11
		Re: [PATCH 1/2] namespaces: introduce sys_hijack (v10) By: Stephen Smalley on Tue, 27 November 2007 18:09
		Re: [PATCH 1/2] namespaces: introduce sys_hijack (v10) By: serue on Tue, 27 November 2007 22:38
		Re: [PATCH 1/2] namespaces: introduce sys_hijack (v10) By: Casey Schaufler on Tue, 27 November 2007 22:54
		Re: [PATCH 1/2] namespaces: introduce sys_hijack (v10) By: serue on Wed, 28 November 2007 14:25
		Re: [PATCH 1/2] namespaces: introduce sys_hijack (v10) By: Stephen Smalley on Wed, 28 November 2007 15:00
		Re: [PATCH 1/2] namespaces: introduce sys_hijack (v10) By: serue on Wed, 28 November 2007 15:23
		Re: [PATCH 1/2] namespaces: introduce sys_hijack (v10) By: Mark Nelson on Fri, 30 November 2007 02:08
		Re: [PATCH 1/2] namespaces: introduce sys_hijack (v10) By: Paul Menage on Fri, 30 November 2007 02:10
		Re: [PATCH 1/2] namespaces: introduce sys_hijack (v10) By: serue on Fri, 30 November 2007 14:50
		Re: [PATCH 1/2] namespaces: introduce sys_hijack (v10) By: ebiederm on Fri, 30 November 2007 02:37
		Re: [PATCH 1/2] namespaces: introduce sys_hijack (v10) By: serue on Fri, 30 November 2007 14:50
		Re: [PATCH 1/2] namespaces: introduce sys_hijack (v10) By: ebiederm on Fri, 30 November 2007 22:09

Previous Topic:	[PATCH 2.6.25] net: removes unnecessary dependencies for net_namespace.h
Next Topic:	[PATCH] AB-BA deadlock in drop_caches sysctl (resend, the one sent was for 2.6.18)

Goto Forum:

-=] Back to Top [=-

[ Syndicate this forum (XML) ] [

] [

]

Current Time: Wed Jul 15 23:06:52 GMT 2026

Total time taken to generate the page: 0.27524 seconds