OpenVZ Forum


Home » Mailing lists » Devel » [PATCH 1/2] namespaces: introduce sys_hijack (v10)
Re: [PATCH 1/2] namespaces: introduce sys_hijack (v10) [message #23812 is a reply to message #23792] Tue, 27 November 2007 06:58 Go to previous messageGo to previous message
Crispin Cowan is currently offline  Crispin Cowan
Messages: 8
Registered: October 2007
Junior Member
Just the name "sys_hijack" makes me concerned.

This post describes a bunch of "what", but doesn't tell us about "why"
we would want this. What is it for?

And I second Casey's concern about careful management of the privilege
required to "hijack" a process.

Crispin

Mark Nelson wrote:
> Here's the latest version of sys_hijack.
> Apologies for its lateness.
>
> Thanks!
>
> Mark.
>
> Subject: [PATCH 1/2] namespaces: introduce sys_hijack (v10)
>
> Move most of do_fork() into a new do_fork_task() which acts on
> a new argument, task, rather than on current.  do_fork() becomes
> a call to do_fork_task(current, ...).
>
> Introduce sys_hijack (for i386 and s390 only so far).  It is like
> clone, but in place of a stack pointer (which is assumed null) it
> accepts a pid.  The process identified by that pid is the one
> which is actually cloned.  Some state - including the file
> table, the signals and sighand (and hence tty), and the ->parent
> are taken from the calling process.
>
> A process to be hijacked may be identified by process id, in the
> case of HIJACK_PID.  Alternatively, in the case of HIJACK_CG an
> open fd for a cgroup 'tasks' file may be specified.  The first
> available task in that cgroup will then be hijacked.
>
> HIJACK_NS is implemented as a third hijack method.  The main
> purpose is to allow entering an empty cgroup without having
> to keep a task alive in the target cgroup.  When HIJACK_NS
> is called, only the cgroup and nsproxy are copied from the
> cgroup.  Security, user, and rootfs info is not retained
> in the cgroups and so cannot be copied to the child task.
>
> In order to hijack a process, the calling process must be
> allowed to ptrace the target.
>
> Sending sigstop to the hijacked task can trick its parent shell
> (if it is a shell foreground task) into thinking it should retake
> its tty.
>
> So try not sending SIGSTOP, and instead hold the task_lock over
> the hijacked task throughout the do_fork_task() operation.
> This is really dangerous.  I've fixed cgroup_fork() to not
> task_lock(task) in the hijack case, but there may well be other
> code called during fork which can under "some circumstances"
> task_lock(task).
>
> Still, this is working for me.
>
> The effect is a sort of namespace enter.  The following program
> uses sys_hijack to 'enter' all namespaces of the specified task.
> For instance in one terminal, do
>
> 	mount -t cgroup -ons cgroup /cgroup
> 	hostname
> 	  qemu
> 	ns_exec -u /bin/sh
> 	  hostname serge
>           echo $$
>             1073
> 	  cat /proc/$$/cgroup
> 	    ns:/node_1073
>
> In another terminal then do
>
> 	hostname
> 	  qemu
> 	cat /proc/$$/cgroup
> 	  ns:/
> 	hijack pid 1073
> 	  hostname
> 	    serge
> 	  cat /proc/$$/cgroup
> 	    ns:/node_1073
> 	hijack cgroup /cgroup/node_1073/tasks
>
> Changelog:
> 	Aug 23: send a stop signal to the hijacked process
> 		(like ptrace does).
> 	Oct 09: Update for 2.6.23-rc8-mm2 (mainly pidns)
> 		Don't take task_lock under rcu_read_lock
> 		Send hijacked process to cgroup_fork() as
> 		the first argument.
> 		Removed some unneeded task_locks.
> 	Oct 16: Fix bug introduced into alloc_pid.
> 	Oct 16: Add 'int which' argument to sys_hijack to
> 		allow later expansion to use cgroup in place
> 		of pid to specify what to hijack.
> 	Oct 24: Implement hijack by open cgroup file.
> 	Nov 02: Switch copying of task info: do full copy
> 		from current, then copy relevant pieces from
> 		hijacked task.
> 	Nov 06: Verbatim task_struct copy now comes from current,
> 		after which copy_hijackable_taskinfo() copies
> 		relevant context pieces from the hijack source.
> 	Nov 07: Move arch-independent hijack code to kernel/fork.c
> 	Nov 07: powerpc and x86_64 support (Mark Nelson)
> 	Nov 07: Don't allow hijacking members of same session.
> 	Nov 07: introduce cgroup_may_hijack, and may_hijack hook to
> 		cgroup subsystems.  The ns subsystem uses this to
> 		enforce the rule that one may only hijack descendent
> 		namespaces.
> 	Nov 07: s390 support
> 	Nov 08: don't send SIGSTOP to hijack source task
> 	Nov 10: cache reference to nsproxy in ns cgroup for use in
> 		hijacking an empty cgroup.
> 	Nov 10: allow partial hijack of empty cgroup
> 	Nov 13: don't double-get cgroup for hijack_ns
> 		find_css_set() actually returns the set with a
> 		reference already held, so cgroup_fork_fromcgroup()
> 		by doing a get_css_set() was getting a second
> 		reference.  Therefore after exiting the hijack
> 		task we could not rmdir the csgroup.
> 	Nov 22: temporarily remove x86_64 and powerpc support
> 	Nov 27: rebased on 2.6.24-rc3
>
> ==============================================================
> hijack.c
> ==============================================================
> /*
>  * Your options are:
>  *	hijack pid 1078
>  *	hijack cgroup /cgroup/node_1078/tasks
>  *	hijack ns /cgroup/node_1078/tasks
>  */
>
> #define _BSD_SOURCE
> #include <unistd.h>
> #include <sys/syscall.h>
> #include <sys/types.h>
> #include <sys/wait.h>
> #include <sys/stat.h>
> #include <fcntl.h>
> #include <sched.h>
> #include <stdio.h>
> #include <stdlib.h>
> #include <string.h>
>
> #if __i386__
> #    define __NR_hijack		325
> #elif __s390x__
> #    define __NR_hijack		319
> #else
> #    error "Architecture not supported"
> #endif
>
> #ifndef CLONE_NEWUTS
> #define CLONE_NEWUTS 0x04000000
> #endif
>
> void usage(char *me)
> {
> 	printf("Usage: %s pid <pid>\n", me);
> 	printf("     | %s cgroup <cgroup_tasks_file>\n", me);
> 	printf("     | %s ns <cgroup_tasks_file>\n", me);
> 	exit(1);
> }
>
> int exec_shell(void)
> {
> 	execl("/bin/sh", "/bin/sh", NULL);
> }
>
> #define HIJACK_PID 1
> #define HIJACK_CG 2
> #define HIJACK_NS 3
>
> int main(int argc, char *argv[])
> {
> 	int id;
> 	int ret;
> 	int status;
> 	int which_hijack;
>
> 	if (argc < 3 || !strcmp(argv[1], "-h"))
> 		usage(argv[0]);
> 	if (strcmp(argv[1], "cgroup") == 0)
> 		which_hijack = HIJACK_CG;
> 	else if (strcmp(argv[1], "ns") == 0)
> 		which_hijack = HIJACK_NS;
> 	else
> 		which_hijack = HIJACK_PID;
>
> 	switch(which_hijack) {
> 		case HIJACK_PID:
> 			id = atoi(argv[2]);
> 			printf("hijacking pid %d\n", id);
> 			break;
> 		case HIJACK_CG:
> 		case HIJACK_NS:
> 			id = open(argv[2], O_RDONLY);
> 			if (id == -1) {
> 				perror("cgroup open");
> 				return 1;
> 			}
> 			break;
> 	}
>
> 	ret = syscall(__NR_hijack, SIGCHLD, which_hijack, (unsigned long)id);
>
> 	if (which_hijack != HIJACK_PID)
> 		close(id);
> 	if  (ret == 0) {
> 		return exec_shell();
> 	} else if (ret < 0) {
> 		perror("sys_hijack");
> 	} else {
> 		printf("waiting on cloned process %d\n", ret);
> 		while(waitpid(-1, &status, __WALL) != -1)
> 				;
> 		printf("cloned process exited with %d (waitpid ret %d)\n",
> 				status, ret);
> 	}
>
> 	return ret;
> }
> ==============================================================
>
> Signed-off-by: Serge Hallyn <serue@us.ibm.com>
> Signed-off-by: Mark Nelson <markn@au1.ibm.com>
> ---
>  Documentation/cgroups.txt          |    9 +
>  arch/s390/kernel/process.c         |   21 +++
>  arch/x86/kernel/process_32.c       |   24 ++++
>  arch/x86/kernel/syscall_table_32.S |    1 
>  include/asm-x86/unistd_32.h        |    3 
>  include/linux/cgroup.h             |   28 ++++-
>  include/linux/nsproxy.h            |   12 +-
>  include/linux/ptrace.h             |    1 
>  include/linux/sched.h              |   19 +++
>  include/linux/syscalls.h           |    2 
>  kernel/cgroup.c                    |  133 +++++++++++++++++++++++-
>  kernel/fork.c                      |  201 ++++++++++++++++++++++++++++++++++---
>  kernel/ns_cgroup.c                 |   88 +++++++++++++++-
>  kernel/nsproxy.c                   |    4 
>  kernel/ptrace.c                    |    7 +
>  15 files changed, 523 insertions(+), 30 deletions(-)
>
> Index: upstream/arch/s390/kernel/process.c
> ===================================================================
> --- upstream.orig/arch/s390/kernel/process.c
> +++ upstream/arch/s390/kernel/process.c
> @@ -321,6 +321,27 @@ asmlinkage long sys_clone(void)
>  		       parent_tidptr, child_tidptr);
>  }
>  
> +asmlinkage long sys_hijack(void)
> +{
> +	struct pt_regs *regs = task_pt_regs(current);
> +	unsigned long sp = regs->orig_gpr2;
> +	unsigned long clone_flags = regs->gprs[3];
> +	int which = regs->gprs[4];
> +	unsigned int fd;
> +	pid_t pid;
> +
> +	switch (which) {
> +	case HIJACK_PID:
> +		pid = regs->gprs[5];
> +		return hijack_pid(pid, clone_flags, *regs, sp);
> +	case HIJACK_CGROUP:
> +		fd = (unsigned int) regs->gprs[5];
> +		return hijack_cgroup(fd, clone_flags, *regs, sp);
> +	default:
> +		return -EINVAL;
> +	}
> +}
> +
>  /*
>   * This is trivial, and on the face of it looks like it
>   * could equally well be done in user mode.
> Index: upstream/arch/x86/kernel/process_32.c
> ===================================================================
> --- upstream.ori
...

 
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Previous Topic: [PATCH 2.6.25] net: removes unnecessary dependencies for net_namespace.h
Next Topic: [PATCH] AB-BA deadlock in drop_caches sysctl (resend, the one sent was for 2.6.18)
Goto Forum:
  


Current Time: Sat Sep 14 16:13:12 GMT 2024

Total time taken to generate the page: 0.03855 seconds