OpenVZ Forum


Home » Mailing lists » Devel » [PATCH 0/3] clone64() and unshare64() system calls
[PATCH 0/3] clone64() and unshare64() system calls [message #29261] Wed, 09 April 2008 22:26 Go to next message
Sukadev Bhattiprolu is currently offline  Sukadev Bhattiprolu
Messages: 413
Registered: August 2006
Senior Member
This is a resend of the patch set Cedric had sent earlier. I ported
the patch set to 2.6.25-rc8-mm1 and tested on x86 and x86_64.
---

We have run out of the 32 bits in clone_flags !

This patchset introduces 2 new system calls which support 64bit clone-flags.

     long sys_clone64(unsigned long flags_high, unsigned long flags_low,
		unsigned long newsp);

     long sys_unshare64(unsigned long flags_high, unsigned long flags_low);

The current version of clone64() does not support CLONE_PARENT_SETTID and 
CLONE_CHILD_CLEARTID because we would exceed the 6 registers limit of some 
arches. It's possible to get around this limitation but we might not
need it as we already have clone()

This is work in progress but already includes support for x86, x86_64, 
x86_64(32), ppc64, ppc64(32), s390x, s390x(31). 

ia64 already supports 64bits clone flags through the clone2() syscall.
should we harmonize the name to clone2 ?  
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers
[PATCH 1/3] change clone_flags type to u64 [message #29262 is a reply to message #29261] Wed, 09 April 2008 22:32 Go to previous messageGo to next message
Sukadev Bhattiprolu is currently offline  Sukadev Bhattiprolu
Messages: 413
Registered: August 2006
Senior Member
From: Sukadev Bhattiprolu <sukadev@us.ibm.com>
Subject: [lxc-dev] [patch -lxc 1/3] change clone_flags type to u64

This is a preliminary patch changing the clone_flags type to 64bits
for all the routines called by do_fork(). 

It prepares ground for the next patch which introduces an enhanced 
version of clone() supporting 64bits flags.

This is work in progress. All conversions might not be done yet.

Signed-off-by: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Sukadev Bhattiprolu <sukadev@us.ibm.com>
---
 arch/alpha/kernel/process.c         |    2 +-
 arch/arm/kernel/process.c           |    2 +-
 arch/avr32/kernel/process.c         |    2 +-
 arch/blackfin/kernel/process.c      |    2 +-
 arch/cris/arch-v10/kernel/process.c |    2 +-
 arch/cris/arch-v32/kernel/process.c |    2 +-
 arch/frv/kernel/process.c           |    2 +-
 arch/h8300/kernel/process.c         |    2 +-
 arch/ia64/ia32/sys_ia32.c           |    2 +-
 arch/ia64/kernel/process.c          |    2 +-
 arch/m32r/kernel/process.c          |    2 +-
 arch/m68k/kernel/process.c          |    2 +-
 arch/m68knommu/kernel/process.c     |    2 +-
 arch/mips/kernel/process.c          |    2 +-
 arch/mn10300/kernel/process.c       |    2 +-
 arch/parisc/kernel/process.c        |    2 +-
 arch/powerpc/kernel/process.c       |    2 +-
 arch/s390/kernel/process.c          |    2 +-
 arch/sh/kernel/process_32.c         |    2 +-
 arch/sh/kernel/process_64.c         |    2 +-
 arch/sparc/kernel/process.c         |    2 +-
 arch/sparc64/kernel/process.c       |    2 +-
 arch/um/kernel/process.c            |    2 +-
 arch/v850/kernel/process.c          |    2 +-
 arch/x86/kernel/process_32.c        |    2 +-
 arch/x86/kernel/process_64.c        |    2 +-
 arch/xtensa/kernel/process.c        |    2 +-
 fs/namespace.c                      |    2 +-
 include/linux/ipc_namespace.h       |    4 ++--
 include/linux/key.h                 |    2 +-
 include/linux/mnt_namespace.h       |    2 +-
 include/linux/nsproxy.h             |    4 ++--
 include/linux/pid_namespace.h       |    4 ++--
 include/linux/sched.h               |    6 ++++--
 include/linux/security.h            |    6 +++---
 include/linux/sem.h                 |    4 ++--
 include/linux/user_namespace.h      |    4 ++--
 include/linux/utsname.h             |    4 ++--
 include/net/net_namespace.h         |    4 ++--
 ipc/namespace.c                     |    2 +-
 ipc/sem.c                           |    2 +-
 kernel/fork.c                       |   36 ++++++++++++++++++------------------
 kernel/nsproxy.c                    |    6 +++---
 kernel/pid_namespace.c              |    2 +-
 kernel/user_namespace.c             |    2 +-
 kernel/utsname.c                    |    2 +-
 net/core/net_namespace.c            |    4 ++--
 security/dummy.c                    |    2 +-
 security/keys/process_keys.c        |    2 +-
 security/security.c                 |    2 +-
 security/selinux/hooks.c            |    2 +-
 51 files changed, 83 insertions(+), 81 deletions(-)

Index: 2.6.25-rc2-mm1/arch/alpha/kernel/process.c
===================================================================
--- 2.6.25-rc2-mm1.orig/arch/alpha/kernel/process.c
+++ 2.6.25-rc2-mm1/arch/alpha/kernel/process.c
@@ -270,7 +270,7 @@ alpha_vfork(struct pt_regs *regs)
  */

 int
-copy_thread(int nr, unsigned long clone_flags, unsigned long usp,
+copy_thread(int nr, u64 clone_flags, unsigned long usp,
 	    unsigned long unused,
 	    struct task_struct * p, struct pt_regs * regs)
 {
Index: 2.6.25-rc2-mm1/arch/arm/kernel/process.c
===================================================================
--- 2.6.25-rc2-mm1.orig/arch/arm/kernel/process.c
+++ 2.6.25-rc2-mm1/arch/arm/kernel/process.c
@@ -331,7 +331,7 @@ void release_thread(struct task_struct *
 asmlinkage void ret_from_fork(void) __asm__("ret_from_fork");

 int
-copy_thread(int nr, unsigned long clone_flags, unsigned long stack_start,
+copy_thread(int nr, u64 clone_flags, unsigned long stack_start,
 	    unsigned long stk_sz, struct task_struct *p, struct pt_regs *regs)
 {
 	struct thread_info *thread = task_thread_info(p);
Index: 2.6.25-rc2-mm1/arch/avr32/kernel/process.c
===================================================================
--- 2.6.25-rc2-mm1.orig/arch/avr32/kernel/process.c
+++ 2.6.25-rc2-mm1/arch/avr32/kernel/process.c
@@ -325,7 +325,7 @@ int dump_fpu(struct pt_regs *regs, elf_f

 asmlinkage void ret_from_fork(void);

-int copy_thread(int nr, unsigned long clone_flags, unsigned long usp,
+int copy_thread(int nr, u64 clone_flags, unsigned long usp,
 		unsigned long unused,
 		struct task_struct *p, struct pt_regs *regs)
 {
Index: 2.6.25-rc2-mm1/arch/blackfin/kernel/process.c
===================================================================
--- 2.6.25-rc2-mm1.orig/arch/blackfin/kernel/process.c
+++ 2.6.25-rc2-mm1/arch/blackfin/kernel/process.c
@@ -168,7 +168,7 @@ asmlinkage int bfin_clone(struct pt_regs
 }

 int
-copy_thread(int nr, unsigned long clone_flags,
+copy_thread(int nr, u64 clone_flags,
 	    unsigned long usp, unsigned long topstk,
 	    struct task_struct *p, struct pt_regs *regs)
 {
Index: 2.6.25-rc2-mm1/arch/cris/arch-v10/kernel/process.c
===================================================================
--- 2.6.25-rc2-mm1.orig/arch/cris/arch-v10/kernel/process.c
+++ 2.6.25-rc2-mm1/arch/cris/arch-v10/kernel/process.c
@@ -115,7 +115,7 @@ int kernel_thread(int (*fn)(void *), voi
  */
 asmlinkage void ret_from_fork(void);

-int copy_thread(int nr, unsigned long clone_flags, unsigned long usp,
+int copy_thread(int nr, u64 clone_flags, unsigned long usp,
 		unsigned long unused,
 		struct task_struct *p, struct pt_regs *regs)
 {
Index: 2.6.25-rc2-mm1/arch/cris/arch-v32/kernel/process.c
===================================================================
--- 2.6.25-rc2-mm1.orig/arch/cris/arch-v32/kernel/process.c
+++ 2.6.25-rc2-mm1/arch/cris/arch-v32/kernel/process.c
@@ -131,7 +131,7 @@ kernel_thread(int (*fn)(void *), void * 
 extern asmlinkage void ret_from_fork(void);

 int
-copy_thread(int nr, unsigned long clone_flags, unsigned long usp,
+copy_thread(int nr, u64 clone_flags, unsigned long usp,
 	unsigned long unused,
 	struct task_struct *p, struct pt_regs *regs)
 {
Index: 2.6.25-rc2-mm1/arch/frv/kernel/process.c
===================================================================
--- 2.6.25-rc2-mm1.orig/arch/frv/kernel/process.c
+++ 2.6.25-rc2-mm1/arch/frv/kernel/process.c
@@ -204,7 +204,7 @@ void prepare_to_copy(struct task_struct 
 /*
  * set up the kernel stack and exception frames for a new process
  */
-int copy_thread(int nr, unsigned long clone_flags,
+int copy_thread(int nr, u64 clone_flags,
 		unsigned long usp, unsigned long topstk,
 		struct task_struct *p, struct pt_regs *regs)
 {
Index: 2.6.25-rc2-mm1/arch/h8300/kernel/process.c
===================================================================
--- 2.6.25-rc2-mm1.orig/arch/h8300/kernel/process.c
+++ 2.6.25-rc2-mm1/arch/h8300/kernel/process.c
@@ -192,7 +192,7 @@ asmlinkage int h8300_clone(struct pt_reg

 }

-int copy_thread(int nr, unsigned long clone_flags,
+int copy_thread(int nr, u64 clone_flags,
                 unsigned long usp, unsigned long topstk,
 		 struct task_struct * p, struct pt_regs * regs)
 {
Index: 2.6.25-rc2-mm1/arch/ia64/ia32/sys_ia32.c
===================================================================
--- 2.6.25-rc2-mm1.orig/arch/ia64/ia32/sys_ia32.c
+++ 2.6.25-rc2-mm1/arch/ia64/ia32/sys_ia32.c
@@ -734,7 +734,7 @@ __ia32_copy_pp_list(struct ia64_partial_

 int
 ia32_copy_ia64_partial_page_list(struct task_struct *p,
-				unsigned long clone_flags)
+				u64 clone_flags)
 {
 	int retval = 0;

Index: 2.6.25-rc2-mm1/arch/ia64/kernel/process.c
===================================================================
--- 2.6.25-rc2-mm1.orig/arch/ia64/kernel/process.c
+++ 2.6.25-rc2-mm1/arch/ia64/kernel/process.c
@@ -402,7 +402,7 @@ ia64_load_extra (struct task_struct *tas
  * so there is nothing to worry about.
  */
 int
-copy_thread (int nr, unsigned long clone_flags,
+copy_thread(int nr, u64 clone_flags,
 	     unsigned long user_stack_base, unsigned long user_stack_size,
 	     struct task_struct *p, struct pt_regs *regs)
 {
Index: 2.6.25-rc2-mm1/arch/m32r/kernel/process.c
===================================================================
--- 2.6.25-rc2-mm1.orig/arch/m32r/kernel/process.c
+++ 2.6.25-rc2-mm1/arch/m32r/kernel/process.c
@@ -242,7 +242,7 @@ int dump_fpu(struct pt_regs *regs, elf_f
 	return 0; /* Task didn't use the fpu at all. */
 }

-int copy_thread(int nr, unsigned long clone_flags, unsigned long spu,
+int copy_thread(int nr, u64 clone_flags, unsigned long spu,
 	unsigned long unused, struct task_struct *tsk, struct pt_regs *regs)
 {
 	struct pt_regs *childregs = task_pt_regs(tsk);
Index: 2.6.25-rc2-mm1/arch/m68k/kernel/process.c
===================================================================
--- 2.6.25-rc2-mm1.orig/arch/m68k/kernel/process.c
+++ 2.6.25-rc2-mm1/arch/m68k/kernel/process.c
@@ -235,7 +235,7 @@ asmlinkage int m68k_clone(struct pt_regs
 		       parent_tidptr, child_tidptr);
 }

-int copy_thread(int nr, unsigned long clone_flags, unsigned long usp,
+int copy_thread(int nr, u64 clone_flags, unsigned long usp,
 		 unsigned long unused,
 		 struct task_struct * p, struct pt_regs * regs)
 {
Index: 2.6.25-rc2-mm1/arch/m68knommu/kernel/process.c
===================================================================
--- 2.6.25-rc2-mm1.orig/arch/m68knommu/kernel/process.c
+++ 2.6.25-rc2-mm1/arch/m68knommu/kernel/process.c
@@ -200,7 +200,7 @@ asmlinkage int m68k_clone(struct pt_regs
         return do_fork(clone_flags, newsp, regs, 0, NULL, NULL);
 }

-int copy_thread(int nr, unsigned long clone_flags,
+int copy_thread(int nr, u64 clone_flags,
 		unsigned long usp, unsigned long topstk,
 		struct task_struct * p, struct pt_regs * regs)
 {
Index: 2.6.25-rc2-mm1/arch/mips/kernel/process.c
===========================================
...

[PATCH 2/3] add do_unshare() [message #29263 is a reply to message #29261] Wed, 09 April 2008 22:34 Go to previous messageGo to next message
Sukadev Bhattiprolu is currently offline  Sukadev Bhattiprolu
Messages: 413
Registered: August 2006
Senior Member
From: Sukadev Bhattiprolu <sukadev@us.ibm.com>
Subject: [PATCH 2/3] add do_unshare()

This patch adds a do_unshare() routine which will be common
to the unshare() and unshare64() syscall.

Signed-off-by: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Sukadev Bhattiprolu <sukadev@us.ibm.com>
---
 kernel/fork.c |    7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

Index: 2.6.25-rc2-mm1/kernel/fork.c
===================================================================
--- 2.6.25-rc2-mm1.orig/kernel/fork.c
+++ 2.6.25-rc2-mm1/kernel/fork.c
@@ -1696,7 +1696,7 @@ static int unshare_semundo(u64 unshare_f
  * constructed. Here we are modifying the current, active,
  * task_struct.
  */
-asmlinkage long sys_unshare(unsigned long unshare_flags)
+static long do_unshare(u64 unshare_flags)
 {
 	int err = 0;
 	struct fs_struct *fs, *new_fs = NULL;
@@ -1790,3 +1790,8 @@ bad_unshare_cleanup_thread:
 bad_unshare_out:
 	return err;
 }
+
+asmlinkage long sys_unshare(unsigned long unshare_flags)
+{
+	return do_unshare(unshare_flags);
+}

-- 

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "lxc-dev" group.
To post to this group, send email to lxc-dev@googlegroups.com
To unsubscribe from this group, send email to lxc-dev-unsubscribe@googlegroups.com
For more options, visit this group at http://groups.google.com/group/lxc-dev?hl=en
-~----------~----~----~----~------~----~------~--~---

_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers
[PATCH 3/3] add the clone64() and unshare64() syscalls [message #29264 is a reply to message #29261] Wed, 09 April 2008 22:34 Go to previous messageGo to next message
Sukadev Bhattiprolu is currently offline  Sukadev Bhattiprolu
Messages: 413
Registered: August 2006
Senior Member
From: Cedric Le Goater <clg@fr.ibm.com>
Subject: [PATCH 3/3] add the clone64() and unshare64() syscalls

This patch adds 2 new syscalls :

     long sys_clone64(unsigned long flags_high, unsigned long flags_low,
		unsigned long newsp);

     long sys_unshare64(unsigned long flags_high, unsigned long flags_low);

The current version of clone64() does not support CLONE_PARENT_SETTID and 
CLONE_CHILD_CLEARTID because we would exceed the 6 registers limit of some 
arches. It's possible to get around this limitation but we might not
need it as we already have clone()

Signed-off-by: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: Sukadev Bhattiprolu <sukadev@us.ibm.com>

---
 arch/powerpc/kernel/entry_32.S     |    8 ++++++++
 arch/powerpc/kernel/entry_64.S     |    5 +++++
 arch/powerpc/kernel/process.c      |   15 +++++++++++++++
 arch/s390/kernel/compat_linux.c    |   16 ++++++++++++++++
 arch/s390/kernel/compat_wrapper.S  |    6 ++++++
 arch/s390/kernel/process.c         |   15 +++++++++++++++
 arch/s390/kernel/syscalls.S        |    2 ++
 arch/x86/ia32/ia32entry.S          |    4 ++++
 arch/x86/ia32/sys_ia32.c           |   12 ++++++++++++
 arch/x86/kernel/entry_64.S         |    1 +
 arch/x86/kernel/process_32.c       |   14 ++++++++++++++
 arch/x86/kernel/process_64.c       |   15 +++++++++++++++
 arch/x86/kernel/syscall_table_32.S |    2 ++
 include/asm-powerpc/systbl.h       |    2 ++
 include/asm-powerpc/unistd.h       |    4 +++-
 include/asm-s390/unistd.h          |    4 +++-
 include/asm-x86/unistd_32.h        |    2 ++
 include/asm-x86/unistd_64.h        |    4 ++++
 include/linux/syscalls.h           |    3 +++
 kernel/fork.c                      |    7 +++++++
 kernel/sys_ni.c                    |    3 +++
 21 files changed, 142 insertions(+), 2 deletions(-)

Index: 2.6.25-rc2-mm1/arch/s390/kernel/syscalls.S
===================================================================
--- 2.6.25-rc2-mm1.orig/arch/s390/kernel/syscalls.S	2008-02-27 15:17:34.000000000 -0800
+++ 2.6.25-rc2-mm1/arch/s390/kernel/syscalls.S	2008-03-06 22:08:49.000000000 -0800
@@ -330,3 +330,5 @@ SYSCALL(sys_eventfd,sys_eventfd,sys_even
 SYSCALL(sys_timerfd_create,sys_timerfd_create,sys_timerfd_create_wrapper)
 SYSCALL(sys_timerfd_settime,sys_timerfd_settime,compat_sys_timerfd_settime_wrapper) /* 320 */
 SYSCALL(sys_timerfd_gettime,sys_timerfd_gettime,compat_sys_timerfd_gettime_wrapper)
+SYSCALL(sys_clone64,sys_clone64,sys32_clone64)
+SYSCALL(sys_unshare64,sys_unshare64,sys_unshare64_wrapper)
Index: 2.6.25-rc2-mm1/arch/x86/kernel/syscall_table_32.S
===================================================================
--- 2.6.25-rc2-mm1.orig/arch/x86/kernel/syscall_table_32.S	2008-02-27 15:17:35.000000000 -0800
+++ 2.6.25-rc2-mm1/arch/x86/kernel/syscall_table_32.S	2008-03-06 22:08:49.000000000 -0800
@@ -326,3 +326,5 @@ ENTRY(sys_call_table)
 	.long sys_fallocate
 	.long sys_timerfd_settime	/* 325 */
 	.long sys_timerfd_gettime
+	.long sys_clone64
+	.long sys_unshare64
Index: 2.6.25-rc2-mm1/include/asm-powerpc/systbl.h
===================================================================
--- 2.6.25-rc2-mm1.orig/include/asm-powerpc/systbl.h	2008-02-27 15:18:12.000000000 -0800
+++ 2.6.25-rc2-mm1/include/asm-powerpc/systbl.h	2008-03-06 22:08:49.000000000 -0800
@@ -316,3 +316,5 @@ COMPAT_SYS(fallocate)
 SYSCALL(subpage_prot)
 COMPAT_SYS_SPU(timerfd_settime)
 COMPAT_SYS_SPU(timerfd_gettime)
+PPC_SYS(clone64)
+SYSCALL_SPU(unshare64)
Index: 2.6.25-rc2-mm1/include/asm-powerpc/unistd.h
===================================================================
--- 2.6.25-rc2-mm1.orig/include/asm-powerpc/unistd.h	2008-02-27 15:18:12.000000000 -0800
+++ 2.6.25-rc2-mm1/include/asm-powerpc/unistd.h	2008-03-06 22:08:49.000000000 -0800
@@ -335,10 +335,12 @@
 #define __NR_subpage_prot	310
 #define __NR_timerfd_settime	311
 #define __NR_timerfd_gettime	312
+#define __NR_clone64		313
+#define __NR_unshare64		314
 
 #ifdef __KERNEL__
 
-#define __NR_syscalls		313
+#define __NR_syscalls		315
 
 #define __NR__exit __NR_exit
 #define NR_syscalls	__NR_syscalls
Index: 2.6.25-rc2-mm1/include/asm-s390/unistd.h
===================================================================
--- 2.6.25-rc2-mm1.orig/include/asm-s390/unistd.h	2008-02-27 15:18:13.000000000 -0800
+++ 2.6.25-rc2-mm1/include/asm-s390/unistd.h	2008-03-06 22:08:49.000000000 -0800
@@ -259,7 +259,9 @@
 #define __NR_timerfd_create	319
 #define __NR_timerfd_settime	320
 #define __NR_timerfd_gettime	321
-#define NR_syscalls 322
+#define __NR_clone64		322
+#define __NR_unshare64		323
+#define NR_syscalls 324
 
 /* 
  * There are some system calls that are not present on 64 bit, some
Index: 2.6.25-rc2-mm1/include/asm-x86/unistd_32.h
===================================================================
--- 2.6.25-rc2-mm1.orig/include/asm-x86/unistd_32.h	2008-02-27 15:18:16.000000000 -0800
+++ 2.6.25-rc2-mm1/include/asm-x86/unistd_32.h	2008-03-06 22:08:49.000000000 -0800
@@ -332,6 +332,8 @@
 #define __NR_fallocate		324
 #define __NR_timerfd_settime	325
 #define __NR_timerfd_gettime	326
+#define __NR_clone64		327
+#define __NR_unshare64		328
 
 #ifdef __KERNEL__
 
Index: 2.6.25-rc2-mm1/include/asm-x86/unistd_64.h
===================================================================
--- 2.6.25-rc2-mm1.orig/include/asm-x86/unistd_64.h	2008-02-27 15:18:16.000000000 -0800
+++ 2.6.25-rc2-mm1/include/asm-x86/unistd_64.h	2008-03-06 22:08:49.000000000 -0800
@@ -639,6 +639,10 @@ __SYSCALL(__NR_fallocate, sys_fallocate)
 __SYSCALL(__NR_timerfd_settime, sys_timerfd_settime)
 #define __NR_timerfd_gettime			287
 __SYSCALL(__NR_timerfd_gettime, sys_timerfd_gettime)
+#define __NR_clone64		288
+__SYSCALL(__NR_clone64, stub_clone64)
+#define __NR_unshare64		289
+__SYSCALL(__NR_unshare64,	sys_unshare64)
 
 
 #ifndef __NO_STUBS
Index: 2.6.25-rc2-mm1/include/linux/syscalls.h
===================================================================
--- 2.6.25-rc2-mm1.orig/include/linux/syscalls.h	2008-02-27 15:18:18.000000000 -0800
+++ 2.6.25-rc2-mm1/include/linux/syscalls.h	2008-03-06 22:08:49.000000000 -0800
@@ -615,6 +615,9 @@ asmlinkage long sys_timerfd_gettime(int 
 asmlinkage long sys_eventfd(unsigned int count);
 asmlinkage long sys_fallocate(int fd, int mode, loff_t offset, loff_t len);
 
+asmlinkage long sys_unshare64(unsigned long clone_flags_high,
+			      unsigned long clone_flags_low);
+
 int kernel_execve(const char *filename, char *const argv[], char *const envp[]);
 
 #endif
Index: 2.6.25-rc2-mm1/kernel/sys_ni.c
===================================================================
--- 2.6.25-rc2-mm1.orig/kernel/sys_ni.c	2008-02-27 15:18:23.000000000 -0800
+++ 2.6.25-rc2-mm1/kernel/sys_ni.c	2008-03-06 22:08:49.000000000 -0800
@@ -161,3 +161,6 @@ cond_syscall(sys_timerfd_gettime);
 cond_syscall(compat_sys_timerfd_settime);
 cond_syscall(compat_sys_timerfd_gettime);
 cond_syscall(sys_eventfd);
+
+cond_syscall(sys_clone64);
+cond_syscall(sys_unshare64);
Index: 2.6.25-rc2-mm1/arch/x86/kernel/process_32.c
===================================================================
--- 2.6.25-rc2-mm1.orig/arch/x86/kernel/process_32.c	2008-03-06 22:08:49.000000000 -0800
+++ 2.6.25-rc2-mm1/arch/x86/kernel/process_32.c	2008-03-06 22:08:49.000000000 -0800
@@ -771,6 +771,20 @@ asmlinkage int sys_clone(struct pt_regs 
 	return do_fork(clone_flags, newsp, &regs, 0, parent_tidptr, child_tidptr);
 }
 
+asmlinkage int sys_clone64(struct pt_regs regs)
+{
+	u64 clone_flags;
+	unsigned long newsp;
+
+	clone_flags = ((u64) regs.bx << 32 | regs.cx);
+	clone_flags &= ~(CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID);
+
+	newsp = regs.dx;
+	if (!newsp)
+		newsp = regs.sp;
+	return do_fork(clone_flags, newsp, &regs, 0, NULL, NULL);
+}
+
 /*
  * This is trivial, and on the face of it looks like it
  * could equally well be done in user mode.
Index: 2.6.25-rc2-mm1/arch/x86/kernel/process_64.c
===================================================================
--- 2.6.25-rc2-mm1.orig/arch/x86/kernel/process_64.c	2008-03-06 22:08:49.000000000 -0800
+++ 2.6.25-rc2-mm1/arch/x86/kernel/process_64.c	2008-03-06 22:08:49.000000000 -0800
@@ -775,6 +775,21 @@ sys_clone(unsigned long clone_flags, uns
 	return do_fork(clone_flags, newsp, regs, 0, parent_tid, child_tid);
 }
 
+asmlinkage long
+sys_clone64(unsigned long clone_flags_high, unsigned long clone_flags_low,
+	  unsigned long newsp, struct pt_regs *regs)
+{
+	u64 clone_flags;
+
+	clone_flags = ((u64) clone_flags_high << 32 | clone_flags_low);
+	clone_flags &= ~(CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID);
+
+	if (!newsp)
+		newsp = regs->sp;
+	return do_fork(clone_flags, newsp, regs, 0, NULL, NULL);
+}
+
+
 /*
  * This is trivial, and on the face of it looks like it
  * could equally well be done in user mode.
Index: 2.6.25-rc2-mm1/arch/s390/kernel/compat_linux.c
===================================================================
--- 2.6.25-rc2-mm1.orig/arch/s390/kernel/compat_linux.c	2008-01-26 09:48:58.000000000 -0800
+++ 2.6.25-rc2-mm1/arch/s390/kernel/compat_linux.c	2008-03-06 22:08:49.000000000 -0800
@@ -940,6 +940,22 @@ asmlinkage long sys32_clone(void)
 		       parent_tidptr, child_tidptr);
 }
 
+asmlinkage long sys32_clone64(void)
+{
+	struct pt_regs *regs = task_pt_regs(current);
+	u64 clone_flags;
+	unsigned long newsp;
+
+	clone_flags = ((u64) (regs->orig_gpr2  & 0xffffffffUL) << 32 |
+		       (regs->gprs[3]  & 0xffffffffUL));
+	clone_flags &= ~(CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID);
+
+	newsp = regs->gprs[4] & 0x7fffffffUL;
+	if (!newsp)
+		newsp = regs->gprs[15];
+	return do_fork(clone_flags, newsp, regs, 0, NULL, NULL);
+}
+
 /*
  * 31 bit emulation wrapper functions for sys_fadvise64/fadvise64_64.
  * These need to rewrite the advise values for POSIX_FADV_{DONTNEED,NOREUSE}
Index: 2.6.25-rc2-mm1/arch/s390/kernel/process.c
===================================================================
--- 2.6.25-rc2-mm1.orig/arch/s390/kernel/process.c	2008-03-06 22:08:49.000000000 -0800
+++ 2.6.25-rc2-mm1/arch/s390/kernel/process.c	2008-03-06 22:08:49.000000000 -0800
@@ -325,6 +325,21 @@ asmlinkage long sys_clone(void)
 		       parent_tidptr, child_tidptr);
 }
 
+asmlinkage long sys_clone64(void)
+{
+	struct pt_regs *regs = task_pt_regs(current);
+	u64 clone_flags;
+	unsigned long newsp;
+
+	clone_flags = ((u64) regs->orig_gpr2 << 32 |  regs->gprs[3]);
+	clone_flags &= ~(CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID);
+
+	newsp = regs->gprs[4];
+	if (!newsp)
+		newsp = regs->gprs[15];
+	return do_fork(clone_flags, newsp, regs, 0, NULL, NULL);
+}
+
 /*
  * This is trivial, and on the face of it looks like it
  * could equally well be done in user mode.
Index: 2.6.25-rc2-mm1/arch/powerpc/kernel/process.c
===================================================================
--- 2.6.25-rc2-mm1.orig/arch/powerpc/kernel/process.c	2008-03-06 22:08:49.000000000 -0800
+++ 2.6.25-rc2-mm1/arch/powerpc/kernel/process.c	2008-03-06 22:08:49.000000000 -0800
@@ -829,6 +829,21 @@ int sys_clone(unsigned long clone_flags,
  	return do_fork(clone_flags, usp, regs, 0, parent_tidp, child_tidp);
 }
 
+int sys_clone64(unsigned long clone_flags_high, unsigned long clone_flags_low,
+		unsigned long usp, unsigned long p4, unsigned long p5,
+		unsigned long p6, struct pt_regs *regs)
+{
+	u64 clone_flags;
+
+	clone_flags = ((u64) clone_flags_high << 32 | clone_flags_low);
+	clone_flags &= ~(CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID);
+
+	CHECK_FULL_REGS(regs);
+	if (usp == 0)
+		usp = regs->gpr[1];	/* stack pointer for child */
+	return do_fork(clone_flags, usp, regs, 0, NULL, NULL);
+}
+
 int sys_fork(unsigned long p1, unsigned long p2, unsigned long p3,
 	     unsigned long p4, unsigned long p5, unsigned long p6,
 	     struct pt_regs *regs)
Index: 2.6.25-rc2-mm1/arch/x86/kernel/entry_64.S
===================================================================
--- 2.6.25-rc2-mm1.orig/arch/x86/kernel/entry_64.S	2008-02-27 16:07:43.000000000 -0800
+++ 2.6.25-rc2-mm1/arch/x86/kernel/entry_64.S	2008-03-06 22:08:49.000000000 -0800
@@ -527,6 +527,7 @@ END(\label)
 	PTREGSCALL stub_rt_sigsuspend, sys_rt_sigsuspend, %rdx
 	PTREGSCALL stub_sigaltstack, sys_sigaltstack, %rdx
 	PTREGSCALL stub_iopl, sys_iopl, %rsi
+	PTREGSCALL stub_clone64, sys_clone64, %rcx
 
 ENTRY(ptregscall_common)
 	popq %r11
Index: 2.6.25-rc2-mm1/arch/powerpc/kernel/entry_32.S
===================================================================
--- 2.6.25-rc2-mm1.orig/arch/powerpc/kernel/entry_32.S	2008-01-26 09:48:57.000000000 -0800
+++ 2.6.25-rc2-mm1/arch/powerpc/kernel/entry_32.S	2008-03-06 22:08:49.000000000 -0800
@@ -452,6 +452,14 @@ ppc_clone:
 	stw	r0,_TRAP(r1)		/* register set saved */
 	b	sys_clone
 
+	.globl	ppc_clone64
+ppc_clone64:
+	SAVE_NVGPRS(r1)
+	lwz	r0,_TRAP(r1)
+	rlwinm	r0,r0,0,0,30		/* clear LSB to indicate full */
+	stw	r0,_TRAP(r1)		/* register set saved */
+	b	sys_clone64
+
 	.globl	ppc_swapcontext
 ppc_swapcontext:
 	SAVE_NVGPRS(r1)
Index: 2.6.25-rc2-mm1/arch/powerpc/kernel/entry_64.S
===================================================================
--- 2.6.25-rc2-mm1.orig/arch/powerpc/kernel/entry_64.S	2008-01-26 09:48:57.000000000 -0800
+++ 2.6.25-rc2-mm1/arch/powerpc/kernel/entry_64.S	2008-03-06 22:08:49.000000000 -0800
@@ -298,6 +298,11 @@ _GLOBAL(ppc_clone)
 	bl	.sys_clone
 	b	syscall_exit
 
+_GLOBAL(ppc_clone64)
+	bl	.save_nvgprs
+	bl	.sys_clone64
+	b	syscall_exit
+
 _GLOBAL(ppc32_swapcontext)
 	bl	.save_nvgprs
 	bl	.compat_sys_swapcontext
Index: 2.6.25-rc2-mm1/arch/s390/kernel/compat_wrapper.S
===================================================================
--- 2.6.25-rc2-mm1.orig/arch/s390/kernel/compat_wrapper.S	2008-02-27 15:17:33.000000000 -0800
+++ 2.6.25-rc2-mm1/arch/s390/kernel/compat_wrapper.S	2008-03-06 22:08:49.000000000 -0800
@@ -1732,3 +1732,9 @@ compat_sys_timerfd_gettime_wrapper:
 	lgfr	%r2,%r2			# int
 	llgtr	%r3,%r3			# struct compat_itimerspec *
 	jg	compat_sys_timerfd_gettime
+
+	.globl sys_unshare64_wrapper
+sys_unshare64_wrapper:
+	llgfr	%r2,%r2			# unsigned long
+	llgfr	%r3,%r3			# unsigned long
+	jg	sys_unshare64
Index: 2.6.25-rc2-mm1/kernel/fork.c
===================================================================
--- 2.6.25-rc2-mm1.orig/kernel/fork.c	2008-03-06 22:08:49.000000000 -0800
+++ 2.6.25-rc2-mm1/kernel/fork.c	2008-03-10 20:47:10.000000000 -0700
@@ -1795,3 +1795,10 @@ asmlinkage long sys_unshare(unsigned lon
 {
 	return do_unshare(unshare_flags);
 }
+
+asmlinkage long sys_unshare64(unsigned long flags_high, unsigned long flags_low)
+{
+	u64 unshare_flags = ((u64) flags_high << 32 | flags_low);
+
+	return do_unshare(unshare_flags);
+}
Index: 2.6.25-rc2-mm1/arch/x86/ia32/sys_ia32.c
===================================================================
--- 2.6.25-rc2-mm1.orig/arch/x86/ia32/sys_ia32.c	2008-02-27 15:17:35.000000000 -0800
+++ 2.6.25-rc2-mm1/arch/x86/ia32/sys_ia32.c	2008-03-06 22:08:49.000000000 -0800
@@ -824,6 +824,18 @@ asmlinkage long sys32_clone(unsigned int
 	return do_fork(clone_flags, newsp, regs, 0, parent_tid, child_tid);
 }
 
+asmlinkage long sys32_clone64(unsigned int flags_high, unsigned int flags_low,
+			      unsigned int newsp, struct pt_regs *regs)
+{
+	u64 clone_flags = ((u64) flags_high << 32 | flags_low);
+
+	clone_flags &= ~(CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID);
+
+	if (!newsp)
+		newsp = regs->sp;
+	return do_fork(clone_flags, newsp, regs, 0, NULL, NULL);
+}
+
 /*
  * Some system calls that need sign extended arguments. This could be
  * done by a generic wrapper.
Index: 2.6.25-rc2-mm1/arch/x86/ia32/ia32entry.S
===================================================================
--- 2.6.25-rc2-mm1.orig/arch/x86/ia32/ia32entry.S	2008-02-27 15:17:35.000000000 -0800
+++ 2.6.25-rc2-mm1/arch/x86/ia32/ia32entry.S	2008-03-06 22:08:49.000000000 -0800
@@ -373,6 +373,7 @@ quiet_ni_syscall:
 	PTREGSCALL stub32_vfork, sys_vfork, %rdi
 	PTREGSCALL stub32_iopl, sys_iopl, %rsi
 	PTREGSCALL stub32_rt_sigsuspend, sys_rt_sigsuspend, %rdx
+	PTREGSCALL stub32_clone64, sys32_clone64, %rcx
 
 ENTRY(ia32_ptregs_common)
 	popq %r11
@@ -727,4 +728,7 @@ ia32_sys_call_table:
 	.quad sys32_fallocate
 	.quad compat_sys_timerfd_settime	/* 325 */
 	.quad compat_sys_timerfd_gettime
+	.quad stub32_clone64
+	.quad sys_unshare64
+
 ia32_syscall_end:
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers
Re: [PATCH 0/3] clone64() and unshare64() system calls [message #29266 is a reply to message #29261] Thu, 10 April 2008 00:00 Go to previous messageGo to next message
hpa is currently offline  hpa
Messages: 38
Registered: January 2007
Member
sukadev@us.ibm.com wrote:
> This is a resend of the patch set Cedric had sent earlier. I ported
> the patch set to 2.6.25-rc8-mm1 and tested on x86 and x86_64.
> ---
> 
> We have run out of the 32 bits in clone_flags !
> 
> This patchset introduces 2 new system calls which support 64bit clone-flags.
> 
>      long sys_clone64(unsigned long flags_high, unsigned long flags_low,
> 		unsigned long newsp);
> 
>      long sys_unshare64(unsigned long flags_high, unsigned long flags_low);
> 
> The current version of clone64() does not support CLONE_PARENT_SETTID and 
> CLONE_CHILD_CLEARTID because we would exceed the 6 registers limit of some 
> arches. It's possible to get around this limitation but we might not
> need it as we already have clone()
> 

I really dislike this interface.

If you're going to make it a 64-bit pass it in as a 64-bit number, 
instead of breaking it into two numbers.  Better yet, IMO, would be to 
pass a pointer to a structure like:

struct shared {
	unsigned long nwords;
	unsigned long flags[];
};

... which can be expanded indefinitely.

	-hpa
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers
Re: [PATCH 0/3] clone64() and unshare64() system calls [message #29268 is a reply to message #29266] Thu, 10 April 2008 01:07 Go to previous messageGo to next message
Sukadev Bhattiprolu is currently offline  Sukadev Bhattiprolu
Messages: 413
Registered: August 2006
Senior Member
H. Peter Anvin [hpa@zytor.com] wrote:
> sukadev@us.ibm.com wrote:
>> This is a resend of the patch set Cedric had sent earlier. I ported
>> the patch set to 2.6.25-rc8-mm1 and tested on x86 and x86_64.
>> ---
>> We have run out of the 32 bits in clone_flags !
>> This patchset introduces 2 new system calls which support 64bit 
>> clone-flags.
>>      long sys_clone64(unsigned long flags_high, unsigned long flags_low,
>> 		unsigned long newsp);
>>      long sys_unshare64(unsigned long flags_high, unsigned long 
>> flags_low);
>> The current version of clone64() does not support CLONE_PARENT_SETTID and 
>> CLONE_CHILD_CLEARTID because we would exceed the 6 registers limit of some 
>> arches. It's possible to get around this limitation but we might not
>> need it as we already have clone()
>
> I really dislike this interface.
>
> If you're going to make it a 64-bit pass it in as a 64-bit number, instead 
> of breaking it into two numbers.

Maybe I am missing your point. The glibc interface could take a 64bit
parameter, but don't we need to pass 32-bit values into the system call 
on 32 bit systems ?

> Better yet, IMO, would be to pass a pointer to a structure like:
>
> struct shared {
> 	unsigned long nwords;
> 	unsigned long flags[];
> };
>
> ... which can be expanded indefinitely.

Yes, this was discussed before in the context of Pavel Emelyanov's patch

	http://lkml.org/lkml/2008/1/16/109

along with sys_indirect().  While there was no consensus, it looked like
adding a new system call was better than open ended interfaces.
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers
Re: [PATCH 0/3] clone64() and unshare64() system calls [message #29269 is a reply to message #29268] Thu, 10 April 2008 01:10 Go to previous messageGo to next message
hpa is currently offline  hpa
Messages: 38
Registered: January 2007
Member
sukadev@us.ibm.com wrote:
>>
>> If you're going to make it a 64-bit pass it in as a 64-bit number, instead 
>> of breaking it into two numbers.
> 
> Maybe I am missing your point. The glibc interface could take a 64bit
> parameter, but don't we need to pass 32-bit values into the system call 
> on 32 bit systems ?

Not as such, no.  The ABI handles that.  To make the ABI clean on some 
architectures, it's good to consider a 64-bit value only in positions 
where they map to an even:odd register pair once slotted in.

> Yes, this was discussed before in the context of Pavel Emelyanov's patch
> 
> 	http://lkml.org/lkml/2008/1/16/109
> 
> along with sys_indirect().  While there was no consensus, it looked like
> adding a new system call was better than open ended interfaces.

That's not really an open-ended interface, it's just an expandable bitmap.

	-hpa
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers
Re: [PATCH 0/3] clone64() and unshare64() system calls [message #29272 is a reply to message #29269] Thu, 10 April 2008 02:38 Go to previous messageGo to next message
Sukadev Bhattiprolu is currently offline  Sukadev Bhattiprolu
Messages: 413
Registered: August 2006
Senior Member
H. Peter Anvin [hpa@zytor.com] wrote:
>> Yes, this was discussed before in the context of Pavel Emelyanov's patch
>> 	http://lkml.org/lkml/2008/1/16/109
>> along with sys_indirect().  While there was no consensus, it looked like
>> adding a new system call was better than open ended interfaces.
>
> That's not really an open-ended interface, it's just an expandable bitmap.

Yes, we liked such an approach earlier too and its conceivable that we
will run out of the 64-bits too :-)

But as Jon Corbet pointed out in the the thread above, it looked like
adding a new system call has been the "traditional" way of solving this
in Linux so far and there has been no consensus on a newer approach.

Sukadev
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers
Re: [PATCH 0/3] clone64() and unshare64() system calls [message #29273 is a reply to message #29272] Thu, 10 April 2008 02:43 Go to previous messageGo to next message
Paul Menage is currently offline  Paul Menage
Messages: 642
Registered: September 2006
Senior Member
On Wed, Apr 9, 2008 at 7:38 PM,  <sukadev@us.ibm.com> wrote:
>
>  But as Jon Corbet pointed out in the the thread above, it looked like
>  adding a new system call has been the "traditional" way of solving this
>  in Linux so far and there has been no consensus on a newer approach.
>

I thought that the consensus was that adding a new system call was
better than trying to force extensibility on to the existing
non-extensible system call.

But if we are adding a new system call, why not make the new one
extensible to reduce the need for yet another new call in the future?

Paul
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers
Re: [PATCH 0/3] clone64() and unshare64() system calls [message #29282 is a reply to message #29266] Thu, 10 April 2008 06:48 Go to previous messageGo to next message
Cedric Le Goater is currently offline  Cedric Le Goater
Messages: 443
Registered: February 2006
Senior Member
H. Peter Anvin wrote:
> sukadev@us.ibm.com wrote:
>> This is a resend of the patch set Cedric had sent earlier. I ported
>> the patch set to 2.6.25-rc8-mm1 and tested on x86 and x86_64.
>> ---
>>
>> We have run out of the 32 bits in clone_flags !
>>
>> This patchset introduces 2 new system calls which support 64bit
>> clone-flags.
>>
>>      long sys_clone64(unsigned long flags_high, unsigned long flags_low,
>>         unsigned long newsp);
>>
>>      long sys_unshare64(unsigned long flags_high, unsigned long
>> flags_low);
>>
>> The current version of clone64() does not support CLONE_PARENT_SETTID
>> and CLONE_CHILD_CLEARTID because we would exceed the 6 registers limit
>> of some arches. It's possible to get around this limitation but we
>> might not
>> need it as we already have clone()
>>
> 
> I really dislike this interface.
> 
> If you're going to make it a 64-bit pass it in as a 64-bit number,
> instead of breaking it into two numbers.  Better yet, IMO, would be to
> pass a pointer to a structure like:
> 
> struct shared {
>     unsigned long nwords;
>     unsigned long flags[];
> };
> 
> ... which can be expanded indefinitely.

ok.

What about the copy_from_user() overhead ? is this something we care 
about for a clone like syscall ?

If not, this would certainly make our life simpler to extend clone flags.
I'm ready to implement anything if someone would just tell me in which
direction.

Thanks !

C. 
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers
Re: [PATCH 1/3] change clone_flags type to u64 [message #29293 is a reply to message #29262] Thu, 10 April 2008 08:25 Go to previous messageGo to next message
Andi Kleen is currently offline  Andi Kleen
Messages: 33
Registered: February 2006
Member
sukadev@us.ibm.com writes:

> From: Sukadev Bhattiprolu <sukadev@us.ibm.com>
> Subject: [lxc-dev] [patch -lxc 1/3] change clone_flags type to u64
>
> This is a preliminary patch changing the clone_flags type to 64bits
> for all the routines called by do_fork(). 

I must admit I was always a little sceptical of giving every tiny
namespaceable kernel feature its own CLONE flag (and it's own 
CONFIG option). What was the rationale for that again?

With your current strategy are you sure that even 64bit will
be enough in the end? For me it rather looks like you'll
go through those quickly too as more and more of the kernel
is namespaced.

Also I think the user interface is very unfriendly. How
is a non kernel hacker supposed to make sense of these 
myriads of flags? You'll be creating another 
CreateProcess123_extra_args_extended() 
in the end I fear.

Wouldn't it be better to just partition all this into
fewer more understandable larger feature groups?  I think
that would be much nicer from pretty much all perspectives
(kernel maintenance, user interface sanity, not needing
clone128/256 in the end etc.) 

Some consolidation on the CONFIGs would be good too. I just
cannot imagine it really makes sense to configure everything
so fine grained and this is just asking for random compile
breakage on randconfig.

-Andi
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers
Re: [PATCH 1/3] change clone_flags type to u64 [message #29303 is a reply to message #29293] Thu, 10 April 2008 12:25 Go to previous messageGo to next message
Cedric Le Goater is currently offline  Cedric Le Goater
Messages: 443
Registered: February 2006
Senior Member
Hello Andi,

Andi Kleen wrote:
> sukadev@us.ibm.com writes:
> 
>> From: Sukadev Bhattiprolu <sukadev@us.ibm.com>
>> Subject: [lxc-dev] [patch -lxc 1/3] change clone_flags type to u64
>>
>> This is a preliminary patch changing the clone_flags type to 64bits
>> for all the routines called by do_fork(). 
> 
> I must admit I was always a little sceptical of giving every tiny
> namespaceable kernel feature its own CLONE flag (and it's own 
> CONFIG option). What was the rationale for that again?

I guess that was a development rationale. Most of the namespaces are in 
use in the container projects like openvz, vserver and probably others 
and we needed a way to activate the code.

Not perfect I agree.
 
> With your current strategy are you sure that even 64bit will
> be enough in the end? For me it rather looks like you'll
> go through those quickly too as more and more of the kernel
> is namespaced.

well, we're reaching the end. I hope ! devpts is in progress and
mq is just waiting for a clone flag.
 
> Also I think the user interface is very unfriendly. How
> is a non kernel hacker supposed to make sense of these 
> myriads of flags? You'll be creating another 
> CreateProcess123_extra_args_extended() 
> in the end I fear.

well, the clone interface is a not friendly interface anyway. glibc wraps 
it and most users just use fork().

We will need a user library, like we have a libphtread or a libaio, to
effectively use the namespaces features. This is being worked on but
it's another topic.
 
> Wouldn't it be better to just partition all this into
> fewer more understandable larger feature groups?  I think
> that would be much nicer from pretty much all perspectives
> (kernel maintenance, user interface sanity, not needing
> clone128/256 in the end etc.) 

Yes. this make sense. Most of the namespaces have dependencies between
each other.

> Some consolidation on the CONFIGs would be good too. I just
> cannot imagine it really makes sense to configure everything
> so fine grained and this is just asking for random compile
> breakage on randconfig.

yes. definitely agree.

but we still need a way to extend the clone flags because none are left.
would you say that the clone64 is the right way to go or should we rather 
go in the direction hpa proposed :

http://lkml.org/lkml/2008/4/9/318 :

> If you're going to make it a 64-bit pass it in as a 64-bit number, 
> instead of breaking it into two numbers.  Better yet, IMO, would 
> be to pass a pointer to a structure like:
>  
> struct shared {
>     unsigned long nwords;
>     unsigned long flags[];
> };
> 
> ... which can be expanded indefinitely.

if we could agree on some new interface, we could then make sure we
are not abusing it.

Thanks,

C.

_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers
Re: [PATCH 0/3] clone64() and unshare64() system calls [message #29304 is a reply to message #29269] Thu, 10 April 2008 12:33 Go to previous messageGo to next message
Cedric Le Goater is currently offline  Cedric Le Goater
Messages: 443
Registered: February 2006
Senior Member
H. Peter Anvin wrote:
> sukadev@us.ibm.com wrote:
>>>
>>> If you're going to make it a 64-bit pass it in as a 64-bit number,
>>> instead of breaking it into two numbers.
>>
>> Maybe I am missing your point. The glibc interface could take a 64bit
>> parameter, but don't we need to pass 32-bit values into the system
>> call on 32 bit systems ?
> 
> Not as such, no.  The ABI handles that.  To make the ABI clean on some
> architectures, it's good to consider a 64-bit value only in positions
> where they map to an even:odd register pair once slotted in.

OK. I didn't know that. I took sys_llseek() as an example of an interface 
to follow when coded clone64(). 

Thanks,

C. 
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers
Re: [PATCH 1/3] change clone_flags type to u64 [message #29305 is a reply to message #29303] Thu, 10 April 2008 12:50 Go to previous messageGo to next message
Andi Kleen is currently offline  Andi Kleen
Messages: 33
Registered: February 2006
Member
> I guess that was a development rationale. 

But what rationale? It just doesn't make much sense to me.

> Most of the namespaces are in 
> use in the container projects like openvz, vserver and probably others 
> and we needed a way to activate the code.

You could just have added it to feature groups over time.

> 
> Not perfect I agree.
>  
> > With your current strategy are you sure that even 64bit will
> > be enough in the end? For me it rather looks like you'll
> > go through those quickly too as more and more of the kernel
> > is namespaced.
> 
> well, we're reaching the end. I hope ! devpts is in progress and
> mq is just waiting for a clone flag.

Are you sure?

>  
> > Also I think the user interface is very unfriendly. How
> > is a non kernel hacker supposed to make sense of these 
> > myriads of flags? You'll be creating another 
> > CreateProcess123_extra_args_extended() 
> > in the end I fear.
> 
> well, the clone interface is a not friendly interface anyway. glibc wraps 
> it

But only for the stack setup which is just a minor detail. 

The basic clone() flags interface used to be pretty sane and usable 
before it could overloaded with so many tiny features.

I especially worry on how user land should keep track of changing kernel
here. If you add new feature flag for lots of kernel features it is
reasonable to expect that in the future there will be often new features.

Does this mean user land needs to be updated all the time? Will this
end up like another udev? 

> We will need a user library, like we have a libphtread or a libaio, to

That doesn't make sense. The basic kernel syscalls should be usable,
not require some magic library that would likely need intimate 
knowledge of specific kernel versions to do any good.

> but we still need a way to extend the clone flags because none are left.

Can we just take out some again that were added in the .25 cycle and
readd them once there is a properly thought out interface?  That would
leave at least one.

-Andi
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers
Re: [PATCH 1/3] change clone_flags type to u64 [message #29306 is a reply to message #29305] Thu, 10 April 2008 13:11 Go to previous messageGo to next message
Kirill Korotaev is currently offline  Kirill Korotaev
Messages: 137
Registered: January 2006
Senior Member
The was no real rationale except for some people seeing "clone" functionality
as the match and the fact that FS_NAMESCAPE was done so made them believe it is a good way to go.
And I warned about flags limitation at the beginning.
Both OpenVZ/vserver suggested to use a special syscall for handling this.
Maybe it is a good point to switch to it now finally and stop worring about all this?

Andi Kleen wrote:
>> I guess that was a development rationale. 
> 
> But what rationale? It just doesn't make much sense to me.
> 
>> Most of the namespaces are in 
>> use in the container projects like openvz, vserver and probably others 
>> and we needed a way to activate the code.
> 
> You could just have added it to feature groups over time.
> 
>> Not perfect I agree.
>>  
>>> With your current strategy are you sure that even 64bit will
>>> be enough in the end? For me it rather looks like you'll
>>> go through those quickly too as more and more of the kernel
>>> is namespaced.
>> well, we're reaching the end. I hope ! devpts is in progress and
>> mq is just waiting for a clone flag.
> 
> Are you sure?
> 
>>  
>>> Also I think the user interface is very unfriendly. How
>>> is a non kernel hacker supposed to make sense of these 
>>> myriads of flags? You'll be creating another 
>>> CreateProcess123_extra_args_extended() 
>>> in the end I fear.
>> well, the clone interface is a not friendly interface anyway. glibc wraps 
>> it
> 
> But only for the stack setup which is just a minor detail. 
> 
> The basic clone() flags interface used to be pretty sane and usable 
> before it could overloaded with so many tiny features.
> 
> I especially worry on how user land should keep track of changing kernel
> here. If you add new feature flag for lots of kernel features it is
> reasonable to expect that in the future there will be often new features.
> 
> Does this mean user land needs to be updated all the time? Will this
> end up like another udev? 
> 
>> We will need a user library, like we have a libphtread or a libaio, to
> 
> That doesn't make sense. The basic kernel syscalls should be usable,
> not require some magic library that would likely need intimate 
> knowledge of specific kernel versions to do any good.
> 
>> but we still need a way to extend the clone flags because none are left.
> 
> Can we just take out some again that were added in the .25 cycle and
> readd them once there is a properly thought out interface?  That would
> leave at least one.
> 
> -Andi
> _______________________________________________
> Containers mailing list
> Containers@lists.linux-foundation.org
> https://lists.linux-foundation.org/mailman/listinfo/containers
> 
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers
Re: [PATCH 1/3] change clone_flags type to u64 [message #29307 is a reply to message #29305] Thu, 10 April 2008 13:18 Go to previous messageGo to next message
Cedric Le Goater is currently offline  Cedric Le Goater
Messages: 443
Registered: February 2006
Senior Member
Andi Kleen wrote:
>> I guess that was a development rationale. 
> 
> But what rationale? It just doesn't make much sense to me.

Let's add Eric in Cc:

>> Most of the namespaces are in 
>> use in the container projects like openvz, vserver and probably others 
>> and we needed a way to activate the code.
> 
> You could just have added it to feature groups over time.

Yes if the feature group had existed, that would have been a good
option. 

Don't take me wrong. I agree with this group direction. Most 
namespaces can't be safely decoupled from each other with a clone 
flag.

>> Not perfect I agree.
>>  
>>> With your current strategy are you sure that even 64bit will
>>> be enough in the end? For me it rather looks like you'll
>>> go through those quickly too as more and more of the kernel
>>> is namespaced.
>> well, we're reaching the end. I hope ! devpts is in progress and
>> mq is just waiting for a clone flag.
> 
> Are you sure?

I'm never sure ! :) That's what we have in plan for the moment.

>>> Also I think the user interface is very unfriendly. How
>>> is a non kernel hacker supposed to make sense of these 
>>> myriads of flags? You'll be creating another 
>>> CreateProcess123_extra_args_extended() 
>>> in the end I fear.
>> well, the clone interface is a not friendly interface anyway. glibc wraps 
>> it
> 
> But only for the stack setup which is just a minor detail. 
> 
> The basic clone() flags interface used to be pretty sane and usable 
> before it could overloaded with so many tiny features.
> 
> I especially worry on how user land should keep track of changing kernel
> here. If you add new feature flag for lots of kernel features it is
> reasonable to expect that in the future there will be often new features.
> 
> Does this mean user land needs to be updated all the time? Will this
> end up like another udev? 
> 
>> We will need a user library, like we have a libphtread or a libaio, to
> 
> That doesn't make sense. The basic kernel syscalls should be usable,
> not require some magic library that would likely need intimate 
> knowledge of specific kernel versions to do any good.

No magic there. but running a container will require some userland code 
to be set up properly. 

>> but we still need a way to extend the clone flags because none are left.
> 
> Can we just take out some again that were added in the .25 cycle and
> readd them once there is a properly thought out interface?  That would
> leave at least one.

well, CLONE_STOPPED is being recycle in 2.6.26. so we could use that one
to group namespaces.

and CLONE_NEWPID would probably be a good candidate to group namespaces.

That would be fine for me but it would still leave clone with one to zero
flags left. 

Thanks,


C.
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers
Re: [PATCH 1/3] change clone_flags type to u64 [message #29308 is a reply to message #29306] Thu, 10 April 2008 13:23 Go to previous messageGo to next message
Cedric Le Goater is currently offline  Cedric Le Goater
Messages: 443
Registered: February 2006
Senior Member
Kirill Korotaev wrote:
> The was no real rationale except for some people seeing "clone" functionality
> as the match and the fact that FS_NAMESCAPE was done so made them believe it 
> is a good way to go.
> And I warned about flags limitation at the beginning.

yes and now, we've hit the clone wall but that's a good thing.

> Both OpenVZ/vserver suggested to use a special syscall for handling this.

most projects do it that way.

> Maybe it is a good point to switch to it now finally and stop worring about all 
> this?

what would be the interface ? 

C. 

 
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers
Re: [PATCH 0/3] clone64() and unshare64() system calls [message #29325 is a reply to message #29304] Thu, 10 April 2008 16:00 Go to previous messageGo to next message
hpa is currently offline  hpa
Messages: 38
Registered: January 2007
Member
Cedric Le Goater wrote:
> 
> OK. I didn't know that. I took sys_llseek() as an example of an interface 
> to follow when coded clone64(). 
> 

llseek() was the first system call that took a doublewidth argument. 
It's not the one you want to mimic.

	-hpa
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers
Re: [PATCH 1/3] change clone_flags type to u64 [message #29330 is a reply to message #29305] Thu, 10 April 2008 17:14 Go to previous messageGo to next message
serue is currently offline  serue
Messages: 750
Registered: February 2006
Senior Member
Quoting Andi Kleen (andi@firstfloor.org):
> > I guess that was a development rationale. 
> 
> But what rationale? It just doesn't make much sense to me.
> 
> > Most of the namespaces are in 
> > use in the container projects like openvz, vserver and probably others 
> > and we needed a way to activate the code.
> 
> You could just have added it to feature groups over time.
> 
> > 
> > Not perfect I agree.
> >  
> > > With your current strategy are you sure that even 64bit will
> > > be enough in the end? For me it rather looks like you'll
> > > go through those quickly too as more and more of the kernel
> > > is namespaced.
> > 
> > well, we're reaching the end. I hope ! devpts is in progress and
> > mq is just waiting for a clone flag.
> 
> Are you sure?

Well for one thing we can take a somewhat different approach to new
clone flags.  I.e. we could extend CLONE_NEWIPC to do mq instead of
introducing a new clone flag.  The name doesn't have 'sysv' in it,
and globbing all ipc resources together makes some amount of sense.
Similarly has hpa+eric pointed out earlier, suka could use
CLONE_NEWDEV for ptys.  If we have net, pid, ipc, devices, that's a
pretty reasonable split imo.  Perhaps we tie user to devices and get
rid of CLONE_NEWUSER which I suspect noone is using atm (since only
Dave has run into the CONFIG_USER_SCHED problem).  Or not.  We could
roll uts into net, and give CLONE_NEWUTS a deprecation period.

-serge
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers
Re: [PATCH 0/3] clone64() and unshare64() system calls [message #29333 is a reply to message #29273] Thu, 10 April 2008 18:26 Go to previous messageGo to next message
Sukadev Bhattiprolu is currently offline  Sukadev Bhattiprolu
Messages: 413
Registered: August 2006
Senior Member
Paul Menage [menage@google.com] wrote:
| On Wed, Apr 9, 2008 at 7:38 PM,  <sukadev@us.ibm.com> wrote:
| >
| >  But as Jon Corbet pointed out in the the thread above, it looked like
| >  adding a new system call has been the "traditional" way of solving this
| >  in Linux so far and there has been no consensus on a newer approach.
| >
| 
| I thought that the consensus was that adding a new system call was
| better than trying to force extensibility on to the existing
| non-extensible system call.

There were couple of objections to extensible system calls like
sys_indirect() and to Pavel's approach.

| 
| But if we are adding a new system call, why not make the new one
| extensible to reduce the need for yet another new call in the future?

hypothetically, can we make a variant of clone() extensible to the point
of requiring a copy_from_user() ?

| 
| Paul
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers
Re: [PATCH 0/3] clone64() and unshare64() system calls [message #29334 is a reply to message #29333] Thu, 10 April 2008 18:31 Go to previous messageGo to next message
hpa is currently offline  hpa
Messages: 38
Registered: January 2007
Member
sukadev@us.ibm.com wrote:
> | 
> | I thought that the consensus was that adding a new system call was
> | better than trying to force extensibility on to the existing
> | non-extensible system call.
> 
> There were couple of objections to extensible system calls like
> sys_indirect() and to Pavel's approach.
> 

This is a very different thing, though.  sys_indirect is pretty much a 
mechanism for having a sideband channel -- a second ABI -- into each and 
every system call, making it extremely hard to analyze what the full set 
of impact of a specific system call is.  Worse, as it was being proposed 
to have been used, it would have set state variables inside the kernel 
in a very opaque manner.

> | But if we are adding a new system call, why not make the new one
> | extensible to reduce the need for yet another new call in the future?
> 
> hypothetically, can we make a variant of clone() extensible to the point
> of requiring a copy_from_user() ?

The only issue is whether or not it's acceptable from a performance 
standpoint.  clone() is reasonably expensive, though.

	-hpa
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers
Re: [PATCH 1/3] change clone_flags type to u64 [message #29340 is a reply to message #29330] Thu, 10 April 2008 22:13 Go to previous messageGo to next message
Daniel Hokka Zakrisso is currently offline  Daniel Hokka Zakrisso
Messages: 22
Registered: January 2007
Junior Member
Serge E. Hallyn wrote:
> Quoting Andi Kleen (andi@firstfloor.org):
>> > I guess that was a development rationale.
>>
>> But what rationale? It just doesn't make much sense to me.
>>
>> > Most of the namespaces are in
>> > use in the container projects like openvz, vserver and probably others
>> > and we needed a way to activate the code.
>>
>> You could just have added it to feature groups over time.
>>
>> >
>> > Not perfect I agree.
>> >
>> > > With your current strategy are you sure that even 64bit will
>> > > be enough in the end? For me it rather looks like you'll
>> > > go through those quickly too as more and more of the kernel
>> > > is namespaced.
>> >
>> > well, we're reaching the end. I hope ! devpts is in progress and
>> > mq is just waiting for a clone flag.
>>
>> Are you sure?
>
> Well for one thing we can take a somewhat different approach to new
> clone flags.  I.e. we could extend CLONE_NEWIPC to do mq instead of
> introducing a new clone flag.  The name doesn't have 'sysv' in it,
> and globbing all ipc resources together makes some amount of sense.
> Similarly has hpa+eric pointed out earlier, suka could use
> CLONE_NEWDEV for ptys.  If we have net, pid, ipc, devices, that's a
> pretty reasonable split imo.  Perhaps we tie user to devices and get
> rid of CLONE_NEWUSER which I suspect noone is using atm (since only
> Dave has run into the CONFIG_USER_SCHED problem).  Or not.  We could
> roll uts into net, and give CLONE_NEWUTS a deprecation period.

Please don't. Then we'd need to re-add it in Linux-VServer to support
guests where network namespaces aren't used...

> -serge

-- 
Daniel Hokka Zakrisson
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers
Re: [PATCH 1/3] change clone_flags type to u64 [message #29341 is a reply to message #29340] Thu, 10 April 2008 22:49 Go to previous messageGo to next message
serue is currently offline  serue
Messages: 750
Registered: February 2006
Senior Member
Quoting Daniel Hokka Zakrisson (daniel@hozac.com):
> Serge E. Hallyn wrote:
> > Quoting Andi Kleen (andi@firstfloor.org):
> >> > I guess that was a development rationale.
> >>
> >> But what rationale? It just doesn't make much sense to me.
> >>
> >> > Most of the namespaces are in
> >> > use in the container projects like openvz, vserver and probably others
> >> > and we needed a way to activate the code.
> >>
> >> You could just have added it to feature groups over time.
> >>
> >> >
> >> > Not perfect I agree.
> >> >
> >> > > With your current strategy are you sure that even 64bit will
> >> > > be enough in the end? For me it rather looks like you'll
> >> > > go through those quickly too as more and more of the kernel
> >> > > is namespaced.
> >> >
> >> > well, we're reaching the end. I hope ! devpts is in progress and
> >> > mq is just waiting for a clone flag.
> >>
> >> Are you sure?
> >
> > Well for one thing we can take a somewhat different approach to new
> > clone flags.  I.e. we could extend CLONE_NEWIPC to do mq instead of
> > introducing a new clone flag.  The name doesn't have 'sysv' in it,
> > and globbing all ipc resources together makes some amount of sense.
> > Similarly has hpa+eric pointed out earlier, suka could use
> > CLONE_NEWDEV for ptys.  If we have net, pid, ipc, devices, that's a
> > pretty reasonable split imo.  Perhaps we tie user to devices and get
> > rid of CLONE_NEWUSER which I suspect noone is using atm (since only
> > Dave has run into the CONFIG_USER_SCHED problem).  Or not.  We could
> > roll uts into net, and give CLONE_NEWUTS a deprecation period.
> 
> Please don't. Then we'd need to re-add it in Linux-VServer to support
> guests where network namespaces aren't used...

So these are networked vservers with a different hostname?  Just
curious, what would be a typical use for these?

Anyway then I guess we won't :)  Do you have other suggestions for
ns clone flags which ought to be combined?  Do the rest of what I
listed make sense to you?  (If not, then I guess I'll step out of the
way and let you and Andi fight it out :)

thanks,
-serge
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers
Re: [PATCH 1/3] change clone_flags type to u64 [message #29359 is a reply to message #29341] Fri, 11 April 2008 08:45 Go to previous message
Daniel Hokka Zakrisso is currently offline  Daniel Hokka Zakrisso
Messages: 22
Registered: January 2007
Junior Member
Serge E. Hallyn wrote:
> Quoting Daniel Hokka Zakrisson (daniel@hozac.com):
>> Serge E. Hallyn wrote:
>> > Quoting Andi Kleen (andi@firstfloor.org):
>> >> > I guess that was a development rationale.
>> >>
>> >> But what rationale? It just doesn't make much sense to me.
>> >>
>> >> > Most of the namespaces are in
>> >> > use in the container projects like openvz, vserver and probably
>> others
>> >> > and we needed a way to activate the code.
>> >>
>> >> You could just have added it to feature groups over time.
>> >>
>> >> >
>> >> > Not perfect I agree.
>> >> >
>> >> > > With your current strategy are you sure that even 64bit will
>> >> > > be enough in the end? For me it rather looks like you'll
>> >> > > go through those quickly too as more and more of the kernel
>> >> > > is namespaced.
>> >> >
>> >> > well, we're reaching the end. I hope ! devpts is in progress and
>> >> > mq is just waiting for a clone flag.
>> >>
>> >> Are you sure?
>> >
>> > Well for one thing we can take a somewhat different approach to new
>> > clone flags.  I.e. we could extend CLONE_NEWIPC to do mq instead of
>> > introducing a new clone flag.  The name doesn't have 'sysv' in it,
>> > and globbing all ipc resources together makes some amount of sense.
>> > Similarly has hpa+eric pointed out earlier, suka could use
>> > CLONE_NEWDEV for ptys.  If we have net, pid, ipc, devices, that's a
>> > pretty reasonable split imo.  Perhaps we tie user to devices and get
>> > rid of CLONE_NEWUSER which I suspect noone is using atm (since only
>> > Dave has run into the CONFIG_USER_SCHED problem).  Or not.  We could
>> > roll uts into net, and give CLONE_NEWUTS a deprecation period.
>>
>> Please don't. Then we'd need to re-add it in Linux-VServer to support
>> guests where network namespaces aren't used...
>
> So these are networked vservers with a different hostname?  Just
> curious, what would be a typical use for these?

Layer 3 isolation will continue to be the default for Linux-VServer.

> Anyway then I guess we won't :)  Do you have other suggestions for
> ns clone flags which ought to be combined?  Do the rest of what I
> listed make sense to you?  (If not, then I guess I'll step out of the
> way and let you and Andi fight it out :)

I think putting mq under CLONE_NEWIPC makes sense, as well as using
CLONE_NEWDEV for the ptys. If CLONE_NEWUSER is to be combined with
anything, I think it makes more sense to combine it with CLONE_NEWPID than
CLONE_NEWDEV.

> thanks,
> -serge
>

-- 
Daniel Hokka Zakrisson
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers
Re: [PATCH 3/3] add the clone64() and unshare64() syscalls [message #29657 is a reply to message #29264] Wed, 09 April 2008 23:07 Go to previous message
Jakub Jelinek is currently offline  Jakub Jelinek
Messages: 1
Registered: April 2008
Junior Member
On Wed, Apr 09, 2008 at 03:34:59PM -0700, sukadev@us.ibm.com wrote:
> From: Cedric Le Goater <clg@fr.ibm.com>
> Subject: [PATCH 3/3] add the clone64() and unshare64() syscalls
> 
> This patch adds 2 new syscalls :
> 
>      long sys_clone64(unsigned long flags_high, unsigned long flags_low,
> 		unsigned long newsp);
> 
>      long sys_unshare64(unsigned long flags_high, unsigned long flags_low);

Can you explain why are you adding it for 64-bit arches too?  unsigned long
is there already 64-bit, and both sys_clone and sys_unshare have unsigned
long flags, rather than unsigned int.

	Jakub
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers
Previous Topic: [PATCH net-2.6.26 1/3][TUN][NETNS]: Introduce the tun_net structure.
Next Topic: [RFC PATCH 0/4] Container Freezer: Reuse Suspend Freezer
Goto Forum:
  


Current Time: Sun Oct 26 02:26:09 GMT 2025

Total time taken to generate the page: 0.11216 seconds