OpenVZ Forum


Home » Mailing lists » Devel » userns: targeted capabilities v5
userns: targeted capabilities v5 [message #41743] Thu, 17 February 2011 15:02 Go to next message
serge is currently offline  serge
Messages: 72
Registered: January 2007
Member
Here is a repost of my previous user namespace patch, ported onto
last night's git head.

It fixes several things I was doing wrong in the last (v4)
posting, in particular:

1. don't set uts_ns->user_ns to current's when !CLONE_NEWUTS
2. add a ipc_ns->user_ns which owns ipc_ns, and use that to
decide CAP_IPC_OWNER
3. fix logic flaw caused by bad parantheses
4. allow do_prlimit to current
5. don't always give root full privs to init_user_ns

The expected course of development for user namespaces is laid out
at https://wiki.ubuntu.com/UserNamespace. Bugs aside, this
patchset is supposed to not at all affect systems which are not
actively using user namespaces, and only restrict what tasks in
child user namespace can do. They begin to limit privilege to
a user namespace, so that root in a container cannot kill or
ptrace tasks in the parent user namespace, and can only get
world access rights to files. Since all files currently belong
to the initila user namespace, that means that child user
namespaces can only get world access rights to *all* files.
While this temporarily makes user namespaces bad for system
containers, it starts to get useful for some sandboxing.

I've run the 'runltplite.sh' with and without this patchset and
found no difference. So all in all, this is the first version
of this patchset for which I feel comfortable asking: please
consider applying.

thanks,
-serge
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containe rs
[PATCH 2/9] security: Make capabilities relative to the user namespace. [message #41744 is a reply to message #41743] Thu, 17 February 2011 15:03 Go to previous messageGo to next message
serge is currently offline  serge
Messages: 72
Registered: January 2007
Member
- Introduce ns_capable to test for a capability in a non-default
user namespace.
- Teach cap_capable to handle capabilities in a non-default
user namespace.

The motivation is to get to the unprivileged creation of new
namespaces. It looks like this gets us 90% of the way there, with
only potential uid confusion issues left.

I still need to handle getting all caps after creation but otherwise I
think I have a good starter patch that achieves all of your goals.

Changelog:
11/05/2010: [serge] add apparmor
12/14/2010: [serge] fix capabilities to created user namespaces
Without this, if user serge creates a user_ns, he won't have
capabilities to the user_ns he created. THis is because we
were first checking whether his effective caps had the caps
he needed and returning -EPERM if not, and THEN checking whether
he was the creator. Reverse those checks.
12/16/2010: [serge] security_real_capable needs ns argument in !security case
01/11/2011: [serge] add task_ns_capable helper
01/11/2011: [serge] add nsown_capable() helper per Bastian Blank suggestion
02/16/2011: [serge] fix a logic bug: the root user is always creator of
init_user_ns, but should not always have capabilities to
it! Fix the check in cap_capable().

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Serge E. Hallyn <serge.hallyn@canonical.com>
---
include/linux/capability.h | 10 ++++++++--
include/linux/security.h | 25 ++++++++++++++-----------
kernel/capability.c | 32 ++++++++++++++++++++++++++++++--
security/apparmor/lsm.c | 5 +++--
security/commoncap.c | 40 +++++++++++++++++++++++++++++++++-------
security/security.c | 16 ++++++++++------
security/selinux/hooks.c | 14 +++++++++-----
7 files changed, 107 insertions(+), 35 deletions(-)

diff --git a/include/linux/capability.h b/include/linux/capability.h
index fb16a36..cb3d2d9 100644
--- a/include/linux/capability.h
+++ b/include/linux/capability.h
@@ -544,7 +544,7 @@ extern const kernel_cap_t __cap_init_eff_set;
*
* Note that this does not set PF_SUPERPRIV on the task.
*/
-#define has_capability(t, cap) (security_real_capable((t), (cap)) == 0)
+#define has_capability(t, cap) (security_real_capable((t), &init_user_ns, (cap)) == 0)

/**
* has_capability_noaudit - Determine if a task has a superior capability available (unaudited)
@@ -558,9 +558,15 @@ extern const kernel_cap_t __cap_init_eff_set;
* Note that this does not set PF_SUPERPRIV on the task.
*/
#define has_capability_noaudit(t, cap) \
- (security_real_capable_noaudit((t), (cap)) == 0)
+ (security_real_capable_noaudit((t), &init_user_ns, (cap)) == 0)

+struct user_namespace;
+extern struct user_namespace init_user_ns;
extern int capable(int cap);
+extern int ns_capable(struct user_namespace *ns, int cap);
+extern int task_ns_capable(struct task_struct *t, int cap);
+
+#define nsown_capable(cap) (ns_capable(current_user_ns(), (cap)))

/* audit system wants to get cap info from files as well */
struct dentry;
diff --git a/include/linux/security.h b/include/linux/security.h
index b2b7f97..6bbee08 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -46,13 +46,14 @@

struct ctl_table;
struct audit_krule;
+struct user_namespace;

/*
* These functions are in security/capability.c and are used
* as the default capabilities functions
*/
extern int cap_capable(struct task_struct *tsk, const struct cred *cred,
- int cap, int audit);
+ struct user_namespace *ns, int cap, int audit);
extern int cap_settime(struct timespec *ts, struct timezone *tz);
extern int cap_ptrace_access_check(struct task_struct *child, unsigned int mode);
extern int cap_ptrace_traceme(struct task_struct *parent);
@@ -1254,6 +1255,7 @@ static inline void security_free_mnt_opts(struct security_mnt_opts *opts)
* credentials.
* @tsk contains the task_struct for the process.
* @cred contains the credentials to use.
+ * @ns contains the user namespace we want the capability in
* @cap contains the capability <include/linux/capability.h>.
* @audit: Whether to write an audit message or not
* Return 0 if the capability is granted for @tsk.
@@ -1382,7 +1384,7 @@ struct security_operations {
const kernel_cap_t *inheritable,
const kernel_cap_t *permitted);
int (*capable) (struct task_struct *tsk, const struct cred *cred,
- int cap, int audit);
+ struct user_namespace *ns, int cap, int audit);
int (*sysctl) (struct ctl_table *table, int op);
int (*quotactl) (int cmds, int type, int id, struct super_block *sb);
int (*quota_on) (struct dentry *dentry);
@@ -1662,9 +1664,9 @@ int security_capset(struct cred *new, const struct cred *old,
const kernel_cap_t *effective,
const kernel_cap_t *inheritable,
const kernel_cap_t *permitted);
-int security_capable(const struct cred *cred, int cap);
-int security_real_capable(struct task_struct *tsk, int cap);
-int security_real_capable_noaudit(struct task_struct *tsk, int cap);
+int security_capable(struct user_namespace *ns, const struct cred *cred, int cap);
+int security_real_capable(struct task_struct *tsk, struct user_namespace *ns, int cap);
+int security_real_capable_noaudit(struct task_struct *tsk, struct user_namespace *ns, int cap);
int security_sysctl(struct ctl_table *table, int op);
int security_quotactl(int cmds, int type, int id, struct super_block *sb);
int security_quota_on(struct dentry *dentry);
@@ -1856,28 +1858,29 @@ static inline int security_capset(struct cred *new,
return cap_capset(new, old, effective, inheritable, permitted);
}

-static inline int security_capable(const struct cred *cred, int cap)
+static inline int security_capable(struct user_namespace *ns,
+ const struct cred *cred, int cap)
{
- return cap_capable(current, cred, cap, SECURITY_CAP_AUDIT);
+ return cap_capable(current, cred, ns, cap, SECURITY_CAP_AUDIT);
}

-static inline int security_real_capable(struct task_struct *tsk, int cap)
+static inline int security_real_capable(struct task_struct *tsk, struct user_namespace *ns, int cap)
{
int ret;

rcu_read_lock();
- ret = cap_capable(tsk, __task_cred(tsk), cap, SECURITY_CAP_AUDIT);
+ ret = cap_capable(tsk, __task_cred(tsk), ns, cap, SECURITY_CAP_AUDIT);
rcu_read_unlock();
return ret;
}

static inline
-int security_real_capable_noaudit(struct task_struct *tsk, int cap)
+int security_real_capable_noaudit(struct task_struct *tsk, struct user_namespace *ns, int cap)
{
int ret;

rcu_read_lock();
- ret = cap_capable(tsk, __task_cred(tsk), cap,
+ ret = cap_capable(tsk, __task_cred(tsk), ns, cap,
SECURITY_CAP_NOAUDIT);
rcu_read_unlock();
return ret;
diff --git a/kernel/capability.c b/kernel/capability.c
index 9e9385f..916658c 100644
--- a/kernel/capability.c
+++ b/kernel/capability.c
@@ -14,6 +14,7 @@
#include <linux/security.h>
#include <linux/syscalls.h>
#include <linux/pid_namespace.h>
+#include <linux/user_namespace.h>
#include <asm/uaccess.h>

/*
@@ -301,15 +302,42 @@ error:
*/
int capable(int cap)
{
+ return ns_capable(&init_user_ns, cap);
+}
+EXPORT_SYMBOL(capable);
+
+/**
+ * ns_capable - Determine if the current task has a superior capability in effect
+ * @ns: The usernamespace we want the capability in
+ * @cap: The capability to be tested for
+ *
+ * Return true if the current task has the given superior capability currently
+ * available for use, false if not.
+ *
+ * This sets PF_SUPERPRIV on the task if the capability is available on the
+ * assumption that it's about to be used.
+ */
+int ns_capable(struct user_namespace *ns, int cap)
+{
if (unlikely(!cap_valid(cap))) {
printk(KERN_CRIT "capable() called with invalid cap=%u\n", cap);
BUG();
}

- if (security_capable(current_cred(), cap) == 0) {
+ if (security_capable(ns, current_cred(), cap) == 0) {
current->flags |= PF_SUPERPRIV;
return 1;
}
return 0;
}
-EXPORT_SYMBOL(capable);
+EXPORT_SYMBOL(ns_capable);
+
+/*
+ * does current have capability 'cap' to the user namespace of task
+ * 't'. Return true if it does, false otherwise.
+ */
+int task_ns_capable(struct task_struct *t, int cap)
+{
+ return ns_capable(task_cred_xxx(t, user)->user_ns, cap);
+}
+EXPORT_SYMBOL(task_ns_capable);
diff --git a/security/apparmor/lsm.c b/security/apparmor/lsm.c
index b7106f1..b37c2cd 100644
--- a/security/apparmor/lsm.c
+++ b/security/apparmor/lsm.c
@@ -22,6 +22,7 @@
#include <linux/ctype.h>
#include <linux/sysctl.h>
#include <linux/audit.h>
+#include <linux/user_namespace.h>
#include <net/sock.h>

#include "include/apparmor.h"
@@ -136,11 +137,11 @@ static int apparmor_capget(struct task_struct *target, kernel_cap_t *effective,
}

static int apparmor_capable(struct task_struct *task, const struct cred *cred,
- int cap, int audit)
+ struct user_namespace *ns, int cap, int audit)
{
struct aa_profile *profile;
/* cap_capable returns 0 on success, else -EPERM */
- int error = cap_capable(task, cred, cap, audit);
+ int error = cap_capable(task, cred, ns, cap, audit);
if (!error) {
profile = aa_cred_profile(cred);
if (!unconfined(profile))
diff --git a/security/commoncap.c b/security/commoncap.c
index 64c2ed9..51fa9ec 100644
--- a/security/commoncap.c
+++ b/security/commoncap.c
@@ -27,6 +27,7 @@
#include <linux/sched.h>
#include <linux/prctl.h>
#include <linux/securebits.h>
+#include <linux/user_namespace.h>

/*
* If a non-root user executes a setuid-root binary in
@@ -68,6 +69,7 @@ EXPORT_SYMBOL(cap_netlink_recv);
* cap_capable - Determine whether a task has a particular effective capability
* @tsk: The task to query
* @cred: The credentials to use
+ * @ns: The user namespace in which we need the capability
* @cap: The capability to check for
* @audit: Whether to write an audit message or not
*
@@ -79,10 +81,32 @@ EXPORT_SYMBOL(cap_netlink_recv);
* cap_has_capability() returns 0 when a task has a capability, but the
* kernel's capable() and has_capability() returns 1 for this case.
*/
-int cap_capable(struct task_struct *tsk, const struct cred *cred, int cap,
- int audit)
+int cap_capable(struct task_struct *tsk, const struct cred *cred,
+ struct user_namespace *targ_ns, int cap, int audit)
{
- return cap_raised(cred->cap_effective, cap) ? 0 : -EPERM;
+ for (;;) {
+ /* The creator of the user namespace has all caps. */
+ if (targ_ns != &init_user_ns && targ_ns->creator == cred->user)
+ return 0;
+
+ /* Do we have the necessary capabilities? */
+ if (targ_ns == cred->user->user_ns)
+ return cap_raised(cred->cap_effective, cap) ? 0 : -EPERM;
+
+ /* Have we tried all of the parent namespaces? */
+ if (targ_ns == &init_user_ns)
+ return -EPERM;
+
+ /* If you have the capability in a parent user ns you have it
+ * in the over all children user namespaces as well, so see
+ * if this process has the capability in the parent user
+ * namespace.
+ */
+ targ_ns = targ_ns->creator->user_ns;
+ }
+
+ /* We never get here */
+ return -EPERM;
}

/**
@@ -177,7 +201,8 @@ static inline int cap_inh_is_capped(void)
/* they are so limited unless the current task has the CAP_SETPCAP
* capability
*/
- if (cap_capable(current, current_cred(), CAP_SETPCAP,
+ if (cap_capable(current, current_cred(),
+ current_cred()->user->user_ns, CAP_SETPCAP,
SECURITY_CAP_AUDIT) == 0)
return 0;
return 1;
@@ -829,7 +854,8 @@ int cap_task_prctl(int option, unsigned long arg2, unsigned long arg3,
& (new->securebits ^ arg2)) /*[1]*/
|| ((new->securebits & SECURE_ALL_LOCKS & ~arg2)) /*[2]*/
|| (arg2 & ~(SECURE_ALL_LOCKS | SECURE_ALL_BITS)) /*[3]*/
- || (cap_capable(current, current_cred(), CAP_SETPCAP,
+ || (cap_capable(current, current_cred(),
+ current_cred()->user->user_ns, CAP_SETPCAP,
SECURITY_CAP_AUDIT) != 0) /*[4]*/
/*
* [1] no changing of bits that are locked
@@ -894,7 +920,7 @@ int cap_vm_enough_memory(struct mm_struct *mm, long pages)
{
int cap_sys_admin = 0;

- if (cap_capable(current, current_cred(), CAP_SYS_ADMIN,
+ if (cap_capable(current, current_cred(), &init_user_ns, CAP_SYS_ADMIN,
SECURITY_CAP_NOAUDIT) == 0)
cap_sys_admin = 1;
return __vm_enough_memory(mm, pages, cap_sys_admin);
@@ -921,7 +947,7 @@ int cap_file_mmap(struct file *file, unsigned long reqprot,
int ret = 0;

if (addr < dac_mmap_min_addr) {
- ret = cap_capable(current, current_cred(), CAP_SYS_RAWIO,
+ ret = cap_capable(current, current_cred(), &init_user_ns, CAP_SYS_RAWIO,
SECURITY_CAP_AUDIT);
/* set PF_SUPERPRIV if it turns out we allow the low mmap */
if (ret == 0)
diff --git a/security/security.c b/security/security.c
index 7b7308a..7a6a0d0 100644
--- a/security/security.c
+++ b/security/security.c
@@ -154,29 +154,33 @@ int security_capset(struct cred *new, const struct cred *old,
effective, inheritable, permitted);
}

-int security_capable(const struct cred *cred, int cap)
+int security_capable(struct user_namespace *ns, const struct cred *cred,
+ int cap)
{
- return security_ops->capable(current, cred, cap, SECURITY_CAP_AUDIT);
+ return security_ops->capable(current, cred, ns, cap,
+ SECURITY_CAP_AUDIT);
}

-int security_real_capable(struct task_struct *tsk, int cap)
+int security_real_capable(struct task_struct *tsk, struct user_namespace *ns,
+ int cap)
{
const struct cred *cred;
int ret;

cred = get_task_cred(tsk);
- ret = security_ops->capable(tsk, cred, cap, SECURITY_CAP_AUDIT);
+ ret = security_ops->capable(tsk, cred, ns, cap, SECURITY_CAP_AUDIT);
put_cred(cred);
return ret;
}

-int security_real_capable_noaudit(struct task_struct *tsk, int cap)
+int security_real_capable_noaudit(struct task_struct *tsk,
+ struct user_namespace *ns, int cap)
{
const struct cred *cred;
int ret;

cred = get_task_cred(tsk);
- ret = security_ops->capable(tsk, cred, cap, SECURITY_CAP_NOAUDIT);
+ ret = security_ops->capable(tsk, cred, ns, cap, SECURITY_CAP_NOAUDIT);
put_cred(cred);
return ret;
}
diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index c8d6992..6dcda48 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -77,6 +77,7 @@
#include <linux/mutex.h>
#include <linux/posix-timers.h>
#include <linux/syslog.h>
+#include <linux/user_namespace.h>

#include "avc.h"
#include "objsec.h"
@@ -1423,6 +1424,7 @@ static int current_has_perm(const struct task_struct *tsk,
/* Check whether a task is allowed to use a capability. */
static int task_has_capability(struct task_struct *tsk,
const struct cred *cred,
+ struct user_namespace *ns,
int cap, int audit)
{
struct common_audit_data ad;
@@ -1851,15 +1853,15 @@ static int selinux_capset(struct cred *new, const struct cred *old,
*/

static int selinux_capable(struct task_struct *tsk, const struct cred *cred,
- int cap, int audit)
+ struct user_namespace *ns, int cap, int audit)
{
int rc;

- rc = cap_capable(tsk, cred, cap, audit);
+ rc = cap_capable(tsk, cred, ns, cap, audit);
if (rc)
return rc;

- return task_has_capability(tsk, cred, cap, audit);
+ return task_has_capability(tsk, cred, ns, cap, audit);
}

static int selinux_sysctl_get_sid(ctl_table *table, u16 tclass, u32 *sid)
@@ -2012,7 +2014,8 @@ static int selinux_vm_enough_memory(struct mm_struct *mm, long pages)
{
int rc, cap_sys_admin = 0;

- rc = selinux_capable(current, current_cred(), CAP_SYS_ADMIN,
+ rc = selinux_capable(current, current_cred(),
+ &init_user_ns, CAP_SYS_ADMIN,
SECURITY_CAP_NOAUDIT);
if (rc == 0)
cap_sys_admin = 1;
@@ -2829,7 +2832,8 @@ static int selinux_inode_getsecurity(const struct inode *inode, const char *name
* and lack of permission just means that we fall back to the
* in-core context value, not a denial.
*/
- error = selinux_capable(current, current_cred(), CAP_MAC_ADMIN,
+ error = selinux_capable(current, current_cred(),
+ &init_user_ns, CAP_MAC_ADMIN,
SECURITY_CAP_NOAUDIT);
if (!error)
error = security_sid_to_context_force(isec->sid, &context,
--
1.7.0.4

_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containe rs
[PATCH 9/9] userns: check user namespace for task-&gt;file uid equivalence checks [message #41745 is a reply to message #41743] Thu, 17 February 2011 15:04 Go to previous messageGo to next message
serge is currently offline  serge
Messages: 72
Registered: January 2007
Member
Cheat for now and say all files belong to init_user_ns. Next
step will be to let superblocks belong to a user_ns, and derive
inode_userns(inode) from inode->i_sb->s_user_ns. Finally we'll
introduce more flexible arrangements.

Changelog:
Feb 15: make is_owner_or_cap take const struct inode

Signed-off-by: Serge E. Hallyn <serge.hallyn@canonical.com>
---
fs/inode.c | 17 +++++++++++++++++
fs/namei.c | 20 +++++++++++++++-----
include/linux/fs.h | 9 +++++++--
3 files changed, 39 insertions(+), 7 deletions(-)

diff --git a/fs/inode.c b/fs/inode.c
index da85e56..1930b45 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -25,6 +25,7 @@
#include <linux/async.h>
#include <linux/posix_acl.h>
#include <linux/ima.h>
+#include <linux/cred.h>

/*
* This is needed for the following functions:
@@ -1722,3 +1723,19 @@ void inode_init_owner(struct inode *inode, const struct inode *dir,
inode->i_mode = mode;
}
EXPORT_SYMBOL(inode_init_owner);
+
+/*
+ * return 1 if current either has CAP_FOWNER to the
+ * file, or owns the file.
+ */
+int is_owner_or_cap(const struct inode *inode)
+{
+ struct user_namespace *ns = inode_userns(inode);
+
+ if (current_user_ns() == ns && current_fsuid() == inode->i_uid)
+ return 1;
+ if (ns_capable(ns, CAP_FOWNER))
+ return 1;
+ return 0;
+}
+EXPORT_SYMBOL(is_owner_or_cap);
diff --git a/fs/namei.c b/fs/namei.c
index 9e701e2..cfac5b4 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -176,6 +176,9 @@ static int acl_permission_check(struct inode *inode, int mask, unsigned int flag

mask &= MAY_READ | MAY_WRITE | MAY_EXEC;

+ if (current_user_ns() != inode_userns(inode))
+ goto other_perms;
+
if (current_fsuid() == inode->i_uid)
mode >>= 6;
else {
@@ -189,6 +192,7 @@ static int acl_permission_check(struct inode *inode, int mask, unsigned int flag
mode >>= 3;
}

+other_perms:
/*
* If the DACs are ok we don't need any capability check.
*/
@@ -230,7 +234,7 @@ int generic_permission(struct inode *inode, int mask, unsigned int flags,
* Executable DACs are overridable if at least one exec bit is set.
*/
if (!(mask & MAY_EXEC) || execute_ok(inode))
- if (capable(CAP_DAC_OVERRIDE))
+ if (ns_capable(inode_userns(inode), CAP_DAC_OVERRIDE))
return 0;

/*
@@ -238,7 +242,7 @@ int generic_permission(struct inode *inode, int mask, unsigned int flags,
*/
mask &= MAY_READ | MAY_WRITE | MAY_EXEC;
if (mask == MAY_READ || (S_ISDIR(inode->i_mode) && !(mask & MAY_WRITE)))
- if (capable(CAP_DAC_READ_SEARCH))
+ if (ns_capable(inode_userns(inode), CAP_DAC_READ_SEARCH))
return 0;

return -EACCES;
@@ -675,6 +679,7 @@ force_reval_path(struct path *path, struct nameidata *nd)
static inline int exec_permission(struct inode *inode, unsigned int flags)
{
int ret;
+ struct user_namespace *ns = inode_userns(inode);

if (inode->i_op->permission) {
ret = inode->i_op->permission(inode, MAY_EXEC, flags);
@@ -687,7 +692,7 @@ static inline int exec_permission(struct inode *inode, unsigned int flags)
if (ret == -ECHILD)
return ret;

- if (capable(CAP_DAC_OVERRIDE) || capable(CAP_DAC_READ_SEARCH))
+ if (ns_capable(ns, CAP_DAC_OVERRIDE) || ns_capable(ns, CAP_DAC_READ_SEARCH))
goto ok;

return ret;
@@ -1940,11 +1945,15 @@ static inline int check_sticky(struct inode *dir, struct inode *inode)

if (!(dir->i_mode & S_ISVTX))
return 0;
+ if (current_user_ns() != inode_userns(inode))
+ goto other_userns;
if (inode->i_uid == fsuid)
return 0;
if (dir->i_uid == fsuid)
return 0;
- return !capable(CAP_FOWNER);
+
+other_userns:
+ return !ns_capable(inode_userns(inode), CAP_FOWNER);
}

/*
@@ -2635,7 +2644,8 @@ int vfs_mknod(struct inode *dir, struct dentry *dentry, int mode, dev_t dev)
if (error)
return error;

- if ((S_ISCHR(mode) || S_ISBLK(mode)) && !capable(CAP_MKNOD))
+ if ((S_ISCHR(mode) || S_ISBLK(mode)) &&
+ !ns_capable(inode_userns(dir), CAP_MKNOD))
return -EPERM;

if (!dir->i_op->mknod)
diff --git a/include/linux/fs.h b/include/linux/fs.h
index bd32159..c84417a 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1446,8 +1446,13 @@ enum {
#define put_fs_excl() atomic_dec(&current->fs_excl)
#define has_fs_excl() atomic_read(&current->fs_excl)

-#define is_owner_or_cap(inode) \
- ((current_fsuid() == (inode)->i_uid) || capable(CAP_FOWNER))
+/*
+ * until VFS tracks user namespaces for inodes, just make all files
+ * belong to init_user_ns
+ */
+extern struct user_namespace init_user_ns;
+#define inode_userns(inode) (&init_user_ns)
+extern int is_owner_or_cap(const struct inode *inode);

/* not quite ready to be deprecated, but... */
extern void lock_super(struct super_block *);
--
1.7.0.4

_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containe rs
[PATCH 7/9] add a user namespace owner of ipc ns [message #41746 is a reply to message #41743] Thu, 17 February 2011 15:03 Go to previous messageGo to next message
serge is currently offline  serge
Messages: 72
Registered: January 2007
Member
Changelog:
Feb 15: Don't set new ipc->user_ns if we didn't create a new
ipc_ns.

Signed-off-by: Serge E. Hallyn <serge.hallyn@canonical.com>
---
include/linux/ipc_namespace.h | 3 +++
ipc/msgutil.c | 3 +++
ipc/namespace.c | 9 +++++++--
kernel/nsproxy.c | 5 +++++
4 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/include/linux/ipc_namespace.h b/include/linux/ipc_namespace.h
index 5195298..46d2eb4 100644
--- a/include/linux/ipc_namespace.h
+++ b/include/linux/ipc_namespace.h
@@ -24,6 +24,7 @@ struct ipc_ids {
struct idr ipcs_idr;
};

+struct user_namespace;
struct ipc_namespace {
atomic_t count;
struct ipc_ids ids[3];
@@ -56,6 +57,8 @@ struct ipc_namespace {
unsigned int mq_msg_max; /* initialized to DFLT_MSGMAX */
unsigned int mq_msgsize_max; /* initialized to DFLT_MSGSIZEMAX */

+ /* user_ns which owns the ipc ns */
+ struct user_namespace *user_ns;
};

extern struct ipc_namespace init_ipc_ns;
diff --git a/ipc/msgutil.c b/ipc/msgutil.c
index f095ee2..d91ff4b 100644
--- a/ipc/msgutil.c
+++ b/ipc/msgutil.c
@@ -20,6 +20,8 @@

DEFINE_SPINLOCK(mq_lock);

+extern struct user_namespace init_user_ns;
+
/*
* The next 2 defines are here bc this is the only file
* compiled when either CONFIG_SYSVIPC and CONFIG_POSIX_MQUEUE
@@ -32,6 +34,7 @@ struct ipc_namespace init_ipc_ns = {
.mq_msg_max = DFLT_MSGMAX,
.mq_msgsize_max = DFLT_MSGSIZEMAX,
#endif
+ .user_ns = &init_user_ns,
};

atomic_t nr_ipc_ns = ATOMIC_INIT(1);
diff --git a/ipc/namespace.c b/ipc/namespace.c
index a1094ff..aa18899 100644
--- a/ipc/namespace.c
+++ b/ipc/namespace.c
@@ -11,10 +11,11 @@
#include <linux/slab.h>
#include <linux/fs.h>
#include <linux/mount.h>
+#include <linux/user_namespace.h>

#include "util.h"

-static struct ipc_namespace *create_ipc_ns(void)
+static struct ipc_namespace *create_ipc_ns(struct ipc_namespace *old_ns)
{
struct ipc_namespace *ns;
int err;
@@ -43,6 +44,9 @@ static struct ipc_namespace *create_ipc_ns(void)
ipcns_notify(IPCNS_CREATED);
register_ipcns_notifier(ns);

+ ns->user_ns = old_ns->user_ns;
+ get_user_ns(ns->user_ns);
+
return ns;
}

@@ -50,7 +54,7 @@ struct ipc_namespace *copy_ipcs(unsigned long flags, struct ipc_namespace *ns)
{
if (!(flags & CLONE_NEWIPC))
return get_ipc_ns(ns);
- return create_ipc_ns();
+ return create_ipc_ns(ns);
}

/*
@@ -105,6 +109,7 @@ static void free_ipc_ns(struct ipc_namespace *ns)
* order to have a correct value when recomputing msgmni.
*/
ipcns_notify(IPCNS_REMOVED);
+ put_user_ns(ns->user_ns);
}

/*
diff --git a/kernel/nsproxy.c b/kernel/nsproxy.c
index 034dc2e..b6dbff2 100644
--- a/kernel/nsproxy.c
+++ b/kernel/nsproxy.c
@@ -85,6 +85,11 @@ static struct nsproxy *create_new_namespaces(unsigned long flags,
err = PTR_ERR(new_nsp->ipc_ns);
goto out_ipc;
}
+ if (new_nsp->ipc_ns != tsk->nsproxy->ipc_ns) {
+ put_user_ns(new_nsp->ipc_ns->user_ns);
+ new_nsp->ipc_ns->user_ns = task_cred_xxx(tsk, user)->user_ns;
+ get_user_ns(new_nsp->ipc_ns->user_ns);
+ }

new_nsp->pid_ns = copy_pid_ns(flags, task_active_pid_ns(tsk));
if (IS_ERR(new_nsp->pid_ns)) {
--
1.7.0.4

_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containe rs
[PATCH 1/9] Add a user_namespace as creator/owner of uts_namespace [message #41747 is a reply to message #41743] Thu, 17 February 2011 15:02 Go to previous messageGo to next message
serge is currently offline  serge
Messages: 72
Registered: January 2007
Member
copy_process() handles CLONE_NEWUSER before the rest of the
namespaces. So in the case of clone(CLONE_NEWUSER|CLONE_NEWUTS)
the new uts namespace will have the new user namespace as its
owner. That is what we want, since we want root in that new
userns to be able to have privilege over it.

Changelog:
Feb 15: don't set uts_ns->user_ns if we didn't create
a new uts_ns.

Signed-off-by: Serge E. Hallyn <serge.hallyn@canonical.com>
---
include/linux/utsname.h | 3 +++
init/version.c | 2 ++
kernel/nsproxy.c | 5 +++++
kernel/user.c | 8 ++++++--
kernel/utsname.c | 4 ++++
5 files changed, 20 insertions(+), 2 deletions(-)

diff --git a/include/linux/utsname.h b/include/linux/utsname.h
index 69f3997..85171be 100644
--- a/include/linux/utsname.h
+++ b/include/linux/utsname.h
@@ -37,9 +37,12 @@ struct new_utsname {
#include <linux/nsproxy.h>
#include <linux/err.h>

+struct user_namespace;
+
struct uts_namespace {
struct kref kref;
struct new_utsname name;
+ struct user_namespace *user_ns;
};
extern struct uts_namespace init_uts_ns;

diff --git a/init/version.c b/init/version.c
index adff586..97bb86f 100644
--- a/init/version.c
+++ b/init/version.c
@@ -21,6 +21,7 @@ extern int version_string(LINUX_VERSION_CODE);
int version_string(LINUX_VERSION_CODE);
#endif

+extern struct user_namespace init_user_ns;
struct uts_namespace init_uts_ns = {
.kref = {
.refcount = ATOMIC_INIT(2),
@@ -33,6 +34,7 @@ struct uts_namespace init_uts_ns = {
.machine = UTS_MACHINE,
.domainname = UTS_DOMAINNAME,
},
+ .user_ns = &init_user_ns,
};
EXPORT_SYMBOL_GPL(init_uts_ns);

diff --git a/kernel/nsproxy.c b/kernel/nsproxy.c
index f74e6c0..034dc2e 100644
--- a/kernel/nsproxy.c
+++ b/kernel/nsproxy.c
@@ -74,6 +74,11 @@ static struct nsproxy *create_new_namespaces(unsigned long flags,
err = PTR_ERR(new_nsp->uts_ns);
goto out_uts;
}
+ if (new_nsp->uts_ns != tsk->nsproxy->uts_ns) {
+ put_user_ns(new_nsp->uts_ns->user_ns);
+ new_nsp->uts_ns->user_ns = task_cred_xxx(tsk, user)->user_ns;
+ get_user_ns(new_nsp->uts_ns->user_ns);
+ }

new_nsp->ipc_ns = copy_ipcs(flags, tsk->nsproxy->ipc_ns);
if (IS_ERR(new_nsp->ipc_ns)) {
diff --git a/kernel/user.c b/kernel/user.c
index 5c598ca..9e03e9c 100644
--- a/kernel/user.c
+++ b/kernel/user.c
@@ -17,9 +17,13 @@
#include <linux/module.h>
#include <linux/user_namespace.h>

+/*
+ * userns count is 1 for root user, 1 for init_uts_ns,
+ * and 1 for... ?
+ */
struct user_namespace init_user_ns = {
.kref = {
- .refcount = ATOMIC_INIT(2),
+ .refcount = ATOMIC_INIT(3),
},
.creator = &root_user,
};
@@ -47,7 +51,7 @@ static struct kmem_cache *uid_cachep;
*/
static DEFINE_SPINLOCK(uidhash_lock);

-/* root_user.__count is 2, 1 for init task cred, 1 for init_user_ns->creator */
+/* root_user.__count is 2, 1 for init task cred, 1 for init_user_ns->user_ns */
struct user_struct root_user = {
.__count = ATOMIC_INIT(2),
.processes = ATOMIC_INIT(1),
diff --git a/kernel/utsname.c b/kernel/utsname.c
index 8a82b4b..a7b3a8d 100644
--- a/kernel/utsname.c
+++ b/kernel/utsname.c
@@ -14,6 +14,7 @@
#include <linux/utsname.h>
#include <linux/err.h>
#include <linux/slab.h>
+#include <linux/user_namespace.h>

static struct uts_namespace *create_uts_ns(void)
{
@@ -40,6 +41,8 @@ static struct uts_namespace *clone_uts_ns(struct uts_namespace *old_ns)

down_read(&uts_sem);
memcpy(&ns->name, &old_ns->name, sizeof(ns->name));
+ ns->user_ns = old_ns->user_ns;
+ get_user_ns(ns->user_ns);
up_read(&uts_sem);
return ns;
}
@@ -71,5 +74,6 @@ void free_uts_ns(struct kref *kref)
struct uts_namespace *ns;

ns = container_of(kref, struct uts_namespace, kref);
+ put_user_ns(ns->user_ns);
kfree(ns);
}
--
1.7.0.4

_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containe rs
[PATCH 3/9] allow sethostname in a container [message #41748 is a reply to message #41743] Thu, 17 February 2011 15:03 Go to previous messageGo to next message
serge is currently offline  serge
Messages: 72
Registered: January 2007
Member
Signed-off-by: Serge E. Hallyn <serge.hallyn@canonical.com>
---
kernel/sys.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/kernel/sys.c b/kernel/sys.c
index 18da702..7a1bbad 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -1177,7 +1177,7 @@ SYSCALL_DEFINE2(sethostname, char __user *, name, int, len)
int errno;
char tmp[__NEW_UTS_LEN];

- if (!capable(CAP_SYS_ADMIN))
+ if (!ns_capable(current->nsproxy->uts_ns->user_ns, CAP_SYS_ADMIN))
return -EPERM;
if (len < 0 || len > __NEW_UTS_LEN)
return -EINVAL;
--
1.7.0.4

_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containe rs
[PATCH 4/9] allow killing tasks in your own or child userns [message #41749 is a reply to message #41743] Thu, 17 February 2011 15:03 Go to previous messageGo to next message
serge is currently offline  serge
Messages: 72
Registered: January 2007
Member
Changelog:
Dec 8: Fixed bug in my check_kill_permission pointed out by
Eric Biederman.
Dec 13: Apply Eric's suggestion to pass target task into kill_ok_by_cred()
for clarity
Dec 31: address comment by Eric Biederman:
don't need cred/tcred in check_kill_permission.
Jan 1: use const cred struct.
Jan 11: Per Bastian Blank's advice, clean up kill_ok_by_cred().
Feb 16: kill_ok_by_cred: fix bad parentheses

Signed-off-by: Serge E. Hallyn <serge.hallyn@canonical.com>
---
kernel/signal.c | 30 ++++++++++++++++++++++--------
1 files changed, 22 insertions(+), 8 deletions(-)

diff --git a/kernel/signal.c b/kernel/signal.c
index 4e3cff1..ffe4bdf 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -636,13 +636,33 @@ static inline bool si_fromuser(const struct siginfo *info)
}

/*
+ * called with RCU read lock from check_kill_permission()
+ */
+static inline int kill_ok_by_cred(struct task_struct *t)
+{
+ const struct cred *cred = current_cred();
+ const struct cred *tcred = __task_cred(t);
+
+ if (cred->user->user_ns == tcred->user->user_ns &&
+ (cred->euid == tcred->suid ||
+ cred->euid == tcred->uid ||
+ cred->uid == tcred->suid ||
+ cred->uid == tcred->uid))
+ return 1;
+
+ if (ns_capable(tcred->user->user_ns, CAP_KILL))
+ return 1;
+
+ return 0;
+}
+
+/*
* Bad permissions for sending the signal
* - the caller must hold the RCU read lock
*/
static int check_kill_permission(int sig, struct siginfo *info,
struct task_struct *t)
{
- const struct cred *cred, *tcred;
struct pid *sid;
int error;

@@ -656,14 +676,8 @@ static int check_kill_permission(int sig, struct siginfo *info,
if (error)
return error;

- cred = current_cred();
- tcred = __task_cred(t);
if (!same_thread_group(current, t) &&
- (cred->euid ^ tcred->suid) &&
- (cred->euid ^ tcred->uid) &&
- (cred->uid ^ tcred->suid) &&
- (cred->uid ^ tcred->uid) &&
- !capable(CAP_KILL)) {
+ !kill_ok_by_cred(t)) {
switch (sig) {
case SIGCONT:
sid = task_session(t);
--
1.7.0.4

_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containe rs
[PATCH 6/9] user namespaces: convert all capable checks in kernel/sys.c [message #41750 is a reply to message #41743] Thu, 17 February 2011 15:03 Go to previous messageGo to next message
serge is currently offline  serge
Messages: 72
Registered: January 2007
Member
This allows setuid/setgid in containers. It also fixes some
corner cases where kernel logic foregoes capability checks when
uids are equivalent. The latter will need to be done throughout
the whole kernel.

Changelog:
Jan 11: Use nsown_capable() as suggested by Bastian Blank.
Jan 11: Fix logic errors in uid checks pointed out by Bastian.
Feb 15: allow prlimit to current (was regression in previous version)

Signed-off-by: Serge E. Hallyn <serge.hallyn@canonical.com>
---
kernel/sys.c | 74 ++++++++++++++++++++++++++++++++++++---------------------
1 files changed, 47 insertions(+), 27 deletions(-)

diff --git a/kernel/sys.c b/kernel/sys.c
index 7a1bbad..075370d 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -118,17 +118,29 @@ EXPORT_SYMBOL(cad_pid);

void (*pm_power_off_prepare)(void);

+/* called with rcu_read_lock, creds are safe */
+static inline int set_one_prio_perm(struct task_struct *p)
+{
+ const struct cred *cred = current_cred(), *pcred = __task_cred(p);
+
+ if (pcred->user->user_ns == cred->user->user_ns &&
+ (pcred->uid == cred->euid ||
+ pcred->euid == cred->euid))
+ return 1;
+ if (ns_capable(pcred->user->user_ns, CAP_SYS_NICE))
+ return 1;
+ return 0;
+}
+
/*
* set the priority of a task
* - the caller must hold the RCU read lock
*/
static int set_one_prio(struct task_struct *p, int niceval, int error)
{
- const struct cred *cred = current_cred(), *pcred = __task_cred(p);
int no_nice;

- if (pcred->uid != cred->euid &&
- pcred->euid != cred->euid && !capable(CAP_SYS_NICE)) {
+ if (!set_one_prio_perm(p)) {
error = -EPERM;
goto out;
}
@@ -502,7 +514,7 @@ SYSCALL_DEFINE2(setregid, gid_t, rgid, gid_t, egid)
if (rgid != (gid_t) -1) {
if (old->gid == rgid ||
old->egid == rgid ||
- capable(CAP_SETGID))
+ nsown_capable(CAP_SETGID))
new->gid = rgid;
else
goto error;
@@ -511,7 +523,7 @@ SYSCALL_DEFINE2(setregid, gid_t, rgid, gid_t, egid)
if (old->gid == egid ||
old->egid == egid ||
old->sgid == egid ||
- capable(CAP_SETGID))
+ nsown_capable(CAP_SETGID))
new->egid = egid;
else
goto error;
@@ -546,7 +558,7 @@ SYSCALL_DEFINE1(setgid, gid_t, gid)
old = current_cred();

retval = -EPERM;
- if (capable(CAP_SETGID))
+ if (nsown_capable(CAP_SETGID))
new->gid = new->egid = new->sgid = new->fsgid = gid;
else if (gid == old->gid || gid == old->sgid)
new->egid = new->fsgid = gid;
@@ -613,7 +625,7 @@ SYSCALL_DEFINE2(setreuid, uid_t, ruid, uid_t, euid)
new->uid = ruid;
if (old->uid != ruid &&
old->euid != ruid &&
- !capable(CAP_SETUID))
+ !nsown_capable(CAP_SETUID))
goto error;
}

@@ -622,7 +634,7 @@ SYSCALL_DEFINE2(setreuid, uid_t, ruid, uid_t, euid)
if (old->uid != euid &&
old->euid != euid &&
old->suid != euid &&
- !capable(CAP_SETUID))
+ !nsown_capable(CAP_SETUID))
goto error;
}

@@ -670,7 +682,7 @@ SYSCALL_DEFINE1(setuid, uid_t, uid)
old = current_cred();

retval = -EPERM;
- if (capable(CAP_SETUID)) {
+ if (nsown_capable(CAP_SETUID)) {
new->suid = new->uid = uid;
if (uid != old->uid) {
retval = set_user(new);
@@ -712,7 +724,7 @@ SYSCALL_DEFINE3(setresuid, uid_t, ruid, uid_t, euid, uid_t, suid)
old = current_cred();

retval = -EPERM;
- if (!capable(CAP_SETUID)) {
+ if (!nsown_capable(CAP_SETUID)) {
if (ruid != (uid_t) -1 && ruid != old->uid &&
ruid != old->euid && ruid != old->suid)
goto error;
@@ -776,7 +788,7 @@ SYSCALL_DEFINE3(setresgid, gid_t, rgid, gid_t, egid, gid_t, sgid)
old = current_cred();

retval = -EPERM;
- if (!capable(CAP_SETGID)) {
+ if (!nsown_capable(CAP_SETGID)) {
if (rgid != (gid_t) -1 && rgid != old->gid &&
rgid != old->egid && rgid != old->sgid)
goto error;
@@ -836,7 +848,7 @@ SYSCALL_DEFINE1(setfsuid, uid_t, uid)

if (uid == old->uid || uid == old->euid ||
uid == old->suid || uid == old->fsuid ||
- capable(CAP_SETUID)) {
+ nsown_capable(CAP_SETUID)) {
if (uid != old_fsuid) {
new->fsuid = uid;
if (security_task_fix_setuid(new, old, LSM_SETID_FS) == 0)
@@ -869,7 +881,7 @@ SYSCALL_DEFINE1(setfsgid, gid_t, gid)

if (gid == old->gid || gid == old->egid ||
gid == old->sgid || gid == old->fsgid ||
- capable(CAP_SETGID)) {
+ nsown_capable(CAP_SETGID)) {
if (gid != old_fsgid) {
new->fsgid = gid;
goto change_okay;
@@ -1177,8 +1189,11 @@ SYSCALL_DEFINE2(sethostname, char __user *, name, int, len)
int errno;
char tmp[__NEW_UTS_LEN];

- if (!ns_capable(current->nsproxy->uts_ns->user_ns, CAP_SYS_ADMIN))
+ if (!ns_capable(current->nsproxy->uts_ns->user_ns, CAP_SYS_ADMIN)) {
+ printk(KERN_NOTICE "%s: did not have CAP_SYS_ADMIN\n", __func__);
return -EPERM;
+ }
+ printk(KERN_NOTICE "%s: did have CAP_SYS_ADMIN\n", __func__);
if (len < 0 || len > __NEW_UTS_LEN)
return -EINVAL;
down_write(&uts_sem);
@@ -1226,7 +1241,7 @@ SYSCALL_DEFINE2(setdomainname, char __user *, name, int, len)
int errno;
char tmp[__NEW_UTS_LEN];

- if (!capable(CAP_SYS_ADMIN))
+ if (!ns_capable(current->nsproxy->uts_ns->user_ns, CAP_SYS_ADMIN))
return -EPERM;
if (len < 0 || len > __NEW_UTS_LEN)
return -EINVAL;
@@ -1341,6 +1356,8 @@ int do_prlimit(struct task_struct *tsk, unsigned int resource,
rlim = tsk->signal->rlim + resource;
task_lock(tsk->group_leader);
if (new_rlim) {
+ /* Keep the capable check against init_user_ns until
+ cgroups can contain all limits */
if (new_rlim->rlim_max > rlim->rlim_max &&
!capable(CAP_SYS_RESOURCE))
retval = -EPERM;
@@ -1384,19 +1401,22 @@ static int check_prlimit_permission(struct task_struct *task)
{
const struct cred *cred = current_cred(), *tcred;

- tcred = __task_cred(task);
- if (current != task &&
- (cred->uid != tcred->euid ||
- cred->uid != tcred->suid ||
- cred->uid != tcred->uid ||
- cred->gid != tcred->egid ||
- cred->gid != tcred->sgid ||
- cred->gid != tcred->gid) &&
- !capable(CAP_SYS_RESOURCE)) {
- return -EPERM;
- }
+ if (current == task)
+ return 0;

- return 0;
+ tcred = __task_cred(task);
+ if (cred->user->user_ns == tcred->user->user_ns &&
+ (cred->uid == tcred->euid &&
+ cred->uid == tcred->suid &&
+ cred->uid == tcred->uid &&
+ cred->gid == tcred->egid &&
+ cred->gid == tcred->sgid &&
+ cred->gid == tcred->gid))
+ return 0;
+ if (ns_capable(tcred->user->user_ns, CAP_SYS_RESOURCE))
+ return 0;
+
+ return -EPERM;
}

SYSCALL_DEFINE4(prlimit64, pid_t, pid, unsigned int, resource,
--
1.7.0.4

_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containe rs
[PATCH 5/9] Allow ptrace from non-init user namespaces [message #41751 is a reply to message #41743] Thu, 17 February 2011 15:03 Go to previous messageGo to next message
serge is currently offline  serge
Messages: 72
Registered: January 2007
Member
ptrace is allowed to tasks in the same user namespace according to
the usual rules (i.e. the same rules as for two tasks in the init
user namespace). ptrace is also allowed to a user namespace to
which the current task the has CAP_SYS_PTRACE capability.

Changelog:
Dec 31: Address feedback by Eric:
. Correct ptrace uid check
. Rename may_ptrace_ns to ptrace_capable
. Also fix the cap_ptrace checks.
Jan 1: Use const cred struct
Jan 11: use task_ns_capable() in place of ptrace_capable().

Signed-off-by: Serge E. Hallyn <serge.hallyn@canonical.com>
---
include/linux/capability.h | 2 +
include/linux/user_namespace.h | 9 +++++++
kernel/ptrace.c | 27 ++++++++++++----------
kernel/user_namespace.c | 16 +++++++++++++
security/commoncap.c | 48 +++++++++++++++++++++++++++++++++------
5 files changed, 82 insertions(+), 20 deletions(-)

diff --git a/include/linux/capability.h b/include/linux/capability.h
index cb3d2d9..bc0f262 100644
--- a/include/linux/capability.h
+++ b/include/linux/capability.h
@@ -546,6 +546,8 @@ extern const kernel_cap_t __cap_init_eff_set;
*/
#define has_capability(t, cap) (security_real_capable((t), &init_user_ns, (cap)) == 0)

+#define has_ns_capability(t, ns, cap) (security_real_capable((t), (ns), (cap)) == 0)
+
/**
* has_capability_noaudit - Determine if a task has a superior capability available (unaudited)
* @t: The task in question
diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
index faf4679..862fc59 100644
--- a/include/linux/user_namespace.h
+++ b/include/linux/user_namespace.h
@@ -39,6 +39,9 @@ static inline void put_user_ns(struct user_namespace *ns)
uid_t user_ns_map_uid(struct user_namespace *to, const struct cred *cred, uid_t uid);
gid_t user_ns_map_gid(struct user_namespace *to, const struct cred *cred, gid_t gid);

+int same_or_ancestor_user_ns(struct task_struct *task,
+ struct task_struct *victim);
+
#else

static inline struct user_namespace *get_user_ns(struct user_namespace *ns)
@@ -66,6 +69,12 @@ static inline gid_t user_ns_map_gid(struct user_namespace *to,
return gid;
}

+static inline int same_or_ancestor_user_ns(struct task_struct *task,
+ struct task_struct *victim)
+{
+ return 1;
+}
+
#endif

#endif /* _LINUX_USER_H */
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 1708b1e..cde4655 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -134,21 +134,24 @@ int __ptrace_may_access(struct task_struct *task, unsigned int mode)
return 0;
rcu_read_lock();
tcred = __task_cred(task);
- if ((cred->uid != tcred->euid ||
- cred->uid != tcred->suid ||
- cred->uid != tcred->uid ||
- cred->gid != tcred->egid ||
- cred->gid != tcred->sgid ||
- cred->gid != tcred->gid) &&
- !capable(CAP_SYS_PTRACE)) {
- rcu_read_unlock();
- return -EPERM;
- }
+ if (cred->user->user_ns == tcred->user->user_ns &&
+ (cred->uid == tcred->euid &&
+ cred->uid == tcred->suid &&
+ cred->uid == tcred->uid &&
+ cred->gid == tcred->egid &&
+ cred->gid == tcred->sgid &&
+ cred->gid == tcred->gid))
+ goto ok;
+ if (ns_capable(tcred->user->user_ns, CAP_SYS_PTRACE))
+ goto ok;
+ rcu_read_unlock();
+ return -EPERM;
+ok:
rcu_read_unlock();
smp_rmb();
if (task->mm)
dumpable = get_dumpable(task->mm);
- if (!dumpable && !capable(CAP_SYS_PTRACE))
+ if (!dumpable && !task_ns_capable(task, CAP_SYS_PTRACE))
return -EPERM;

return security_ptrace_access_check(task, mode);
@@ -198,7 +201,7 @@ int ptrace_attach(struct task_struct *task)
goto unlock_tasklist;

task->ptrace = PT_PTRACED;
- if (capable(CAP_SYS_PTRACE))
+ if (task_ns_capable(task, CAP_SYS_PTRACE))
task->ptrace |= PT_PTRACE_CAP;

__ptrace_link(task, current);
diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
index 9da289c..0ef2258 100644
--- a/kernel/user_namespace.c
+++ b/kernel/user_namespace.c
@@ -129,6 +129,22 @@ gid_t user_ns_map_gid(struct user_namespace *to, const struct cred *cred, gid_t
return overflowgid;
}

+int same_or_ancestor_user_ns(struct task_struct *task,
+ struct task_struct *victim)
+{
+ struct user_namespace *u1 = task_cred_xxx(task, user)->user_ns;
+ struct user_namespace *u2 = task_cred_xxx(victim, user)->user_ns;
+ for (;;) {
+ if (u1 == u2)
+ return 1;
+ if (u1 == &init_user_ns)
+ return 0;
+ u1 = u1->creator->user_ns;
+ }
+ /* We never get here */
+ return 0;
+}
+
static __init int user_namespaces_init(void)
{
user_ns_cachep = KMEM_CACHE(user_namespace, SLAB_PANIC);
diff --git a/security/commoncap.c b/security/commoncap.c
index 51fa9ec..12ff65c 100644
--- a/security/commoncap.c
+++ b/security/commoncap.c
@@ -130,18 +130,34 @@ int cap_settime(struct timespec *ts, struct timezone *tz)
* @child: The process to be accessed
* @mode: The mode of attachment.
*
+ * If we are in the same or an ancestor user_ns and have all the target
+ * task's capabilities, then ptrace access is allowed.
+ * If we have the ptrace capability to the target user_ns, then ptrace
+ * access is allowed.
+ * Else denied.
+ *
* Determine whether a process may access another, returning 0 if permission
* granted, -ve if denied.
*/
int cap_ptrace_access_check(struct task_struct *child, unsigned int mode)
{
int ret = 0;
+ const struct cred *cred, *tcred;

rcu_read_lock();
- if (!cap_issubset(__task_cred(child)->cap_permitted,
- current_cred()->cap_permitted) &&
- !capable(CAP_SYS_PTRACE))
- ret = -EPERM;
+ cred = current_cred();
+ tcred = __task_cred(child);
+ /*
+ * The ancestor user_ns check may be gratuitous, as I think
+ * we've already guaranteed that in kernel/ptrace.c.
+ */
+ if (same_or_ancestor_user_ns(current, child) &&
+ cap_issubset(tcred->cap_permitted, cred->cap_permitted))
+ goto out;
+ if (ns_capable(tcred->user->user_ns, CAP_SYS_PTRACE))
+ goto out;
+ ret = -EPERM;
+out:
rcu_read_unlock();
return ret;
}
@@ -150,18 +166,34 @@ int cap_ptrace_access_check(struct task_struct *child, unsigned int mode)
* cap_ptrace_traceme - Determine whether another process may trace the current
* @parent: The task proposed to be the tracer
*
+ * If parent is in the same or an ancestor user_ns and has all current's
+ * capabilities, then ptrace access is allowed.
+ * If parent has the ptrace capability to current's user_ns, then ptrace
+ * access is allowed.
+ * Else denied.
+ *
* Determine whether the nominated task is permitted to trace the current
* process, returning 0 if permission is granted, -ve if denied.
*/
int cap_ptrace_traceme(struct task_struct *parent)
{
int ret = 0;
+ const struct cred *cred, *tcred;

rcu_read_lock();
- if (!cap_issubset(current_cred()->cap_permitted,
- __task_cred(parent)->cap_permitted) &&
- !has_capability(parent, CAP_SYS_PTRACE))
- ret = -EPERM;
+ cred = __task_cred(parent);
+ tcred = current_cred();
+ /*
+ * The ancestor user_ns check may be gratuitous, as I think
+ * we've already guaranteed that in kernel/ptrace.c.
+ */
+ if (same_or_ancestor_user_ns(parent, current) &&
+ cap_issubset(tcred->cap_permitted, cred->cap_permitted))
+ goto out;
+ if (has_ns_capability(parent, tcred->user->user_ns, CAP_SYS_PTRACE))
+ goto out;
+ ret = -EPERM;
+out:
rcu_read_unlock();
return ret;
}
--
1.7.0.4

_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containe rs
[PATCH 8/9] user namespaces: convert several capable() calls [message #41752 is a reply to message #41743] Thu, 17 February 2011 15:03 Go to previous messageGo to next message
serge is currently offline  serge
Messages: 72
Registered: January 2007
Member
CAP_IPC_OWNER and CAP_IPC_LOCK can be checked against current_user_ns(),
because the resource comes from current's own ipc namespace.

setuid/setgid are to uids in own namespace, so again checks can be
against current_user_ns().

Changelog:
Jan 11: Use task_ns_capable() in place of sched_capable().
Jan 11: Use nsown_capable() as suggested by Bastian Blank.
Jan 11: Clarify (hopefully) some logic in futex and sched.c
Feb 15: use ns_capable for ipc, not nsown_capable

Signed-off-by: Serge E. Hallyn <serge.hallyn@canonical.com>
---
ipc/shm.c | 2 +-
ipc/util.c | 5 +++--
kernel/futex.c | 11 ++++++++++-
kernel/futex_compat.c | 11 ++++++++++-
kernel/groups.c | 2 +-
kernel/sched.c | 9 ++++++---
kernel/uid16.c | 2 +-
7 files changed, 32 insertions(+), 10 deletions(-)

diff --git a/ipc/shm.c b/ipc/shm.c
index 7d3bb22..e91e2e9 100644
--- a/ipc/shm.c
+++ b/ipc/shm.c
@@ -773,7 +773,7 @@ SYSCALL_DEFINE3(shmctl, int, shmid, int, cmd, struct shmid_ds __user *, buf)

audit_ipc_obj(&(shp->shm_perm));

- if (!capable(CAP_IPC_LOCK)) {
+ if (!ns_capable(ns->user_ns, CAP_IPC_LOCK)) {
uid_t euid = current_euid();
err = -EPERM;
if (euid != shp->shm_perm.uid &&
diff --git a/ipc/util.c b/ipc/util.c
index 69a0cc1..8e7ec6a 100644
--- a/ipc/util.c
+++ b/ipc/util.c
@@ -627,7 +627,7 @@ int ipcperms (struct kern_ipc_perm *ipcp, short flag)
granted_mode >>= 3;
/* is there some bit set in requested_mode but not in granted_mode? */
if ((requested_mode & ~granted_mode & 0007) &&
- !capable(CAP_IPC_OWNER))
+ !ns_capable(current->nsproxy->ipc_ns->user_ns, CAP_IPC_OWNER))
return -1;

return security_ipc_permission(ipcp, flag);
@@ -800,7 +800,8 @@ struct kern_ipc_perm *ipcctl_pre_down(struct ipc_ids *ids, int id, int cmd,

euid = current_euid();
if (euid == ipcp->cuid ||
- euid == ipcp->uid || capable(CAP_SYS_ADMIN))
+ euid == ipcp->uid ||
+ ns_capable(current->nsproxy->ipc_ns->user_ns, CAP_SYS_ADMIN))
return ipcp;

err = -EPERM;
diff --git a/kernel/futex.c b/kernel/futex.c
index b766d28..1e876f1 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -2421,10 +2421,19 @@ SYSCALL_DEFINE3(get_robust_list, int, pid,
goto err_unlock;
ret = -EPERM;
pcred = __task_cred(p);
+ /* If victim is in different user_ns, then uids are not
+ comparable, so we must have CAP_SYS_PTRACE */
+ if (cred->user->user_ns != pcred->user->user_ns) {
+ if (!ns_capable(pcred->user->user_ns, CAP_SYS_PTRACE))
+ goto err_unlock;
+ goto ok;
+ }
+ /* If victim is in same user_ns, then uids are comparable */
if (cred->euid != pcred->euid &&
cred->euid != pcred->uid &&
- !capable(CAP_SYS_PTRACE))
+ !ns_capable(pcred->user->user_ns, CAP_SYS_PTRACE))
goto err_unlock;
+ok:
head = p->robust_list;
rcu_read_unlock();
}
diff --git a/kernel/futex_compat.c b/kernel/futex_compat.c
index a7934ac..5f9e689 100644
--- a/kernel/futex_compat.c
+++ b/kernel/futex_compat.c
@@ -153,10 +153,19 @@ compat_sys_get_robust_list(int pid, compat_uptr_t __user *head_ptr,
goto err_unlock;
ret = -EPERM;
pcred = __task_cred(p);
+ /* If victim is in different user_ns, then uids are not
+ comparable, so we must have CAP_SYS_PTRACE */
+ if (cred->user->user_ns != pcred->user->user_ns) {
+ if (!ns_capable(pcred->user->user_ns, CAP_SYS_PTRACE))
+ goto err_unlock;
+ goto ok;
+ }
+ /* If victim is in same user_ns, then uids are comparable */
if (cred->euid != pcred->euid &&
cred->euid != pcred->uid &&
- !capable(CAP_SYS_PTRACE))
+ !ns_capable(pcred->user->user_ns, CAP_SYS_PTRACE))
goto err_unlock;
+ok:
head = p->compat_robust_list;
rcu_read_unlock();
}
diff --git a/kernel/groups.c b/kernel/groups.c
index 253dc0f..1cc476d 100644
--- a/kernel/groups.c
+++ b/kernel/groups.c
@@ -233,7 +233,7 @@ SYSCALL_DEFINE2(setgroups, int, gidsetsize, gid_t __user *, grouplist)
struct group_info *group_info;
int retval;

- if (!capable(CAP_SETGID))
+ if (!nsown_capable(CAP_SETGID))
return -EPERM;
if ((unsigned)gidsetsize > NGROUPS_MAX)
return -EINVAL;
diff --git a/kernel/sched.c b/kernel/sched.c
index 18d38e4..dc12bc2 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -4761,8 +4761,11 @@ static bool check_same_owner(struct task_struct *p)

rcu_read_lock();
pcred = __task_cred(p);
- match = (cred->euid == pcred->euid ||
- cred->euid == pcred->uid);
+ if (cred->user->user_ns == pcred->user->user_ns)
+ match = (cred->euid == pcred->euid ||
+ cred->euid == pcred->uid);
+ else
+ match = false;
rcu_read_unlock();
return match;
}
@@ -5088,7 +5091,7 @@ long sched_setaffinity(pid_t pid, const struct cpumask *in_mask)
goto out_free_cpus_allowed;
}
retval = -EPERM;
- if (!check_same_owner(p) && !capable(CAP_SYS_NICE))
+ if (!check_same_owner(p) && !task_ns_capable(p, CAP_SYS_NICE))
goto out_unlock;

retval = security_task_setscheduler(p);
diff --git a/kernel/uid16.c b/kernel/uid16.c
index 4192098..51c6e89 100644
--- a/kernel/uid16.c
+++ b/kernel/uid16.c
@@ -189,7 +189,7 @@ SYSCALL_DEFINE2(setgroups16, int, gidsetsize, old_gid_t __user *, grouplist)
struct group_info *group_info;
int retval;

- if (!capable(CAP_SETGID))
+ if (!nsown_capable(CAP_SETGID))
return -EPERM;
if ((unsigned)gidsetsize > NGROUPS_MAX)
return -EINVAL;
--
1.7.0.4

_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containe rs
Re: userns: targeted capabilities v5 [message #41758 is a reply to message #41743] Fri, 18 February 2011 00:21 Go to previous messageGo to next message
akpm is currently offline  akpm
Messages: 224
Registered: March 2007
Senior Member
On Thu, 17 Feb 2011 15:02:24 +0000
"Serge E. Hallyn" <serge@hallyn.com> wrote:

> Here is a repost of my previous user namespace patch, ported onto
> last night's git head.
>
> It fixes several things I was doing wrong in the last (v4)
> posting, in particular:
>
> 1. don't set uts_ns->user_ns to current's when !CLONE_NEWUTS
> 2. add a ipc_ns->user_ns which owns ipc_ns, and use that to
> decide CAP_IPC_OWNER
> 3. fix logic flaw caused by bad parantheses
> 4. allow do_prlimit to current
> 5. don't always give root full privs to init_user_ns
>
> The expected course of development for user namespaces is laid out
> at https://wiki.ubuntu.com/UserNamespace.

Seems like a nice feature to be developing.

I worry about the maturity of it all at this stage. How far along is
it *really*?

Is anyone else working with you on developing and reviewing this work?

_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containe rs
Re: [PATCH 9/9] userns: check user namespace for task-&gt;file uid equivalence checks [message #41760 is a reply to message #41745] Fri, 18 February 2011 01:29 Go to previous messageGo to next message
ebiederm is currently offline  ebiederm
Messages: 1354
Registered: February 2006
Senior Member
"Serge E. Hallyn" <serge@hallyn.com> writes:

> Cheat for now and say all files belong to init_user_ns. Next
> step will be to let superblocks belong to a user_ns, and derive
> inode_userns(inode) from inode->i_sb->s_user_ns. Finally we'll
> introduce more flexible arrangements.

This looks good. I am a little worried that a concept like
inode_user_ns will imply that there is only ever one.

However this looks like a good place to start and it will only
be strange filesystems that implement a notion of permissions
that is namespace aware so I don't expect the generic code
needs to handle that case other than allowing the permission checks
to be overridden.

Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>

>
> Changelog:
> Feb 15: make is_owner_or_cap take const struct inode
>
> Signed-off-by: Serge E. Hallyn <serge.hallyn@canonical.com>
> ---
> fs/inode.c | 17 +++++++++++++++++
> fs/namei.c | 20 +++++++++++++++-----
> include/linux/fs.h | 9 +++++++--
> 3 files changed, 39 insertions(+), 7 deletions(-)
>
> diff --git a/fs/inode.c b/fs/inode.c
> index da85e56..1930b45 100644
> --- a/fs/inode.c
> +++ b/fs/inode.c
> @@ -25,6 +25,7 @@
> #include <linux/async.h>
> #include <linux/posix_acl.h>
> #include <linux/ima.h>
> +#include <linux/cred.h>
>
> /*
> * This is needed for the following functions:
> @@ -1722,3 +1723,19 @@ void inode_init_owner(struct inode *inode, const struct inode *dir,
> inode->i_mode = mode;
> }
> EXPORT_SYMBOL(inode_init_owner);
> +
> +/*
> + * return 1 if current either has CAP_FOWNER to the
> + * file, or owns the file.
> + */
> +int is_owner_or_cap(const struct inode *inode)
> +{
> + struct user_namespace *ns = inode_userns(inode);
> +
> + if (current_user_ns() == ns && current_fsuid() == inode->i_uid)
> + return 1;
> + if (ns_capable(ns, CAP_FOWNER))
> + return 1;
> + return 0;
> +}
> +EXPORT_SYMBOL(is_owner_or_cap);
> diff --git a/fs/namei.c b/fs/namei.c
> index 9e701e2..cfac5b4 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -176,6 +176,9 @@ static int acl_permission_check(struct inode *inode, int mask, unsigned int flag
>
> mask &= MAY_READ | MAY_WRITE | MAY_EXEC;
>
> + if (current_user_ns() != inode_userns(inode))
> + goto other_perms;
> +
> if (current_fsuid() == inode->i_uid)
> mode >>= 6;
> else {
> @@ -189,6 +192,7 @@ static int acl_permission_check(struct inode *inode, int mask, unsigned int flag
> mode >>= 3;
> }
>
> +other_perms:
> /*
> * If the DACs are ok we don't need any capability check.
> */
> @@ -230,7 +234,7 @@ int generic_permission(struct inode *inode, int mask, unsigned int flags,
> * Executable DACs are overridable if at least one exec bit is set.
> */
> if (!(mask & MAY_EXEC) || execute_ok(inode))
> - if (capable(CAP_DAC_OVERRIDE))
> + if (ns_capable(inode_userns(inode), CAP_DAC_OVERRIDE))
> return 0;
>
> /*
> @@ -238,7 +242,7 @@ int generic_permission(struct inode *inode, int mask, unsigned int flags,
> */
> mask &= MAY_READ | MAY_WRITE | MAY_EXEC;
> if (mask == MAY_READ || (S_ISDIR(inode->i_mode) && !(mask & MAY_WRITE)))
> - if (capable(CAP_DAC_READ_SEARCH))
> + if (ns_capable(inode_userns(inode), CAP_DAC_READ_SEARCH))
> return 0;
>
> return -EACCES;
> @@ -675,6 +679,7 @@ force_reval_path(struct path *path, struct nameidata *nd)
> static inline int exec_permission(struct inode *inode, unsigned int flags)
> {
> int ret;
> + struct user_namespace *ns = inode_userns(inode);
>
> if (inode->i_op->permission) {
> ret = inode->i_op->permission(inode, MAY_EXEC, flags);
> @@ -687,7 +692,7 @@ static inline int exec_permission(struct inode *inode, unsigned int flags)
> if (ret == -ECHILD)
> return ret;
>
> - if (capable(CAP_DAC_OVERRIDE) || capable(CAP_DAC_READ_SEARCH))
> + if (ns_capable(ns, CAP_DAC_OVERRIDE) || ns_capable(ns, CAP_DAC_READ_SEARCH))
> goto ok;
>
> return ret;
> @@ -1940,11 +1945,15 @@ static inline int check_sticky(struct inode *dir, struct inode *inode)
>
> if (!(dir->i_mode & S_ISVTX))
> return 0;
> + if (current_user_ns() != inode_userns(inode))
> + goto other_userns;
> if (inode->i_uid == fsuid)
> return 0;
> if (dir->i_uid == fsuid)
> return 0;
> - return !capable(CAP_FOWNER);
> +
> +other_userns:
> + return !ns_capable(inode_userns(inode), CAP_FOWNER);
> }
>
> /*
> @@ -2635,7 +2644,8 @@ int vfs_mknod(struct inode *dir, struct dentry *dentry, int mode, dev_t dev)
> if (error)
> return error;
>
> - if ((S_ISCHR(mode) || S_ISBLK(mode)) && !capable(CAP_MKNOD))
> + if ((S_ISCHR(mode) || S_ISBLK(mode)) &&
> + !ns_capable(inode_userns(dir), CAP_MKNOD))
> return -EPERM;
>
> if (!dir->i_op->mknod)
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index bd32159..c84417a 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -1446,8 +1446,13 @@ enum {
> #define put_fs_excl() atomic_dec(&current->fs_excl)
> #define has_fs_excl() atomic_read(&current->fs_excl)
>
> -#define is_owner_or_cap(inode) \
> - ((current_fsuid() == (inode)->i_uid) || capable(CAP_FOWNER))
> +/*
> + * until VFS tracks user namespaces for inodes, just make all files
> + * belong to init_user_ns
> + */
> +extern struct user_namespace init_user_ns;
> +#define inode_userns(inode) (&init_user_ns)
> +extern int is_owner_or_cap(const struct inode *inode);
>
> /* not quite ready to be deprecated, but... */
> extern void lock_super(struct super_block *);
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containe rs
Re: [PATCH 8/9] user namespaces: convert several capable() calls [message #41761 is a reply to message #41752] Fri, 18 February 2011 01:51 Go to previous messageGo to next message
ebiederm is currently offline  ebiederm
Messages: 1354
Registered: February 2006
Senior Member
"Serge E. Hallyn" <serge@hallyn.com> writes:

> CAP_IPC_OWNER and CAP_IPC_LOCK can be checked against current_user_ns(),
> because the resource comes from current's own ipc namespace.
>
> setuid/setgid are to uids in own namespace, so again checks can be
> against current_user_ns().

Some nits below. But this generally looks good if a little bit all over
the map for a single patch.

Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>


> Changelog:
> Jan 11: Use task_ns_capable() in place of sched_capable().
> Jan 11: Use nsown_capable() as suggested by Bastian Blank.
> Jan 11: Clarify (hopefully) some logic in futex and sched.c
> Feb 15: use ns_capable for ipc, not nsown_capable
>
> Signed-off-by: Serge E. Hallyn <serge.hallyn@canonical.com>
> ---
> ipc/shm.c | 2 +-
> ipc/util.c | 5 +++--
> kernel/futex.c | 11 ++++++++++-
> kernel/futex_compat.c | 11 ++++++++++-
> kernel/groups.c | 2 +-
> kernel/sched.c | 9 ++++++---
> kernel/uid16.c | 2 +-
> 7 files changed, 32 insertions(+), 10 deletions(-)
>
> diff --git a/ipc/shm.c b/ipc/shm.c
> index 7d3bb22..e91e2e9 100644
> --- a/ipc/shm.c
> +++ b/ipc/shm.c
> @@ -773,7 +773,7 @@ SYSCALL_DEFINE3(shmctl, int, shmid, int, cmd, struct shmid_ds __user *, buf)
>
> audit_ipc_obj(&(shp->shm_perm));
>
> - if (!capable(CAP_IPC_LOCK)) {
> + if (!ns_capable(ns->user_ns, CAP_IPC_LOCK)) {
> uid_t euid = current_euid();
> err = -EPERM;
> if (euid != shp->shm_perm.uid &&
> diff --git a/ipc/util.c b/ipc/util.c
> index 69a0cc1..8e7ec6a 100644
> --- a/ipc/util.c
> +++ b/ipc/util.c
> @@ -627,7 +627,7 @@ int ipcperms (struct kern_ipc_perm *ipcp, short flag)
> granted_mode >>= 3;
> /* is there some bit set in requested_mode but not in granted_mode? */
> if ((requested_mode & ~granted_mode & 0007) &&
> - !capable(CAP_IPC_OWNER))
> + !ns_capable(current->nsproxy->ipc_ns->user_ns, CAP_IPC_OWNER))
> return -1;

Serge can we please modify the code to pass the ns down from
ipcget_public to ipcperms. It is passed in and dropping the value and
going back to current to get it just feels wrong.

Strictly speaking this code is correct but it requires an audit of all
of the callers to know that, which is unfortunate.

> return security_ipc_permission(ipcp, flag);
> @@ -800,7 +800,8 @@ struct kern_ipc_perm *ipcctl_pre_down(struct ipc_ids *ids, int id, int cmd,
>
> euid = current_euid();
> if (euid == ipcp->cuid ||
> - euid == ipcp->uid || capable(CAP_SYS_ADMIN))
> + euid == ipcp->uid ||
> + ns_capable(current->nsproxy->ipc_ns->user_ns, CAP_SYS_ADMIN))
> return ipcp;

Like the other ipc call can we please pass the ipc_ns into ipcctl_pre_down.

The code as constructed appears correct but because we are passing the
namespace into the caller of this function always using current to get
the ipc_ns seems confusing and unnecessary.

> err = -EPERM;
> diff --git a/kernel/futex.c b/kernel/futex.c
> index b766d28..1e876f1 100644
> --- a/kernel/futex.c
> +++ b/kernel/futex.c
> @@ -2421,10 +2421,19 @@ SYSCALL_DEFINE3(get_robust_list, int, pid,
> goto err_unlock;
> ret = -EPERM;
> pcred = __task_cred(p);
> + /* If victim is in different user_ns, then uids are not
> + comparable, so we must have CAP_SYS_PTRACE */
> + if (cred->user->user_ns != pcred->user->user_ns) {
> + if (!ns_capable(pcred->user->user_ns, CAP_SYS_PTRACE))
> + goto err_unlock;
> + goto ok;
> + }
> + /* If victim is in same user_ns, then uids are comparable */
> if (cred->euid != pcred->euid &&
> cred->euid != pcred->uid &&
> - !capable(CAP_SYS_PTRACE))
> + !ns_capable(pcred->user->user_ns, CAP_SYS_PTRACE))
> goto err_unlock;
> +ok:
> head = p->robust_list;
> rcu_read_unlock();
> }
> diff --git a/kernel/futex_compat.c b/kernel/futex_compat.c
> index a7934ac..5f9e689 100644
> --- a/kernel/futex_compat.c
> +++ b/kernel/futex_compat.c
> @@ -153,10 +153,19 @@ compat_sys_get_robust_list(int pid, compat_uptr_t __user *head_ptr,
> goto err_unlock;
> ret = -EPERM;
> pcred = __task_cred(p);
> + /* If victim is in different user_ns, then uids are not
> + comparable, so we must have CAP_SYS_PTRACE */
> + if (cred->user->user_ns != pcred->user->user_ns) {
> + if (!ns_capable(pcred->user->user_ns, CAP_SYS_PTRACE))
> + goto err_unlock;
> + goto ok;
> + }
> + /* If victim is in same user_ns, then uids are comparable */
> if (cred->euid != pcred->euid &&
> cred->euid != pcred->uid &&
> - !capable(CAP_SYS_PTRACE))
> + !ns_capable(pcred->user->user_ns, CAP_SYS_PTRACE))
> goto err_unlock;
> +ok:
> head = p->compat_robust_list;
> rcu_read_unlock();
> }
> diff --git a/kernel/groups.c b/kernel/groups.c
> index 253dc0f..1cc476d 100644
> --- a/kernel/groups.c
> +++ b/kernel/groups.c
> @@ -233,7 +233,7 @@ SYSCALL_DEFINE2(setgroups, int, gidsetsize, gid_t __user *, grouplist)
> struct group_info *group_info;
> int retval;
>
> - if (!capable(CAP_SETGID))
> + if (!nsown_capable(CAP_SETGID))
> return -EPERM;
> if ((unsigned)gidsetsize > NGROUPS_MAX)
> return -EINVAL;
> diff --git a/kernel/sched.c b/kernel/sched.c
> index 18d38e4..dc12bc2 100644
> --- a/kernel/sched.c
> +++ b/kernel/sched.c
> @@ -4761,8 +4761,11 @@ static bool check_same_owner(struct task_struct *p)
>
> rcu_read_lock();
> pcred = __task_cred(p);
> - match = (cred->euid == pcred->euid ||
> - cred->euid == pcred->uid);
> + if (cred->user->user_ns == pcred->user->user_ns)
> + match = (cred->euid == pcred->euid ||
> + cred->euid == pcred->uid);
> + else
> + match = false;
> rcu_read_unlock();
> return match;
> }
> @@ -5088,7 +5091,7 @@ long sched_setaffinity(pid_t pid, const struct cpumask *in_mask)
> goto out_free_cpus_allowed;
> }
> retval = -EPERM;
> - if (!check_same_owner(p) && !capable(CAP_SYS_NICE))
> + if (!check_same_owner(p) && !task_ns_capable(p, CAP_SYS_NICE))
> goto out_unlock;
>
> retval = security_task_setscheduler(p);
> diff --git a/kernel/uid16.c b/kernel/uid16.c
> index 4192098..51c6e89 100644
> --- a/kernel/uid16.c
> +++ b/kernel/uid16.c
> @@ -189,7 +189,7 @@ SYSCALL_DEFINE2(setgroups16, int, gidsetsize, old_gid_t __user *, grouplist)
> struct group_info *group_info;
> int retval;
>
> - if (!capable(CAP_SETGID))
> + if (!nsown_capable(CAP_SETGID))
> return -EPERM;
> if ((unsigned)gidsetsize > NGROUPS_MAX)
> return -EINVAL;
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containe rs
Re: [PATCH 6/9] user namespaces: convert all capable checks in kernel/sys.c [message #41763 is a reply to message #41750] Fri, 18 February 2011 01:57 Go to previous messageGo to next message
ebiederm is currently offline  ebiederm
Messages: 1354
Registered: February 2006
Senior Member
"Serge E. Hallyn" <serge@hallyn.com> writes:

> This allows setuid/setgid in containers. It also fixes some
> corner cases where kernel logic foregoes capability checks when
> uids are equivalent. The latter will need to be done throughout
> the whole kernel.

Except for the extra printk in sethostname this looks good.

Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>

>
> Changelog:
> Jan 11: Use nsown_capable() as suggested by Bastian Blank.
> Jan 11: Fix logic errors in uid checks pointed out by Bastian.
> Feb 15: allow prlimit to current (was regression in previous version)
>
> Signed-off-by: Serge E. Hallyn <serge.hallyn@canonical.com>
> ---
> kernel/sys.c | 74 ++++++++++++++++++++++++++++++++++++---------------------
> 1 files changed, 47 insertions(+), 27 deletions(-)
>
> diff --git a/kernel/sys.c b/kernel/sys.c
> index 7a1bbad..075370d 100644
> --- a/kernel/sys.c
> +++ b/kernel/sys.c
> @@ -118,17 +118,29 @@ EXPORT_SYMBOL(cad_pid);
>
> void (*pm_power_off_prepare)(void);
>
> +/* called with rcu_read_lock, creds are safe */
> +static inline int set_one_prio_perm(struct task_struct *p)
> +{
> + const struct cred *cred = current_cred(), *pcred = __task_cred(p);
> +
> + if (pcred->user->user_ns == cred->user->user_ns &&
> + (pcred->uid == cred->euid ||
> + pcred->euid == cred->euid))
> + return 1;
> + if (ns_capable(pcred->user->user_ns, CAP_SYS_NICE))
> + return 1;
> + return 0;
> +}
> +
> /*
> * set the priority of a task
> * - the caller must hold the RCU read lock
> */
> static int set_one_prio(struct task_struct *p, int niceval, int error)
> {
> - const struct cred *cred = current_cred(), *pcred = __task_cred(p);
> int no_nice;
>
> - if (pcred->uid != cred->euid &&
> - pcred->euid != cred->euid && !capable(CAP_SYS_NICE)) {
> + if (!set_one_prio_perm(p)) {
> error = -EPERM;
> goto out;
> }
> @@ -502,7 +514,7 @@ SYSCALL_DEFINE2(setregid, gid_t, rgid, gid_t, egid)
> if (rgid != (gid_t) -1) {
> if (old->gid == rgid ||
> old->egid == rgid ||
> - capable(CAP_SETGID))
> + nsown_capable(CAP_SETGID))
> new->gid = rgid;
> else
> goto error;
> @@ -511,7 +523,7 @@ SYSCALL_DEFINE2(setregid, gid_t, rgid, gid_t, egid)
> if (old->gid == egid ||
> old->egid == egid ||
> old->sgid == egid ||
> - capable(CAP_SETGID))
> + nsown_capable(CAP_SETGID))
> new->egid = egid;
> else
> goto error;
> @@ -546,7 +558,7 @@ SYSCALL_DEFINE1(setgid, gid_t, gid)
> old = current_cred();
>
> retval = -EPERM;
> - if (capable(CAP_SETGID))
> + if (nsown_capable(CAP_SETGID))
> new->gid = new->egid = new->sgid = new->fsgid = gid;
> else if (gid == old->gid || gid == old->sgid)
> new->egid = new->fsgid = gid;
> @@ -613,7 +625,7 @@ SYSCALL_DEFINE2(setreuid, uid_t, ruid, uid_t, euid)
> new->uid = ruid;
> if (old->uid != ruid &&
> old->euid != ruid &&
> - !capable(CAP_SETUID))
> + !nsown_capable(CAP_SETUID))
> goto error;
> }
>
> @@ -622,7 +634,7 @@ SYSCALL_DEFINE2(setreuid, uid_t, ruid, uid_t, euid)
> if (old->uid != euid &&
> old->euid != euid &&
> old->suid != euid &&
> - !capable(CAP_SETUID))
> + !nsown_capable(CAP_SETUID))
> goto error;
> }
>
> @@ -670,7 +682,7 @@ SYSCALL_DEFINE1(setuid, uid_t, uid)
> old = current_cred();
>
> retval = -EPERM;
> - if (capable(CAP_SETUID)) {
> + if (nsown_capable(CAP_SETUID)) {
> new->suid = new->uid = uid;
> if (uid != old->uid) {
> retval = set_user(new);
> @@ -712,7 +724,7 @@ SYSCALL_DEFINE3(setresuid, uid_t, ruid, uid_t, euid, uid_t, suid)
> old = current_cred();
>
> retval = -EPERM;
> - if (!capable(CAP_SETUID)) {
> + if (!nsown_capable(CAP_SETUID)) {
> if (ruid != (uid_t) -1 && ruid != old->uid &&
> ruid != old->euid && ruid != old->suid)
> goto error;
> @@ -776,7 +788,7 @@ SYSCALL_DEFINE3(setresgid, gid_t, rgid, gid_t, egid, gid_t, sgid)
> old = current_cred();
>
> retval = -EPERM;
> - if (!capable(CAP_SETGID)) {
> + if (!nsown_capable(CAP_SETGID)) {
> if (rgid != (gid_t) -1 && rgid != old->gid &&
> rgid != old->egid && rgid != old->sgid)
> goto error;
> @@ -836,7 +848,7 @@ SYSCALL_DEFINE1(setfsuid, uid_t, uid)
>
> if (uid == old->uid || uid == old->euid ||
> uid == old->suid || uid == old->fsuid ||
> - capable(CAP_SETUID)) {
> + nsown_capable(CAP_SETUID)) {
> if (uid != old_fsuid) {
> new->fsuid = uid;
> if (security_task_fix_setuid(new, old, LSM_SETID_FS) == 0)
> @@ -869,7 +881,7 @@ SYSCALL_DEFINE1(setfsgid, gid_t, gid)
>
> if (gid == old->gid || gid == old->egid ||
> gid == old->sgid || gid == old->fsgid ||
> - capable(CAP_SETGID)) {
> + nsown_capable(CAP_SETGID)) {
> if (gid != old_fsgid) {
> new->fsgid = gid;
> goto change_okay;
> @@ -1177,8 +1189,11 @@ SYSCALL_DEFINE2(sethostname, char __user *, name, int, len)
> int errno;
> char tmp[__NEW_UTS_LEN];
>
> - if (!ns_capable(current->nsproxy->uts_ns->user_ns, CAP_SYS_ADMIN))
> + if (!ns_capable(current->nsproxy->uts_ns->user_ns, CAP_SYS_ADMIN)) {
> + printk(KERN_NOTICE "%s: did not have CAP_SYS_ADMIN\n", __func__);
> return -EPERM;
> + }
> + printk(KERN_NOTICE "%s: did have CAP_SYS_ADMIN\n", __func__);

Ouch! This new print statement could be really annoying if an
unprivileged user calls sethostname. Could you remove it?

> if (len < 0 || len > __NEW_UTS_LEN)
> return -EINVAL;
> down_write(&uts_sem);
> @@ -1226,7 +1241,7 @@ SYSCALL_DEFINE2(setdomainname, char __user *, name, int, len)
> int errno;
> char tmp[__NEW_UTS_LEN];
>
> - if (!capable(CAP_SYS_ADMIN))
> + if (!ns_capable(current->nsproxy->uts_ns->user_ns, CAP_SYS_ADMIN))
> return -EPERM;
> if (len < 0 || len > __NEW_UTS_LEN)
> return -EINVAL;
> @@ -1341,6 +1356,8 @@ int do_prlimit(struct task_struct *tsk, unsigned int resource,
> rlim = tsk->signal->rlim + resource;
> task_lock(tsk->group_leader);
> if (new_rlim) {
> + /* Keep the capable check against init_user_ns until
> + cgroups can contain all limits */
> if (new_rlim->rlim_max > rlim->rlim_max &&
> !capable(CAP_SYS_RESOURCE))
> retval = -EPERM;
> @@ -1384,19 +1401,22 @@ static int check_prlimit_permission(struct task_struct *task)
> {
> const struct cred *cred = current_cred(), *tcred;
>
> - tcred = __task_cred(task);
> - if (current != task &&
> - (cred->uid != tcred->euid ||
> - cred->uid != tcred->suid ||
> - cred->uid != tcred->uid ||
> - cred->gid != tcred->egid ||
> - cred->gid != tcred->sgid ||
> - cred->gid != tcred->gid) &&
> - !capable(CAP_SYS_RESOURCE)) {
> - return -EPERM;
> - }
> + if (current == task)
> + return 0;
>
> - return 0;
> + tcred = __task_cred(task);
> + if (cred->user->user_ns == tcred->user->user_ns &&
> + (cred->uid == tcred->euid &&
> + cred->uid == tcred->suid &&
> + cred->uid == tcred->uid &&
> + cred->gid == tcred->egid &&
> + cred->gid == tcred->sgid &&
> + cred->gid == tcred->gid))
> + return 0;
> + if (ns_capable(tcred->user->user_ns, CAP_SYS_RESOURCE))
> + return 0;
> +
> + return -EPERM;
> }
>
> SYSCALL_DEFINE4(prlimit64, pid_t, pid, unsigned int, resource,
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containe rs
...

Re: [PATCH 5/9] Allow ptrace from non-init user namespaces [message #41764 is a reply to message #41751] Fri, 18 February 2011 02:59 Go to previous messageGo to next message
ebiederm is currently offline  ebiederm
Messages: 1354
Registered: February 2006
Senior Member
"Serge E. Hallyn" <serge@hallyn.com> writes:

> ptrace is allowed to tasks in the same user namespace according to
> the usual rules (i.e. the same rules as for two tasks in the init
> user namespace). ptrace is also allowed to a user namespace to
> which the current task the has CAP_SYS_PTRACE capability.


I don't see how it can go wrong at the moment but
same_or_ancestore_user_ns is too permissive and potentially inefficient.
Can you please replace it with a simple user namespace equality check.

Eric


> Changelog:
> Dec 31: Address feedback by Eric:
> . Correct ptrace uid check
> . Rename may_ptrace_ns to ptrace_capable
> . Also fix the cap_ptrace checks.
> Jan 1: Use const cred struct
> Jan 11: use task_ns_capable() in place of ptrace_capable().
>
> Signed-off-by: Serge E. Hallyn <serge.hallyn@canonical.com>
> ---
> include/linux/capability.h | 2 +
> include/linux/user_namespace.h | 9 +++++++
> kernel/ptrace.c | 27 ++++++++++++----------
> kernel/user_namespace.c | 16 +++++++++++++
> security/commoncap.c | 48 +++++++++++++++++++++++++++++++++------
> 5 files changed, 82 insertions(+), 20 deletions(-)
>
> diff --git a/include/linux/capability.h b/include/linux/capability.h
> index cb3d2d9..bc0f262 100644
> --- a/include/linux/capability.h
> +++ b/include/linux/capability.h
> @@ -546,6 +546,8 @@ extern const kernel_cap_t __cap_init_eff_set;
> */
> #define has_capability(t, cap) (security_real_capable((t), &init_user_ns, (cap)) == 0)
>
> +#define has_ns_capability(t, ns, cap) (security_real_capable((t), (ns), (cap)) == 0)
> +
> /**
> * has_capability_noaudit - Determine if a task has a superior capability available (unaudited)
> * @t: The task in question
> diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
> index faf4679..862fc59 100644
> --- a/include/linux/user_namespace.h
> +++ b/include/linux/user_namespace.h
> @@ -39,6 +39,9 @@ static inline void put_user_ns(struct user_namespace *ns)
> uid_t user_ns_map_uid(struct user_namespace *to, const struct cred *cred, uid_t uid);
> gid_t user_ns_map_gid(struct user_namespace *to, const struct cred *cred, gid_t gid);
>
> +int same_or_ancestor_user_ns(struct task_struct *task,
> + struct task_struct *victim);
> +
> #else
>
> static inline struct user_namespace *get_user_ns(struct user_namespace *ns)
> @@ -66,6 +69,12 @@ static inline gid_t user_ns_map_gid(struct user_namespace *to,
> return gid;
> }
>
> +static inline int same_or_ancestor_user_ns(struct task_struct *task,
> + struct task_struct *victim)
> +{
> + return 1;
> +}
> +
> #endif
>
> #endif /* _LINUX_USER_H */
> diff --git a/kernel/ptrace.c b/kernel/ptrace.c
> index 1708b1e..cde4655 100644
> --- a/kernel/ptrace.c
> +++ b/kernel/ptrace.c
> @@ -134,21 +134,24 @@ int __ptrace_may_access(struct task_struct *task, unsigned int mode)
> return 0;
> rcu_read_lock();
> tcred = __task_cred(task);
> - if ((cred->uid != tcred->euid ||
> - cred->uid != tcred->suid ||
> - cred->uid != tcred->uid ||
> - cred->gid != tcred->egid ||
> - cred->gid != tcred->sgid ||
> - cred->gid != tcred->gid) &&
> - !capable(CAP_SYS_PTRACE)) {
> - rcu_read_unlock();
> - return -EPERM;
> - }
> + if (cred->user->user_ns == tcred->user->user_ns &&
> + (cred->uid == tcred->euid &&
> + cred->uid == tcred->suid &&
> + cred->uid == tcred->uid &&
> + cred->gid == tcred->egid &&
> + cred->gid == tcred->sgid &&
> + cred->gid == tcred->gid))
> + goto ok;
> + if (ns_capable(tcred->user->user_ns, CAP_SYS_PTRACE))
> + goto ok;
> + rcu_read_unlock();
> + return -EPERM;
> +ok:
> rcu_read_unlock();
> smp_rmb();
> if (task->mm)
> dumpable = get_dumpable(task->mm);
> - if (!dumpable && !capable(CAP_SYS_PTRACE))
> + if (!dumpable && !task_ns_capable(task, CAP_SYS_PTRACE))
> return -EPERM;
>
> return security_ptrace_access_check(task, mode);
> @@ -198,7 +201,7 @@ int ptrace_attach(struct task_struct *task)
> goto unlock_tasklist;
>
> task->ptrace = PT_PTRACED;
> - if (capable(CAP_SYS_PTRACE))
> + if (task_ns_capable(task, CAP_SYS_PTRACE))
> task->ptrace |= PT_PTRACE_CAP;
>
> __ptrace_link(task, current);
> diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
> index 9da289c..0ef2258 100644
> --- a/kernel/user_namespace.c
> +++ b/kernel/user_namespace.c
> @@ -129,6 +129,22 @@ gid_t user_ns_map_gid(struct user_namespace *to, const struct cred *cred, gid_t
> return overflowgid;
> }
>
> +int same_or_ancestor_user_ns(struct task_struct *task,
> + struct task_struct *victim)
> +{
> + struct user_namespace *u1 = task_cred_xxx(task, user)->user_ns;
> + struct user_namespace *u2 = task_cred_xxx(victim, user)->user_ns;
> + for (;;) {
> + if (u1 == u2)
> + return 1;
> + if (u1 == &init_user_ns)
> + return 0;
> + u1 = u1->creator->user_ns;
> + }
> + /* We never get here */
> + return 0;
> +}
> +
> static __init int user_namespaces_init(void)
> {
> user_ns_cachep = KMEM_CACHE(user_namespace, SLAB_PANIC);
> diff --git a/security/commoncap.c b/security/commoncap.c
> index 51fa9ec..12ff65c 100644
> --- a/security/commoncap.c
> +++ b/security/commoncap.c
> @@ -130,18 +130,34 @@ int cap_settime(struct timespec *ts, struct timezone *tz)
> * @child: The process to be accessed
> * @mode: The mode of attachment.
> *
> + * If we are in the same or an ancestor user_ns and have all the target
> + * task's capabilities, then ptrace access is allowed.
> + * If we have the ptrace capability to the target user_ns, then ptrace
> + * access is allowed.
> + * Else denied.
> + *
> * Determine whether a process may access another, returning 0 if permission
> * granted, -ve if denied.
> */
> int cap_ptrace_access_check(struct task_struct *child, unsigned int mode)
> {
> int ret = 0;
> + const struct cred *cred, *tcred;
>
> rcu_read_lock();
> - if (!cap_issubset(__task_cred(child)->cap_permitted,
> - current_cred()->cap_permitted) &&
> - !capable(CAP_SYS_PTRACE))
> - ret = -EPERM;
> + cred = current_cred();
> + tcred = __task_cred(child);
> + /*
> + * The ancestor user_ns check may be gratuitous, as I think
> + * we've already guaranteed that in kernel/ptrace.c.
> + */
> + if (same_or_ancestor_user_ns(current, child) &&
> + cap_issubset(tcred->cap_permitted, cred->cap_permitted))
> + goto out;

I have commented on this before but I took a good hard look this time,
and can comment more intelligently.

The cap_issubset check is for the case where we don't use the
CAP_SYS_PTRACE capability, as such is only valid in the same user
namespace. Furthermore capabilities really are not comparable between
different user namespaces. So can you please replace the
same_or_ancestor_user_ns with a simple namespace equality check.
Having the wrong logic in here will just be confusing in the future.

Also could you name tcred child_cread I think that would be clearer in
the test below.

> + if (ns_capable(tcred->user->user_ns, CAP_SYS_PTRACE))
> + goto out;
> + ret = -EPERM;
> +out:
> rcu_read_unlock();
> return ret;

I also find it strange that we allow CAP_SYS_PTRACE to allow us to
ptrace processes with more capabilities than ourselves. But that is an
entirely different issue.

> }
> @@ -150,18 +166,34 @@ int cap_ptrace_access_check(struct task_struct *child, unsigned int mode)
> * cap_ptrace_traceme - Determine whether another process may trace the current
> * @parent: The task proposed to be the tracer
> *
> + * If parent is in the same or an ancestor user_ns and has all current's
> + * capabilities, then ptrace access is allowed.
> + * If parent has the ptrace capability to current's user_ns, then ptrace
> + * access is allowed.
> + * Else denied.
> + *
> * Determine whether the nominated task is permitted to trace the current
> * process, returning 0 if permission is granted, -ve if denied.
> */
> int cap_ptrace_traceme(struct task_struct *parent)
> {
> int ret = 0;
> + const struct cred *cred, *tcred;
>
> rcu_read_lock();
> - if (!cap_issubset(current_cred()->cap_permitted,
> - __task_cred(parent)->cap_permitted) &&
> - !has_capability(parent, CAP_SYS_PTRACE))
> - ret = -EPERM;
> + cred = __task_cred(parent);
> + tcred = current_cred();
> + /*
> + * The ancestor user_ns check may be gratuitous, as I think
> + * we've already guaranteed that in kernel/ptrace.c.
> + */
> + if (same_or_ancestor_user_ns(parent, current) &&
> + cap_issubset(tcred->cap_permitted, cred->cap_permitted))
> + goto out;
> + if (has_ns_capability(parent, tcred->user->user_ns, CAP_SYS_PTRACE))
> + goto out;
> + ret = -EPERM;
> +out:
> rcu_read_unlock();
> return ret;
> }
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containe rs
...

Re: [PATCH 4/9] allow killing tasks in your own or child userns [message #41765 is a reply to message #41749] Fri, 18 February 2011 03:00 Go to previous messageGo to next message
ebiederm is currently offline  ebiederm
Messages: 1354
Registered: February 2006
Senior Member
"Serge E. Hallyn" <serge@hallyn.com> writes:

> Changelog:
> Dec 8: Fixed bug in my check_kill_permission pointed out by
> Eric Biederman.
> Dec 13: Apply Eric's suggestion to pass target task into kill_ok_by_cred()
> for clarity
> Dec 31: address comment by Eric Biederman:
> don't need cred/tcred in check_kill_permission.
> Jan 1: use const cred struct.
> Jan 11: Per Bastian Blank's advice, clean up kill_ok_by_cred().
> Feb 16: kill_ok_by_cred: fix bad parentheses

Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>

>
> Signed-off-by: Serge E. Hallyn <serge.hallyn@canonical.com>
> ---
> kernel/signal.c | 30 ++++++++++++++++++++++--------
> 1 files changed, 22 insertions(+), 8 deletions(-)
>
> diff --git a/kernel/signal.c b/kernel/signal.c
> index 4e3cff1..ffe4bdf 100644
> --- a/kernel/signal.c
> +++ b/kernel/signal.c
> @@ -636,13 +636,33 @@ static inline bool si_fromuser(const struct siginfo *info)
> }
>
> /*
> + * called with RCU read lock from check_kill_permission()
> + */
> +static inline int kill_ok_by_cred(struct task_struct *t)
> +{
> + const struct cred *cred = current_cred();
> + const struct cred *tcred = __task_cred(t);
> +
> + if (cred->user->user_ns == tcred->user->user_ns &&
> + (cred->euid == tcred->suid ||
> + cred->euid == tcred->uid ||
> + cred->uid == tcred->suid ||
> + cred->uid == tcred->uid))
> + return 1;
> +
> + if (ns_capable(tcred->user->user_ns, CAP_KILL))
> + return 1;
> +
> + return 0;
> +}
> +
> +/*
> * Bad permissions for sending the signal
> * - the caller must hold the RCU read lock
> */
> static int check_kill_permission(int sig, struct siginfo *info,
> struct task_struct *t)
> {
> - const struct cred *cred, *tcred;
> struct pid *sid;
> int error;
>
> @@ -656,14 +676,8 @@ static int check_kill_permission(int sig, struct siginfo *info,
> if (error)
> return error;
>
> - cred = current_cred();
> - tcred = __task_cred(t);
> if (!same_thread_group(current, t) &&
> - (cred->euid ^ tcred->suid) &&
> - (cred->euid ^ tcred->uid) &&
> - (cred->uid ^ tcred->suid) &&
> - (cred->uid ^ tcred->uid) &&
> - !capable(CAP_KILL)) {
> + !kill_ok_by_cred(t)) {
> switch (sig) {
> case SIGCONT:
> sid = task_session(t);
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containe rs
Re: [PATCH 3/9] allow sethostname in a container [message #41766 is a reply to message #41748] Fri, 18 February 2011 03:05 Go to previous messageGo to next message
ebiederm is currently offline  ebiederm
Messages: 1354
Registered: February 2006
Senior Member
"Serge E. Hallyn" <serge@hallyn.com> writes:

Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>


> Signed-off-by: Serge E. Hallyn <serge.hallyn@canonical.com>
> ---
> kernel/sys.c | 2 +-
> 1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/kernel/sys.c b/kernel/sys.c
> index 18da702..7a1bbad 100644
> --- a/kernel/sys.c
> +++ b/kernel/sys.c
> @@ -1177,7 +1177,7 @@ SYSCALL_DEFINE2(sethostname, char __user *, name, int, len)
> int errno;
> char tmp[__NEW_UTS_LEN];
>
> - if (!capable(CAP_SYS_ADMIN))
> + if (!ns_capable(current->nsproxy->uts_ns->user_ns, CAP_SYS_ADMIN))
> return -EPERM;
> if (len < 0 || len > __NEW_UTS_LEN)
> return -EINVAL;
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containe rs
Re: [PATCH 7/9] add a user namespace owner of ipc ns [message #41767 is a reply to message #41746] Fri, 18 February 2011 03:19 Go to previous messageGo to next message
ebiederm is currently offline  ebiederm
Messages: 1354
Registered: February 2006
Senior Member
"Serge E. Hallyn" <serge@hallyn.com> writes:

> Changelog:
> Feb 15: Don't set new ipc->user_ns if we didn't create a new
> ipc_ns.

Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>

>
> Signed-off-by: Serge E. Hallyn <serge.hallyn@canonical.com>
> ---
> include/linux/ipc_namespace.h | 3 +++
> ipc/msgutil.c | 3 +++
> ipc/namespace.c | 9 +++++++--
> kernel/nsproxy.c | 5 +++++
> 4 files changed, 18 insertions(+), 2 deletions(-)
>
> diff --git a/include/linux/ipc_namespace.h b/include/linux/ipc_namespace.h
> index 5195298..46d2eb4 100644
> --- a/include/linux/ipc_namespace.h
> +++ b/include/linux/ipc_namespace.h
> @@ -24,6 +24,7 @@ struct ipc_ids {
> struct idr ipcs_idr;
> };
>
> +struct user_namespace;
> struct ipc_namespace {
> atomic_t count;
> struct ipc_ids ids[3];
> @@ -56,6 +57,8 @@ struct ipc_namespace {
> unsigned int mq_msg_max; /* initialized to DFLT_MSGMAX */
> unsigned int mq_msgsize_max; /* initialized to DFLT_MSGSIZEMAX */
>
> + /* user_ns which owns the ipc ns */
> + struct user_namespace *user_ns;
> };
>
> extern struct ipc_namespace init_ipc_ns;
> diff --git a/ipc/msgutil.c b/ipc/msgutil.c
> index f095ee2..d91ff4b 100644
> --- a/ipc/msgutil.c
> +++ b/ipc/msgutil.c
> @@ -20,6 +20,8 @@
>
> DEFINE_SPINLOCK(mq_lock);
>
> +extern struct user_namespace init_user_ns;
> +
> /*
> * The next 2 defines are here bc this is the only file
> * compiled when either CONFIG_SYSVIPC and CONFIG_POSIX_MQUEUE
> @@ -32,6 +34,7 @@ struct ipc_namespace init_ipc_ns = {
> .mq_msg_max = DFLT_MSGMAX,
> .mq_msgsize_max = DFLT_MSGSIZEMAX,
> #endif
> + .user_ns = &init_user_ns,
> };
>
> atomic_t nr_ipc_ns = ATOMIC_INIT(1);
> diff --git a/ipc/namespace.c b/ipc/namespace.c
> index a1094ff..aa18899 100644
> --- a/ipc/namespace.c
> +++ b/ipc/namespace.c
> @@ -11,10 +11,11 @@
> #include <linux/slab.h>
> #include <linux/fs.h>
> #include <linux/mount.h>
> +#include <linux/user_namespace.h>
>
> #include "util.h"
>
> -static struct ipc_namespace *create_ipc_ns(void)
> +static struct ipc_namespace *create_ipc_ns(struct ipc_namespace *old_ns)
> {
> struct ipc_namespace *ns;
> int err;
> @@ -43,6 +44,9 @@ static struct ipc_namespace *create_ipc_ns(void)
> ipcns_notify(IPCNS_CREATED);
> register_ipcns_notifier(ns);
>
> + ns->user_ns = old_ns->user_ns;
> + get_user_ns(ns->user_ns);
> +
> return ns;
> }
>
> @@ -50,7 +54,7 @@ struct ipc_namespace *copy_ipcs(unsigned long flags, struct ipc_namespace *ns)
> {
> if (!(flags & CLONE_NEWIPC))
> return get_ipc_ns(ns);
> - return create_ipc_ns();
> + return create_ipc_ns(ns);
> }
>
> /*
> @@ -105,6 +109,7 @@ static void free_ipc_ns(struct ipc_namespace *ns)
> * order to have a correct value when recomputing msgmni.
> */
> ipcns_notify(IPCNS_REMOVED);
> + put_user_ns(ns->user_ns);
> }
>
> /*
> diff --git a/kernel/nsproxy.c b/kernel/nsproxy.c
> index 034dc2e..b6dbff2 100644
> --- a/kernel/nsproxy.c
> +++ b/kernel/nsproxy.c
> @@ -85,6 +85,11 @@ static struct nsproxy *create_new_namespaces(unsigned long flags,
> err = PTR_ERR(new_nsp->ipc_ns);
> goto out_ipc;
> }
> + if (new_nsp->ipc_ns != tsk->nsproxy->ipc_ns) {
> + put_user_ns(new_nsp->ipc_ns->user_ns);
> + new_nsp->ipc_ns->user_ns = task_cred_xxx(tsk, user)->user_ns;
> + get_user_ns(new_nsp->ipc_ns->user_ns);
> + }
>
> new_nsp->pid_ns = copy_pid_ns(flags, task_active_pid_ns(tsk));
> if (IS_ERR(new_nsp->pid_ns)) {
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containe rs
Re: [PATCH 1/9] Add a user_namespace as creator/owner of uts_namespace [message #41768 is a reply to message #41747] Fri, 18 February 2011 03:31 Go to previous messageGo to next message
ebiederm is currently offline  ebiederm
Messages: 1354
Registered: February 2006
Senior Member
"Serge E. Hallyn" <serge@hallyn.com> writes:

> copy_process() handles CLONE_NEWUSER before the rest of the
> namespaces. So in the case of clone(CLONE_NEWUSER|CLONE_NEWUTS)
> the new uts namespace will have the new user namespace as its
> owner. That is what we want, since we want root in that new
> userns to be able to have privilege over it.

Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>

>
> Changelog:
> Feb 15: don't set uts_ns->user_ns if we didn't create
> a new uts_ns.
>
> Signed-off-by: Serge E. Hallyn <serge.hallyn@canonical.com>
> ---
> include/linux/utsname.h | 3 +++
> init/version.c | 2 ++
> kernel/nsproxy.c | 5 +++++
> kernel/user.c | 8 ++++++--
> kernel/utsname.c | 4 ++++
> 5 files changed, 20 insertions(+), 2 deletions(-)
>
> diff --git a/include/linux/utsname.h b/include/linux/utsname.h
> index 69f3997..85171be 100644
> --- a/include/linux/utsname.h
> +++ b/include/linux/utsname.h
> @@ -37,9 +37,12 @@ struct new_utsname {
> #include <linux/nsproxy.h>
> #include <linux/err.h>
>
> +struct user_namespace;
> +
> struct uts_namespace {
> struct kref kref;
> struct new_utsname name;
> + struct user_namespace *user_ns;
> };
> extern struct uts_namespace init_uts_ns;
>
> diff --git a/init/version.c b/init/version.c
> index adff586..97bb86f 100644
> --- a/init/version.c
> +++ b/init/version.c
> @@ -21,6 +21,7 @@ extern int version_string(LINUX_VERSION_CODE);
> int version_string(LINUX_VERSION_CODE);
> #endif
>
> +extern struct user_namespace init_user_ns;
> struct uts_namespace init_uts_ns = {
> .kref = {
> .refcount = ATOMIC_INIT(2),
> @@ -33,6 +34,7 @@ struct uts_namespace init_uts_ns = {
> .machine = UTS_MACHINE,
> .domainname = UTS_DOMAINNAME,
> },
> + .user_ns = &init_user_ns,
> };
> EXPORT_SYMBOL_GPL(init_uts_ns);
>
> diff --git a/kernel/nsproxy.c b/kernel/nsproxy.c
> index f74e6c0..034dc2e 100644
> --- a/kernel/nsproxy.c
> +++ b/kernel/nsproxy.c
> @@ -74,6 +74,11 @@ static struct nsproxy *create_new_namespaces(unsigned long flags,
> err = PTR_ERR(new_nsp->uts_ns);
> goto out_uts;
> }
> + if (new_nsp->uts_ns != tsk->nsproxy->uts_ns) {
> + put_user_ns(new_nsp->uts_ns->user_ns);
> + new_nsp->uts_ns->user_ns = task_cred_xxx(tsk, user)->user_ns;
> + get_user_ns(new_nsp->uts_ns->user_ns);
> + }
>
> new_nsp->ipc_ns = copy_ipcs(flags, tsk->nsproxy->ipc_ns);
> if (IS_ERR(new_nsp->ipc_ns)) {
> diff --git a/kernel/user.c b/kernel/user.c
> index 5c598ca..9e03e9c 100644
> --- a/kernel/user.c
> +++ b/kernel/user.c
> @@ -17,9 +17,13 @@
> #include <linux/module.h>
> #include <linux/user_namespace.h>
>
> +/*
> + * userns count is 1 for root user, 1 for init_uts_ns,
> + * and 1 for... ?
> + */
> struct user_namespace init_user_ns = {
> .kref = {
> - .refcount = ATOMIC_INIT(2),
> + .refcount = ATOMIC_INIT(3),
> },
> .creator = &root_user,
> };
> @@ -47,7 +51,7 @@ static struct kmem_cache *uid_cachep;
> */
> static DEFINE_SPINLOCK(uidhash_lock);
>
> -/* root_user.__count is 2, 1 for init task cred, 1 for init_user_ns->creator */
> +/* root_user.__count is 2, 1 for init task cred, 1 for init_user_ns->user_ns */
> struct user_struct root_user = {
> .__count = ATOMIC_INIT(2),
> .processes = ATOMIC_INIT(1),
> diff --git a/kernel/utsname.c b/kernel/utsname.c
> index 8a82b4b..a7b3a8d 100644
> --- a/kernel/utsname.c
> +++ b/kernel/utsname.c
> @@ -14,6 +14,7 @@
> #include <linux/utsname.h>
> #include <linux/err.h>
> #include <linux/slab.h>
> +#include <linux/user_namespace.h>
>
> static struct uts_namespace *create_uts_ns(void)
> {
> @@ -40,6 +41,8 @@ static struct uts_namespace *clone_uts_ns(struct uts_namespace *old_ns)
>
> down_read(&uts_sem);
> memcpy(&ns->name, &old_ns->name, sizeof(ns->name));
> + ns->user_ns = old_ns->user_ns;
> + get_user_ns(ns->user_ns);
> up_read(&uts_sem);
> return ns;
> }
> @@ -71,5 +74,6 @@ void free_uts_ns(struct kref *kref)
> struct uts_namespace *ns;
>
> ns = container_of(kref, struct uts_namespace, kref);
> + put_user_ns(ns->user_ns);
> kfree(ns);
> }
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containe rs
Re: [PATCH 2/9] security: Make capabilities relative to the user namespace. [message #41769 is a reply to message #41744] Fri, 18 February 2011 03:46 Go to previous messageGo to next message
ebiederm is currently offline  ebiederm
Messages: 1354
Registered: February 2006
Senior Member
"Serge E. Hallyn" <serge@hallyn.com> writes:

> - Introduce ns_capable to test for a capability in a non-default
> user namespace.
> - Teach cap_capable to handle capabilities in a non-default
> user namespace.
>
> The motivation is to get to the unprivileged creation of new
> namespaces. It looks like this gets us 90% of the way there, with
> only potential uid confusion issues left.
>
> I still need to handle getting all caps after creation but otherwise I
> think I have a good starter patch that achieves all of your goals.

Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>

>
> Changelog:
> 11/05/2010: [serge] add apparmor
> 12/14/2010: [serge] fix capabilities to created user namespaces
> Without this, if user serge creates a user_ns, he won't have
> capabilities to the user_ns he created. THis is because we
> were first checking whether his effective caps had the caps
> he needed and returning -EPERM if not, and THEN checking whether
> he was the creator. Reverse those checks.
> 12/16/2010: [serge] security_real_capable needs ns argument in !security case
> 01/11/2011: [serge] add task_ns_capable helper
> 01/11/2011: [serge] add nsown_capable() helper per Bastian Blank suggestion
> 02/16/2011: [serge] fix a logic bug: the root user is always creator of
> init_user_ns, but should not always have capabilities to
> it! Fix the check in cap_capable().
>
> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
> Signed-off-by: Serge E. Hallyn <serge.hallyn@canonical.com>
> ---
> include/linux/capability.h | 10 ++++++++--
> include/linux/security.h | 25 ++++++++++++++-----------
> kernel/capability.c | 32 ++++++++++++++++++++++++++++++--
> security/apparmor/lsm.c | 5 +++--
> security/commoncap.c | 40 +++++++++++++++++++++++++++++++++-------
> security/security.c | 16 ++++++++++------
> security/selinux/hooks.c | 14 +++++++++-----
> 7 files changed, 107 insertions(+), 35 deletions(-)
>
> diff --git a/include/linux/capability.h b/include/linux/capability.h
> index fb16a36..cb3d2d9 100644
> --- a/include/linux/capability.h
> +++ b/include/linux/capability.h
> @@ -544,7 +544,7 @@ extern const kernel_cap_t __cap_init_eff_set;
> *
> * Note that this does not set PF_SUPERPRIV on the task.
> */
> -#define has_capability(t, cap) (security_real_capable((t), (cap)) == 0)
> +#define has_capability(t, cap) (security_real_capable((t), &init_user_ns, (cap)) == 0)
>
> /**
> * has_capability_noaudit - Determine if a task has a superior capability available (unaudited)
> @@ -558,9 +558,15 @@ extern const kernel_cap_t __cap_init_eff_set;
> * Note that this does not set PF_SUPERPRIV on the task.
> */
> #define has_capability_noaudit(t, cap) \
> - (security_real_capable_noaudit((t), (cap)) == 0)
> + (security_real_capable_noaudit((t), &init_user_ns, (cap)) == 0)
>
> +struct user_namespace;
> +extern struct user_namespace init_user_ns;
> extern int capable(int cap);
> +extern int ns_capable(struct user_namespace *ns, int cap);
> +extern int task_ns_capable(struct task_struct *t, int cap);
> +
> +#define nsown_capable(cap) (ns_capable(current_user_ns(), (cap)))
>
> /* audit system wants to get cap info from files as well */
> struct dentry;
> diff --git a/include/linux/security.h b/include/linux/security.h
> index b2b7f97..6bbee08 100644
> --- a/include/linux/security.h
> +++ b/include/linux/security.h
> @@ -46,13 +46,14 @@
>
> struct ctl_table;
> struct audit_krule;
> +struct user_namespace;
>
> /*
> * These functions are in security/capability.c and are used
> * as the default capabilities functions
> */
> extern int cap_capable(struct task_struct *tsk, const struct cred *cred,
> - int cap, int audit);
> + struct user_namespace *ns, int cap, int audit);
> extern int cap_settime(struct timespec *ts, struct timezone *tz);
> extern int cap_ptrace_access_check(struct task_struct *child, unsigned int mode);
> extern int cap_ptrace_traceme(struct task_struct *parent);
> @@ -1254,6 +1255,7 @@ static inline void security_free_mnt_opts(struct security_mnt_opts *opts)
> * credentials.
> * @tsk contains the task_struct for the process.
> * @cred contains the credentials to use.
> + * @ns contains the user namespace we want the capability in
> * @cap contains the capability <include/linux/capability.h>.
> * @audit: Whether to write an audit message or not
> * Return 0 if the capability is granted for @tsk.
> @@ -1382,7 +1384,7 @@ struct security_operations {
> const kernel_cap_t *inheritable,
> const kernel_cap_t *permitted);
> int (*capable) (struct task_struct *tsk, const struct cred *cred,
> - int cap, int audit);
> + struct user_namespace *ns, int cap, int audit);
> int (*sysctl) (struct ctl_table *table, int op);
> int (*quotactl) (int cmds, int type, int id, struct super_block *sb);
> int (*quota_on) (struct dentry *dentry);
> @@ -1662,9 +1664,9 @@ int security_capset(struct cred *new, const struct cred *old,
> const kernel_cap_t *effective,
> const kernel_cap_t *inheritable,
> const kernel_cap_t *permitted);
> -int security_capable(const struct cred *cred, int cap);
> -int security_real_capable(struct task_struct *tsk, int cap);
> -int security_real_capable_noaudit(struct task_struct *tsk, int cap);
> +int security_capable(struct user_namespace *ns, const struct cred *cred, int cap);
> +int security_real_capable(struct task_struct *tsk, struct user_namespace *ns, int cap);
> +int security_real_capable_noaudit(struct task_struct *tsk, struct user_namespace *ns, int cap);
> int security_sysctl(struct ctl_table *table, int op);
> int security_quotactl(int cmds, int type, int id, struct super_block *sb);
> int security_quota_on(struct dentry *dentry);
> @@ -1856,28 +1858,29 @@ static inline int security_capset(struct cred *new,
> return cap_capset(new, old, effective, inheritable, permitted);
> }
>
> -static inline int security_capable(const struct cred *cred, int cap)
> +static inline int security_capable(struct user_namespace *ns,
> + const struct cred *cred, int cap)
> {
> - return cap_capable(current, cred, cap, SECURITY_CAP_AUDIT);
> + return cap_capable(current, cred, ns, cap, SECURITY_CAP_AUDIT);
> }
>
> -static inline int security_real_capable(struct task_struct *tsk, int cap)
> +static inline int security_real_capable(struct task_struct *tsk, struct user_namespace *ns, int cap)
> {
> int ret;
>
> rcu_read_lock();
> - ret = cap_capable(tsk, __task_cred(tsk), cap, SECURITY_CAP_AUDIT);
> + ret = cap_capable(tsk, __task_cred(tsk), ns, cap, SECURITY_CAP_AUDIT);
> rcu_read_unlock();
> return ret;
> }
>
> static inline
> -int security_real_capable_noaudit(struct task_struct *tsk, int cap)
> +int security_real_capable_noaudit(struct task_struct *tsk, struct user_namespace *ns, int cap)
> {
> int ret;
>
> rcu_read_lock();
> - ret = cap_capable(tsk, __task_cred(tsk), cap,
> + ret = cap_capable(tsk, __task_cred(tsk), ns, cap,
> SECURITY_CAP_NOAUDIT);
> rcu_read_unlock();
> return ret;
> diff --git a/kernel/capability.c b/kernel/capability.c
> index 9e9385f..916658c 100644
> --- a/kernel/capability.c
> +++ b/kernel/capability.c
> @@ -14,6 +14,7 @@
> #include <linux/security.h>
> #include <linux/syscalls.h>
> #include <linux/pid_namespace.h>
> +#include <linux/user_namespace.h>
> #include <asm/uaccess.h>
>
> /*
> @@ -301,15 +302,42 @@ error:
> */
> int capable(int cap)
> {
> + return ns_capable(&init_user_ns, cap);
> +}
> +EXPORT_SYMBOL(capable);
> +
> +/**
> + * ns_capable - Determine if the current task has a superior capability in effect
> + * @ns: The usernamespace we want the capability in
> + * @cap: The capability to be tested for
> + *
> + * Return true if the current task has the given superior capability currently
> + * available for use, false if not.
> + *
> + * This sets PF_SUPERPRIV on the task if the capability is available on the
> + * assumption that it's about to be used.
> + */
> +int ns_capable(struct user_namespace *ns, int cap)
> +{
> if (unlikely(!cap_valid(cap))) {
> printk(KERN_CRIT "capable() called with invalid cap=%u\n", cap);
> BUG();
> }
>
> - if (security_capable(current_cred(), cap) == 0) {
> + if (security_capable(ns, current_cred(), cap) == 0) {
> current->flags |= PF_SUPERPRIV;
> return 1;
> }
> return 0;
> }
> -EXPORT_SYMBOL(capable);
> +EXPORT_SYMBOL(ns_capable);
> +
> +/*
> + * does current have capability 'cap' to the user namespace of task
> + * 't'. Return true if it does, false otherwise.
> + */
> +int task_ns_capable(struct task_struct *t, int cap)
> +{
> + return ns_capable(task_cred_xxx(t, user)->user_ns, cap);
> +}
> +EXPORT_SYMBOL(task_ns_capable);
> diff --git a/security/apparmor/lsm.c b/security/apparmor/lsm.c
> index b7106f1..b37c2cd 100644
> --- a/security/apparmor/lsm.c
> +++ b/security/apparmor/lsm.c
> @@ -22,6 +22,7 @@
> #include <linux/ctype.h>
> #include <linux/sysctl.h>
> #include <linux/audit.h>
> +#include <linux/user_namespace.h>
> #include <net/sock.h>
>
> #include &qu
...

Re: userns: targeted capabilities v5 [message #41770 is a reply to message #41758] Fri, 18 February 2011 03:53 Go to previous messageGo to next message
ebiederm is currently offline  ebiederm
Messages: 1354
Registered: February 2006
Senior Member
Andrew Morton <akpm@linux-foundation.org> writes:

> On Thu, 17 Feb 2011 15:02:24 +0000
> "Serge E. Hallyn" <serge@hallyn.com> wrote:
>
>> Here is a repost of my previous user namespace patch, ported onto
>> last night's git head.
>>
>> It fixes several things I was doing wrong in the last (v4)
>> posting, in particular:
>>
>> 1. don't set uts_ns->user_ns to current's when !CLONE_NEWUTS
>> 2. add a ipc_ns->user_ns which owns ipc_ns, and use that to
>> decide CAP_IPC_OWNER
>> 3. fix logic flaw caused by bad parantheses
>> 4. allow do_prlimit to current
>> 5. don't always give root full privs to init_user_ns
>>
>> The expected course of development for user namespaces is laid out
>> at https://wiki.ubuntu.com/UserNamespace.
>
> Seems like a nice feature to be developing.
>
> I worry about the maturity of it all at this stage. How far along is
> it *really*?
>
> Is anyone else working with you on developing and reviewing this work?

I did a lot of the initial design and I have been reviewing as I have
time.

Andrew at a practical level we have to merge this incrementally.
Anything much bigger than Serge's current patchset will be too big to
review. The first really bit step is making capabilities local to the
user namespace, and that is what this patchset does along with using
that localness in some good places.

Eric

_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containe rs
Re: userns: targeted capabilities v5 [message #41771 is a reply to message #41758] Fri, 18 February 2011 04:28 Go to previous messageGo to next message
serge is currently offline  serge
Messages: 72
Registered: January 2007
Member
Quoting Andrew Morton (akpm@linux-foundation.org):
> On Thu, 17 Feb 2011 15:02:24 +0000
> "Serge E. Hallyn" <serge@hallyn.com> wrote:
>
> > Here is a repost of my previous user namespace patch, ported onto
> > last night's git head.
> >
> > It fixes several things I was doing wrong in the last (v4)
> > posting, in particular:
> >
> > 1. don't set uts_ns->user_ns to current's when !CLONE_NEWUTS
> > 2. add a ipc_ns->user_ns which owns ipc_ns, and use that to
> > decide CAP_IPC_OWNER
> > 3. fix logic flaw caused by bad parantheses
> > 4. allow do_prlimit to current
> > 5. don't always give root full privs to init_user_ns
> >
> > The expected course of development for user namespaces is laid out
> > at https://wiki.ubuntu.com/UserNamespace.
>
> Seems like a nice feature to be developing.
>
> I worry about the maturity of it all at this stage. How far along is
> it *really*?
>
> Is anyone else working with you on developing and reviewing this work?

Thanks, Andrew. I'm not sure what definition of 'maturity' you were
looking for here. If you meant completeness of the feature, it's
definately not there. Of the goals for user namespaces sandboxing
will be the quickest to mature. Completing that will largely be an
exercise of running the breadth of the kernel looking for simple
uid/gid comparisons and making them namespace aware.

The design has been meshed around (publicly) on and off for many
years by eric and I. This particular patchset has gotten some great
reviews by Eric Biederman and Bastian Blank (to who, unfortunately,
to this day I cannot send a direct email - they're always bounced).

As Eric said, this feature will have to go in incrementally.
Furthermore, each piece touches scary code so it's likely to go
pretty slowly. My hope is less than a year for sandboxing, and
two years for containers. It might go way faster, but experience
tells me that's unlikely :)

thanks,
-serge
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containe rs
Re: [PATCH 5/9] Allow ptrace from non-init user namespaces [message #41772 is a reply to message #41764] Fri, 18 February 2011 04:36 Go to previous messageGo to next message
serge is currently offline  serge
Messages: 72
Registered: January 2007
Member
Quoting Eric W. Biederman (ebiederm@xmission.com):
> "Serge E. Hallyn" <serge@hallyn.com> writes:
>
> > ptrace is allowed to tasks in the same user namespace according to
> > the usual rules (i.e. the same rules as for two tasks in the init
> > user namespace). ptrace is also allowed to a user namespace to
> > which the current task the has CAP_SYS_PTRACE capability.
>
>
> I don't see how it can go wrong at the moment but
> same_or_ancestore_user_ns is too permissive and potentially inefficient.
> Can you please replace it with a simple user namespace equality check.
>
> Eric
>
>
> > Changelog:
> > Dec 31: Address feedback by Eric:
> > . Correct ptrace uid check
> > . Rename may_ptrace_ns to ptrace_capable
> > . Also fix the cap_ptrace checks.
> > Jan 1: Use const cred struct
> > Jan 11: use task_ns_capable() in place of ptrace_capable().
> >
> > Signed-off-by: Serge E. Hallyn <serge.hallyn@canonical.com>
> > ---
> > include/linux/capability.h | 2 +
> > include/linux/user_namespace.h | 9 +++++++
> > kernel/ptrace.c | 27 ++++++++++++----------
> > kernel/user_namespace.c | 16 +++++++++++++
> > security/commoncap.c | 48 +++++++++++++++++++++++++++++++++------
> > 5 files changed, 82 insertions(+), 20 deletions(-)
> >
> > diff --git a/include/linux/capability.h b/include/linux/capability.h
> > index cb3d2d9..bc0f262 100644
> > --- a/include/linux/capability.h
> > +++ b/include/linux/capability.h
> > @@ -546,6 +546,8 @@ extern const kernel_cap_t __cap_init_eff_set;
> > */
> > #define has_capability(t, cap) (security_real_capable((t), &init_user_ns, (cap)) == 0)
> >
> > +#define has_ns_capability(t, ns, cap) (security_real_capable((t), (ns), (cap)) == 0)
> > +
> > /**
> > * has_capability_noaudit - Determine if a task has a superior capability available (unaudited)
> > * @t: The task in question
> > diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
> > index faf4679..862fc59 100644
> > --- a/include/linux/user_namespace.h
> > +++ b/include/linux/user_namespace.h
> > @@ -39,6 +39,9 @@ static inline void put_user_ns(struct user_namespace *ns)
> > uid_t user_ns_map_uid(struct user_namespace *to, const struct cred *cred, uid_t uid);
> > gid_t user_ns_map_gid(struct user_namespace *to, const struct cred *cred, gid_t gid);
> >
> > +int same_or_ancestor_user_ns(struct task_struct *task,
> > + struct task_struct *victim);
> > +
> > #else
> >
> > static inline struct user_namespace *get_user_ns(struct user_namespace *ns)
> > @@ -66,6 +69,12 @@ static inline gid_t user_ns_map_gid(struct user_namespace *to,
> > return gid;
> > }
> >
> > +static inline int same_or_ancestor_user_ns(struct task_struct *task,
> > + struct task_struct *victim)
> > +{
> > + return 1;
> > +}
> > +
> > #endif
> >
> > #endif /* _LINUX_USER_H */
> > diff --git a/kernel/ptrace.c b/kernel/ptrace.c
> > index 1708b1e..cde4655 100644
> > --- a/kernel/ptrace.c
> > +++ b/kernel/ptrace.c
> > @@ -134,21 +134,24 @@ int __ptrace_may_access(struct task_struct *task, unsigned int mode)
> > return 0;
> > rcu_read_lock();
> > tcred = __task_cred(task);
> > - if ((cred->uid != tcred->euid ||
> > - cred->uid != tcred->suid ||
> > - cred->uid != tcred->uid ||
> > - cred->gid != tcred->egid ||
> > - cred->gid != tcred->sgid ||
> > - cred->gid != tcred->gid) &&
> > - !capable(CAP_SYS_PTRACE)) {
> > - rcu_read_unlock();
> > - return -EPERM;
> > - }
> > + if (cred->user->user_ns == tcred->user->user_ns &&
> > + (cred->uid == tcred->euid &&
> > + cred->uid == tcred->suid &&
> > + cred->uid == tcred->uid &&
> > + cred->gid == tcred->egid &&
> > + cred->gid == tcred->sgid &&
> > + cred->gid == tcred->gid))
> > + goto ok;
> > + if (ns_capable(tcred->user->user_ns, CAP_SYS_PTRACE))
> > + goto ok;
> > + rcu_read_unlock();
> > + return -EPERM;
> > +ok:
> > rcu_read_unlock();
> > smp_rmb();
> > if (task->mm)
> > dumpable = get_dumpable(task->mm);
> > - if (!dumpable && !capable(CAP_SYS_PTRACE))
> > + if (!dumpable && !task_ns_capable(task, CAP_SYS_PTRACE))
> > return -EPERM;
> >
> > return security_ptrace_access_check(task, mode);
> > @@ -198,7 +201,7 @@ int ptrace_attach(struct task_struct *task)
> > goto unlock_tasklist;
> >
> > task->ptrace = PT_PTRACED;
> > - if (capable(CAP_SYS_PTRACE))
> > + if (task_ns_capable(task, CAP_SYS_PTRACE))
> > task->ptrace |= PT_PTRACE_CAP;
> >
> > __ptrace_link(task, current);
> > diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
> > index 9da289c..0ef2258 100644
> > --- a/kernel/user_namespace.c
> > +++ b/kernel/user_namespace.c
> > @@ -129,6 +129,22 @@ gid_t user_ns_map_gid(struct user_namespace *to, const struct cred *cred, gid_t
> > return overflowgid;
> > }
> >
> > +int same_or_ancestor_user_ns(struct task_struct *task,
> > + struct task_struct *victim)
> > +{
> > + struct user_namespace *u1 = task_cred_xxx(task, user)->user_ns;
> > + struct user_namespace *u2 = task_cred_xxx(victim, user)->user_ns;
> > + for (;;) {
> > + if (u1 == u2)
> > + return 1;
> > + if (u1 == &init_user_ns)
> > + return 0;
> > + u1 = u1->creator->user_ns;
> > + }
> > + /* We never get here */
> > + return 0;
> > +}
> > +
> > static __init int user_namespaces_init(void)
> > {
> > user_ns_cachep = KMEM_CACHE(user_namespace, SLAB_PANIC);
> > diff --git a/security/commoncap.c b/security/commoncap.c
> > index 51fa9ec..12ff65c 100644
> > --- a/security/commoncap.c
> > +++ b/security/commoncap.c
> > @@ -130,18 +130,34 @@ int cap_settime(struct timespec *ts, struct timezone *tz)
> > * @child: The process to be accessed
> > * @mode: The mode of attachment.
> > *
> > + * If we are in the same or an ancestor user_ns and have all the target
> > + * task's capabilities, then ptrace access is allowed.
> > + * If we have the ptrace capability to the target user_ns, then ptrace
> > + * access is allowed.
> > + * Else denied.
> > + *
> > * Determine whether a process may access another, returning 0 if permission
> > * granted, -ve if denied.
> > */
> > int cap_ptrace_access_check(struct task_struct *child, unsigned int mode)
> > {
> > int ret = 0;
> > + const struct cred *cred, *tcred;
> >
> > rcu_read_lock();
> > - if (!cap_issubset(__task_cred(child)->cap_permitted,
> > - current_cred()->cap_permitted) &&
> > - !capable(CAP_SYS_PTRACE))
> > - ret = -EPERM;
> > + cred = current_cred();
> > + tcred = __task_cred(child);
> > + /*
> > + * The ancestor user_ns check may be gratuitous, as I think
> > + * we've already guaranteed that in kernel/ptrace.c.
> > + */
> > + if (same_or_ancestor_user_ns(current, child) &&
> > + cap_issubset(tcred->cap_permitted, cred->cap_permitted))
> > + goto out;
>
> I have commented on this before but I took a good hard look this time,
> and can comment more intelligently.

Thanks, Eric.

> The cap_issubset check is for the case where we don't use the
> CAP_SYS_PTRACE capability, as such is only valid in the same user
> namespace. Furthermore capabilities really are not comparable between
> different user namespaces. So can you please replace the
> same_or_ancestor_user_ns with a simple namespace equality check.
> Having the wrong logic in here will just be confusing in the future.

I see. You're right, what's there is conceptually not quite right.
So I think we can just do:

if (current_user_ns() == tcred->user->userns) &&
cap_issubset(child, current)
goto out;
if ns_capable(tcred->user->user_ns, CAP_SYS_PTRACE)
goto out;

This suffices since
root in an ancestor ns will have CAP_SYS_PTRACE
the user who created the user_ns will have CAP_SYS_PTRACE

Any user in an ancestor ns who does not have CAP_SYS_PTRACE will
be denied.

> Also could you name tcred child_cread I think that would be clearer in
> the test below.

Yes, will do, that'll be clearer. Thanks.

> > + if (ns_capable(tcred->user->user_ns, CAP_SYS_PTRACE))
> > + goto out;
> > + ret = -EPERM;
> > +out:
> > rcu_read_unlock();
> > return ret;
>
> I also find it strange that we allow CAP_SYS_PTRACE to allow us to
> ptrace processes with more capabilities than ourselves. But that is an
> entirely different issue.

I concur. I'm pretty sure it must stay, but it does feel icky.

I'll send out a new version

thanks
-serge
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/co
...

Re: [PATCH 1/9] Add a user_namespace as creator/owner of uts_namespace [message #41774 is a reply to message #41747] Fri, 18 February 2011 16:57 Go to previous messageGo to next message
Daniel Lezcano is currently offline  Daniel Lezcano
Messages: 417
Registered: June 2006
Senior Member
On 02/17/2011 04:02 PM, Serge E. Hallyn wrote:
> copy_process() handles CLONE_NEWUSER before the rest of the
> namespaces. So in the case of clone(CLONE_NEWUSER|CLONE_NEWUTS)
> the new uts namespace will have the new user namespace as its
> owner. That is what we want, since we want root in that new
> userns to be able to have privilege over it.
>
> Changelog:
> Feb 15: don't set uts_ns->user_ns if we didn't create
> a new uts_ns.
>
> Signed-off-by: Serge E. Hallyn<serge.hallyn@canonical.com>

Acked-by: Daniel Lezcano <daniel.lezcano@free.fr>

A couple of comments.

> ---
> include/linux/utsname.h | 3 +++
> init/version.c | 2 ++
> kernel/nsproxy.c | 5 +++++
> kernel/user.c | 8 ++++++--
> kernel/utsname.c | 4 ++++
> 5 files changed, 20 insertions(+), 2 deletions(-)
>
> diff --git a/include/linux/utsname.h b/include/linux/utsname.h
> index 69f3997..85171be 100644
> --- a/include/linux/utsname.h
> +++ b/include/linux/utsname.h
> @@ -37,9 +37,12 @@ struct new_utsname {
> #include<linux/nsproxy.h>
> #include<linux/err.h>
>
> +struct user_namespace;
> +
> struct uts_namespace {
> struct kref kref;
> struct new_utsname name;
> + struct user_namespace *user_ns;
> };
> extern struct uts_namespace init_uts_ns;
>
> diff --git a/init/version.c b/init/version.c
> index adff586..97bb86f 100644
> --- a/init/version.c
> +++ b/init/version.c
> @@ -21,6 +21,7 @@ extern int version_string(LINUX_VERSION_CODE);
> int version_string(LINUX_VERSION_CODE);
> #endif
>
> +extern struct user_namespace init_user_ns;
> struct uts_namespace init_uts_ns = {
> .kref = {
> .refcount = ATOMIC_INIT(2),
> @@ -33,6 +34,7 @@ struct uts_namespace init_uts_ns = {
> .machine = UTS_MACHINE,
> .domainname = UTS_DOMAINNAME,
> },
> + .user_ns =&init_user_ns,
> };
> EXPORT_SYMBOL_GPL(init_uts_ns);
>
> diff --git a/kernel/nsproxy.c b/kernel/nsproxy.c
> index f74e6c0..034dc2e 100644
> --- a/kernel/nsproxy.c
> +++ b/kernel/nsproxy.c
> @@ -74,6 +74,11 @@ static struct nsproxy *create_new_namespaces(unsigned long flags,
> err = PTR_ERR(new_nsp->uts_ns);
> goto out_uts;
> }
> + if (new_nsp->uts_ns != tsk->nsproxy->uts_ns) {
> + put_user_ns(new_nsp->uts_ns->user_ns);
> + new_nsp->uts_ns->user_ns = task_cred_xxx(tsk, user)->user_ns;
> + get_user_ns(new_nsp->uts_ns->user_ns);
> + }

IMO you should add a comment telling this code assume create_user_ns was
called before (via copy_cred).

>
> new_nsp->ipc_ns = copy_ipcs(flags, tsk->nsproxy->ipc_ns);
> if (IS_ERR(new_nsp->ipc_ns)) {

[ ... ]

> static struct uts_namespace *create_uts_ns(void)
> {
> @@ -40,6 +41,8 @@ static struct uts_namespace *clone_uts_ns(struct uts_namespace *old_ns)
>
> down_read(&uts_sem);
> memcpy(&ns->name,&old_ns->name, sizeof(ns->name));
> + ns->user_ns = old_ns->user_ns;
> + get_user_ns(ns->user_ns);

ns->user_ns = get_user_ns(old_ns->user_ns);

_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containe rs
Re: [PATCH 2/9] security: Make capabilities relative to the user namespace. [message #41776 is a reply to message #41744] Fri, 18 February 2011 23:44 Go to previous messageGo to next message
Daniel Lezcano is currently offline  Daniel Lezcano
Messages: 417
Registered: June 2006
Senior Member
On 02/17/2011 04:03 PM, Serge E. Hallyn wrote:
> - Introduce ns_capable to test for a capability in a non-default
> user namespace.
> - Teach cap_capable to handle capabilities in a non-default
> user namespace.
>
> The motivation is to get to the unprivileged creation of new
> namespaces. It looks like this gets us 90% of the way there, with
> only potential uid confusion issues left.
>
> I still need to handle getting all caps after creation but otherwise I
> think I have a good starter patch that achieves all of your goals.
>
> Changelog:
> 11/05/2010: [serge] add apparmor
> 12/14/2010: [serge] fix capabilities to created user namespaces
> Without this, if user serge creates a user_ns, he won't have
> capabilities to the user_ns he created. THis is because we
> were first checking whether his effective caps had the caps
> he needed and returning -EPERM if not, and THEN checking whether
> he was the creator. Reverse those checks.
> 12/16/2010: [serge] security_real_capable needs ns argument in !security case
> 01/11/2011: [serge] add task_ns_capable helper
> 01/11/2011: [serge] add nsown_capable() helper per Bastian Blank suggestion
> 02/16/2011: [serge] fix a logic bug: the root user is always creator of
> init_user_ns, but should not always have capabilities to
> it! Fix the check in cap_capable().
>
> Signed-off-by: Eric W. Biederman<ebiederm@xmission.com>
> Signed-off-by: Serge E. Hallyn<serge.hallyn@canonical.com>
> ---

Acked-by: Daniel Lezcano <daniel.lezcano@free.fr>

_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containe rs
Re: [PATCH 3/9] allow sethostname in a container [message #41777 is a reply to message #41748] Fri, 18 February 2011 23:46 Go to previous messageGo to next message
Daniel Lezcano is currently offline  Daniel Lezcano
Messages: 417
Registered: June 2006
Senior Member
On 02/17/2011 04:03 PM, Serge E. Hallyn wrote:
> Signed-off-by: Serge E. Hallyn<serge.hallyn@canonical.com>
> ---
Acked-by: Daniel Lezcano <daniel.lezcano@free.fr>
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containe rs
Re: [PATCH 1/9] Add a user_namespace as creator/owner of uts_namespace [message #41778 is a reply to message #41747] Fri, 18 February 2011 23:59 Go to previous messageGo to next message
akpm is currently offline  akpm
Messages: 224
Registered: March 2007
Senior Member
On Thu, 17 Feb 2011 15:02:57 +0000
"Serge E. Hallyn" <serge@hallyn.com> wrote:

> +/*
> + * userns count is 1 for root user, 1 for init_uts_ns,
> + * and 1 for... ?
> + */

?
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containe rs
Re: [PATCH 2/9] security: Make capabilities relative to the user namespace. [message #41779 is a reply to message #41744] Fri, 18 February 2011 23:59 Go to previous messageGo to next message
akpm is currently offline  akpm
Messages: 224
Registered: March 2007
Senior Member
On Thu, 17 Feb 2011 15:03:06 +0000
"Serge E. Hallyn" <serge@hallyn.com> wrote:

> - Introduce ns_capable to test for a capability in a non-default
> user namespace.
> - Teach cap_capable to handle capabilities in a non-default
> user namespace.
>
> The motivation is to get to the unprivileged creation of new
> namespaces. It looks like this gets us 90% of the way there, with
> only potential uid confusion issues left.
>
> I still need to handle getting all caps after creation but otherwise I
> think I have a good starter patch that achieves all of your goals.
>
>
> ...
>
> --- a/include/linux/capability.h
> +++ b/include/linux/capability.h
> @@ -544,7 +544,7 @@ extern const kernel_cap_t __cap_init_eff_set;
> *
> * Note that this does not set PF_SUPERPRIV on the task.
> */
> -#define has_capability(t, cap) (security_real_capable((t), (cap)) == 0)
> +#define has_capability(t, cap) (security_real_capable((t), &init_user_ns, (cap)) == 0)
>
> /**
> * has_capability_noaudit - Determine if a task has a superior capability available (unaudited)
> @@ -558,9 +558,15 @@ extern const kernel_cap_t __cap_init_eff_set;
> * Note that this does not set PF_SUPERPRIV on the task.
> */
> #define has_capability_noaudit(t, cap) \
> - (security_real_capable_noaudit((t), (cap)) == 0)
> + (security_real_capable_noaudit((t), &init_user_ns, (cap)) == 0)
>
> +struct user_namespace;
> +extern struct user_namespace init_user_ns;

Two icky-should-be-written-in-C macros which reference init_user_ns,
followed by the declaration of init_user_ns and its type. Declarations
which duplicate those in other header files. It's ripe for some
upcleaning, methinks?

Also, please ensure that the forward struct declarations are all at
top-of-file (as in include/linux/security.h). Otherwise we can end up
accumulating multiple forward declarations of the same thing in the one
file.

> extern int capable(int cap);
> +extern int ns_capable(struct user_namespace *ns, int cap);
> +extern int task_ns_capable(struct task_struct *t, int cap);
> +
> +#define nsown_capable(cap) (ns_capable(current_user_ns(), (cap)))

macroitis!

> @@ -301,15 +302,42 @@ error:
> */
> int capable(int cap)
> {
> + return ns_capable(&init_user_ns, cap);
> +}
> +EXPORT_SYMBOL(capable);
> +
> +/**
> + * ns_capable - Determine if the current task has a superior capability in effect
> + * @ns: The usernamespace we want the capability in
> + * @cap: The capability to be tested for
> + *
> + * Return true if the current task has the given superior capability currently
> + * available for use, false if not.

Actually it doesn't return true or false - it returns 1 or 0. Using a
`bool' return type would fix the comment :)

> + * This sets PF_SUPERPRIV on the task if the capability is available on the
> + * assumption that it's about to be used.
> + */
> +int ns_capable(struct user_namespace *ns, int cap)
> +{
> if (unlikely(!cap_valid(cap))) {
> printk(KERN_CRIT "capable() called with invalid cap=%u\n", cap);
> BUG();
> }
>
> - if (security_capable(current_cred(), cap) == 0) {
> + if (security_capable(ns, current_cred(), cap) == 0) {
> current->flags |= PF_SUPERPRIV;
> return 1;
> }
> return 0;
> }
> -EXPORT_SYMBOL(capable);
> +EXPORT_SYMBOL(ns_capable);
> +
> +/*
> + * does current have capability 'cap' to the user namespace of task
> + * 't'. Return true if it does, false otherwise.
> + */

Other comments were kerneldocified.

> +int task_ns_capable(struct task_struct *t, int cap)
> +{
> + return ns_capable(task_cred_xxx(t, user)->user_ns, cap);
> +}
> +EXPORT_SYMBOL(task_ns_capable);

Could return bool.

>
> ...
>
> +int cap_capable(struct task_struct *tsk, const struct cred *cred,
> + struct user_namespace *targ_ns, int cap, int audit)
> {
> - return cap_raised(cred->cap_effective, cap) ? 0 : -EPERM;
> + for (;;) {
> + /* The creator of the user namespace has all caps. */
> + if (targ_ns != &init_user_ns && targ_ns->creator == cred->user)
> + return 0;
> +
> + /* Do we have the necessary capabilities? */
> + if (targ_ns == cred->user->user_ns)
> + return cap_raised(cred->cap_effective, cap) ? 0 : -EPERM;
> +
> + /* Have we tried all of the parent namespaces? */
> + if (targ_ns == &init_user_ns)
> + return -EPERM;
> +
> + /* If you have the capability in a parent user ns you have it
> + * in the over all children user namespaces as well, so see
> + * if this process has the capability in the parent user
> + * namespace.
> + */
> + targ_ns = targ_ns->creator->user_ns;
> + }
> +
> + /* We never get here */
> + return -EPERM;

So delete the code? Or does the compiler warn? If so, it's pretty busted.

> }
>
> /**
>
> ...
>

_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containe rs
Re: [PATCH 4/9] allow killing tasks in your own or child userns [message #41780 is a reply to message #41749] Fri, 18 February 2011 23:59 Go to previous messageGo to next message
akpm is currently offline  akpm
Messages: 224
Registered: March 2007
Senior Member
On Thu, 17 Feb 2011 15:03:25 +0000
"Serge E. Hallyn" <serge@hallyn.com> wrote:

> /*
> + * called with RCU read lock from check_kill_permission()
> + */
> +static inline int kill_ok_by_cred(struct task_struct *t)
> +{
> + const struct cred *cred = current_cred();
> + const struct cred *tcred = __task_cred(t);
> +
> + if (cred->user->user_ns == tcred->user->user_ns &&
> + (cred->euid == tcred->suid ||
> + cred->euid == tcred->uid ||
> + cred->uid == tcred->suid ||
> + cred->uid == tcred->uid))
> + return 1;
> +
> + if (ns_capable(tcred->user->user_ns, CAP_KILL))
> + return 1;
> +
> + return 0;
> +}

The compiler will inline this for us.

> +/*
> * Bad permissions for sending the signal
> * - the caller must hold the RCU read lock
> */
> static int check_kill_permission(int sig, struct siginfo *info,
> struct task_struct *t)
> {
> - const struct cred *cred, *tcred;
> struct pid *sid;
> int error;
>
> @@ -656,14 +676,8 @@ static int check_kill_permission(int sig, struct siginfo *info,
> if (error)
> return error;
>
> - cred = current_cred();
> - tcred = __task_cred(t);
> if (!same_thread_group(current, t) &&
> - (cred->euid ^ tcred->suid) &&
> - (cred->euid ^ tcred->uid) &&
> - (cred->uid ^ tcred->suid) &&
> - (cred->uid ^ tcred->uid) &&
> - !capable(CAP_KILL)) {
> + !kill_ok_by_cred(t)) {
> switch (sig) {
> case SIGCONT:
> sid = task_session(t);

_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containe rs
Re: [PATCH 5/9] Allow ptrace from non-init user namespaces [message #41781 is a reply to message #41751] Fri, 18 February 2011 23:59 Go to previous messageGo to next message
akpm is currently offline  akpm
Messages: 224
Registered: March 2007
Senior Member
On Thu, 17 Feb 2011 15:03:33 +0000
"Serge E. Hallyn" <serge@hallyn.com> wrote:

> ptrace is allowed to tasks in the same user namespace according to
> the usual rules (i.e. the same rules as for two tasks in the init
> user namespace). ptrace is also allowed to a user namespace to
> which the current task the has CAP_SYS_PTRACE capability.
>
>
> ...
>
> --- a/include/linux/capability.h
> +++ b/include/linux/capability.h
> @@ -546,6 +546,8 @@ extern const kernel_cap_t __cap_init_eff_set;
> */
> #define has_capability(t, cap) (security_real_capable((t), &init_user_ns, (cap)) == 0)
>
> +#define has_ns_capability(t, ns, cap) (security_real_capable((t), (ns), (cap)) == 0)

macroitis.

> /**
> * has_capability_noaudit - Determine if a task has a superior capability available (unaudited)
> * @t: The task in question
> diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
> index faf4679..862fc59 100644
> --- a/include/linux/user_namespace.h
> +++ b/include/linux/user_namespace.h
> @@ -39,6 +39,9 @@ static inline void put_user_ns(struct user_namespace *ns)
> uid_t user_ns_map_uid(struct user_namespace *to, const struct cred *cred, uid_t uid);
> gid_t user_ns_map_gid(struct user_namespace *to, const struct cred *cred, gid_t gid);
>
> +int same_or_ancestor_user_ns(struct task_struct *task,
> + struct task_struct *victim);

bool.

> #else
>
> static inline struct user_namespace *get_user_ns(struct user_namespace *ns)
>
> ...
>
> --- a/kernel/user_namespace.c
> +++ b/kernel/user_namespace.c
> @@ -129,6 +129,22 @@ gid_t user_ns_map_gid(struct user_namespace *to, const struct cred *cred, gid_t
> return overflowgid;
> }
>
> +int same_or_ancestor_user_ns(struct task_struct *task,
> + struct task_struct *victim)
> +{
> + struct user_namespace *u1 = task_cred_xxx(task, user)->user_ns;
> + struct user_namespace *u2 = task_cred_xxx(victim, user)->user_ns;
> + for (;;) {
> + if (u1 == u2)
> + return 1;
> + if (u1 == &init_user_ns)
> + return 0;
> + u1 = u1->creator->user_ns;
> + }
> + /* We never get here */
> + return 0;

Remove?

> +}
> +
> static __init int user_namespaces_init(void)
> {
> user_ns_cachep = KMEM_CACHE(user_namespace, SLAB_PANIC);
>
> ...
>
> int cap_ptrace_access_check(struct task_struct *child, unsigned int mode)
> {
> int ret = 0;
> + const struct cred *cred, *tcred;
>
> rcu_read_lock();
> - if (!cap_issubset(__task_cred(child)->cap_permitted,
> - current_cred()->cap_permitted) &&
> - !capable(CAP_SYS_PTRACE))
> - ret = -EPERM;
> + cred = current_cred();
> + tcred = __task_cred(child);
> + /*
> + * The ancestor user_ns check may be gratuitous, as I think
> + * we've already guaranteed that in kernel/ptrace.c.
> + */

?

> + if (same_or_ancestor_user_ns(current, child) &&
> + cap_issubset(tcred->cap_permitted, cred->cap_permitted))
> + goto out;
> + if (ns_capable(tcred->user->user_ns, CAP_SYS_PTRACE))
> + goto out;
> + ret = -EPERM;
> +out:
> rcu_read_unlock();
> return ret;
> }
>
> ...
>

_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containe rs
Re: [PATCH 7/9] add a user namespace owner of ipc ns [message #41782 is a reply to message #41746] Fri, 18 February 2011 23:59 Go to previous messageGo to next message
akpm is currently offline  akpm
Messages: 224
Registered: March 2007
Senior Member
On Thu, 17 Feb 2011 15:03:49 +0000
"Serge E. Hallyn" <serge@hallyn.com> wrote:

>
> ...
>
> --- a/include/linux/ipc_namespace.h
> +++ b/include/linux/ipc_namespace.h
> @@ -24,6 +24,7 @@ struct ipc_ids {
> struct idr ipcs_idr;
> };
>
> +struct user_namespace;

Move to top of file.

> struct ipc_namespace {
> atomic_t count;
> struct ipc_ids ids[3];
> @@ -56,6 +57,8 @@ struct ipc_namespace {
> unsigned int mq_msg_max; /* initialized to DFLT_MSGMAX */
> unsigned int mq_msgsize_max; /* initialized to DFLT_MSGSIZEMAX */
>
> + /* user_ns which owns the ipc ns */
> + struct user_namespace *user_ns;
> };
>
> extern struct ipc_namespace init_ipc_ns;
> diff --git a/ipc/msgutil.c b/ipc/msgutil.c
> index f095ee2..d91ff4b 100644
> --- a/ipc/msgutil.c
> +++ b/ipc/msgutil.c
> @@ -20,6 +20,8 @@
>
> DEFINE_SPINLOCK(mq_lock);
>
> +extern struct user_namespace init_user_ns;

Should be declared in .h, not in .c.

> /*
> * The next 2 defines are here bc this is the only file
> * compiled when either CONFIG_SYSVIPC and CONFIG_POSIX_MQUEUE
>
> ...
>

_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containe rs
Re: [PATCH 6/9] user namespaces: convert all capable checks in kernel/sys.c [message #41783 is a reply to message #41750] Fri, 18 February 2011 23:59 Go to previous messageGo to next message
akpm is currently offline  akpm
Messages: 224
Registered: March 2007
Senior Member
On Thu, 17 Feb 2011 15:03:42 +0000
"Serge E. Hallyn" <serge@hallyn.com> wrote:

> This allows setuid/setgid in containers. It also fixes some
> corner cases where kernel logic foregoes capability checks when
> uids are equivalent. The latter will need to be done throughout
> the whole kernel.
>
>
> ...
>
> --- a/kernel/sys.c
> +++ b/kernel/sys.c
> @@ -118,17 +118,29 @@ EXPORT_SYMBOL(cad_pid);
>
> void (*pm_power_off_prepare)(void);
>
> +/* called with rcu_read_lock, creds are safe */
> +static inline int set_one_prio_perm(struct task_struct *p)
> +{
> + const struct cred *cred = current_cred(), *pcred = __task_cred(p);
> +
> + if (pcred->user->user_ns == cred->user->user_ns &&
> + (pcred->uid == cred->euid ||
> + pcred->euid == cred->euid))
> + return 1;
> + if (ns_capable(pcred->user->user_ns, CAP_SYS_NICE))
> + return 1;
> + return 0;
> +}

uninline. Document return value?

>
> ...
>

_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containe rs
Re: [PATCH 9/9] userns: check user namespace for task-&gt;file uid equivalence checks [message #41784 is a reply to message #41745] Fri, 18 February 2011 23:59 Go to previous messageGo to next message
akpm is currently offline  akpm
Messages: 224
Registered: March 2007
Senior Member
On Thu, 17 Feb 2011 15:04:07 +0000
"Serge E. Hallyn" <serge@hallyn.com> wrote:

> Cheat for now and say all files belong to init_user_ns. Next
> step will be to let superblocks belong to a user_ns, and derive
> inode_userns(inode) from inode->i_sb->s_user_ns. Finally we'll
> introduce more flexible arrangements.
>
>
> ...
>
> +
> +/*
> + * return 1 if current either has CAP_FOWNER to the
> + * file, or owns the file.
> + */
> +int is_owner_or_cap(const struct inode *inode)
> +{
> + struct user_namespace *ns = inode_userns(inode);
> +
> + if (current_user_ns() == ns && current_fsuid() == inode->i_uid)
> + return 1;
> + if (ns_capable(ns, CAP_FOWNER))
> + return 1;
> + return 0;
> +}

bool?

> +EXPORT_SYMBOL(is_owner_or_cap);

There's a fairly well adhered to convention that global symbols (and
often static symbols) have a prefix which identifies the subsystem to
which they belong. This patchset rather scorns that convention.

Most of these identifiers are pretty obviously from the capability
subsystem, but still...

>
> ...
>

_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containe rs
Re: [PATCH 6/9] user namespaces: convert all capable checks in kernel/sys.c [message #41785 is a reply to message #41750] Sat, 19 February 2011 00:01 Go to previous messageGo to next message
akpm is currently offline  akpm
Messages: 224
Registered: March 2007
Senior Member
On Thu, 17 Feb 2011 15:03:42 +0000
"Serge E. Hallyn" <serge@hallyn.com> wrote:

> @@ -1177,8 +1189,11 @@ SYSCALL_DEFINE2(sethostname, char __user *, name, int, len)
> int errno;
> char tmp[__NEW_UTS_LEN];
>
> - if (!ns_capable(current->nsproxy->uts_ns->user_ns, CAP_SYS_ADMIN))
> + if (!ns_capable(current->nsproxy->uts_ns->user_ns, CAP_SYS_ADMIN)) {
> + printk(KERN_NOTICE "%s: did not have CAP_SYS_ADMIN\n", __func__);
> return -EPERM;
> + }
> + printk(KERN_NOTICE "%s: did have CAP_SYS_ADMIN\n", __func__);
> if (len < 0 || len > __NEW_UTS_LEN)
> return -EINVAL;
> down_write(&uts_sem);

Left over debugging printks?
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containe rs
Re: [PATCH 4/9] allow killing tasks in your own or child userns [message #41787 is a reply to message #41749] Sat, 19 February 2011 10:55 Go to previous messageGo to next message
Daniel Lezcano is currently offline  Daniel Lezcano
Messages: 417
Registered: June 2006
Senior Member
On 02/17/2011 04:03 PM, Serge E. Hallyn wrote:
> Changelog:
> Dec 8: Fixed bug in my check_kill_permission pointed out by
> Eric Biederman.
> Dec 13: Apply Eric's suggestion to pass target task into kill_ok_by_cred()
> for clarity
> Dec 31: address comment by Eric Biederman:
> don't need cred/tcred in check_kill_permission.
> Jan 1: use const cred struct.
> Jan 11: Per Bastian Blank's advice, clean up kill_ok_by_cred().
> Feb 16: kill_ok_by_cred: fix bad parentheses
>
> Signed-off-by: Serge E. Hallyn<serge.hallyn@canonical.com>

Acked-by: Daniel Lezcano <daniel.lezcano@free.fr>
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containe rs
Re: [PATCH 5/9] Allow ptrace from non-init user namespaces [message #41788 is a reply to message #41751] Sat, 19 February 2011 17:49 Go to previous messageGo to next message
Daniel Lezcano is currently offline  Daniel Lezcano
Messages: 417
Registered: June 2006
Senior Member
On 02/17/2011 04:03 PM, Serge E. Hallyn wrote:
> ptrace is allowed to tasks in the same user namespace according to
> the usual rules (i.e. the same rules as for two tasks in the init
> user namespace). ptrace is also allowed to a user namespace to
> which the current task the has CAP_SYS_PTRACE capability.
>
> Changelog:
> Dec 31: Address feedback by Eric:
> . Correct ptrace uid check
> . Rename may_ptrace_ns to ptrace_capable
> . Also fix the cap_ptrace checks.
> Jan 1: Use const cred struct
> Jan 11: use task_ns_capable() in place of ptrace_capable().
>
> Signed-off-by: Serge E. Hallyn<serge.hallyn@canonical.com>
Acked-by: Daniel Lezcano <daniel.lezcano@free.fr>

_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containe rs
Re: [PATCH 6/9] user namespaces: convert all capable checks in kernel/sys.c [message #41789 is a reply to message #41750] Sat, 19 February 2011 17:52 Go to previous messageGo to next message
Daniel Lezcano is currently offline  Daniel Lezcano
Messages: 417
Registered: June 2006
Senior Member
On 02/17/2011 04:03 PM, Serge E. Hallyn wrote:
> This allows setuid/setgid in containers. It also fixes some
> corner cases where kernel logic foregoes capability checks when
> uids are equivalent. The latter will need to be done throughout
> the whole kernel.
>
> Changelog:
> Jan 11: Use nsown_capable() as suggested by Bastian Blank.
> Jan 11: Fix logic errors in uid checks pointed out by Bastian.
> Feb 15: allow prlimit to current (was regression in previous version)
>
> Signed-off-by: Serge E. Hallyn<serge.hallyn@canonical.com>

Acked-by: Daniel Lezcano <daniel.lezcano@free.fr>

>
> - if (!ns_capable(current->nsproxy->uts_ns->user_ns, CAP_SYS_ADMIN))
> + if (!ns_capable(current->nsproxy->uts_ns->user_ns, CAP_SYS_ADMIN)) {
> + printk(KERN_NOTICE "%s: did not have CAP_SYS_ADMIN\n", __func__);
> return -EPERM;
> + }
> + printk(KERN_NOTICE "%s: did have CAP_SYS_ADMIN\n", __func__);

A couple of printk left here.


_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containe rs
Re: [PATCH 7/9] add a user namespace owner of ipc ns [message #41790 is a reply to message #41746] Sat, 19 February 2011 17:57 Go to previous messageGo to next message
Daniel Lezcano is currently offline  Daniel Lezcano
Messages: 417
Registered: June 2006
Senior Member
On 02/17/2011 04:03 PM, Serge E. Hallyn wrote:
> Changelog:
> Feb 15: Don't set new ipc->user_ns if we didn't create a new
> ipc_ns.
>
> Signed-off-by: Serge E. Hallyn<serge.hallyn@canonical.com>
> ---

Acked-by: Daniel Lezcano <daniel.lezcano@free.fr>


[ ... ]

> + ns->user_ns = old_ns->user_ns;
> + get_user_ns(ns->user_ns);

A mindless change.

ns->user_ns = get_user_ns(old_ns->user_ns);


_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containe rs
Re: [PATCH 8/9] user namespaces: convert several capable() calls [message #41791 is a reply to message #41752] Sat, 19 February 2011 19:07 Go to previous messageGo to next message
Daniel Lezcano is currently offline  Daniel Lezcano
Messages: 417
Registered: June 2006
Senior Member
On 02/17/2011 04:03 PM, Serge E. Hallyn wrote:
> CAP_IPC_OWNER and CAP_IPC_LOCK can be checked against current_user_ns(),
> because the resource comes from current's own ipc namespace.
>
> setuid/setgid are to uids in own namespace, so again checks can be
> against current_user_ns().
>
> Changelog:
> Jan 11: Use task_ns_capable() in place of sched_capable().
> Jan 11: Use nsown_capable() as suggested by Bastian Blank.
> Jan 11: Clarify (hopefully) some logic in futex and sched.c
> Feb 15: use ns_capable for ipc, not nsown_capable
>
> Signed-off-by: Serge E. Hallyn<serge.hallyn@canonical.com>
> ---

Acked-by: Daniel Lezcano <daniel.lezcano@free.fr>
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containe rs
Re: [PATCH 9/9] userns: check user namespace for task-&gt;file uid equivalence checks [message #41792 is a reply to message #41745] Sat, 19 February 2011 19:22 Go to previous messageGo to previous message
Daniel Lezcano is currently offline  Daniel Lezcano
Messages: 417
Registered: June 2006
Senior Member
On 02/17/2011 04:04 PM, Serge E. Hallyn wrote:
> Cheat for now and say all files belong to init_user_ns. Next
> step will be to let superblocks belong to a user_ns, and derive
> inode_userns(inode) from inode->i_sb->s_user_ns. Finally we'll
> introduce more flexible arrangements.
>
> Changelog:
> Feb 15: make is_owner_or_cap take const struct inode
>
> Signed-off-by: Serge E. Hallyn<serge.hallyn@canonical.com>
> ---

Acked-by: Daniel Lezcano <daniel.lezcano@free.fr>
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containe rs
Previous Topic: [PATCH 0/5] blk-throttle: writeback and swap IO control
Next Topic: [PATCH 1/4] userns: let clone_uts_ns() handle setting uts-&gt;user_ns
Goto Forum:
  


Current Time: Fri Oct 24 14:50:05 GMT 2025

Total time taken to generate the page: 0.14453 seconds