OpenVZ Forum


Home » Mailing lists » Devel » containers access control 'roadmap'
Re: containers access control 'roadmap' [message #19955 is a reply to message #19949] Thu, 06 September 2007 21:10 Go to previous messageGo to previous message
serue is currently offline  serue
Messages: 750
Registered: February 2006
Senior Member
Quoting Herbert Poetzl (herbert@13thfloor.at):
> On Thu, Sep 06, 2007 at 01:26:11PM -0500, Serge E. Hallyn wrote:
> > Quoting Herbert Poetzl (herbert@13thfloor.at):
> > > On Thu, Sep 06, 2007 at 11:55:34AM -0500, Serge E. Hallyn wrote:
> > > > Roadmap is a bit of an exaggeration, but here is a list of the
> > > > next bit of work i expect to do relating to containers and access
> > > > control. The list gets more vague toward the end, with the intent
> > > > of going far enough ahead to show what the final result would
> > > > hopefully look like.
> > > >
> > > > Please review and tell me where I'm unclear, inconsistant,
> > > > glossing over important details, or completely on drugs.
> > 
> > Thanks for looking this over, Herbert.
> > 
> > > > 1. introduce CAP_HOST_ADMIN
> > > > 
> > > > 	acts like a mask.  If set, all capabilities apply across
> > > > 	namespaces.
> > > > 
> > > > 	is that ok, or do we insist on duplicates for all caps?
> > > > 
> > > > 	brings us into 64-bit caps, so associated patches come
> > > > 	along
> > > > 
> > > > 	As an example, CAP_DAC_OVERRIDE by itself will mean within
> > > > 	the same user namespace, while CAP_DAC_OVERRIDE|CAP_HOST_ADMIN
> > > > 	will override userns equivalence checks.
> > > 
> > > what does that mean? 
> > > guest spaces need to be limited to a certain (mutable)
> > > subset of capabilities to work properly, please explain
> > 
> > (note that that muable subset of caps for guest spaces is what item
> > #2, the per-process cap_bset, implements)
> 
> how is per-process supposed to handle things like
> suid-root properly?

Simple.  It's inherited at fork, and you can take caps out but not put
them back in.

> > > how this relates?
> > 
> > capabilities will give you privileged access within your own
> > container. Also having CAP_HOST_ADMIN will mean that the capabilities
> > you have can also be used against objects in other containers.
> 
> also, please make sure that you extend the capability
> set to 64 bit first, as this would be using up the
> last capability (which is not a good idea IMHO)

Of course - unless you talk me out of defining the capability :)

> > Now maybe you prefer a model where a "container" is owned by some
> > user in some namespaces. All capabilities apply purely within their
> > own namespace, and a container owner has full rights to the owned
> > containers. That makes container vms more like a qemu vm.
> > 
> > Or maybe I just punt this for now altogether, and we address
> > cross-namespace privileged access if/when we really need it.
> > 
> > > > 2. introduce per-process cap_bset
> > > > 	
> > > > 	Idea is you can start a container with cap-bset not containing
> > > > 	CAP_HOST_ADMIN, for instance.
> > > > 
> > > > 	As namespaces are fleshed out and proper behavior for
> > > > 	cross-namespace access is figured out (see step 7) I
> > > > 	expect behavior under !CAP_HOST_ADMIN with certain
> > > > 	capabilities will change.  I.e. if we get a device
> > > > 	namespace, CAP_MKNOD will be different from
> > > > 	CAP_HOST_ADMIN|CAP_MKNOD, and people will want to
> > > > 	start keeping CAP_MKNOD in their container cap_bsets.
> > > 
> > > doesn't sound like a good idea to me, ignoring caps
> > > or disallowing them seems okay, but changing the meaning
> > > between caps (depending on host or guest space) seems
> > > just wrong ...
> > 
> > Ok your 'doesn't sound like a good idea' is to my blabbing though,
> > not the the per-process cap_bset. Right? So you're again objecting
> > to CAP_HOST_ADMIN, item #1?
> 
> no, actually it is to the idea having capabilities which
> mean different things depending on whether they are 
> available on the host or inside a guest (because that
> would mean handling them different in userspace software
> and for administration)

Whoa - no i am not saying caps would be handled differently based on
whether you're in a container or not.  In fact from what I've introduced
there is no such thing as a 'host' or 'admin' container.

Rather, the single capability, CAP_HOST_ADMIN, just means that your
capabilities will also apply to actions on objects in namespaces other
than your own.  If you don't have CAP_HOST_ADMIN, then capabilities will
only give you privileged status with respect to objects in your own
namespaces.

So in theory you could have a child container where admin has
CAP_HOST_ADMIN, while the initial set of namespaces, or what some might
be tempted to otherwise call the 'host container', have taken
CAP_HOST_ADMIN out of their cap_bset (after spawning off the child
container with the CAP_HOST_ADMIN bit in it's cap_bset).

Is that clearer?  Is it less objectionable to you?

> > > > 3. audit driver code etc for any and all uid==0 checks.  Fix those
> > > >    immediately to take user namespaces into account.
> > > 
> > > okay, sounds good ...
> > 
> > Ok maybe i should make that '#1' and get going as it's the least
> > contraversial :)
> > 
> > Though I think I still prefer to start with #2.
> > 
> > > > 4. introduce inode->user_ns, as per my previous userns patchset from
> > > >    April (I guess posted in June, according to:
> > > >    https://lists.linux-foundation.org/pipermail/containers/2007-June/005342.html)
> > > > 
> > > > 	For now, enforce roughly the following access checks when
> > > > 	inode->user_ns is set:
> > > > 
> > > > 	if capable(CAP_HOST_ADMIN|CAP_DAC_OVERRIDE)
> > > > 		allow
> > > > 	if current->userns==inode->userns {
> > > > 		if capable(CAP_DAC_OVERRIDE)
> > > > 			allow
> > > > 		if current->uid==inode->i_uid
> > > > 			allow as owner
> > > > 		inode->i_uid is in current's keychain
> > > > 			allow as owner
> > > > 		uid==inode->i_gid in current's groups
> > > > 			allow as group
> > > > 	}
> > > > 	treat as user 'other' (i.e. usually read-only access)
> > > 
> > > what about inodes belonging to several contexts?
> > 
> > There's no such thing in the way I was envisioning it.
> > 
> > An inode belongs to one context.  A user can belong to several.
> 
> well, at least in Linux-VServer, inodes are shared
> on a per inode basis between guests, which drastically
> reduces the memory and disk overhead if you have more
> than one guest of similar nature ...

And I believe the same can be done with what I am suggesting.

> > > (which is a major resource conserving feature of OS
> > > level isolation)
> > 
> > Sure. Let's say you want to share /usr among many servers. 
> > It exists in the host user namespace. 
> > In guest user namespaces, anyone including  root will have 
> > access to them as though they were user 'other', i.e.
> > if a directory has 751 perms, you'll get '1'.
> 
> no,

Well, yes: I'm describing my proposal :)

> the inodes are shared in a way that the guest has
> (almost) full control over them, including copy on
> write functionality when inode contents or properties
> change (see unification for details)

In my proposal, the assignment of values to inode->userns, and
enforcement, is left to the filesystem.  So a filesystem can be written
that understands and interprets global user ids, or, to mimic what you
have, a simple stackable cow filesystem could be used.

> i.e. for us, the ability to share inodes between
> completely different process _and_ user spaces is
> essential because of resource consumption.
> 
> > > > 5. Then comes the piece where users can get credentials 
> > > > as users in other namespaces to store in their keychain.
> > > 
> > > does that make sense? wouldn't it be better to have
> > > the keychains 'per context'?
> > 
> > Either you misunderstood me, or I misunderstand you.
> > 
> > What I am saying is that there is a 'uid' keychain, which 
> > holds things like (usernamespace 3, uid 5), meaning that 
> > even though I am uid 1000 in usernamespace 1, I am allowed 
> > access to usernamespace 3 as though I were uid 5.
> > 
> > I expect the two common use cases of this to be:
> > 
> > 	1. uid 5 on the host system created a virtual server, 
> >          and gives himself a (usernamespace 2, uid 0) key 
> >          so he is root in the virtual server without having 
> >          to enter it.  (Meaning he can signal all processes, 
> >	   access all files, etc)
> > 
> > 	2. uid 3000 on the host system is given (usernamespace 
> >	   2, uid 1001) in a virtual server so he can access 
> >	   uid 1001's files in the virtual server which has 
> >	   usernamespace 2.
> 
> do you mean files here or actually inodes or both?
> why shouldn't the host context be able to access
> any of them without acquiring any credentials?

Because there is no 'host context', just an initial namespace.

> > > > 6. enforce other 
...

 
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Previous Topic: Thoughts on Namespace / Subsystem unification
Next Topic: [PATCH 5/20] Introduce struct upid
Goto Forum:
  


Current Time: Sat Jul 19 01:08:38 GMT 2025

Total time taken to generate the page: 0.04577 seconds