| Home » Mailing lists » Devel » Re: [patch 2/6] [Network namespace] Network device sharing by view Goto Forum:
	| 
		
			| Re: [patch 2/6] [Network namespace] Network device sharing by view [message #4035] | Tue, 27 June 2006 09:38  |  
			| 
				
				
					|  Andrey Savochkin Messages: 47
 Registered: December 2005
 | Member |  |  |  
	| On Tue, Jun 27, 2006 at 11:34:36AM +0200, Daniel Lezcano wrote: > Andrey Savochkin wrote:
 > > Daniel,
 > >
 > > On Mon, Jun 26, 2006 at 05:49:41PM +0200, Daniel Lezcano wrote:
 > >
 > >>>Then you lose the ability for each namespace to have its own routing entries.
 > >>>Which implies that you'll have difficulties with devices that should exist
 > >>>and be visible in one namespace only (like tunnels), as they require IP
 > >>>addresses and route.
 > >>
 > >>I mean instead of having the route tables private to the namespace, the
 > >>routes have the information to which namespace they are associated.
 > >
 > >
 > > I think I understand what you're talking about: you want to make routing
 > > responsible for determining destination namespace ID in addition to route
 > > type (local, unicast etc), nexthop information, and so on.  Right?
 >
 > Yes.
 >
 > >
 > > My point is that if you make namespace tagging at routing time, and
 > > your packets are being routed only once, you lose the ability
 > > to have separate routing tables in each namespace.
 >
 > Right. What is the advantage of having separate the routing tables ?
 
 Routing is everything.
 For example, I want namespaces to have their private tunnel devices.
 It means that namespaces should be allowed have private routes of local type,
 private default routes, and so on...
 
 Andrey
 |  
	|  |  |  
	| 
		
			| Re: [patch 2/6] [Network namespace] Network device sharing by view [message #4039 is a reply to message #4035] | Tue, 27 June 2006 11:21   |  
			| 
				
				
					|  Daniel Lezcano Messages: 417
 Registered: June 2006
 | Senior Member |  |  |  
	| >>>My point is that if you make namespace tagging at routing time, and >>>your packets are being routed only once, you lose the ability
 >>>to have separate routing tables in each namespace.
 >>
 >>Right. What is the advantage of having separate the routing tables ?
 >
 >
 > Routing is everything.
 > For example, I want namespaces to have their private tunnel devices.
 > It means that namespaces should be allowed have private routes of local type,
 > private default routes, and so on...
 >
 
 Ok, we are talking about the same things. We do it only in a different way:
 
 * separate routing table :
 namespace
 |
 \--- route_tables
 |
 \---routes
 
 * tagged routing table :
 route_tables
 |
 \---routes
 |
 \---namespace
 
 When using routes private to the namespace, globally the logic of the ip
 stack is not changed, it manipulates only differents variables. It is
 more clean than tagging the route for the reasons mentioned by Eric.
 
 When using route tagging, the logic is changed because when doing lookup
 on the routes table which is global, the namespace is used to match the
 route and make it visible.
 
 I use the second method, because I think it is more effecient and reduce
 the overhead. But the isolation is minimalist and only aims to avoid the
 application using ressources outside of the container (aka namespace)
 without taking care of the system. For example, I didn't take care of
 network devices, because as far as see I can't imagine an administrator
 wanting to change the network device name while there are hundred of
 containers running. Concerning tunnel devices for example, they should
 be created inside the container.
 
 I think, private network ressources method is more elegant and involves
 more network ressources, but there is probably a significant overhead
 and some difficulties to have __lightweight__ container (aka application
 container), make nfs working well, etc... I did some tests with tbench
 and the loopback with the private namespace and there is roughly an
 overhead of 4 % without the isolation since with the tagging method
 there is 1 % with the isolation.
 
 The network namespace aims the isolation for now, but the container
 based on the namespaces will probably need checkpoint/restart and
 migration ability. The migration is needed not only for servers but for
 HPC jobs too.
 
 So I don't know what level of isolation/virtualization is really needed
 by users, what should be acceptable (strong isolation and overhead /
 weak isolation and efficiency). I don't know if people wanting strong
 isolation will not prefer Xen (cleary with much more overhead than your
 patches ;) )
 
 
 
 Regards
 -- Daniel
 |  
	|  |  |  
	| 
		
			| Re: [patch 2/6] [Network namespace] Network device sharing by view [message #4040 is a reply to message #4039] | Tue, 27 June 2006 11:52   |  
			| 
				
				
					|  ebiederm Messages: 1354
 Registered: February 2006
 | Senior Member |  |  |  
	| Daniel Lezcano <dlezcano@fr.ibm.com> writes: 
 >>>>My point is that if you make namespace tagging at routing time, and
 >>>>your packets are being routed only once, you lose the ability
 >>>>to have separate routing tables in each namespace.
 >>>
 >>>Right. What is the advantage of having separate the routing tables ?
 >> Routing is everything.
 >> For example, I want namespaces to have their private tunnel devices.
 >> It means that namespaces should be allowed have private routes of local type,
 >> private default routes, and so on...
 >>
 >
 > Ok, we are talking about the same things. We do it only in a different way:
 >
 > 	* separate routing table :
 > 		 namespace
 > 			|
 > 			\--- route_tables
 > 				|
 > 				\---routes
 >
 > 	* tagged routing table :
 > 		route_tables
 > 			|
 > 			\---routes
 > 				|
 > 				\---namespace
 
 There is a third possibility, that falls in between these two if local
 communication is really the bottle neck.
 
 We have the dst cache for caching routes and cache multiple transformations
 that happen on a packet.
 
 With a little extra knowledge it is possible to have the separate
 routing tables but have special logic that recognizes the local tunnel
 device that connects namespaces and have it look into the next
 namespaces routes, and build up a complete stack of dst entries of
 where the packet needs to go.
 
 I keep forgetting about that possibility.  But as long as everything
 is done at the routing layer that should work.
 
 > I use the second method, because I think it is more effecient and reduce the
 > overhead. But the isolation is minimalist and only aims to avoid the application
 > using ressources outside of the container (aka namespace) without taking care of
 > the system. For example, I didn't take care of network devices, because as far
 > as see I can't imagine an administrator wanting to change the network device
 > name while there are hundred of containers running. Concerning tunnel devices
 > for example, they should be created inside the container.
 
 Inside the containers I want all network devices named eth0!
 
 > I think, private network ressources method is more elegant and involves more
 > network ressources, but there is probably a significant overhead and some
 > difficulties to have __lightweight__ container (aka application container), make
 > nfs working well, etc... I did some tests with tbench and the loopback with the
 > private namespace and there is roughly an overhead of 4 % without the isolation
 > since with the tagging method there is 1 % with the isolation.
 
 The overhead went down?
 
 > The network namespace aims the isolation for now, but the container based on the
 > namespaces will probably need checkpoint/restart and migration ability. The
 > migration is needed not only for servers but for HPC jobs too.
 
 Yes.
 
 > So I don't know what level of isolation/virtualization is really needed by
 > users, what should be acceptable (strong isolation and overhead / weak isolation
 > and efficiency). I don't know if people wanting strong isolation will not prefer
 > Xen (cleary with much more overhead than your patches ;) )
 
 We need a clean abstraction that optimizes well.
 
 However local communication between containers is not what we should
 benchmark.  That can always be improved later.  So long as the
 performance is reasonable.  What needs to be benchmarked is the
 overhead of namespaces when connected to physical networking devices
 and on their own local loopback, and comparing that to a kernel
 without namespace support.
 
 If we don't hurt that core case we have an implementation we can
 merge.  There are a lot of optimization opportunities for local
 communications and we can do that after we have a correct and accepted
 implementation.  Anything else is optimizing too soon, and will
 just be muddying the waters.
 
 Eric
 |  
	|  |  |  
	| 
		
			| Re: [patch 2/6] [Network namespace] Network device sharing by view [message #4041 is a reply to message #4039] | Tue, 27 June 2006 11:55   |  
			| 
				
				
					|  Andrey Savochkin Messages: 47
 Registered: December 2005
 | Member |  |  |  
	| Daniel, 
 On Tue, Jun 27, 2006 at 01:21:02PM +0200, Daniel Lezcano wrote:
 > >>>My point is that if you make namespace tagging at routing time, and
 > >>>your packets are being routed only once, you lose the ability
 > >>>to have separate routing tables in each namespace.
 > >>
 > >>Right. What is the advantage of having separate the routing tables ?
 > >
 > >
 > > Routing is everything.
 > > For example, I want namespaces to have their private tunnel devices.
 > > It means that namespaces should be allowed have private routes of local type,
 > > private default routes, and so on...
 > >
 >
 > Ok, we are talking about the same things. We do it only in a different way:
 
 We are not talking about the same things.
 
 It isn't a technical thing whether route lookup is performed before or after
 namespace change.
 It is a fundamental question determining functionality of network namespaces.
 We are talking about the capabilities namespaces provide.
 
 Your proposal essentially denies namespaces to have their own tunnel or other
 devices.  There is no point in having a device inside a namespace if the
 namespace owner can't route all or some specific outgoing packets through
 that device.  You don't allow system administrators to completely delegate
 management of network configuration to namespace owners.
 
 Andrey
 |  
	|  |  |  
	| 
		
			| Re: [patch 2/6] [Network namespace] Network device sharing by view [message #4047 is a reply to message #4040] | Tue, 27 June 2006 16:02   |  
			| 
				
				
					|  Herbert Poetzl Messages: 239
 Registered: February 2006
 | Senior Member |  |  |  
	| On Tue, Jun 27, 2006 at 05:52:52AM -0600, Eric W. Biederman wrote: > Daniel Lezcano <dlezcano@fr.ibm.com> writes:
 >
 > >>>>My point is that if you make namespace tagging at routing time,
 > >>>>and your packets are being routed only once, you lose the ability
 > >>>>to have separate routing tables in each namespace.
 > >>>
 > >>>Right. What is the advantage of having separate the routing tables ?
 > >> Routing is everything. For example, I want namespaces to have their
 > >> private tunnel devices. It means that namespaces should be allowed
 > >> have private routes of local type, private default routes, and so
 > >> on...
 > >>
 > >
 > > Ok, we are talking about the same things. We do it only in a different way:
 > >
 > > 	* separate routing table :
 > > 		 namespace
 > > 			|
 > > 			\--- route_tables
 > > 				|
 > > 				\---routes
 > >
 > > 	* tagged routing table :
 > > 		route_tables
 > > 			|
 > > 			\---routes
 > > 				|
 > > 				\---namespace
 >
 > There is a third possibility, that falls in between these two if local
 > communication is really the bottle neck.
 >
 > We have the dst cache for caching routes and cache multiple
 > transformations that happen on a packet.
 >
 > With a little extra knowledge it is possible to have the separate
 > routing tables but have special logic that recognizes the local
 > tunnel device that connects namespaces and have it look into the next
 > namespaces routes, and build up a complete stack of dst entries of
 > where the packet needs to go.
 >
 > I keep forgetting about that possibility. But as long as everything is
 > done at the routing layer that should work.
 >
 > > I use the second method, because I think it is more effecient and
 > > reduce the overhead. But the isolation is minimalist and only aims
 > > to avoid the application using ressources outside of the container
 > > (aka namespace) without taking care of the system. For example, I
 > > didn't take care of network devices, because as far as see I can't
 > > imagine an administrator wanting to change the network device name
 > > while there are hundred of containers running. Concerning tunnel
 > > devices for example, they should be created inside the container.
 >
 > Inside the containers I want all network devices named eth0!
 
 huh? even if there are two of them? also tun?
 
 I think you meant, you want to be able to have eth0 in
 _more_ than one guest where eth0 in a guest can also
 be/use/relate to eth1 on the host, right?
 
 > > I think, private network ressources method is more elegant
 > > and involves more network ressources, but there is probably a
 > > significant overhead and some difficulties to have __lightweight__
 > > container (aka application container), make nfs working well,
 > > etc... I did some tests with tbench and the loopback with the
 > > private namespace and there is roughly an overhead of 4 % without
 > > the isolation since with the tagging method there is 1 % with the
 > > isolation.
 >
 > The overhead went down?
 
 yes, this might actually happen, because the guest
 has only to look at a certain subset of entries
 but this needs a lot more testing, especially with
 a lot of guests
 
 > > The network namespace aims the isolation for now, but the container
 > > based on the namespaces will probably need checkpoint/restart and
 > > migration ability. The migration is needed not only for servers but
 > > for HPC jobs too.
 >
 > Yes.
 >
 > > So I don't know what level of isolation/virtualization is really
 > > needed by users, what should be acceptable (strong isolation and
 > > overhead / weak isolation and efficiency). I don't know if people
 > > wanting strong isolation will not prefer Xen (cleary with much more
 > > overhead than your patches ;) )
 
 well, Xen claims something below 2% IIRC, and would
 be clearly the better choice if you want strict
 separation with the complete functionality, especially
 with hardware support
 
 > We need a clean abstraction that optimizes well.
 >
 > However local communication between containers is not what we
 > should benchmark. That can always be improved later. So long as
 > the performance is reasonable. What needs to be benchmarked is the
 > overhead of namespaces when connected to physical networking devices
 > and on their own local loopback, and comparing that to a kernel
 > without namespace support.
 
 well, for me (obviously advocating the lightweight case)
 it seems improtant that the following conditions are met:
 
 - loopback traffic inside a guest is insignificantly
 slower than on a normal system
 
 - loopback traffic on the host is insignificantly
 slower than on a normal system
 
 - inter guest traffic is faster than on-wire traffic,
 and should be withing a small tolerance of the
 loopback case (as it really isn't different)
 
 - network (on-wire) traffic should be as fast as without
 the namespace (i.e. within 1% or so, better not really
 measurable)
 
 - all this should be true in a setup with a significant
 number of guests, when only one guest is active, but
 all other guests are ready/configured
 
 - all this should scale well with a few hundred guests
 
 > If we don't hurt that core case we have an implementation we can
 > merge.  There are a lot of optimization opportunities for local
 > communications and we can do that after we have a correct and accepted
 > implementation.  Anything else is optimizing too soon, and will
 > just be muddying the waters.
 
 what I fear is that once something is in, the kernel will
 just become slower (as it already did in some areas) and
 nobody will care/be-able to fix that later on ...
 
 best,
 Herbert
 
 > Eric
 |  
	|  |  |  
	| 
		
			| Re: [patch 2/6] [Network namespace] Network device sharing by view [message #4054 is a reply to message #4047] | Tue, 27 June 2006 16:47   |  
			| 
				
				
					|  ebiederm Messages: 1354
 Registered: February 2006
 | Senior Member |  |  |  
	| Herbert Poetzl <herbert@13thfloor.at> writes: 
 > On Tue, Jun 27, 2006 at 05:52:52AM -0600, Eric W. Biederman wrote:
 >>
 >> Inside the containers I want all network devices named eth0!
 >
 > huh? even if there are two of them? also tun?
 >
 > I think you meant, you want to be able to have eth0 in
 > _more_ than one guest where eth0 in a guest can also
 > be/use/relate to eth1 on the host, right?
 
 Right I want to have an eth0 in each guest where eth0 is
 it's own network device and need have no relationship to
 eth0 on the host.
 
 >> We need a clean abstraction that optimizes well.
 >>
 >> However local communication between containers is not what we
 >> should benchmark. That can always be improved later. So long as
 >> the performance is reasonable. What needs to be benchmarked is the
 >> overhead of namespaces when connected to physical networking devices
 >> and on their own local loopback, and comparing that to a kernel
 >> without namespace support.
 >
 > well, for me (obviously advocating the lightweight case)
 > it seems improtant that the following conditions are met:
 >
 >  - loopback traffic inside a guest is insignificantly
 >    slower than on a normal system
 >
 >  - loopback traffic on the host is insignificantly
 >    slower than on a normal system
 >
 >  - inter guest traffic is faster than on-wire traffic,
 >    and should be withing a small tolerance of the
 >    loopback case (as it really isn't different)
 >
 >  - network (on-wire) traffic should be as fast as without
 >    the namespace (i.e. within 1% or so, better not really
 >    measurable)
 >
 >  - all this should be true in a setup with a significant
 >    number of guests, when only one guest is active, but
 >    all other guests are ready/configured
 >
 >  - all this should scale well with a few hundred guests
 
 Ultimately I agree. However.  Only host performance should be
 a merge blocker.  Allowing us to go back and reclaim the few
 percentage points we lost later.
 
 >> If we don't hurt that core case we have an implementation we can
 >> merge.  There are a lot of optimization opportunities for local
 >> communications and we can do that after we have a correct and accepted
 >> implementation.  Anything else is optimizing too soon, and will
 >> just be muddying the waters.
 >
 > what I fear is that once something is in, the kernel will
 > just become slower (as it already did in some areas) and
 > nobody will care/be-able to fix that later on ...
 
 If nobody cares it doesn't matter.
 
 If no one can fix it that is a problem.  Which is why we need
 high standards and clean code, not early optimizations.
 
 But on that front each step of the way must be justified on
 it's own merits.  Not because it will give us some holy grail.
 
 The way to keep the inter guest performance from degrading is
 to measure it an complain.  But the linux network stack is too
 big to get in one pass.
 
 Eric
 |  
	|  |  |  
	| 
		
			| Re: [patch 2/6] [Network namespace] Network device sharing by view [message #4055 is a reply to message #4047] | Tue, 27 June 2006 16:49   |  
			| 
				
				
					|  Alexey Kuznetsov Messages: 18
 Registered: February 2006
 | Junior Member |  |  |  
	| On Tue, Jun 27, 2006 at 06:02:42PM +0200, Herbert Poetzl wrote: 
 >  - loopback traffic inside a guest is insignificantly
 >    slower than on a normal system
 >
 >  - loopback traffic on the host is insignificantly
 >    slower than on a normal system
 >
 >  - inter guest traffic is faster than on-wire traffic,
 >    and should be withing a small tolerance of the
 >    loopback case (as it really isn't different)
 
 I do not follow what are you people arguing about?
 
 Intra-guest, guest-guest and host-guest paths have _no_ differences
 from host-host loopback. Only the device is different:
 * virtual loopback for intra-guest
 * virtual interface for guest-guest and host-guest
 
 But the work is exactly the same, only the place where packets
 looped back is different. How could this be issue to break a lance over? :-)
 
 Alexey
 
 
 PS. The only thing, which I can imagine is "optimized" out ip_route_input()
 in the case of loopback. But this optimization was an obvious design mistake
 (mine, sorry) and apparently will die together with removal of current
 deficiences of routing cache. Actually, it is one of deficiences.
 |  
	|  |  |  
	|  |  
	| 
		
			| Re: [patch 2/6] [Network namespace] Network device sharing by view [message #4063 is a reply to message #4057] | Tue, 27 June 2006 22:52   |  
			| 
				
				
					|  Herbert Poetzl Messages: 239
 Registered: February 2006
 | Senior Member |  |  |  
	| On Tue, Jun 27, 2006 at 10:19:23AM -0700, Ben Greear wrote: > Eric W. Biederman wrote:
 > >Herbert Poetzl <herbert@13thfloor.at> writes:
 > >
 > >
 > >>On Tue, Jun 27, 2006 at 05:52:52AM -0600, Eric W. Biederman wrote:
 > >>
 > >>>Inside the containers I want all network devices named eth0!
 > >>
 > >>huh? even if there are two of them? also tun?
 > >>
 > >>I think you meant, you want to be able to have eth0 in
 > >>_more_ than one guest where eth0 in a guest can also
 > >>be/use/relate to eth1 on the host, right?
 > >
 > >
 > >Right I want to have an eth0 in each guest where eth0 is
 > >it's own network device and need have no relationship to
 > >eth0 on the host.
 >
 > How does that help anything?  Do you envision programs
 > that make special decisions on whether the interface is
 > called eth0 v/s eth151?
 
 well, those poor folks who do not have ethernet
 devices for networking :)
 
 seriously, what I think Eric meant was that it
 might be nice (especially for migration purposes)
 to keep the device namespace completely virtualized
 and not just isolated ...
 
 I'm fine with that, as long as it does not add
 overhead or complicate handling, and as far as I
 can tell, it should not do that ...
 
 best,
 Herbert
 
 > Ben
 >
 >
 > --
 > Ben Greear <greearb@candelatech.com>
 > Candela Technologies Inc  http://www.candelatech.com
 |  
	|  |  |  
	|  |  
	|  |  
	|  |  
	| 
		
			| Re: [patch 2/6] [Network namespace] Network device sharing by view [message #4088 is a reply to message #4067] | Wed, 28 June 2006 13:36   |  
			| 
				
				
					|  Herbert Poetzl Messages: 239
 Registered: February 2006
 | Senior Member |  |  |  
	| On Tue, Jun 27, 2006 at 09:38:14PM -0600, Eric W. Biederman wrote: > Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> writes:
 >
 > > Hello!
 > >
 > >> It may look weird, but do application really *need* to see eth0 rather
 > >> than eth858354?
 > >
 > > Applications do not care, humans do. :-)
 > >
 > > What's about applications they just need to see exactly the same
 > > device after migration. Not only name, but f.e. also its ifindex.
 > > If you do not create a separate namespace for netdevices, you will
 > > inevitably end up with some strange hack sort of VPIDs to translate
 > > (or to partition) ifindices or to tell that "ping -I eth858354 xxx"
 > > is too coimplicated application to survive migration.
 >
 >
 > Actually there are applications with peculiar licensing practices that
 > do look at devices like eth0 to verify you have the appropriate mac, and
 > do really weird things if you don't have an eth0.
 >
 > Plus there are other cases where it can be simpler to hard code things
 > if it is allowable. (The human factor)  Otherwise your configuration
 > must be done through hotplug scripts.
 >
 > But yes there are misguided applications that care.
 
 last time I pointed to such 'misguided' apps which
 made assumptions that are not necessarily true
 inside a virtual environment (e.g. pstree, initpid)
 the general? position was that those apps should
 be fixed instead adding a 'workaround'
 
 note: personally I'm absolutely not against virtualizing
 the device names so that each guest can have a separate
 name space for devices, but there should be a way to
 'see' _and_ 'identify' the interfaces from outside
 (i.e. host or spectator context)
 
 best,
 Herbert
 
 > Eric
 |  
	|  |  |  
	|  |  
	|  |  
	|  |  
	|  |  
	| 
		
			| strict isolation of net interfaces [message #4147 is a reply to message #4146] | Thu, 29 June 2006 22:14   |  
			| 
				
				
					|  Cedric Le Goater Messages: 443
 Registered: February 2006
 | Senior Member |  |  |  
	| Sam Vilain wrote: > jamal wrote:
 >>> note: personally I'm absolutely not against virtualizing
 >>> the device names so that each guest can have a separate
 >>> name space for devices, but there should be a way to
 >>> 'see' _and_ 'identify' the interfaces from outside
 >>> (i.e. host or spectator context)
 >>>
 >>>
 >> Makes sense for the host side to have naming convention tied
 >> to the guest. Example as a prefix: guest0-eth0. Would it not
 >> be interesting to have the host also manage these interfaces
 >> via standard tools like ip or ifconfig etc? i.e if i admin up
 >> guest0-eth0, then the user in guest0 will see its eth0 going
 >> up.
 >
 > That particular convention only works if you have network namespaces and
 > UTS namespaces tightly bound.  We plan to have them separate - so for
 > that to work, each network namespace could have an arbitrary "prefix"
 > that determines what the interface name will look like from the outside
 > when combined.  We'd have to be careful about length limits.
 >
 > And guest0-eth0 doesn't necessarily make sense; it's not really an
 > ethernet interface, more like a tun or something.
 >
 > So, an equally good convention might be to use sequential prefixes on
 > the host, like "tun", "dummy", or a new prefix - then a property of that
 > is what the name of the interface is perceived to be to those who are in
 > the corresponding network namespace.
 >
 > Then the pragmatic question becomes how to correlate what you see from
 > `ip addr list' to guests.
 
 
 we could work on virtualizing the net interfaces in the host, map them to
 eth0 or something in the guest and let the guest handle upper network layers ?
 
 lo0 would just be exposed relying on skbuff tagging to discriminate traffic
 between guests.
 
 
 
 host                  |  guest 0  |  guest 1  |  guest2
 ----------------------+-----------+-----------+------------- -
 |                   |           |           |
 |-> l0      <-------+-> lo0 ... | lo0       | lo0
 |                   |           |           |
 |-> bar0   <--------+-> eth0    |           |
 |                   |           |           |
 |-> foo0   <--------+-----------+-----------+-> eth0
 |                   |           |           |
 `-> foo0:1  <-------+-----------+-> eth0    |
 |           |           |
 
 
 is that clear ? stupid ? reinventing the wheel ?
 
 thanks,
 
 C.
 |  
	|  |  |  
	| 
		
			| Re: [patch 2/6] [Network namespace] Network device sharing by view [message #4148 is a reply to message #4146] | Fri, 30 June 2006 00:15   |  
			| 
				
				
					|  jamal Messages: 12
 Registered: June 2006
 | Junior Member |  |  |  
	| On Fri, 2006-30-06 at 09:07 +1200, Sam Vilain wrote: > jamal wrote:
 
 > > Makes sense for the host side to have naming convention tied
 > > to the guest. Example as a prefix: guest0-eth0. Would it not
 > > be interesting to have the host also manage these interfaces
 > > via standard tools like ip or ifconfig etc? i.e if i admin up
 > > guest0-eth0, then the user in guest0 will see its eth0 going
 > > up.
 >
 > That particular convention only works if you have network namespaces and
 > UTS namespaces tightly bound.
 
 that would be one approach. Another less sophisticated approach is to
 have no binding whatsoever, rather some translation table to map two
 unrelated devices.
 
 >  We plan to have them separate - so for
 > that to work, each network namespace could have an arbitrary "prefix"
 > that determines what the interface name will look like from the outside
 > when combined.  We'd have to be careful about length limits.
 >
 > And guest0-eth0 doesn't necessarily make sense; it's not really an
 > ethernet interface, more like a tun or something.
 >
 
 it wouldnt quiet fit as a tun device. More like a mirror side of the
 guest eth0 created on the host side
 i.e a sort of passthrough device with one side visible on the host (send
 from guest0-eth0 is received on eth0 in the guest and vice-versa).
 
 Note this is radically different from what i have heard Andrey and co
 talk about and i dont wanna disturb any shit because there seems to be
 some agreement. But if you address me i respond because it is very
 interesting a topic;->
 
 > So, an equally good convention might be to use sequential prefixes on
 > the host, like "tun", "dummy", or a new prefix - then a property of that
 > is what the name of the interface is perceived to be to those who are in
 > the corresponding network namespace.
 >
 > Then the pragmatic question becomes how to correlate what you see from
 > `ip addr list' to guests.
 
 on the host ip addr and the one seen on the guest side are the same.
 Except one is seen (on the host) on guest0-eth0 and another is seen
 on eth0 (on guest).
 Anyways, ignore what i am saying if it is disrupting the discussion.
 
 cheers,
 jamal
 |  
	|  |  |  
	|  |  
	| 
		
			| Re: strict isolation of net interfaces [message #4152 is a reply to message #4147] | Fri, 30 June 2006 02:39   |  
			| 
				
				
					|  serue Messages: 750
 Registered: February 2006
 | Senior Member |  |  |  
	| Quoting Cedric Le Goater (clg@fr.ibm.com): > Sam Vilain wrote:
 > > jamal wrote:
 > >>> note: personally I'm absolutely not against virtualizing
 > >>> the device names so that each guest can have a separate
 > >>> name space for devices, but there should be a way to
 > >>> 'see' _and_ 'identify' the interfaces from outside
 > >>> (i.e. host or spectator context)
 > >>>
 > >>>
 > >> Makes sense for the host side to have naming convention tied
 > >> to the guest. Example as a prefix: guest0-eth0. Would it not
 > >> be interesting to have the host also manage these interfaces
 > >> via standard tools like ip or ifconfig etc? i.e if i admin up
 > >> guest0-eth0, then the user in guest0 will see its eth0 going
 > >> up.
 > >
 > > That particular convention only works if you have network namespaces and
 > > UTS namespaces tightly bound.  We plan to have them separate - so for
 > > that to work, each network namespace could have an arbitrary "prefix"
 > > that determines what the interface name will look like from the outside
 > > when combined.  We'd have to be careful about length limits.
 > >
 > > And guest0-eth0 doesn't necessarily make sense; it's not really an
 > > ethernet interface, more like a tun or something.
 > >
 > > So, an equally good convention might be to use sequential prefixes on
 > > the host, like "tun", "dummy", or a new prefix - then a property of that
 > > is what the name of the interface is perceived to be to those who are in
 > > the corresponding network namespace.
 > >
 > > Then the pragmatic question becomes how to correlate what you see from
 > > `ip addr list' to guests.
 >
 >
 > we could work on virtualizing the net interfaces in the host, map them to
 > eth0 or something in the guest and let the guest handle upper network layers ?
 >
 > lo0 would just be exposed relying on skbuff tagging to discriminate traffic
 > between guests.
 
 This seems to me the preferable way.  We create a full virtual net
 device for each new container, and fully virtualize the device
 namespace.
 
 > host                  |  guest 0  |  guest 1  |  guest2
 >  ----------------------+-----------+-----------+------------- -
 >   |                   |           |           |
 >   |-> l0      <-------+-> lo0 ... | lo0       | lo0
 >   |                   |           |           |
 >   |-> bar0   <--------+-> eth0    |           |
 >   |                   |           |           |
 >   |-> foo0   <--------+-----------+-----------+-> eth0
 >   |                   |           |           |
 >   `-> foo0:1  <-------+-----------+-> eth0    |
 >                       |           |           |
 >
 >
 > is that clear ? stupid ? reinventing the wheel ?
 
 The last one in your diagram confuses me - why foo0:1?  I would
 have thought it'd be
 
 host                  |  guest 0  |  guest 1  |  guest2
 ----------------------+-----------+-----------+------------- -
 |                   |           |           |
 |-> l0      <-------+-> lo0 ... | lo0       | lo0
 |                   |           |           |
 |-> eth0            |           |           |
 |                   |           |           |
 |-> veth0  <--------+-> eth0    |           |
 |                   |           |           |
 |-> veth1  <--------+-----------+-----------+-> eth0
 |                   |           |           |
 |-> veth2   <-------+-----------+-> eth0    |
 
 I think we should avoid using device aliases, as trying to do
 something like giving eth0:1 to guest1 and eth0:2 to guest2
 while hiding eth0:1 from guest2 requires some uglier code (as
 I recall) than working with full devices.  In other words,
 if a namespace can see eth0, and eth0:2 exists, it should always
 see eth0:2.
 
 So conceptually using a full virtual net device per container
 certainly seems cleaner to me, and it seems like it should be
 simpler by way of statistics gathering etc, but are there actually
 any real gains?  Or is the support for multiple IPs per device
 actually enough?
 
 Herbert, is this basically how ngnet is supposed to work?
 
 -serge
 |  
	|  |  |  
	| 
		
			| Re: [patch 2/6] [Network namespace] Network device sharing by view [message #4153 is a reply to message #4148] | Fri, 30 June 2006 03:35   |  
			| 
				
				
					|  Herbert Poetzl Messages: 239
 Registered: February 2006
 | Senior Member |  |  |  
	| On Thu, Jun 29, 2006 at 08:15:52PM -0400, jamal wrote: > On Fri, 2006-30-06 at 09:07 +1200, Sam Vilain wrote:
 > > jamal wrote:
 >
 > > > Makes sense for the host side to have naming convention tied
 > > > to the guest. Example as a prefix: guest0-eth0. Would it not
 > > > be interesting to have the host also manage these interfaces
 > > > via standard tools like ip or ifconfig etc? i.e if i admin up
 > > > guest0-eth0, then the user in guest0 will see its eth0 going
 > > > up.
 > >
 > > That particular convention only works if you have network namespaces
 > > and UTS namespaces tightly bound.
 >
 > that would be one approach. Another less sophisticated approach is to
 > have no binding whatsoever, rather some translation table to map two
 > unrelated devices.
 >
 > >  We plan to have them separate - so for
 > > that to work, each network namespace could have an arbitrary
 > > "prefix" that determines what the interface name will look like from
 > > the outside when combined. We'd have to be careful about length
 > > limits.
 > >
 > > And guest0-eth0 doesn't necessarily make sense; it's not really an
 > > ethernet interface, more like a tun or something.
 >
 > it wouldnt quiet fit as a tun device. More like a mirror side of the
 > guest eth0 created on the host side
 > i.e a sort of passthrough device with one side visible on the host (send
 > from guest0-eth0 is received on eth0 in the guest and vice-versa).
 >
 > Note this is radically different from what i have heard Andrey and co
 > talk about and i dont wanna disturb any shit because there seems to be
 > some agreement. But if you address me i respond because it is very
 > interesting a topic;->
 
 thing is, we have several things we should care about
 and some of them 'look' or 'sound' similar, although
 they are not really ... I'll try to clarify
 
 first, we want to have 'per guest' interfaces, which
 do not clash with any interfaces on the host or in
 other guests
 
 then, we want to 'connect' them, implicitely or
 explicetly with 'other' interfaces or devices inside
 other guests or on the host, here we have the following
 cases (some are a little special):
 
 - lo interface, guest and host private (by default)
 - tap/tun interfaces, again host/guest private
 - tun like interfaces between host and guests
 - tun like interfaces between guests
 - 'normal' interfaces mapped into guests
 
 on the traffic side we have the following cases:
 
 - local traffic on the host
 - local traffic on the guest
 - local traffic between host and guest
 - local traffic between guests
 - routed traffic from guest via host
 - bridged traffic from guest via host
 
 special cases here would be tun/tap traffic inside
 a guest, but that can be considered local too
 
 > > So, an equally good convention might be to use sequential prefixes
 > > on the host, like "tun", "dummy", or a new prefix - then a property
 > > of that is what the name of the interface is perceived to be to
 > > those who are in the corresponding network namespace.
 > >
 > > Then the pragmatic question becomes how to correlate what you see
 > > from `ip addr list' to guests.
 >
 > on the host ip addr and the one seen on the guest side are the same.
 > Except one is seen (on the host) on guest0-eth0 and another is seen
 > on eth0 (on guest).
 
 this depends on the way the interfaces are handled
 and how they actually work, means:
 
 if the interfaces _solely_ work via routing or
 bridging, then the 'host' end has to exist and be
 visible similar to 'normal' interfaces
 
 if the traffic is (magically) mapped from guest
 interfaces to real (outside) host interfaces, we
 might want the same view as the guest has
 (i.e. basically a 'copy' which is not real)
 
 > Anyways, ignore what i am saying if it is disrupting the discussion.
 
 IMHO input is always welcome .. helps the folks to
 do better thinking :)
 
 > cheers,
 > jamal
 >
 >
 >
 |  
	|  |  |  
	| 
		
			| Re: [patch 2/6] [Network namespace] Network device sharing by view [message #4155 is a reply to message #4148] | Fri, 30 June 2006 07:45   |  
			| 
				
				
					|  Andrey Savochkin Messages: 47
 Registered: December 2005
 | Member |  |  |  
	| Hi Jamal, 
 On Thu, Jun 29, 2006 at 08:15:52PM -0400, jamal wrote:
 > On Fri, 2006-30-06 at 09:07 +1200, Sam Vilain wrote:
 [snip]
 > >  We plan to have them separate - so for
 > > that to work, each network namespace could have an arbitrary "prefix"
 > > that determines what the interface name will look like from the outside
 > > when combined.  We'd have to be careful about length limits.
 > >
 > > And guest0-eth0 doesn't necessarily make sense; it's not really an
 > > ethernet interface, more like a tun or something.
 > >
 >
 > it wouldnt quiet fit as a tun device. More like a mirror side of the
 > guest eth0 created on the host side
 > i.e a sort of passthrough device with one side visible on the host (send
 > from guest0-eth0 is received on eth0 in the guest and vice-versa).
 >
 > Note this is radically different from what i have heard Andrey and co
 > talk about and i dont wanna disturb any shit because there seems to be
 > some agreement. But if you address me i respond because it is very
 > interesting a topic;->
 
 I do not have anything against guest-eth0 - eth0 pairs _if_ they are set up
 by the host administrators explicitly for some purpose.
 For example, if these guest-eth0 and eth0 devices stay as pure virtual ones,
 i.e. they don't have any physical NIC, host administrator may route traffic
 to guestXX-eth0 interfaces to pass it to the guests.
 
 However, I oppose the idea of automatic mirroring of _all_ devices appearing
 inside some namespaces ("guests") to another namespace (the "host").
 This clearly goes against the concept of namespaces as independent realms,
 and creates a lot of problems with applications running in the host, hotplug
 scripts and so on.
 
 >
 > > So, an equally good convention might be to use sequential prefixes on
 > > the host, like "tun", "dummy", or a new prefix - then a property of that
 > > is what the name of the interface is perceived to be to those who are in
 > > the corresponding network namespace.
 > >
 > > Then the pragmatic question becomes how to correlate what you see from
 > > `ip addr list' to guests.
 >
 > on the host ip addr and the one seen on the guest side are the same.
 > Except one is seen (on the host) on guest0-eth0 and another is seen
 > on eth0 (on guest).
 
 Then what to do if the host system has 10.0.0.1 as a private address on eth3,
 and then interfaces guest1-tun0 and guest2-tun0 both get address 10.0.0.1
 when each guest has added 10.0.0.1 to their tun0 device?
 
 Regards,
 
 Andrey
 |  
	|  |  |  
	|  |  
	|  |  
	| 
		
			| Re: [patch 2/6] [Network namespace] Network device sharing by view [message #4169 is a reply to message #4155] | Fri, 30 June 2006 13:50   |  
			| 
				
				
					|  jamal Messages: 12
 Registered: June 2006
 | Junior Member |  |  |  
	| Hi Andrey, 
 BTW - I was just looking at openvz, very impressive. To the other folks,
 I am not putting down any of your approaches - just havent
 had time to study them. Andrey, this is the same thing you guys have
 been working on for a few years now, you just changed the name, correct?
 
 Ok, since you guys are encouraging me to speak, here goes ;->
 Hopefully this addresses the other email from Herbert et al.
 
 On Fri, 2006-30-06 at 11:45 +0400, Andrey Savochkin wrote:
 > Hi Jamal,
 >
 > On Thu, Jun 29, 2006 at 08:15:52PM -0400, jamal wrote:
 > > On Fri, 2006-30-06 at 09:07 +1200, Sam Vilain wrote:
 > [snip]
 
 >
 > I do not have anything against guest-eth0 - eth0 pairs _if_ they are set up
 > by the host administrators explicitly for some purpose.
 > For example, if these guest-eth0 and eth0 devices stay as pure virtual ones,
 > i.e. they don't have any physical NIC, host administrator may route traffic
 > to guestXX-eth0 interfaces to pass it to the guests.
 >
 
 Well there will be purely virtual of course.  Something along the lines
 for openvz:
 
 // create the guest
 [host-node]# vzctl create 101 --ostemplate fedora-core-5-minimal
 // create guest101::eth0, seems to only create config to boot up with
 [host-node]# vzctl create 101 --netdev eth0
 // bootup guest101
 [host-node]# vzctl start 101
 
 As soon as bootup of guest101 happens, creating guest101::eth0 should activate
 creation of the host side netdevice. This could be triggered for example by
 the netlink event message seen on host whic- which is a result of creating guest101::eth0
 Which means control sits purely in user space.
 
 at that point if i do ifconfig on host i see g101-eth0
 on guest101 i see just name eth0.
 
 My earlier suggestion was that instead of:
 host-node]# vzctl set 101 --ipadd 10.1.2.3
 
 you do:
 host-node]# ip addr add g101-eth0 10.1.2.3/32
 you should still use vzctl to save config for next bootup
 
 > However, I oppose the idea of automatic mirroring of _all_ devices appearing
 > inside some namespaces ("guests") to another namespace (the "host").
 > This clearly goes against the concept of namespaces as independent realms,
 > and creates a lot of problems with applications running in the host, hotplug
 > scripts and so on.
 >
 
 I was thinking that the host side is the master i.e you can peek at
 namespaces in the guest from the host.
 Also note that having the pass through device allows for guests to be
 connected via standard linux schemes in the host side (bridge, point
 routes, tc redirect etc); so you dont need a speacial device to hook
 them together.
 
 > > > Then the pragmatic question becomes how to correlate what you see from
 > > > `ip addr list' to guests.
 > >
 > > on the host ip addr and the one seen on the guest side are the same.
 > > Except one is seen (on the host) on guest0-eth0 and another is seen
 > > on eth0 (on guest).
 >
 > Then what to do if the host system has 10.0.0.1 as a private address on eth3,
 > and then interfaces guest1-tun0 and guest2-tun0 both get address 10.0.0.1
 > when each guest has added 10.0.0.1 to their tun0 device?
 >
 
 Yes, that would be a conflict that needs to be resolved. If you look at
 ip addresses as also belonging to namespaces, then it should work, no?
 i am assuming a tag at the ifa table level.
 
 cheers,
 jamal
 |  
	|  |  |  
	|  |  
	| 
		
			| Re: [patch 2/6] [Network namespace] Network device sharing by view [message #4172 is a reply to message #4169] | Fri, 30 June 2006 15:01   |  
			| 
				
				
					|  Andrey Savochkin Messages: 47
 Registered: December 2005
 | Member |  |  |  
	| Jamal, 
 On Fri, Jun 30, 2006 at 09:50:52AM -0400, jamal wrote:
 >
 > BTW - I was just looking at openvz, very impressive. To the other folks,
 
 Thanks!
 
 > I am not putting down any of your approaches - just havent
 > had time to study them. Andrey, this is the same thing you guys have
 > been working on for a few years now, you just changed the name, correct?
 
 The relations are more complicated than just the change of name,
 but yes, OpenVZ represents the result of our work for a few years.
 
 >
 > Ok, since you guys are encouraging me to speak, here goes ;->
 > Hopefully this addresses the other email from Herbert et al.
 >
 [snip]
 > // create the guest
 > [host-node]# vzctl create 101 --ostemplate fedora-core-5-minimal
 > // create guest101::eth0, seems to only create config to boot up with
 > [host-node]# vzctl create 101 --netdev eth0
 > // bootup guest101
 > [host-node]# vzctl start 101
 >
 > As soon as bootup of guest101 happens, creating guest101::eth0 should activate
 > creation of the host side netdevice. This could be triggered for example by
 > the netlink event message seen on host whic- which is a result of creating guest101::eth0
 > Which means control sits purely in user space.
 
 I'd like to clarify you idea: whether this host-side device is a real
 device capable of receiving and transmitting packets (by moving them between
 namespaces), or it's a fake device creating only a view of other namespace's
 devices?
 
 [snip]
 > > However, I oppose the idea of automatic mirroring of _all_ devices appearing
 > > inside some namespaces ("guests") to another namespace (the "host").
 > > This clearly goes against the concept of namespaces as independent realms,
 > > and creates a lot of problems with applications running in the host, hotplug
 > > scripts and so on.
 > >
 >
 > I was thinking that the host side is the master i.e you can peek at
 > namespaces in the guest from the host.
 
 "Host(master)-guest" relations is a valid and useful scheme.
 However, I'm thinking about broader application of network namespaces,
 when they can form an arbitrary tree and may not be in "host-guest" relations.
 
 > Also note that having the pass through device allows for guests to be
 > connected via standard linux schemes in the host side (bridge, point
 > routes, tc redirect etc); so you dont need a speacial device to hook
 > them together.
 
 What do you mean under pass through device?
 Do you mean using guest1-tun0 as a backdoor to talk to the guest?
 
 >
 > > > > Then the pragmatic question becomes how to correlate what you see from
 > > > > `ip addr list' to guests.
 > > >
 > > > on the host ip addr and the one seen on the guest side are the same.
 > > > Except one is seen (on the host) on guest0-eth0 and another is seen
 > > > on eth0 (on guest).
 > >
 > > Then what to do if the host system has 10.0.0.1 as a private address on eth3,
 > > and then interfaces guest1-tun0 and guest2-tun0 both get address 10.0.0.1
 > > when each guest has added 10.0.0.1 to their tun0 device?
 > >
 >
 > Yes, that would be a conflict that needs to be resolved. If you look at
 > ip addresses as also belonging to namespaces, then it should work, no?
 > i am assuming a tag at the ifa table level.
 
 I'm not sure, it's complicated.
 You wouldn't want automatic local routes to be added for IP addresses on
 the host-side interfaces, right?
 Do you expect these IP addresses to act as local addresses in other places,
 like answering to arp requests about these IP on all physical devices?
 
 But anyway, you'll have conflicts on the application level.
 Many programs like ntpd, bind, and others fetch the device list using the
 same ioctls as ifconfig, and make (un)intelligent decisions basing on what
 they see.
 Mirroring may have some advantages if I am both host and guest administrator.
 But if I create a namespace for my friend Joe to play with IPv6 and sit
 tunnels, why should I face inconveniences because of what he does there?
 
 Best regards
 
 Andrey
 |  
	|  |  |  
	| 
		
			| Re: strict isolation of net interfaces [message #4173 is a reply to message #4171] | Fri, 30 June 2006 15:22   |  
			| 
				
				
					|  Daniel Lezcano Messages: 417
 Registered: June 2006
 | Senior Member |  |  |  
	| Eric W. Biederman wrote: > Daniel Lezcano <dlezcano@fr.ibm.com> writes:
 >
 >
 >>Serge E. Hallyn wrote:
 >>
 >>>Quoting Cedric Le Goater (clg@fr.ibm.com):
 >>>
 >>>
 >>>>we could work on virtualizing the net interfaces in the host, map them to
 >>>>eth0 or something in the guest and let the guest handle upper network layers ?
 >>>>
 >>>>lo0 would just be exposed relying on skbuff tagging to discriminate traffic
 >>>>between guests.
 >>>
 >>>This seems to me the preferable way.  We create a full virtual net
 >>>device for each new container, and fully virtualize the device
 >>>namespace.
 >>
 >>I have a few questions about all the network isolation stuff:
 >
 
 It seems these questions are not important.
 
 >
 > So far I have seen two viable possibilities on the table,
 > neither of them involve multiple names for a network device.
 >
 > layer 3 (filtering the allowed ip addresses at bind time roughly the current vserver).
 >   - implementable as a security hook.
 >   - Benefit no measurable performance impact.
 >   - Downside not many things we can do.
 
 What things ? Can you develop please ? Can you give some examples ?
 
 >
 > layer 2 (What appears to applications a separate instance of the network stack).
 >   - Implementable as a namespace.
 
 what about accessing a NFS mounted outside the container ?
 
 >   - Each network namespace would have dedicated network devices.
 >   - Benefit extremely flexible.
 
 For what ? For who ? Do you have examples ?
 
 >   - Downside since at least the slow path must examine the packet
 >     it has the possibility of slowing down the networking stack.
 
 What is/are the slow path(s) you identified ?
 
 > For me the important characteristics.
 > - Allows for application migration, when we take our ip address with us.
 >   In particular it allows for importation of addresses assignments
 >   mad on other machines.
 
 Ok for the two methods no ?
 
 > - No measurable impact on the existing networking when the code
 >   is compiled in.
 
 You contradict ...
 
 > - Clean predictable semantics.
 
 What that means ? Can you explain, please ?
 
 > This whole debate on network devices show up in multiple network namespaces
 > is just silly.
 
 The debate is not on the network device show up. The debate is can we
 have a network isolation ___usable for everybody___ not only for the
 beauty of having namespaces and for a system container like.
 
 I am not against the network device virtualization or against the
 namespaces. I am just asking if the namespace is the solution for all
 the network isolation. Should we nest layer 2 and layer 3 vitualization
 into namespaces or separate them in order to have the flexibility to
 choose isolation/performance.
 
 > The only reason for wanting that appears to be better management.
 > We have deeper issues like can we do a reasonable implementation without a
 > network device showing up in multiple namespaces.
 
 Again, I am not against having the network device virtualization. It is
 a good idea.
 
 > I think the reason the debate exists at all is that it is a very approachable
 > topic, as opposed to the fundamentals here.
 >
 > If we can get layer 2 level isolation working without measurable overhead
 > with one namespace per device it may be worth revisiting things.  Until
 > then it is a side issue at best.
 
 I agree, so where are the answers of the questions I asked in my
 previous email ? You said you did some implementation of network
 isolation with and without namespaces, so you should be able to answer...
 
 
 -- Daniel
 |  
	|  |  |  
	|  |  
	|  |  
	| 
		
			| Re: strict isolation of net interfaces [message #4179 is a reply to message #4173] | Fri, 30 June 2006 17:58   |  
			| 
				
				
					|  ebiederm Messages: 1354
 Registered: February 2006
 | Senior Member |  |  |  
	| Daniel Lezcano <dlezcano@fr.ibm.com> writes: 
 > Eric W. Biederman wrote:
 >> Daniel Lezcano <dlezcano@fr.ibm.com> writes:
 >>
 >>>Serge E. Hallyn wrote:
 >>>
 >>>>Quoting Cedric Le Goater (clg@fr.ibm.com):
 >>>>
 >>>>
 >>>>>we could work on virtualizing the net interfaces in the host, map them to
 >>>>>eth0 or something in the guest and let the guest handle upper network layers
 > ?
 >>>>>
 >>>>>lo0 would just be exposed relying on skbuff tagging to discriminate traffic
 >>>>>between guests.
 >>>>
 >>>>This seems to me the preferable way.  We create a full virtual net
 >>>>device for each new container, and fully virtualize the device
 >>>>namespace.
 >>>
 >>>I have a few questions about all the network isolation stuff:
 >>
 >
 > It seems these questions are not important.
 
 I'm just trying to get us back to a productive topic.
 
 >> So far I have seen two viable possibilities on the table,
 >> neither of them involve multiple names for a network device.
 >> layer 3 (filtering the allowed ip addresses at bind time roughly the current
 >> vserver).
 >>   - implementable as a security hook.
 >>   - Benefit no measurable performance impact.
 >>   - Downside not many things we can do.
 >
 > What things ? Can you develop please ? Can you give some examples ?
 
 DHCP, tcpdump,..  Probably a bad way of phrasing it.  But there
 is a lot more that we can do using a pure layer 2 approach.
 
 >> layer 2 (What appears to applications a separate instance of the network
 >> stack).
 >>   - Implementable as a namespace.
 >
 > what about accessing a NFS mounted outside the container ?
 
 As I replied earlier it isn't a problem.  If you get to it through the
 filesystem namespace it uses the network namespace it was mounted with
 for it's connection.
 
 >>   - Each network namespace would have dedicated network devices.
 >>   - Benefit extremely flexible.
 >
 > For what ? For who ? Do you have examples ?
 
 See above.
 
 >>   - Downside since at least the slow path must examine the packet
 >>     it has the possibility of slowing down the networking stack.
 >
 > What is/are the slow path(s) you identified ?
 
 Grr.  I put that badly.  Basically at least on the slow path you need to
 look at a per network namespace data structure.  The extra pointer
 indirection could slow things down.  The point is that we may be
 able to have a fast path that is exactly the same as the rest
 of the network stack.
 
 If the obvious approach does not work my gut the feeling the
 network stack fast path will give us an implementation without overhead.
 
 >> For me the important characteristics.
 >> - Allows for application migration, when we take our ip address with us.
 >>   In particular it allows for importation of addresses assignments
 >>   mad on other machines.
 >
 > Ok for the two methods no ?
 
 So far.
 
 >> - No measurable impact on the existing networking when the code
 >>   is compiled in.
 >
 > You contradict ...
 
 How so?  As far as I can tell this is a basic requirement to get
 merged.
 
 >> - Clean predictable semantics.
 >
 > What that means ? Can you explain, please ?
 
 >> This whole debate on network devices show up in multiple network namespaces
 >> is just silly.
 >
 > The debate is not on the network device show up. The debate is can we have a
 > network isolation ___usable for everybody___ not only for the beauty of having
 > namespaces and for a system container like.
 
 This subthread talking about devices showing up in multiple namespaces seemed
 very much exactly on how network devices show up.
 
 > I am not against the network device virtualization or against the namespaces. I
 > am just asking if the namespace is the solution for all the network
 > isolation. Should we nest layer 2 and layer 3 vitualization into namespaces or
 > separate them in order to have the flexibility to choose isolation/performance.
 
 I believe I addressed Herbert Poetzl's concerns earlier.  To me the question
 is can we implement an acceptable layer 2 solution, that distrubutions and
 other people who do not need isolation would have no problem compiling in
 by default.
 
 The joy of namespaces is that if you don't want it you don't have to use it.
 Layer 2 can do everything and is likely usable by everyone iff the performance
 is acceptable.
 
 >> The only reason for wanting that appears to be better management.
 >> We have deeper issues like can we do a reasonable implementation without a
 >> network device showing up in multiple namespaces.
 >
 > Again, I am not against having the network device virtualization. It is a good
 > idea.
 >
 >> I think the reason the debate exists at all is that it is a very approachable
 >> topic, as opposed to the fundamentals here.
 >> If we can get layer 2 level isolation working without measurable overhead
 >> with one namespace per device it may be worth revisiting things.  Until
 >> then it is a side issue at best.
 >
 > I agree, so where are the answers of the questions I asked in my previous email
 > ? You said you did some implementation of network isolation with and without
 > namespaces, so you should be able to answer...
 
 Sorry.  More than anything those questions looked retorical and aimed
 at disarming some of the silliness.  I will go back and try and
 answer those.  Fundamentally when we have one namespace that includes
 network devices, network sockets, and all of the data structures necessary
 to use them (routing tables and the like) and we have a tunnel device
 that can connect namespaces the answers are trivial and I though obvious.
 
 Eric
 |  
	|  |  |  
	|  |  
	|  |  
	|  |  
	|  |  
	| 
		
			| Re: strict isolation of net interfaces [message #4234 is a reply to message #4157] | Mon, 03 July 2006 13:36   |  
			| 
				
				
					|  Herbert Poetzl Messages: 239
 Registered: February 2006
 | Senior Member |  |  |  
	| On Fri, Jun 30, 2006 at 10:56:13AM +0200, Cedric Le Goater wrote: > Serge E. Hallyn wrote:
 > >
 > > The last one in your diagram confuses me - why foo0:1?  I would
 > > have thought it'd be
 >
 > just thinking aloud. I thought that any kind/type of interface could be
 > mapped from host to guest.
 >
 > > host                  |  guest 0  |  guest 1  |  guest2
 > >  ----------------------+-----------+-----------+------------- -
 > >   |                   |           |           |
 > >   |-> l0      <-------+-> lo0 ... | lo0       | lo0
 > >   |                   |           |           |
 > >   |-> eth0            |           |           |
 > >   |                   |           |           |
 > >   |-> veth0  <--------+-> eth0    |           |
 > >   |                   |           |           |
 > >   |-> veth1  <--------+-----------+-----------+-> eth0
 > >   |                   |           |           |
 > >   |-> veth2   <-------+-----------+-> eth0    |
 > >
 > > I think we should avoid using device aliases, as trying to do
 > > something like giving eth0:1 to guest1 and eth0:2 to guest2
 > > while hiding eth0:1 from guest2 requires some uglier code (as
 > > I recall) than working with full devices.  In other words,
 > > if a namespace can see eth0, and eth0:2 exists, it should always
 > > see eth0:2.
 > >
 > > So conceptually using a full virtual net device per container
 > > certainly seems cleaner to me, and it seems like it should be
 > > simpler by way of statistics gathering etc, but are there actually
 > > any real gains?  Or is the support for multiple IPs per device
 > > actually enough?
 > >
 > > Herbert, is this basically how ngnet is supposed to work?
 
 hard to tell, we have at least three ngnet prototypes
 and basically all variants are covered there, from
 separate interfaces which map to real ones to perfect
 isolation of addresses assigned to global interfaces
 
 IMHO the 'virtual' interface per guest is fine, as
 the overhead and consumed resources are non critical
 and it will definitely simplify handling for the
 guest side
 
 I'd really appreciate if we could find a solution which
 allows both, isolation and virtualization, and if the
 bridge scenario is as fast as a direct mapping, I'm
 perfectly fine with a big bridge + ebtables to handle
 security issues
 
 best,
 Herbert
 |  
	|  |  |  
	| 
		
			| Re: strict isolation of net interfaces [message #4235 is a reply to message #4151] | Mon, 03 July 2006 14:53   |  
			| 
				
				
					|  Andrey Savochkin Messages: 47
 Registered: December 2005
 | Member |  |  |  
	| Sam, Serge, Cedric, 
 On Fri, Jun 30, 2006 at 02:49:05PM +1200, Sam Vilain wrote:
 > Serge E. Hallyn wrote:
 > > The last one in your diagram confuses me - why foo0:1?  I would
 > > have thought it'd be
 > >
 > > host                  |  guest 0  |  guest 1  |  guest2
 > >  ----------------------+-----------+-----------+------------- -
 > >   |                   |           |           |
 > >   |-> l0      <-------+-> lo0 ... | lo0       | lo0
 > >   |                   |           |           |
 > >   |-> eth0            |           |           |
 > >   |                   |           |           |
 > >   |-> veth0  <--------+-> eth0    |           |
 > >   |                   |           |           |
 > >   |-> veth1  <--------+-----------+-----------+-> eth0
 > >   |                   |           |           |
 > >   |-> veth2   <-------+-----------+-> eth0    |
 > >
 > > [...]
 > >
 > > So conceptually using a full virtual net device per container
 > > certainly seems cleaner to me, and it seems like it should be
 > > simpler by way of statistics gathering etc, but are there actually
 > > any real gains?  Or is the support for multiple IPs per device
 > > actually enough?
 > >
 >
 > Why special case loopback?
 >
 > Why not:
 >
 > host                  |  guest 0  |  guest 1  |  guest2
 >  ----------------------+-----------+-----------+------------- -
 >   |                   |           |           |
 >   |-> lo              |           |           |
 >   |                   |           |           |
 >   |-> vlo0  <---------+-> lo      |           |
 >   |                   |           |           |
 >   |-> vlo1  <---------+-----------+-----------+-> lo
 >   |                   |           |           |
 >   |-> vlo2   <--------+-----------+-> lo      |
 >   |                   |           |           |
 >   |-> eth0            |           |           |
 >   |                   |           |           |
 >   |-> veth0  <--------+-> eth0    |           |
 >   |                   |           |           |
 >   |-> veth1  <--------+-----------+-----------+-> eth0
 >   |                   |           |           |
 >   |-> veth2   <-------+-----------+-> eth0    |
 
 I still can't completely understand your direction of thoughts.
 Could you elaborate on IP address assignment in your diagram, please?  For
 example, guest0 wants 127.0.0.1 and 192.168.0.1 addresses on its lo
 interface, and 10.1.1.1 on its eth0 interface.
 Does this diagram assume any local IP addresses on v* interfaces in the
 "host"?
 
 And the second question.
 Are vlo0, veth0, etc. devices supposed to have hard_xmit routines?
 
 Best regards
 
 Andrey
 |  
	|  |  |  
	|  |  
	|  | 
 
 
 Current Time: Fri Oct 24 21:58:19 GMT 2025 
 Total time taken to generate the page: 0.09925 seconds |