OpenVZ Forum


Home » Mailing lists » Devel » [RFC] network namespaces
Re: Re: [RFC] network namespaces [message #6142 is a reply to message #6130] Sat, 09 September 2006 07:57 Go to previous messageGo to next message
Mishin Dmitry is currently offline  Mishin Dmitry
Messages: 112
Registered: February 2006
Senior Member
On Friday 08 September 2006 22:11, Herbert Poetzl wrote:
> actually the light-weight ip isolation runs perfectly
> fine _without_ CAP_NET_ADMIN, as you do not want the
> guest to be able to mess with the 'configured' ips at
> all (not to speak of interfaces here)
It was only an example. I'm thinking about how to implement flexible solution,
which permits light-weight ip isolation as well as full-fledged netwrok
virtualization. Another solution is to split CONFIG_NET_NAMESPACE. Is it good
for you?

--
Thanks,
Dmitry.
Re: Re: [RFC] network namespaces [message #6147 is a reply to message #6142] Sun, 10 September 2006 02:47 Go to previous messageGo to next message
Herbert Poetzl is currently offline  Herbert Poetzl
Messages: 239
Registered: February 2006
Senior Member
On Sat, Sep 09, 2006 at 11:57:24AM +0400, Dmitry Mishin wrote:
> On Friday 08 September 2006 22:11, Herbert Poetzl wrote:
> > actually the light-weight ip isolation runs perfectly
> > fine _without_ CAP_NET_ADMIN, as you do not want the
> > guest to be able to mess with the 'configured' ips at
> > all (not to speak of interfaces here)

> It was only an example. I'm thinking about how to implement flexible
> solution, which permits light-weight ip isolation as well as
> full-fledged netwrok virtualization. Another solution is to split
> CONFIG_NET_NAMESPACE. Is it good for you?

well, I think it would be best to have both, as
they are complementary to some degree, and IMHO
both, the full virtualization _and_ the isolation
will require a separate namespace to work, I also
think that limiting the isolation to something
very simple (like one IP + network or so) would
be acceptable for a start, because especially
multi IP or network range checks require a little
more efford to get them right ...

I do not think that folks would want to recompile
their kernel just to get a light-weight guest or
a fully virtualized one

best,
Herbert

> --
> Thanks,
> Dmitry.
Re: Re: [RFC] network namespaces [message #6148 is a reply to message #6147] Sun, 10 September 2006 07:45 Go to previous messageGo to next message
Mishin Dmitry is currently offline  Mishin Dmitry
Messages: 112
Registered: February 2006
Senior Member
On Sunday 10 September 2006 06:47, Herbert Poetzl wrote:
> well, I think it would be best to have both, as
> they are complementary to some degree, and IMHO
> both, the full virtualization _and_ the isolation
> will require a separate namespace to work,
[snip]
> I do not think that folks would want to recompile
> their kernel just to get a light-weight guest or
> a fully virtualized one
In this case light-weight guest will have unnecessary overhead.
For example, instead of using static pointer, we have to find the required
common namespace before. And there will be no advantages for such guest over
full-featured.

>
> best,
> Herbert
>
> > --
> > Thanks,
> > Dmitry.

--
Thanks,
Dmitry.
Re: Re: [RFC] network namespaces [message #6156 is a reply to message #6148] Sun, 10 September 2006 19:22 Go to previous messageGo to next message
Herbert Poetzl is currently offline  Herbert Poetzl
Messages: 239
Registered: February 2006
Senior Member
On Sun, Sep 10, 2006 at 11:45:35AM +0400, Dmitry Mishin wrote:
> On Sunday 10 September 2006 06:47, Herbert Poetzl wrote:
> > well, I think it would be best to have both, as
> > they are complementary to some degree, and IMHO
> > both, the full virtualization _and_ the isolation
> > will require a separate namespace to work,
> [snip]
> > I do not think that folks would want to recompile
> > their kernel just to get a light-weight guest or
> > a fully virtualized one

> In this case light-weight guest will have unnecessary overhead. For
> example, instead of using static pointer, we have to find the required
> common namespace before.

this is only required at 'bind' time, which makes
a non measurable fraction of the actual connection
usage (unless you keep binding ports over and over
without ever using them)

> And there will be no advantages for such guest over full-featured.

the advantage is in the flexibility, simplicity of
setup and the basically non-existant overhead on
the hot (conenction/transfer) part ...

> > best,
> > Herbert
> >
> > > --
> > > Thanks,
> > > Dmitry.
>
> --
> Thanks,
> Dmitry.
Re: [PATCH 4/9] network namespaces: socket hashes [message #6476 is a reply to message #5172] Mon, 18 September 2006 15:12 Go to previous messageGo to next message
Daniel Lezcano is currently offline  Daniel Lezcano
Messages: 417
Registered: June 2006
Senior Member
Andrey Savochkin wrote:
> Socket hash lookups are made within namespace.
> Hash tables are common for all namespaces, with
> additional permutation of indexes.

Hi Andrey,

why is the hash table common and not instanciated multiple times for
each namespace like the routes ?
Re: [PATCH 4/9] network namespaces: socket hashes [message #6577 is a reply to message #6476] Wed, 20 September 2006 16:32 Go to previous messageGo to next message
Andrey Savochkin is currently offline  Andrey Savochkin
Messages: 47
Registered: December 2005
Member
Hi,

On Mon, Sep 18, 2006 at 05:12:49PM +0200, Daniel Lezcano wrote:
> Andrey Savochkin wrote:
> > Socket hash lookups are made within namespace.
> > Hash tables are common for all namespaces, with
> > additional permutation of indexes.
>
> Hi Andrey,
>
> why is the hash table common and not instanciated multiple times for
> each namespace like the routes ?

The main reason is that socket hash tables should be large enough to work
efficiently, but it isn't good to waste a lot of memory for each namespace.
Namespaces should be cheap enough, to allow to have hundreds of them.
This reason of memory efficiency, of course, has a priority unless/until
socket hash tables start to resize automatically.

Another point is that routing lookup is much more complicated than the
socket's one to add another search key.
Routing also have additional routines for deleting entries matching some
patterns, and so on.
In short, routing is much more complicated, and it already quite efficient
for various sizes of routing tables.

Andrey
Re: [PATCH 4/9] network namespaces: socket hashes [message #6696 is a reply to message #6577] Thu, 21 September 2006 12:34 Go to previous messageGo to next message
Daniel Lezcano is currently offline  Daniel Lezcano
Messages: 417
Registered: June 2006
Senior Member
Andrey Savochkin wrote:

> The main reason is that socket hash tables should be large enough to work
> efficiently, but it isn't good to waste a lot of memory for each namespace.
> Namespaces should be cheap enough, to allow to have hundreds of them.
> This reason of memory efficiency, of course, has a priority unless/until
> socket hash tables start to resize automatically.
>
> Another point is that routing lookup is much more complicated than the
> socket's one to add another search key.
> Routing also have additional routines for deleting entries matching some
> patterns, and so on.
> In short, routing is much more complicated, and it already quite efficient
> for various sizes of routing tables.

That makes sense, thx for the explanation.

Cheers.
-- Daniel.
Re: [PATCH 5/9] network namespaces: async socket operations [message #6801 is a reply to message #5169] Fri, 22 September 2006 15:33 Go to previous messageGo to next message
Daniel Lezcano is currently offline  Daniel Lezcano
Messages: 417
Registered: June 2006
Senior Member
Andrey Savochkin wrote:
> Non-trivial part of socket namespaces: asynchronous events
> should be run in proper context.
>
> Signed-off-by: Andrey Savochkin <saw@swsoft.com>
> ---
> af_inet.c | 10 ++++++++++
> inet_timewait_sock.c | 8 ++++++++
> tcp_timer.c | 9 +++++++++
> 3 files changed, 27 insertions(+)
>
> --- ./net/ipv4/af_inet.c.venssock-asyn Mon Aug 14 17:04:07 2006
> +++ ./net/ipv4/af_inet.c Tue Aug 15 13:45:44 2006
> @@ -366,10 +366,17 @@ out_rcu_unlock:
> int inet_release(struct socket *sock)
> {
> struct sock *sk = sock->sk;
> + struct net_namespace *ns, *orig_net_ns;
>
> if (sk) {
> long timeout;
>
> + /* Need to change context here since protocol ->close
> + * operation may send packets.
> + */
> + ns = get_net_ns(sk->sk_net_ns);
> + push_net_ns(ns, orig_net_ns);
> +

Is it not a race condition here ? What happens if you have a packet
incoming during the namespace context switching ?

IHMO doing namespace switching is something dangerous, you can probably
handle that with locks but it will be difficult and will decrease all
network performance.

In an other hand, I don't see how you can handle the
"sk->sk_prot->close" after ...

-- Cheers
Re: [PATCH 5/9] network namespaces: async socket operations [message #6808 is a reply to message #6801] Sat, 23 September 2006 13:16 Go to previous messageGo to next message
Andrey Savochkin is currently offline  Andrey Savochkin
Messages: 47
Registered: December 2005
Member
On Fri, Sep 22, 2006 at 05:33:56PM +0200, Daniel Lezcano wrote:
> Andrey Savochkin wrote:
> > Non-trivial part of socket namespaces: asynchronous events
> > should be run in proper context.
> >
> > Signed-off-by: Andrey Savochkin <saw@swsoft.com>
> > ---
> > af_inet.c | 10 ++++++++++
> > inet_timewait_sock.c | 8 ++++++++
> > tcp_timer.c | 9 +++++++++
> > 3 files changed, 27 insertions(+)
> >
> > --- ./net/ipv4/af_inet.c.venssock-asyn Mon Aug 14 17:04:07 2006
> > +++ ./net/ipv4/af_inet.c Tue Aug 15 13:45:44 2006
> > @@ -366,10 +366,17 @@ out_rcu_unlock:
> > int inet_release(struct socket *sock)
> > {
> > struct sock *sk = sock->sk;
> > + struct net_namespace *ns, *orig_net_ns;
> >
> > if (sk) {
> > long timeout;
> >
> > + /* Need to change context here since protocol ->close
> > + * operation may send packets.
> > + */
> > + ns = get_net_ns(sk->sk_net_ns);
> > + push_net_ns(ns, orig_net_ns);
> > +
>
> Is it not a race condition here ? What happens if you have a packet
> incoming during the namespace context switching ?

All asynchronous operations (RX softirq, timers) should set their context
explicitly, and can't rely on the current context being the right one
(or a valid pointer at all).

Andrey
Re: [RFC] network namespaces [message #7120 is a reply to message #5165] Wed, 04 October 2006 09:40 Go to previous message
Daniel Lezcano is currently offline  Daniel Lezcano
Messages: 417
Registered: June 2006
Senior Member
Andrey Savochkin wrote:
> Hi All,
>
> I'd like to resurrect our discussion about network namespaces.
> In our previous discussions it appeared that we have rather polar concepts
> which seemed hard to reconcile.
> Now I have an idea how to look at all discussed concepts to enable everyone's
> usage scenario.

Hi Andrey,

I have a few questions ... sorry for asking so late ;)

>
> 1. The most straightforward concept is complete separation of namespaces,
> covering device list, routing tables, netfilter tables, socket hashes, and
> everything else.
>
> On input path, each packet is tagged with namespace right from the
> place where it appears from a device, and is processed by each layer
> in the context of this namespace.

If you have the namespace where is coming the packet, why do you tag the
packet instead of switching to the right namespace ?

> Non-root namespaces communicate with the outside world in two ways: by
> owning hardware devices, or receiving packets forwarded them by their parent
> namespace via pass-through device.

Do you will do proxy arp and ip forwarding into the root namespace in
order to make non-root namespace visible from the outside world ?

Regards.

-- Daniel
Re: [RFC] network namespaces [message #16590 is a reply to message #5941] Tue, 05 September 2006 18:27 Go to previous message
ebiederm is currently offline  ebiederm
Messages: 1354
Registered: February 2006
Senior Member
Herbert Poetzl <herbert@13thfloor.at> writes:

> On Tue, Sep 05, 2006 at 08:45:39AM -0600, Eric W. Biederman wrote:
>> Daniel Lezcano <dlezcano@fr.ibm.com> writes:
>> 
>> For HPC if you are interested in migration you need a separate IP
>> per container. If you can take you IP address with you migration of
>> networking state is simple. If you can't take your IP address with you
>> a network container is nearly pointless from a migration perspective.
>>
>> Beyond that from everything I have seen layer 2 is just much cleaner
>> than any layer 3 approach short of Serge's bind filtering.
>
> well, the 'ip subset' approach Linux-VServer and
> other Jail solutions use is very clean, it just does
> not match your expectations of a virtual interface
> (as there is none) and it does not cope well with
> all kinds of per context 'requirements', which IMHO
> do not really exist on the application layer (only
> on the whole system layer)

I probably expressed that wrong.  There are currently three
basic approaches under discussion.
Layer 3 (Basically bind filtering) nothing at the packet level.
   The approach taken by Serge's version of bsdjails and Vserver.

Layer 2.5 What Daniel proposed.

Layer 2.  (Trivially mapping each packet to a different interface)
           And then treating everything as multiple instances of the
           network stack.
        Roughly what OpenVZ and I have implemented.

You can get into some weird complications at layer 3 but because
it doesn't touch each packet the proof it is fast is trivial.

>> Beyond that I have yet to see a clean semantics for anything
>> resembling your layer 2 layer 3 hybrid approach. If we can't have
>> clear semantics it is by definition impossible to implement correctly
>> because no one understands what it is supposed to do.
>
> IMHO that would be quite simple, have a 'namespace'
> for limiting port binds to a subset of the available
> ips and another one which does complete network 
> virtualization with all the whistles and bells, IMHO
> most of them are orthogonal and can easily be combined
>
>  - full network virtualization
>  - lightweight ip subset 
>  - both

Quite possibly.  The LSM will stay for a while so we do have
a clean way to restrict port binds.

>> Note. A true layer 3 approach has no impact on TCP/UDP filtering
>> because it filters at bind time not at packet reception time. Once you
>> start inspecting packets I don't see what the gain is from not going
>> all of the way to layer 2.
>
> IMHO this requirement only arises from the full system
> virtualization approach, just look at the other jail
> solutions (solaris, bsd, ...) some of them do not even 
> allow for more than a single ip but they work quite
> well when used properly ...


Yes they do.  Currently I am strongly opposed to Daniel Layer 2.5 approach
as I see no redeeming value in it.  A good clean layer 3 approach I 
avoid only because I think we can do better.

Eric
_______________________________________________
Containers mailing list
Containers@lists.osdl.org
https://lists.osdl.org/mailman/listinfo/containers
Re: [RFC] network namespaces [message #16592 is a reply to message #6005] Wed, 06 September 2006 17:58 Go to previous message
ebiederm is currently offline  ebiederm
Messages: 1354
Registered: February 2006
Senior Member
Herbert Poetzl <herbert@13thfloor.at> writes:

> On Wed, Sep 06, 2006 at 11:10:23AM +0200, Daniel Lezcano wrote:
>> 
>> As far as I see, vserver use a layer 3 solution but, when needed, the
>> veth "component", made by Nestor Pena, is used to provide a layer 2
>> virtualization. Right ?
>
> well, no, we do not explicitely use the VETH daemon
> for networking, although some folks probably make use
> of it, mainly because if you realize that this kind 
> of isolation is something different and partially
> complementary to network virtualization, you can do
> live without the layer 2 virtualization in almost
> all cases, nevertheless, for certain purposes layer
> 2/3 virtualization is required and/or makes perfect
> sense
>
>> Having the two solutions, you have certainly a lot if information
>> about use cases. 
>
>> From the point of view of vserver, can you give some
>> examples of when a layer 3 solution is better/worst than 
>> a layer 2 solution ? 
>
> my point (until we have an implementation which clearly
> shows that performance is equal/better to isolation)
> is simply this:
>
>  of course, you can 'simulate' or 'construct' all the
>  isolation scenarios with kernel bridging and routing
>  and tricky injection/marking of packets, but, this
>  usually comes with an overhead ...
>
>> Who wants a layer 2/3 virtualization and why ?
>
> there are some reasons for virtualization instead of
> pure isolation (as Linux-VServer does it for now)
>
>  - context migration/snapshot (probably reason #1)
>  - creating network devices inside a guest
>    (can help with vpn and similar)
>  - allowing non IP protocols (like DHCP, ICMP, etc)
>
> the problem which arises with this kind of network
> virtualization is that you need some additional policy
> for example to avoid sending 'evil' packets and/or
> (D)DoSing one guest from another, which again adds
> further overhead, so basically if you 'just' want
> to have network isolation, you have to do this:
>
>  - create a 'copy' of your hosts networking inside
>    the guest (with virtual interfaces)
>  - assign all the same (subset) ips and this to
>    the virtual guest interfaces
>  - activate some smart bridging code which 'knows'
>    what ports can be used and/or mapped 
>  - add policy to block unwanted connections and/or
>    packets to/from the guest
>
> all this sounds very intrusive and for sure (please
> proove me wrong here :) adds a lot of overhead to the
> networking itself, while a 'simple' isolation approach
> for IP (tcp/udp) is (almost) without any cost, certainly
> without overhead once a connection is established.

Thanks, for the good summary of the situation.

I think we can prove you wrong but it is going to take
some doing to build a good implementation and take
the necessary measurements.

Hmm.  I wonder if the filtering layer 3 style of isolation can be built with
netfilter rules.  Just skimming it looks we may be able to do it with something
like the netfilter owner module, possibly in conjunction with the connmark or
conntrack modules.  If not if the infrastructure is close enough we can write
our own module.

Has anyone looked at network isolation from the netfilter perspective?

Eric

_______________________________________________
Containers mailing list
Containers@lists.osdl.org
https://lists.osdl.org/mailman/listinfo/containers
Re: [RFC] network namespaces [message #16595 is a reply to message #16590] Wed, 06 September 2006 14:52 Go to previous message
dev is currently offline  dev
Messages: 1693
Registered: September 2005
Location: Moscow
Senior Member

>>On Tue, Sep 05, 2006 at 08:45:39AM -0600, Eric W. Biederman wrote:
>>
>>>Daniel Lezcano <dlezcano@fr.ibm.com> writes:
>>>
>>>For HPC if you are interested in migration you need a separate IP
>>>per container. If you can take you IP address with you migration of
>>>networking state is simple. If you can't take your IP address with you
>>>a network container is nearly pointless from a migration perspective.
>>>
>>>Beyond that from everything I have seen layer 2 is just much cleaner
>>>than any layer 3 approach short of Serge's bind filtering.
>>
>>well, the 'ip subset' approach Linux-VServer and
>>other Jail solutions use is very clean, it just does
>>not match your expectations of a virtual interface
>>(as there is none) and it does not cope well with
>>all kinds of per context 'requirements', which IMHO
>>do not really exist on the application layer (only
>>on the whole system layer)
> 
> 
> I probably expressed that wrong.  There are currently three
> basic approaches under discussion.
> Layer 3 (Basically bind filtering) nothing at the packet level.
>    The approach taken by Serge's version of bsdjails and Vserver.
> 
> Layer 2.5 What Daniel proposed.
> 
> Layer 2.  (Trivially mapping each packet to a different interface)
>            And then treating everything as multiple instances of the
>            network stack.
>         Roughly what OpenVZ and I have implemented.
I think classifying network virtualization by Layer X is not good enough.
OpenVZ has Layer 3 (venet) and Layer 2 (veth) implementations, but
in both cases networking stack inside VE remains fully virtualized.

Thanks,
Kirill

_______________________________________________
Containers mailing list
Containers@lists.osdl.org
https://lists.osdl.org/mailman/listinfo/containers
Re: [RFC] network namespaces [message #16597 is a reply to message #6009] Wed, 06 September 2006 20:53 Go to previous message
Cedric Le Goater is currently offline  Cedric Le Goater
Messages: 443
Registered: February 2006
Senior Member
Kir Kolyshkin wrote:

<snip>

> I am not sure about "network isolation" (used by Linux-VServer), but as 
> it comes for level2 vs. level3 virtualization, I see a need for both. 
> Here is the easy-to-understand comparison which can shed some light: 
> http://wiki.openvz.org/Differences_between_venet_and_veth

thanks kir,

> Here are a couple of examples
> * Do we want to let container's owner (i.e. root) to add/remove IP 
> addresses? Most probably not, but in some cases we want that.
> * Do we want to be able to run DHCP server and/or DHCP client inside a 
> container? Sometimes...but not always.
> * Do we want to let container's owner to create/manage his own set of 
> iptables? In half of the cases we do.
> 
> The problem here is single solution will not cover all those scenarios.

some would argue that there is one single solution : Xen or similar.

IMO, I think containers should try to leverage their difference,
performance, and not try to simulate a real hardware environment.

Restricting the network environment of a container should be considered
acceptable if this is for the sake of performance. The network interface(s)
could be pre-configured and provided to the container. Protocol(s) could be
forbidden.

Now, if you need more network power in a container, you will need a real or
a virtualized interface.

But let's consider both alternatives.

thanks,

C.
_______________________________________________
Containers mailing list
Containers@lists.osdl.org
https://lists.osdl.org/mailman/listinfo/containers
Re: Re: [RFC] network namespaces [message #16607 is a reply to message #6079] Thu, 07 September 2006 19:50 Go to previous message
ebiederm is currently offline  ebiederm
Messages: 1354
Registered: February 2006
Senior Member
Herbert Poetzl <herbert@13thfloor.at> writes:

> On Thu, Sep 07, 2006 at 08:23:53PM +0400, Kirill Korotaev wrote:
>
> well, who said that you need to have things like RAW sockets
> or other protocols except IP, not to speak of iptable and 
> routing entries ...
>
> folks who _want_ full network virtualization can use the
> more complete virtual setup and be happy ...

Exactly this was a proposal for isolation for containers
that don't get CAP_NET_ADMIN, with a facility that could
easily be general purpose.

Eric
_______________________________________________
Containers mailing list
Containers@lists.osdl.org
https://lists.osdl.org/mailman/listinfo/containers
Re: [RFC] network namespaces [message #16628 is a reply to message #6082] Fri, 08 September 2006 06:02 Go to previous message
Herbert Poetzl is currently offline  Herbert Poetzl
Messages: 239
Registered: February 2006
Senior Member
On Thu, Sep 07, 2006 at 12:29:21PM -0600, Eric W. Biederman wrote:
> Daniel Lezcano <dlezcano@fr.ibm.com> writes:
> >
> > IHMO, I think there is one reason. The unsharing mechanism is
> > not only for containers, its aim other kind of isolation like a
> > "bsdjail" for example. The unshare syscall is flexible, shall the
> > network unsharing be one-block solution ? For example, we want to
> > launch an application using TCP/IP and we want to have
> > an IP address only used by the application, nothing more.
> > With a layer 2, we must after unsharing:
> >  1) create a virtual device into the application namespace
> >  2) assign an IP address
> >  3) create a virtual device pass-through in the root namespace
> >  4) set the virtual device IP
> >
> > All this stuff, need a lot of administration (check mac addresses
> > conflicts, check interface names collision in root namespace, ...)
> > for a simple network isolation.
> 
> Yes, and even more it is hard to show that it will perform as well.
> Although by dropping CAP_NET_ADMIN the actual runtime administration
> is about the same.
> 
> > With a layer 3:
> >  1) assign an IP address
> >
> > In the other hand, a layer 3 isolation is not sufficient to reach
> > the level of isolation/virtualization needed for the system
> > containers.
> 
> Agreed.
> 
> > Very soon, I will commit more info at:
> >
> > http://wiki.openvz.org/Containers/Networking
> >
> > So the consensus is based on the fact that there is a lot of common
> > code for the layer 2 and layer 3 isolation/virtualization and we can
> > find a way to merge the 2 implementation in order to have a flexible
> > network virtualization/isolation.
> 
> NACK In a real level 3 implementation there is very little common
> code with a layer 2 implementation. You don't need to muck with the
> socket handling code as you are not allowed to dup addresses between
> containers. Look at what Serge did that is layer 3.
>
> A layer 3 isolation implementation should either be a new security
> module or a new form of iptables. The problem with using the lsm is
> that it seems to be an all or nothing mechanism so is a very coarse
> grained tool for this job.

IMHO LSM was never an option for that, because it is
a) very complicated to use it for that purpose
b) missing many hooks you definitely need to make this work
c) is not really efficient and/or performant

with something 'like' iptables, this could be done, but
I'm not sure that is the best approach either ...

best,
Herbert

> A layer 2 implementation (where you have network devices isolated and
> not sockets) should be a namespace.
> 
> Eric
> _______________________________________________
> Containers mailing list
> Containers@lists.osdl.org
> https://lists.osdl.org/mailman/listinfo/containers
_______________________________________________
Containers mailing list
Containers@lists.osdl.org
https://lists.osdl.org/mailman/listinfo/containers
Re: Re: [RFC] network namespaces [message #16648 is a reply to message #6147] Sun, 10 September 2006 03:41 Go to previous message
ebiederm is currently offline  ebiederm
Messages: 1354
Registered: February 2006
Senior Member
Herbert Poetzl <herbert@13thfloor.at> writes:

> On Sat, Sep 09, 2006 at 11:57:24AM +0400, Dmitry Mishin wrote:
>> On Friday 08 September 2006 22:11, Herbert Poetzl wrote:
>> > actually the light-weight ip isolation runs perfectly
>> > fine _without_ CAP_NET_ADMIN, as you do not want the
>> > guest to be able to mess with the 'configured' ips at
>> > all (not to speak of interfaces here)
>
>> It was only an example. I'm thinking about how to implement flexible
>> solution, which permits light-weight ip isolation as well as
>> full-fledged netwrok virtualization. Another solution is to split
>> CONFIG_NET_NAMESPACE. Is it good for you?
>
> well, I think it would be best to have both, as
> they are complementary to some degree, and IMHO
> both, the full virtualization _and_ the isolation
> will require a separate namespace to work, I also
> think that limiting the isolation to something
> very simple (like one IP + network or so) would
> be acceptable for a start, because especially
> multi IP or network range checks require a little
> more efford to get them right ...
>
> I do not think that folks would want to recompile
> their kernel just to get a light-weight guest or
> a fully virtualized one

I certainly agree that we are not at a point where a final decision
can be made.  A major piece of that is that a layer 2 approach has
not shown to be without a performance penalty.

A practical question.  Do the IPs assigned to guests ever get used
by anything besides the guest?

Eric
_______________________________________________
Containers mailing list
Containers@lists.osdl.org
https://lists.osdl.org/mailman/listinfo/containers
Re: [RFC] network namespaces [message #16652 is a reply to message #5165] Sun, 10 September 2006 11:48 Go to previous message
ebiederm is currently offline  ebiederm
Messages: 1354
Registered: February 2006
Senior Member
Dmitry Mishin <dim@openvz.org> writes:

> On Sunday 10 September 2006 07:41, Eric W. Biederman wrote:
>> I certainly agree that we are not at a point where a final decision
>> can be made.  A major piece of that is that a layer 2 approach has
>> not shown to be without a performance penalty.
> But it is required. Why to limit possible usages?

Wrong perspective.

The point is that we need to dig in and show that there is no
measurable penalty for the current cases.  Showing that there
is little penalty for the advanced configurations is a plus.

The practical question is, do we need to implement the grand unified
lookup before we can do this cheaply, or can we implement this without
needing that optimization?

To get a perspective, to get a good implementation of the pid namespace
I am having to refactor significant parts of the kernel so it uses
abstractions that can cleanly express what we are doing.  The
networking stack is in better shape but there is a lot of it. 

>> A practical question.  Do the IPs assigned to guests ever get used
>> by anything besides the guest?
> In case of level2 virtualization - no.

Actually that is one of the benefits of a layer 2 implementation
you can set up weird things like shared IPs, that various types
of fail over scenarios want.

My question was really about the layer 3 bind filtering techniques,
and how people are using them.

The basic attraction with layer 3 is that you can do a simple
implementation, and it will run very fast, and it doesn't need
to conflict with the layer 2 work at all.  If you can make that layer
3 implementation clean and generally mergeable  as well it is worth
pursuing.

Eric
_______________________________________________
Containers mailing list
Containers@lists.osdl.org
https://lists.osdl.org/mailman/listinfo/containers
Re: Re: [RFC] network namespaces [message #16655 is a reply to message #16648] Sun, 10 September 2006 08:11 Go to previous message
Mishin Dmitry is currently offline  Mishin Dmitry
Messages: 112
Registered: February 2006
Senior Member
On Sunday 10 September 2006 07:41, Eric W. Biederman wrote:
> I certainly agree that we are not at a point where a final decision
> can be made.  A major piece of that is that a layer 2 approach has
> not shown to be without a performance penalty.
But it is required. Why to limit possible usages?
 
> A practical question.  Do the IPs assigned to guests ever get used
> by anything besides the guest?
In case of level2 virtualization - no.

-- 
Thanks,
Dmitry.
_______________________________________________
Containers mailing list
Containers@lists.osdl.org
https://lists.osdl.org/mailman/listinfo/containers
Re: Re: [RFC] network namespaces [message #16663 is a reply to message #16648] Sun, 10 September 2006 19:19 Go to previous message
Herbert Poetzl is currently offline  Herbert Poetzl
Messages: 239
Registered: February 2006
Senior Member
On Sat, Sep 09, 2006 at 09:41:35PM -0600, Eric W. Biederman wrote:
> Herbert Poetzl <herbert@13thfloor.at> writes:
> 
> > On Sat, Sep 09, 2006 at 11:57:24AM +0400, Dmitry Mishin wrote:
> >> On Friday 08 September 2006 22:11, Herbert Poetzl wrote:
> >> > actually the light-weight ip isolation runs perfectly
> >> > fine _without_ CAP_NET_ADMIN, as you do not want the
> >> > guest to be able to mess with the 'configured' ips at
> >> > all (not to speak of interfaces here)
> >
> >> It was only an example. I'm thinking about how to implement flexible
> >> solution, which permits light-weight ip isolation as well as
> >> full-fledged netwrok virtualization. Another solution is to split
> >> CONFIG_NET_NAMESPACE. Is it good for you?
> >
> > well, I think it would be best to have both, as
> > they are complementary to some degree, and IMHO
> > both, the full virtualization _and_ the isolation
> > will require a separate namespace to work, I also
> > think that limiting the isolation to something
> > very simple (like one IP + network or so) would
> > be acceptable for a start, because especially
> > multi IP or network range checks require a little
> > more efford to get them right ...
> >
> > I do not think that folks would want to recompile
> > their kernel just to get a light-weight guest or
> > a fully virtualized one
> 
> I certainly agree that we are not at a point where a final decision
> can be made.  A major piece of that is that a layer 2 approach has
> not shown to be without a performance penalty.
> 
> A practical question.  Do the IPs assigned to guests ever get used
> by anything besides the guest?

only in special setups and for testing routing and
general operation of course, i.e. one typical
failure scenario is this:

 - 'provider' has a bunch of ips assigned
 - 'host' ip works perfectly
 - 'guest' ip is not routed (by the external router)

in this case, for example, I always suggest to test
on the host with a guest ip, simplest example:

 ping -I <guest-ip> google.com

but for 'normal' operation, the guest ip is reserved
for the guests, unless some service like named is
shared between guests ...

HTH,
Herbert

> Eric
_______________________________________________
Containers mailing list
Containers@lists.osdl.org
https://lists.osdl.org/mailman/listinfo/containers
Re: Re: [RFC] network namespaces [message #16680 is a reply to message #6142] Mon, 11 September 2006 14:40 Go to previous message
Daniel Lezcano is currently offline  Daniel Lezcano
Messages: 417
Registered: June 2006
Senior Member
Dmitry Mishin wrote:
> On Friday 08 September 2006 22:11, Herbert Poetzl wrote:
> 
>>actually the light-weight ip isolation runs perfectly
>>fine _without_ CAP_NET_ADMIN, as you do not want the
>>guest to be able to mess with the 'configured' ips at
>>all (not to speak of interfaces here)
> 
> It was only an example. I'm thinking about how to implement flexible solution, 
> which permits light-weight ip isolation as well as full-fledged netwrok 
> virtualization. Another solution is to split CONFIG_NET_NAMESPACE. Is it good 
> for you?

Hi Dmitry,

I am currently working on this and I am finishing a prototype bringing 
isolation at the ip layer. The prototype code is very closed to Andrey's 
patches at TCP/UDP level. So the next step is to merge the prototype 
code with the existing network namespace layer 2 isolation.

IHMO, the solution of spliting CONFIG_NET_NS into CONFIG_L2_NET_NS and 
CONFIG_L3_NET_NS is for me not acceptable because you will need to 
recompile the kernel. The proper way is certainly to have a specific 
flag for the unshare, something like CLONE_NEW_L2_NET and 
CLONE_NEW_L3_NET for example.

   -- Daniel

_______________________________________________
Containers mailing list
Containers@lists.osdl.org
https://lists.osdl.org/mailman/listinfo/containers
Re: Re: [RFC] network namespaces [message #16681 is a reply to message #16680] Mon, 11 September 2006 14:57 Go to previous message
Herbert Poetzl is currently offline  Herbert Poetzl
Messages: 239
Registered: February 2006
Senior Member
On Mon, Sep 11, 2006 at 04:40:59PM +0200, Daniel Lezcano wrote:
> Dmitry Mishin wrote:
> >On Friday 08 September 2006 22:11, Herbert Poetzl wrote:
> >
> >>actually the light-weight ip isolation runs perfectly
> >>fine _without_ CAP_NET_ADMIN, as you do not want the
> >>guest to be able to mess with the 'configured' ips at
> >>all (not to speak of interfaces here)
> >
> >It was only an example. I'm thinking about how to implement flexible 
> >solution, which permits light-weight ip isolation as well as full-fledged 
> >netwrok virtualization. Another solution is to split CONFIG_NET_NAMESPACE. 
> >Is it good for you?
> 
> Hi Dmitry,
> 
> I am currently working on this and I am finishing a prototype bringing
> isolation at the ip layer. The prototype code is very closed to
> Andrey's patches at TCP/UDP level. So the next step is to merge the
> prototype code with the existing network namespace layer 2 isolation.

you might want to take a look at the current Linux-VServer
implementation for the network isolation too, should be
quite similar to Andrey's approach, but maybe you can
gather some additional information from there

> IHMO, the solution of spliting CONFIG_NET_NS into CONFIG_L2_NET_NS
> and CONFIG_L3_NET_NS is for me not acceptable because you will need
> to recompile the kernel. The proper way is certainly to have a
> specific flag for the unshare, something like CLONE_NEW_L2_NET and
> CLONE_NEW_L3_NET for example.

I completely agree here, we need a separate namespace
for that, so that we can combine isolation and virtualization
as needed, unless the bind restrictions can be completely
expressed with an additional mangle or filter table (as
was suggested)

best,
Herbert

>   -- Daniel
_______________________________________________
Containers mailing list
Containers@lists.osdl.org
https://lists.osdl.org/mailman/listinfo/containers
Re: Re: [RFC] network namespaces [message #16682 is a reply to message #16681] Mon, 11 September 2006 15:04 Go to previous message
Daniel Lezcano is currently offline  Daniel Lezcano
Messages: 417
Registered: June 2006
Senior Member
Herbert Poetzl wrote:
> On Mon, Sep 11, 2006 at 04:40:59PM +0200, Daniel Lezcano wrote:
> 

>>I am currently working on this and I am finishing a prototype bringing
>>isolation at the ip layer. The prototype code is very closed to
>>Andrey's patches at TCP/UDP level. So the next step is to merge the
>>prototype code with the existing network namespace layer 2 isolation.
> 
> 
> you might want to take a look at the current Linux-VServer
> implementation for the network isolation too, should be
> quite similar to Andrey's approach, but maybe you can
> gather some additional information from there

ok, thanks. I will.

>>IHMO, the solution of spliting CONFIG_NET_NS into CONFIG_L2_NET_NS
>>and CONFIG_L3_NET_NS is for me not acceptable because you will need
>>to recompile the kernel. The proper way is certainly to have a
>>specific flag for the unshare, something like CLONE_NEW_L2_NET and
>>CLONE_NEW_L3_NET for example.
> 
> 
> I completely agree here, we need a separate namespace
> for that, so that we can combine isolation and virtualization
> as needed, unless the bind restrictions can be completely
> expressed with an additional mangle or filter table (as
> was suggested)

What is the bind restriction ? Do you want to force binding to a 
specific source address ?

   -- Daniel
_______________________________________________
Containers mailing list
Containers@lists.osdl.org
https://lists.osdl.org/mailman/listinfo/containers
Re: Re: [RFC] network namespaces [message #16683 is a reply to message #16681] Mon, 11 September 2006 15:10 Go to previous message
Mishin Dmitry is currently offline  Mishin Dmitry
Messages: 112
Registered: February 2006
Senior Member
On Monday 11 September 2006 18:57, Herbert Poetzl wrote:
> I completely agree here, we need a separate namespace
> for that, so that we can combine isolation and virtualization
> as needed, unless the bind restrictions can be completely
> expressed with an additional mangle or filter table (as
> was suggested)
iptables are designed for packet flow decisions and filtering, it has nothing 
common with bind restrictions. So, it may be only packet flow 
scheduling/filtering, but it will not help to resolve bind-time IP conflicts.

-- 
Thanks,
Dmitry.
_______________________________________________
Containers mailing list
Containers@lists.osdl.org
https://lists.osdl.org/mailman/listinfo/containers
Re: [RFC] network namespaces [message #16687 is a reply to message #6148] Tue, 12 September 2006 03:26 Go to previous message
ebiederm is currently offline  ebiederm
Messages: 1354
Registered: February 2006
Senior Member
Dmitry Mishin <dim@openvz.org> writes:

> On Sunday 10 September 2006 06:47, Herbert Poetzl wrote:
>> well, I think it would be best to have both, as
>> they are complementary to some degree, and IMHO
>> both, the full virtualization _and_ the isolation
>> will require a separate namespace to work,   
> [snip]
>> I do not think that folks would want to recompile
>> their kernel just to get a light-weight guest or
>> a fully virtualized one
> In this case light-weight guest will have unnecessary overhead.
> For example, instead of using static pointer, we have to find the required 
> common namespace before. And there will be no advantages for such guest over 
> full-featured.

Dmitry that just isn't true if implemented properly.  

Eric
_______________________________________________
Containers mailing list
Containers@lists.osdl.org
https://lists.osdl.org/mailman/listinfo/containers
Re: [RFC] network namespaces [message #16688 is a reply to message #16683] Tue, 12 September 2006 03:28 Go to previous message
ebiederm is currently offline  ebiederm
Messages: 1354
Registered: February 2006
Senior Member
Dmitry Mishin <dim@openvz.org> writes:

> On Monday 11 September 2006 18:57, Herbert Poetzl wrote:
>> I completely agree here, we need a separate namespace
>> for that, so that we can combine isolation and virtualization
>> as needed, unless the bind restrictions can be completely
>> expressed with an additional mangle or filter table (as
>> was suggested)
>
> iptables are designed for packet flow decisions and filtering, it has nothing 
> common with bind restrictions. So, it may be only packet flow 
> scheduling/filtering, but it will not help to resolve bind-time IP conflicts.

Please read the archive, where the suggestion was made.

What was suggested was a new table, with it's own set of chains.
So we could make filtering decisions on where sockets could be bound.

That is not a far stretch from where iptables is today.

Do you have some concrete arguments against the proposal?

Eric
_______________________________________________
Containers mailing list
Containers@lists.osdl.org
https://lists.osdl.org/mailman/listinfo/containers
Re: [RFC] network namespaces [message #16694 is a reply to message #16688] Tue, 12 September 2006 07:38 Go to previous message
Mishin Dmitry is currently offline  Mishin Dmitry
Messages: 112
Registered: February 2006
Senior Member
Sorry, dont' understand your proposal correctly from the previous talk. :)
But...

On Tuesday 12 September 2006 07:28, Eric W. Biederman wrote:
> Do you have some concrete arguments against the proposal?
Yes, I have. I think it is unnecessary complication. This complication will 
followed in additional bugs. Especially if we'll accept rules creation in 
userspace. Why we need complex solution, if there are only two approaches to  
socket bound - isolation and virtualization? These approaches could co-exist 
without hooks. Or you probably have thoughts about other ways?

-- 
Thanks,
Dmitry.
_______________________________________________
Containers mailing list
Containers@lists.osdl.org
https://lists.osdl.org/mailman/listinfo/containers
Previous Topic: [PATCH 2.6.18] ext2: errors behaviour fix
Next Topic: 64bit DMA in i2o_block
Goto Forum:
  


Current Time: Sun Oct 26 21:28:29 GMT 2025

Total time taken to generate the page: 0.10986 seconds