OpenVZ Forum


Home » Mailing lists » Devel » L3 network isolation
L3 network isolation [message #16831] Wed, 06 December 2006 23:25 Go to next message
Daniel Lezcano is currently offline  Daniel Lezcano
Messages: 417
Registered: June 2006
Senior Member
Hi all,

Dmitry and I, we thought about a possible implementation allowing the 
l2/l3 to coexists.

The idea is assuming the l3 network namespaces are the leaf in the l2 
namespace hierarchy tree. By default, init process is l2 namespace. From 
a layer 3, it is impossible to do a new network namespace unshare.

All the configuration is done into the l2 namespace. When a l3 is 
created a new IP address should be created into the l2 namespace and 
"pushed" into the l3. When the l3 dies, the IP is pulled to its parent, 
aka the l2. In order to ensure security into the l3, the NET_ADMIN 
capability is lost when doing unsharing for l3.
There is no extra code for socket virtualization. It is a common part.

How to setup a l3 namespace ?
-----------------------------

  1 - setup a new IP address in l2 namespace
  2 - create a l3 namespace
  3 - specific socket ioctl to "push" the IP address from the l2 
namespace to the newly created l3 namespace

The l2 lose visibility on the IP address and l3 gains visibility on the 
IP address. A ifconfig or a ip command shows only the IP address 
assigned to the namespace. Loopback address is always visible.

How to handle outgoing traffic ?
--------------------------------

The bind must be checked with the IP addresses belonging to the l3 
namespace and with all the derivative addresses (multicast, broadcast, 
zero net, loopback, ...).

The IP addresses will rely on aliased IP address. The source address 
must be filled with the IP address belonging the l3 namespace when not 
set. This is a trivial operation, because we know which IP addresses are 
assigned to the l3 namespace.

When the route are resolved, the l3 namespace switch the its parent, 
that is to say the l2 namespace, and the virtualization follows its 
normal path.

How to handle incoming traffic ?
--------------------------------

Because we can have several sockets listening on the same 
INADDR_ANY:port, we must find the network namespace associated with the 
destination IP address.
For unicast, this is a trivial operation, because that can be checked 
with the assigned IP address again. For broadcast and multicast, some 
extra work should be done in order to store the namespaces which are 
listening on a broadcast address. As soon as the namespace is found, we 
switch to it. This can be done with netfilters.

Routes and co.
--------------

  - Routes: they are not isolated, each l3 namespace can see all the 
routes from the other namespaces. That allows the routing engine to see 
all the routes and choose the loopback when two network namespaces in 
the same host try to communicate.

  - Cache: the routing cache must be isolated, otherwise the socket 
isolation will not work. The l3 namespace code does not impact the l2 
namespace code and route cache isolation is a common part if the l3 
namespace switching is done in the right place.


Dmitry has posted the l2 namespace relying on the net namespace empty 
framework, I will post the l3 namespace relying on the l2 namespace 
today or tomorrow.

   -- Daniel

_______________________________________________
Containers mailing list
Containers@lists.osdl.org
https://lists.osdl.org/mailman/listinfo/containers
Re: L3 network isolation [message #16856 is a reply to message #16831] Thu, 07 December 2006 21:33 Go to previous messageGo to next message
Vlad Yasevich is currently offline  Vlad Yasevich
Messages: 8
Registered: November 2006
Junior Member
Hi Daniel

> Hi all,
> 
> Dmitry and I, we thought about a possible implementation allowing the 
> l2/l3 to coexists.
> 
> The idea is assuming the l3 network namespaces are the leaf in the l2 
> namespace hierarchy tree. By default, init process is l2 namespace. From 
> a layer 3, it is impossible to do a new network namespace unshare.
> 
> All the configuration is done into the l2 namespace. When a l3 is 
> created a new IP address should be created into the l2 namespace and 
> "pushed" into the l3. When the l3 dies, the IP is pulled to its parent, 
> aka the l2. In order to ensure security into the l3, the NET_ADMIN 
> capability is lost when doing unsharing for l3.
> There is no extra code for socket virtualization. It is a common part.
> 
> How to setup a l3 namespace ?
> -----------------------------
> 
>   1 - setup a new IP address in l2 namespace
>   2 - create a l3 namespace
>   3 - specific socket ioctl to "push" the IP address from the l2 
> namespace to the newly created l3 namespace

This means that there is some kind of identifier for the l3 namespace, right?

> 
> The l2 lose visibility on the IP address and l3 gains visibility on the 
> IP address. A ifconfig or a ip command shows only the IP address 
> assigned to the namespace. Loopback address is always visible.

Hmm....  I've been thinking about this, and I think this OK from the sockets point
of view, i.e. binds() in l2 lose visibility to the new l3 address.  There is
a concern for a potential race here though. 

However, it would be really nice to be able to see l3 namespace addresses in
the parent l2 tagged in some way.

> 
> How to handle outgoing traffic ?
> --------------------------------
> 
> The bind must be checked with the IP addresses belonging to the l3 
> namespace and with all the derivative addresses (multicast, broadcast, 
> zero net, loopback, ...).
> 
> The IP addresses will rely on aliased IP address. The source address 
> must be filled with the IP address belonging the l3 namespace when not 
> set. This is a trivial operation, because we know which IP addresses are 
> assigned to the l3 namespace.

Can you provide a little more info?

> 
> When the route are resolved, the l3 namespace switch the its parent, 
> that is to say the l2 namespace, and the virtualization follows its 
> normal path.
> 
> How to handle incoming traffic ?
> --------------------------------
> 
> Because we can have several sockets listening on the same 
> INADDR_ANY:port, we must find the network namespace associated with the 
> destination IP address.
> For unicast, this is a trivial operation, because that can be checked 
> with the assigned IP address again. For broadcast and multicast, some 
> extra work should be done in order to store the namespaces which are 
> listening on a broadcast address. As soon as the namespace is found, we 
> switch to it. This can be done with netfilters.

The problem is with multicasts.  Multicast groups are joined on the interface
bases.  Every socket that bound *:multicast_port will receive multicast
traffic once a single app joined the group.  Since l3 namespaces don't have
share the conceptual interface, theoretically, all l3 namespaces should receive
multicast traffic.

> 
> Routes and co.
> --------------
> 
>   - Routes: they are not isolated, each l3 namespace can see all the 
> routes from the other namespaces. That allows the routing engine to see 
> all the routes and choose the loopback when two network namespaces in 
> the same host try to communicate.
> 
>   - Cache: the routing cache must be isolated, otherwise the socket 
> isolation will not work. The l3 namespace code does not impact the l2 
> namespace code and route cache isolation is a common part if the l3 
> namespace switching is done in the right place.
> 
> 
> Dmitry has posted the l2 namespace relying on the net namespace empty 
> framework, I will post the l3 namespace relying on the l2 namespace 
> today or tomorrow.
> 

Looking forward to it.

Thanks
-vlad

_______________________________________________
Containers mailing list
Containers@lists.osdl.org
https://lists.osdl.org/mailman/listinfo/containers
Re: L3 network isolation [message #16857 is a reply to message #16831] Thu, 07 December 2006 19:43 Go to previous messageGo to next message
Herbert Poetzl is currently offline  Herbert Poetzl
Messages: 239
Registered: February 2006
Senior Member
On Thu, Dec 07, 2006 at 12:25:45AM +0100, Daniel Lezcano wrote:
> Hi all,
> 
> Dmitry and I, we thought about a possible implementation allowing the 
> l2/l3 to coexists.
> 
> The idea is assuming the l3 network namespaces are the leaf in the l2 
> namespace hierarchy tree. By default, init process is l2 namespace. From 
> a layer 3, it is impossible to do a new network namespace unshare.
> 
> All the configuration is done into the l2 namespace. When a l3 is 
> created a new IP address should be created into the l2 namespace and 
> "pushed" into the l3. When the l3 dies, the IP is pulled to its parent, 
> aka the l2. In order to ensure security into the l3, the NET_ADMIN 
> capability is lost when doing unsharing for l3.
> There is no extra code for socket virtualization. It is a common part.
> 
> How to setup a l3 namespace ?
> -----------------------------
> 
>   1 - setup a new IP address in l2 namespace
>   2 - create a l3 namespace
>   3 - specific socket ioctl to "push" the IP address from the l2 
> namespace to the newly created l3 namespace
> 
> The l2 lose visibility on the IP address and l3 gains visibility on   
> the IP address.                                                       

why that?
I consider visibility of the IP addresses on the host
(what you call l2 space) a feature ...

> A ifconfig or a ip command shows only the IP address 
> assigned to the namespace. 

that is okay though ...

> Loopback address is always visible.

is it also bindable?

> How to handle outgoing traffic ?
> --------------------------------
> 
> The bind must be checked with the IP addresses belonging to the l3 
> namespace and with all the derivative addresses (multicast, broadcast, 
> zero net, loopback, ...).
> 
> The IP addresses will rely on aliased IP address. 

hmm? please elaborate ...

> The source address must be filled with the IP address belonging the l3
> namespace when not set. This is a trivial operation, because we know
> which IP addresses are assigned to the l3 namespace.
> 
> When the route are resolved, the l3 namespace switch the its parent, 
> that is to say the l2 namespace, and the virtualization follows its 
> normal path.
> 
> How to handle incoming traffic ?
> --------------------------------
> 
> Because we can have several sockets listening on the same 
> INADDR_ANY:port, we must find the network namespace associated 
> with the destination IP address.
> For unicast, this is a trivial operation, because that can be checked 
> with the assigned IP address again. For broadcast and multicast, some 
> extra work should be done in order to store the namespaces which are 
> listening on a broadcast address. As soon as the namespace is found, we 
> switch to it. This can be done with netfilters.

okay ...

> Routes and co.
> --------------
> 
>   - Routes: they are not isolated, each l3 namespace can see all the 
> routes from the other namespaces. That allows the routing engine to see 
> all the routes and choose the loopback when two network namespaces in 
> the same host try to communicate.
> 
>   - Cache: the routing cache must be isolated, otherwise the socket 
> isolation will not work. The l3 namespace code does not impact the l2 
> namespace code and route cache isolation is a common part if the l3 
> namespace switching is done in the right place.
> 
> Dmitry has posted the l2 namespace relying on the net namespace empty 
> framework, I will post the l3 namespace relying on the l2 namespace 
> today or tomorrow.

looking forward to it ...

best,
Herbert

>    -- Daniel
> 
> _______________________________________________
> Containers mailing list
> Containers@lists.osdl.org
> https://lists.osdl.org/mailman/listinfo/containers
_______________________________________________
Containers mailing list
Containers@lists.osdl.org
https://lists.osdl.org/mailman/listinfo/containers
Re: L3 network isolation [message #16859 is a reply to message #16857] Thu, 07 December 2006 22:08 Go to previous messageGo to next message
Daniel Lezcano is currently offline  Daniel Lezcano
Messages: 417
Registered: June 2006
Senior Member
Herbert Poetzl wrote:
> On Thu, Dec 07, 2006 at 12:25:45AM +0100, Daniel Lezcano wrote:
>> Hi all,
>>
>> Dmitry and I, we thought about a possible implementation allowing the
>> l2/l3 to coexists.
>>
>> The idea is assuming the l3 network namespaces are the leaf in the l2
>> namespace hierarchy tree. By default, init process is l2 namespace. From
>> a layer 3, it is impossible to do a new network namespace unshare.
>>
>> All the configuration is done into the l2 namespace. When a l3 is
>> created a new IP address should be created into the l2 namespace and
>> "pushed" into the l3. When the l3 dies, the IP is pulled to its parent,
>> aka the l2. In order to ensure security into the l3, the NET_ADMIN
>> capability is lost when doing unsharing for l3.
>> There is no extra code for socket virtualization. It is a common part.
>>
>> How to setup a l3 namespace ?
>> -----------------------------
>>
>>   1 - setup a new IP address in l2 namespace
>>   2 - create a l3 namespace
>>   3 - specific socket ioctl to "push" the IP address from the l2
>> namespace to the newly created l3 namespace
>>
>> The l2 lose visibility on the IP address and l3 gains visibility on
>> the IP address.
> 
> why that?
> I consider visibility of the IP addresses on the host
> (what you call l2 space) a feature ...

Perhaps the sentence is malformed. I mean, you set an IP address in the 
layer 2, you do ifconfig/ip => you see it. The IP is pushed to l3, you 
do again ifconfig/ip in the l2 namespace and you do not see it. This is 
related to the section below.
> 
>> A ifconfig or a ip command shows only the IP address
>> assigned to the namespace.
> 
> that is okay though ...
> 
>> Loopback address is always visible.
> 
> is it also bindable?

Yes, bindable, usable, isolated. I think the loopback isolation should 
be enabled/disabled by configuration in order to let the application to 
communicate with portmap.

> 
>> How to handle outgoing traffic ?
>> --------------------------------
>>
>> The bind must be checked with the IP addresses belonging to the l3
>> namespace and with all the derivative addresses (multicast, broadcast,
>> zero net, loopback, ...).
>>
>> The IP addresses will rely on aliased IP address.
> 
> hmm? please elaborate ...

If you create 5 IP address, 1.2.3.[1-5]/24, the IP 1.2.3.1 will be the 
primary address and 1.2.3.[2-4] will be secondaries IP addresses. You 
create five l3 namespaces and assign each IP to each namespace. So we have:
namespace 1 -> 1.2.3.1/24
namespace 2 -> 1.2.3.2/24
....

If namespace 2 connects to 1.2.3.100 for example, the routing engine 
will choose the primary address as source address if it was not 
specified by a bind, which is the usual case for a connection. The peer 
1.2.3.100 will answer to 1.2.3.1 instead of 1.2.3.2 => RST

> 
>> The source address must be filled with the IP address belonging the l3
>> namespace when not set. This is a trivial operation, because we know
>> which IP addresses are assigned to the l3 namespace.
>>
>> When the route are resolved, the l3 namespace switch the its parent,
>> that is to say the l2 namespace, and the virtualization follows its
>> normal path.
>>
>> How to handle incoming traffic ?
>> --------------------------------
>>
>> Because we can have several sockets listening on the same
>> INADDR_ANY:port, we must find the network namespace associated
>> with the destination IP address.
>> For unicast, this is a trivial operation, because that can be checked
>> with the assigned IP address again. For broadcast and multicast, some
>> extra work should be done in order to store the namespaces which are
>> listening on a broadcast address. As soon as the namespace is found, we
>> switch to it. This can be done with netfilters.
> 
> okay ...
> 
>> Routes and co.
>> --------------
>>
>>   - Routes: they are not isolated, each l3 namespace can see all the
>> routes from the other namespaces. That allows the routing engine to see
>> all the routes and choose the loopback when two network namespaces in
>> the same host try to communicate.
>>
>>   - Cache: the routing cache must be isolated, otherwise the socket
>> isolation will not work. The l3 namespace code does not impact the l2
>> namespace code and route cache isolation is a common part if the l3
>> namespace switching is done in the right place.
>>
>> Dmitry has posted the l2 namespace relying on the net namespace empty
>> framework, I will post the l3 namespace relying on the l2 namespace
>> today or tomorrow.
> 
> looking forward to it ...
> 
> best,
> Herbert
> 
>>    -- Daniel
>>
>> _______________________________________________
>> Containers mailing list
>> Containers@lists.osdl.org
>> https://lists.osdl.org/mailman/listinfo/containers

_______________________________________________
Containers mailing list
Containers@lists.osdl.org
https://lists.osdl.org/mailman/listinfo/containers
Re: L3 network isolation [message #16860 is a reply to message #16856] Thu, 07 December 2006 22:33 Go to previous messageGo to next message
Daniel Lezcano is currently offline  Daniel Lezcano
Messages: 417
Registered: June 2006
Senior Member
Vlad Yasevich wrote:
> Hi Daniel
> 
>> Hi all,
>>
>> Dmitry and I, we thought about a possible implementation allowing the
>> l2/l3 to coexists.
>>
>> The idea is assuming the l3 network namespaces are the leaf in the l2
>> namespace hierarchy tree. By default, init process is l2 namespace. From
>> a layer 3, it is impossible to do a new network namespace unshare.
>>
>> All the configuration is done into the l2 namespace. When a l3 is
>> created a new IP address should be created into the l2 namespace and
>> "pushed" into the l3. When the l3 dies, the IP is pulled to its parent,
>> aka the l2. In order to ensure security into the l3, the NET_ADMIN
>> capability is lost when doing unsharing for l3.
>> There is no extra code for socket virtualization. It is a common part.
>>
>> How to setup a l3 namespace ?
>> -----------------------------
>>
>>   1 - setup a new IP address in l2 namespace
>>   2 - create a l3 namespace
>>   3 - specific socket ioctl to "push" the IP address from the l2
>> namespace to the newly created l3 namespace
> 
> This means that there is some kind of identifier for the l3 namespace, right?

Not exactly. The bind_ns allows to assign an identifier to a namespace. 
The namespace is an aggregation of the different namespace ressources 
(ipc, pid, net, utsname, ...). But the result is the same, we use the 
namespace identifier instead of a l3 namespace identifier.

> 
>> The l2 lose visibility on the IP address and l3 gains visibility on the
>> IP address. A ifconfig or a ip command shows only the IP address
>> assigned to the namespace. Loopback address is always visible.
> 
> Hmm....  I've been thinking about this, and I think this OK from the sockets point
> of view, i.e. binds() in l2 lose visibility to the new l3 address.  There is
> a concern for a potential race here though.

Do you mean, someone in the l2 namespace can use the IP address before 
pushing  it the l3 namespace ? That is right, perhaps the call should be 
done in one shoot (set address + pushing it to l3)

> However, it would be really nice to be able to see l3 namespace addresses in
> the parent l2 tagged in some way.


> 
>> How to handle outgoing traffic ?
>> --------------------------------
>>
>> The bind must be checked with the IP addresses belonging to the l3
>> namespace and with all the derivative addresses (multicast, broadcast,
>> zero net, loopback, ...).
>>
>> The IP addresses will rely on aliased IP address. The source address
>> must be filled with the IP address belonging the l3 namespace when not
>> set. This is a trivial operation, because we know which IP addresses are
>> assigned to the l3 namespace.
> 
> Can you provide a little more info?

I think I already answered this question in the previous email. I am 
afraid this paragraph is not very clear ... ;)

> 
>> When the route are resolved, the l3 namespace switch the its parent,
>> that is to say the l2 namespace, and the virtualization follows its
>> normal path.
>>
>> How to handle incoming traffic ?
>> --------------------------------
>>
>> Because we can have several sockets listening on the same
>> INADDR_ANY:port, we must find the network namespace associated with the
>> destination IP address.
>> For unicast, this is a trivial operation, because that can be checked
>> with the assigned IP address again. For broadcast and multicast, some
>> extra work should be done in order to store the namespaces which are
>> listening on a broadcast address. As soon as the namespace is found, we
>> switch to it. This can be done with netfilters.
> 
> The problem is with multicasts.  Multicast groups are joined on the interface
> bases.  Every socket that bound *:multicast_port will receive multicast
> traffic once a single app joined the group.  Since l3 namespaces don't have
> share the conceptual interface, theoretically, all l3 namespaces should receive
> multicast traffic.

Right. You sunk my battleship  :)
Need to be thought...

> 
>> Routes and co.
>> --------------
>>
>>   - Routes: they are not isolated, each l3 namespace can see all the
>> routes from the other namespaces. That allows the routing engine to see
>> all the routes and choose the loopback when two network namespaces in
>> the same host try to communicate.
>>
>>   - Cache: the routing cache must be isolated, otherwise the socket
>> isolation will not work. The l3 namespace code does not impact the l2
>> namespace code and route cache isolation is a common part if the l3
>> namespace switching is done in the right place.
>>
>>
>> Dmitry has posted the l2 namespace relying on the net namespace empty
>> framework, I will post the l3 namespace relying on the l2 namespace
>> today or tomorrow.
>>
> 
> Looking forward to it.

Fixing a kref problem...

Thanks for all your comments.

   -- Daniel
_______________________________________________
Containers mailing list
Containers@lists.osdl.org
https://lists.osdl.org/mailman/listinfo/containers
[RFC] L3 network isolation : broadcast [message #17038 is a reply to message #16831] Wed, 13 December 2006 20:43 Go to previous message
Daniel Lezcano is currently offline  Daniel Lezcano
Messages: 417
Registered: June 2006
Senior Member
Hi all,

I am trying to find a solution to handle the broadcast traffic on the l3 
namespace.

The broadcast issue comes from the l2 isolation:

in udp.c

static inline struct sock *udp_v4_mcast_next(struct sock *sk,
					__be16 loc_port,
					__be32 loc_addr,
					__be16 rmt_port,
					__be32 rmt_addr,
					int dif)
{
	struct hlist_node *node;
	struct sock *s = sk;
	struct net_namespace *ns = current_net_ns;
	unsigned short hnum = ntohs(loc_port);

	sk_for_each_from(s, node) {
		struct inet_sock *inet = inet_sk(s);

		if (inet->num != hnum					||
		    (inet->daddr && inet->daddr != rmt_addr)		||
		    (inet->dport != rmt_port && inet->dport)		||
		    (inet->rcv_saddr && inet->rcv_saddr != loc_addr)	||
		    ipv6_only_sock(s)					||
		    !net_ns_match(sk->sk_net_ns, ns)			||
		    (s->sk_bound_dev_if && s->sk_bound_dev_if != dif))
			continue;
		if (!ip_mc_sf_allow(s, loc_addr, rmt_addr, dif))
			continue;
		goto found;
   	}
	s = NULL;
found:
   	return s;
}

This is absolutely correct for l2 namespaces because they share the 
socket hash table. But that is not correct for l3 namespaces because we 
want to deliver the packet to each l3 namespaces which have binded to 
the broadcast address, so we should avoid checking net_ns_match if we 
are in a layer 3 namespace. Doing that we will break the l2 isolation 
because an another l2 namespace could have binded to the same broadcast 
address.

The solution I see here is:

if namespace is l3 then;
	net_ns match any net_ns registered as listening on this address
else
	net_ns_match
fi

The registered network namespace is a list shared between brothers l3 
namespaces. This will add more overhead for sure. Does anyone have 
comments on that or perhaps a better solution ?
_______________________________________________
Containers mailing list
Containers@lists.osdl.org
https://lists.osdl.org/mailman/listinfo/containers
Previous Topic: [PATCH 1/1] Revert "[PATCH] identifier to nsproxy"
Next Topic: [PATCH 0/12] tty layer and misc struct pid conversions
Goto Forum:
  


Current Time: Fri Jul 11 17:42:17 GMT 2025

Total time taken to generate the page: 0.04545 seconds