OpenVZ Forum: Devel » [RFC PATCH 0/31] An introduction and A path for merging network namespace work

Home » Mailing lists » Devel » [RFC PATCH 0/31] An introduction and A path for merging network namespace work

Show: Today's Messages :: Show Polls :: Message Navigator
E-mail to friend

[RFC PATCH 0/31] An introduction and A path for merging network namespace work [message #17338]

Thu, 25 January 2007 18:55

ebiederm
Messages: 1354
Registered: February 2006

Senior Member

The idea of a network namespace is fundamentally quite simple.  We create
a mechanism that from the users perspective allows creation of separate
instances of the network stack.  When combined with mechanism like chroot
this results in a much more complete isolation.  When seen in the context
of application migration this allows for taking your IP address and other
global identifiers with you.

What does this mean in the context of the networking stack?  The basic
idea is to tag processes with a network namespace that is used when
they create new sockets  or otherwise initiate a new fresh communication
with the networking stack.  The idea is to tag all sockets with a
network namespace they will always be in and all operations on them
will be relative to.  The idea is to tag all network devices with
a network namespace they are a member of, but may be changed during
the lifetime of a device.  

Mostly a network namespace at it's most basic level is about names.
It is about creating a view of the networking stack where you can
name the network devices that are members anything you want.  Likewise
for iptables rules and all of the rest of the state.  It is a lot
like creating a new directory in a filesystem.  The underlying data
structures don't really change just the users view of those data
structures, and we continue to have a single network stack.




My goal today is that even if we can't agree on a specific set of
patches that we come to an agreement on roughly what those patches
should accomplish, and what process we should go through to get
them merged.




For implementing a network namespace the core problem is that there is
a lot of networking code, and it is continually evolving.   This means
that the task of implementing a network namespace is not a small one,
a lot of code must be read, touched and updated, while hoping 
someone doesn't change something important before you get your changes
in.  To do this sanely means we need an incremental path to our goal,
that allows small pieces to be reviewed and merged as they are ready.


The path I am recommending today is to first lay down some basic
infrastructure.  Then one layer at a time modify the existing code
to handle multiple simultaneous network namespaces but to modify
each component of that layer to refuse to operate in the context
of anything but the initial network namespace, thus preventing
code that has not yet been updated with situations it does not
know how to deal with.

Eventually this will get down to the real meat of the problem and
practical things like ipv4 sockets will work.

This should allow for a network stack that compiles, builds and works
at each step of the way.  Not too far into the process support
for multiple network namespaces that works should be available with
the limitation that except for the initial network namespace all of
the rest will look like a kernel with most parts of the networking
stack compiled out, but within those parts that are present it
should be fully useable.





To make my thinking clear I have provided a initial patchset, that
makes quite a bit of progress especially in laying the ground work.
My goal is to have the question does this basic path make sense?

To that end I have omitted posting some of the prerequisite cleanup
and infrastructure patches (like my sysctl work), that are just noise
in this context, and I have failed to rebase my patchset against Dave
Miller's latest networking tree.  Those are important details but
they are not important to this conversation.


If my basic path and the basic patches look like they are heading
in the right direction we can start moving towards what needs to
happen to ensure a review of the patches, and what we need to do
to start merging them.  If the basic path does not appear reasonable
well that would be good to know as well.





There are essentially two different approaches to modify networking
code to handle multiple network namesspaces.  Either all of the global
variables can be replicated once for each network namespace and we
build up parallel namespace specific data structures.  Or the data
elements in the data structure are tagged, with what namespace they
belong to and we filter them.  It depends on the context which
is most appropriate and easier.  As a general rule large hash tables
call for filtering and a small global variable set calls for simply
having multiple instances of the data structure.

The biggest intrusion I expect to see in the logic of the networking
stack is initialization and tear down.  As we need to initialize
and clean up all of those per network namespace variables when
we create and destroy and network namespace.



A git tree with all of my patches against 2.6.20-rc5 is available at:
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/linux-2.6-netns.git

In addition to what I have posted here and all of it's prerequisites
the tree includes further patches that get the basics of ipv4 and
iptables working.  So people who are interested actually have
something more or less useful to play with.


At a big practical level what I don't yet see is how exactly the
inifiniband/rdma network subsystem fits into network namespaces yet.
Not at the ipoib layer but at the native layer.  I think I want the
ability to say each pkey of each IB device can potentially be in
a different namespace or possibly each different queue pair.  Suggestions
are welcome.   I don't quite have my head wrapped around that the user
space API there yet.

I suppose on the infiniband/rdma side I should dig up all interactions
with user space and simply fail if that user is not in the initial
network namespace as a start.  At the very least this is necessary
given how many calls the connection manager makes into the IP stack.

Eric
_______________________________________________
Containers mailing list
Containers@lists.osdl.org
https://lists.osdl.org/mailman/listinfo/containers

Report message to a moderator

[PATCH RFC 1/31] net: Add net_namespace_type.h to allow for per network namespace variables. [message #17339 is a reply to message #17338]

Thu, 25 January 2007 19:00

ebiederm
Messages: 1354
Registered: February 2006

Senior Member

The problem:
   To properly implement a ``level 2'' network namespace we need to
move many of the networking stack global variables into the network
namespace.  We want to keep it explicit that the code is accessing a
variable in a network namespace.  We want to be able to completely
compile out the network namespace support so we can do comparitive
performance testing, and so to not penalize users who don't need
network namespace support. Because the network stack is a moving
target we want something simple that  allows for the bulk of the
changes to be merged before we enable network namespace support.

My biggest challenge when looking into this was to find an approach
that would allow the code to compile out, in a way that does not yield
any performance overhead and does not make the code ugly.  While
playing with the different possibilities I discovered that gcc will
not pass 0 byte structures that are arguments to functions and instead
will simply optmize them away.  This appears to be true on i386 all of
the way back to gcc-2.95 and I verified that it also works with gcc
4.1 on x86_64.  Since this is part of the ABI I never expect it to
change.  Hopefully gcc uses this nice optimization on all
architectures, I suspect so as C++ allows passing function arguments
of type void in certain circumstances.

Using this observation I was able to come up with an network namespace
implementation network namespace code that allows the changes to
completely compile out when we don't build the kernel with network
namespace support.

This patch implements my dummy network namespace support that should
completely compiles out.  Further patches will add the real version.
Starting with the dummy gives a quick hint of where I am going and
allows for dependencies to be overcome.

When doing my proof of concept implementation one of the other
problems I had was that as the network stack comes in so many modular
pieces figuring out how to get their global variables into the network
namespace structure was a challenge.  The basic technique used by our
per cpu variables for having the linker build and dynamically change
structures for us appears applicable here and a lot less nuisance then
what I did before so I am implementing a tailored version of that
technique as well, and again this makes it very simple to compile the
code out.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 include/linux/net_namespace_type.h |   52 ++++++++++++++++++++++++++++++++++++
 1 files changed, 52 insertions(+), 0 deletions(-)

diff --git a/include/linux/net_namespace_type.h b/include/linux/net_namespace_type.h
new file mode 100644
index 0000000..8173f59
--- /dev/null
+++ b/include/linux/net_namespace_type.h
@@ -0,0 +1,52 @@
+/* 
+ * Definition of the network namespace reference type
+ * And operations upon it.
+ */
+#ifndef __LINUX_NET_NAMESPACE_TYPE_H
+#define __LINUX_NET_NAMESPACE_TYPE_H
+
+#define __pernetname(name) per_net__##name
+
+typedef struct {} net_t;
+
+#define __data_pernet 
+
+/* Look up a per network namespace variable */
+static inline unsigned long __per_net_offset(net_t net) { return 0; }
+
+/* Like per_net but returns a pseudo variable address that must be moved
+ * __per_net_offset() bytes before it will point to a real variable.
+ * Useful for static initializers.
+ */
+#define __per_net_base(name)   __pernetname(name)
+
+/* Get the network namespace reference from a per_net variable address */
+#define net_of(ptr, name) ({ net_t net; ptr; net; })
+
+/* Look up a per network namespace variable */
+#define per_net(name, net) \
+	(*(__per_net_offset(net), &__per_net_base(name)))
+ 
+/* Are the two network namespaces the same */
+static inline int net_eq(net_t a, net_t b) { return 1; }
+/* Get an unsigned value appropriate for hashing the network namespace */
+static inline unsigned int net_hval(net_t net) { return 0; }
+
+/* Convert to and from to and from void pointers */
+static inline void *net_to_voidp(net_t net) { return NULL; }
+static inline net_t net_from_voidp(void *ptr) { net_t net; return net; }
+
+static inline int null_net(net_t net) { return 0; }
+
+#define DEFINE_PER_NET(type, name) 			\
+	__data_pernet __typeof__(type) __pernetname(name)
+
+#define DECLARE_PER_NET(type, name) \
+	extern __typeof__(type) __pernetname(name)
+
+#define EXPORT_PER_NET_SYMBOL(var)	\
+	EXPORT_SYMBOL(__pernetname(var))
+#define EXPORT_PER_NET_SYMBOL_GPL(var)	\
+	EXPORT_SYMBOL_GPL(__pernetname(var))
+
+#endif /* __LINUX_NET_NAMESPACE_TYPE_H */
-- 
1.4.4.1.g278f

_______________________________________________
Containers mailing list
Containers@lists.osdl.org
https://lists.osdl.org/mailman/listinfo/containers

Report message to a moderator

[PATCH RFC 2/31] net: Implement a place holder network namespace [message #17340 is a reply to message #17338]

Thu, 25 January 2007 19:00

ebiederm
Messages: 1354
Registered: February 2006

Senior Member

From: Eric W. Biederman <ebiederm@xmission.com> - unquoted

Many of the changes to the network stack will simply be adding a
network namespace parameter to function calls or moving variables
from globals to being per network namespace.  When those variables
have initializers that cannot statically compute the proper value,
a function that runs at the creation and destruction of network
namespaces will need to be registered, and the logic will need to
be changed to accomidate that.

Adding unconditional support for these functions ensures that even when
everything else is compiled out the modified network stack logic will
continue to run correctly.

This patch adds struct pernet_operations that has an init (constructor)
and an exit (destructor) method.  When registered the init method
is called for every existing namespace, and when unregistered the
exit method is called for every existing namespace.  When a new
network namespace is created all of the init methods are called
in the order in which they were registered, and when a network namespace
is destroyed the exit methods are called in the reverse order in
which they were registered.

There are two distinct types of pernet_operations recognized: subsys and
device.  At creation all subsys init functions are called before device
init functions, and at destruction all device exit functions are called
before subsys exit function.  For other ordering the preservation
of the order of registration combined with the various kinds of
kernel initcalls should be sufficient.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 include/net/net_namespace.h |   62 ++++++++++++++++++
 net/core/Makefile           |    2 +-
 net/core/net_namespace.c    |  149 +++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 212 insertions(+), 1 deletions(-)

diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
new file mode 100644
index 0000000..06a9ba1
--- /dev/null
+++ b/include/net/net_namespace.h
@@ -0,0 +1,62 @@
+/* 
+ * Operations on the network namespace
+ */
+#ifndef __NET_NET_NAMESPACE_H
+#define __NET_NET_NAMESPACE_H
+
+#include <asm/atomic.h>
+#include <linux/workqueue.h>
+#include <linux/nsproxy.h>
+#include <linux/net_namespace_type.h>
+
+/* How many bytes in each network namespace should we allocate
+ * for use by modules when they are loaded.
+ */
+#ifdef CONFIG_MODULES
+# define PER_NET_MODULE_RESERVE 2048
+#else
+# define PER_NET_MODULE_RESERVE 0
+#endif
+
+struct net_namespace_head {
+	atomic_t count;		/* To decided when the network namespace
+				 * should go 
+				 */
+	atomic_t use_count;	/* For references we destroy on demand */
+	struct list_head list;
+	struct work_struct work;
+};
+
+static inline net_t get_net(net_t net) { return net; }
+static inline void put_net(net_t net) {}
+static inline net_t hold_net(net_t net) { return net; }
+static inline void release_net(net_t net) {} 
+
+#define __per_net_start	((char *)0)
+#define __per_net_end	((char *)0)
+
+static inline int copy_net(int flags, struct task_struct *tsk) { return 0; }
+
+/* Don't let the list of network namespaces change */
+static inline void net_lock(void) {}
+static inline void net_unlock(void) {}
+
+#define for_each_net(VAR) if (1)
+
+extern net_t net_template;
+
+#define NET_CREATE	0x0001	/* A network namespace has been created */
+#define NET_DESTROY	0x0002	/* A network namespace is being destroyed */
+
+struct pernet_operations {
+	struct list_head list;
+	int (*init)(net_t net);
+	void (*exit)(net_t net);
+};
+
+extern int register_pernet_subsys(struct pernet_operations *);
+extern void unregister_pernet_subsys(struct pernet_operations *);
+extern int register_pernet_device(struct pernet_operations *);
+extern void unregister_pernet_device(struct pernet_operations *);
+
+#endif /* __NET_NET_NAMESPACE_H */
diff --git a/net/core/Makefile b/net/core/Makefile
index 73272d5..554dbdc 100644
--- a/net/core/Makefile
+++ b/net/core/Makefile
@@ -3,7 +3,7 @@
 #
 
 obj-y := sock.o request_sock.o skbuff.o iovec.o datagram.o stream.o scm.o \
-	 gen_stats.o gen_estimator.o
+	 gen_stats.o gen_estimator.o net_namespace.o
 
 obj-$(CONFIG_SYSCTL) += sysctl_net_core.o
 
diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
new file mode 100644
index 0000000..4ae266d
--- /dev/null
+++ b/net/core/net_namespace.c
@@ -0,0 +1,149 @@
+#include <linux/rtnetlink.h>
+#include <net/net_namespace.h>
+
+/*
+ *	Our network namespace constructor/destructor lists
+ */
+
+static LIST_HEAD(pernet_list);
+static struct list_head *first_device = &pernet_list;
+static DEFINE_MUTEX(net_mutex);
+net_t net_template;
+
+static int register_pernet_operations(struct list_head *list,
+				      struct pernet_operations *ops)
+{
+	net_t net, undo_net;
+	int error;
+
+	error = 0;
+	list_add_tail(&ops->list, list);
+	for_each_net(net) {
+		if (ops->init) {
+			error = ops->init(net);
+			if (error)
+				goto out_undo;
+		}
+	}
+out:
+	return error;
+
+out_undo:
+	/* If I have an error cleanup all namespaces I initialized */
+	list_del(&ops->list);
+	for_each_net(undo_net) {
+		if (net_eq(undo_net, net))
+			goto undone;
+		if (ops->exit)
+			ops->exit(undo_net);
+	}
+undone:
+	goto out;
+}
+
+static void unregister_pernet_operations(struct pernet_operations *ops)
+{
+	net_t net;
+
+	list_del(&ops->list);
+	for_each_net(net) 
+		if (ops->exit)
+			ops->exit(net);
+}
+
+/**
+ *      register_pernet_subsys - register a network namespace subsystem
+ *	@ops:  pernet operations structure for the subsystem
+ *
+ *	Register a subsystem which has init and exit functions
+ *	that are called when network namespaces are created and
+ *	destroyed respectively.
+ *
+ *	When registered all network namespace init functions are
+ *	called for every existing network namespace.  Allowing kernel
+ *	modules to have a race free view of the set of network namespaces.
+ *
+ *	When a new network namespace is created all of the init
+ *	methods are called in the order in which they were registered.
+ *
+ *	When a network namespace is destroyed all of the exit methods
+ *	are called in the reverse of the order with which they were
+ *	registered.
+ */
+int register_pernet_subsys(struct pernet_operations *ops)
+{
+	int error;
+	mutex_lock(&net_mutex);
+	error =  register_pernet_operations(first_device, ops);
+	mutex_unlock(&net_mutex);
+	return error;
+}
+EXPORT_SYMBOL_GPL(register_pernet_subsys);
+
+/**
+ *      unregister_pernet_subsys - unregister a network namespace subsystem
+ *	@ops: pernet operations structure to manipulate
+ *
+ *	Remove the pernet operations structure from the list to be
+ *	used when network namespaces are created or destoryed.  In
+ *	addition run the exit method for all existing network
+ *	namespaces.
+ */
+void unregister_pernet_subsys(struct pernet_operations *module)
+{
+	mutex_lock(&net_mutex);
+	unregister_pernet_operations(module);
+	mutex_unlock(&net_mutex);
+}
+EXPORT_SYMBOL_GPL(unregister_pernet_subsys);
+
+/**
+ *      register_pernet_device - register a network namespace device
+ *	@ops:  pernet operations structure for the subsystem
+ *
+ *	Register a device which has init and exit functions
+ *	that are called when network namespaces are created and
+ *	destroyed respectively.
+ *
+ *	When registered all network namespace init functions are
+ *	called for every existing network namespace.  Allowing kernel
+ *	modules to have a race free view of the set of network namespaces.
+ *
+ *	When a new network namespace is created all of the init
+ *	methods are called in the order in which they were registered.
+ *
+ *	When a network namespace is destroyed all of the exit methods
+ *	are called in the reverse of the order with which they were
+ *	registered.
+ */
+int register_pernet_device(struct pernet_operations *ops)
+{
+	int error;
+	mutex_lock(&net_mutex);
+	error = register_pernet_operations(&pernet_list, ops);
+	if (!error && (first_device == &pernet_list))
+		first_device = &ops->list;
+	mutex_unlock(&net_mutex);
+	return error;
+}
+EXPORT_SYMBOL_GPL(register_pernet_device);
+
+/**
+ *      unregister_pernet_device - unregister a network namespace netdevice
+ *	@ops: pernet operations structure to manipulate
+ *
+ *	Remove the pernet operations structure from the list to be
+ *	used when network namespaces are created or destoryed.  In
+ *	addition run the exit method for all existing network
+ *	namespaces.
+ */
+void unregister_pernet_device(struct pernet_operations *ops)
+{
+	mutex_lock(&net_mutex);
+	if (&ops->list == first_device)
+		first_device = first_device->next;
+	unregister_pernet_operations(ops);
+	mutex_unlock(&net_mutex);
+}
+EXPORT_SYMBOL_GPL(unregister_pernet_device);
+
-- 
1.4.4.1.g278f

_______________________________________________
Containers mailing list
Containers@lists.osdl.org
https://lists.osdl.org/mailman/listinfo/containers

Report message to a moderator

[PATCH RFC 3/31] net: Add a network namespace parameter to tasks [message #17341 is a reply to message #17338]

Thu, 25 January 2007 19:00

ebiederm
Messages: 1354
Registered: February 2006

Senior Member

From: Eric W. Biederman <ebiederm@xmission.com> - unquoted

This is the network namespace from which all which all sockets
and anything else under user control ultimately get their network
namespace parameters.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 include/linux/nsproxy.h |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/include/linux/nsproxy.h b/include/linux/nsproxy.h
index 0b9f0dc..cc76610 100644
--- a/include/linux/nsproxy.h
+++ b/include/linux/nsproxy.h
@@ -3,6 +3,7 @@
 
 #include <linux/spinlock.h>
 #include <linux/sched.h>
+#include <linux/net_namespace_type.h>
 
 struct mnt_namespace;
 struct uts_namespace;
@@ -28,6 +29,7 @@ struct nsproxy {
 	struct ipc_namespace *ipc_ns;
 	struct mnt_namespace *mnt_ns;
 	struct pid_namespace *pid_ns;
+	net_t		      net_ns;
 };
 extern struct nsproxy init_nsproxy;
 
-- 
1.4.4.1.g278f

_______________________________________________
Containers mailing list
Containers@lists.osdl.org
https://lists.osdl.org/mailman/listinfo/containers

Report message to a moderator

[PATCH RFC 4/31] net: Add a network namespace tag to struct net_device [message #17342 is a reply to message #17338]

Thu, 25 January 2007 19:00

ebiederm
Messages: 1354
Registered: February 2006

Senior Member

From: Eric W. Biederman <ebiederm@xmission.com> - unquoted

Please note that network devices do not increase the count
count on the network namespace.  The are inside the network
namespace and so the network namespace tag is in the nature
of a back pointer and so getting and putting the network namespace
is unnecessary.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 include/linux/netdevice.h |    4 ++++
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 4cb8b39..6a1579d 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -38,6 +38,7 @@
 #include <linux/device.h>
 #include <linux/percpu.h>
 #include <linux/dmaengine.h>
+#include <linux/net_namespace_type.h>
 
 struct vlan_group;
 struct ethtool_ops;
@@ -525,6 +526,9 @@ struct net_device
 	void                    (*poll_controller)(struct net_device *dev);
 #endif
 
+	/* Network namespace this network device is inside */
+	net_t			nd_net;
+
 	/* bridge stuff */
 	struct net_bridge_port	*br_port;
 
-- 
1.4.4.1.g278f

_______________________________________________
Containers mailing list
Containers@lists.osdl.org
https://lists.osdl.org/mailman/listinfo/containers

Report message to a moderator

[PATCH RFC 5/31] net: Add a network namespace parameter to struct sock [message #17343 is a reply to message #17338]

Thu, 25 January 2007 19:00

ebiederm
Messages: 1354
Registered: February 2006

Senior Member

From: Eric W. Biederman <ebiederm@xmission.com> - unquoted

Sockets need to get a reference to their network namespace,
or possibly a simple hold if someone registers on the network
namespace notifier and will free the sockets when the namespace
is going to be destroyed.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 include/net/inet_timewait_sock.h |    1 +
 include/net/sock.h               |    3 +++
 2 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/include/net/inet_timewait_sock.h b/include/net/inet_timewait_sock.h
index f7be1ac..162c2b9 100644
--- a/include/net/inet_timewait_sock.h
+++ b/include/net/inet_timewait_sock.h
@@ -115,6 +115,7 @@ struct inet_timewait_sock {
 #define tw_refcnt		__tw_common.skc_refcnt
 #define tw_hash			__tw_common.skc_hash
 #define tw_prot			__tw_common.skc_prot
+#define tw_net			__tw_common.skc_net
 	volatile unsigned char	tw_substate;
 	/* 3 bits hole, try to pack */
 	unsigned char		tw_rcv_wscale;
diff --git a/include/net/sock.h b/include/net/sock.h
index 03684e7..5bf6bb5 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -105,6 +105,7 @@ struct proto;
  *	@skc_refcnt: reference count
  *	@skc_hash: hash value used with various protocol lookup tables
  *	@skc_prot: protocol handlers inside a network family
+ *	@skc_net: reference to the network namespace of this socket
  *
  *	This is the minimal network layer representation of sockets, the header
  *	for struct sock and struct inet_timewait_sock.
@@ -119,6 +120,7 @@ struct sock_common {
 	atomic_t		skc_refcnt;
 	unsigned int		skc_hash;
 	struct proto		*skc_prot;
+	net_t	 		skc_net;
 };
 
 /**
@@ -195,6 +197,7 @@ struct sock {
 #define sk_refcnt		__sk_common.skc_refcnt
 #define sk_hash			__sk_common.skc_hash
 #define sk_prot			__sk_common.skc_prot
+#define sk_net			__sk_common.skc_net
 	unsigned char		sk_shutdown : 2,
 				sk_no_check : 2,
 				sk_userlocks : 4;
-- 
1.4.4.1.g278f

_______________________________________________
Containers mailing list
Containers@lists.osdl.org
https://lists.osdl.org/mailman/listinfo/containers

Report message to a moderator

[PATCH RFC 6/31] net: Add a helper to get a reference to the initial network namespace. [message #17344 is a reply to message #17338]

Thu, 25 January 2007 19:00

ebiederm
Messages: 1354
Registered: February 2006

Senior Member

From: Eric W. Biederman <ebiederm@xmission.com> - unquoted

The initial network namespace is special and we need to use it for various
things.  Probably the biggest initial use will be to ensure code that
can't cope with multiple namespaces only sees the initial network namespace.

For that reason and because getting at the initial network namespace is just
a little clumsy add a helper function.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 include/net/net_namespace.h |    6 ++++++
 1 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
index 06a9ba1..9208e2e 100644
--- a/include/net/net_namespace.h
+++ b/include/net/net_namespace.h
@@ -27,6 +27,12 @@ struct net_namespace_head {
 	struct work_struct work;
 };
 
+/* Get the initial network namespace */
+static inline net_t init_net(void)
+{
+	return init_nsproxy.net_ns;
+}
+
 static inline net_t get_net(net_t net) { return net; }
 static inline void put_net(net_t net) {}
 static inline net_t hold_net(net_t net) { return net; }
-- 
1.4.4.1.g278f

_______________________________________________
Containers mailing list
Containers@lists.osdl.org
https://lists.osdl.org/mailman/listinfo/containers

Report message to a moderator

[PATCH RFC 7/31] net: Make /proc/net per network namespace [message #17345 is a reply to message #17338]

Thu, 25 January 2007 19:00

ebiederm
Messages: 1354
Registered: February 2006

Senior Member

From: Eric W. Biederman <ebiederm@xmission.com> - unquoted

This patch makes /proc/net per network namespace.  It modifies the global
variables proc_net and proc_net_stat to be per network namespace.
The proc_net file helpers are modified to take a network namespace argument,
and all of their callers are fixed to pass init_net() for that argument.
This ensures that all of the /proc/net files are only visible and
usable in the initial network namespace until the code behind them
has bee updated to be handle multiple network namespaces.

Making /proc/net per namespace is necessary as at least some files
in /proc/net depend upon the set of network devices which is per
network namespace, and even more files in /proc/net have contents
that are relevant to a single network namespace.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 drivers/isdn/divert/divert_procfs.c                |    7 +-
 drivers/isdn/hardware/eicon/diva_didd.c            |    5 +-
 drivers/isdn/hysdn/hysdn_procconf.c                |    4 +-
 drivers/net/bonding/bond_main.c                    |    7 +-
 drivers/net/hamradio/bpqether.c                    |    5 +-
 drivers/net/hamradio/scc.c                         |    4 +-
 drivers/net/hamradio/yam.c                         |    5 +-
 drivers/net/ibmveth.c                              |    6 +-
 drivers/net/pppoe.c                                |    5 +-
 drivers/net/tc35815.c                              |    1 -
 drivers/net/tokenring/lanstreamer.c                |    4 +-
 drivers/net/tokenring/olympic.c                    |    9 +-
 drivers/net/wireless/hostap/hostap_main.c          |    7 +-
 drivers/net/wireless/strip.c                       |    5 +-
 fs/proc/Makefile                                   |    1 +
 fs/proc/internal.h                                 |    5 +
 fs/proc/proc_net.c                                 |  126 ++++++++++++++++++++
 fs/proc/root.c                                     |    8 +-
 include/linux/proc_fs.h                            |   28 +++--
 include/net/net_namespace.h                        |   11 ++
 net/802/tr.c                                       |    3 +-
 net/8021q/vlanproc.c                               |    5 +-
 net/appletalk/atalk_proc.c                         |    7 +-
 net/atm/proc.c                                     |    5 +-
 net/ax25/af_ax25.c                                 |   13 +-
 net/core/dev.c                                     |    9 +-
 net/core/dev_mcast.c                               |    3 +-
 net/core/neighbour.c                               |    3 +-
 net/core/pktgen.c                                  |    9 +-
 net/core/sock.c                                    |    3 +-
 net/core/wireless.c                                |    3 +-
 net/dccp/probe.c                                   |    7 +-
 net/decnet/af_decnet.c                             |    5 +-
 net/decnet/dn_dev.c                                |    5 +-
 net/decnet/dn_neigh.c                              |    5 +-
 net/decnet/dn_route.c                              |    5 +-
 net/ieee80211/ieee80211_module.c                   |    6 +-
 net/ipv4/arp.c                                     |    3 +-
 net/ipv4/fib_hash.c                                |    5 +-
 net/ipv4/fib_trie.c                                |   17 ++--
 net/ipv4/igmp.c                                    |    5 +-
 net/ipv4/ipconfig.c                                |    3 +-
 net/ipv4/ipmr.c                                    |    5 +-
 net/ipv4/ipvs/ip_vs_app.c                          |    5 +-
 net/ipv4/ipvs/ip_vs_conn.c                         |    5 +-
 net/ipv4/ipvs/ip_vs_ctl.c                          |    9 +-
 net/ipv4/ipvs/ip_vs_lblcr.c                        |    4 +-
 net/ipv4/netfilter/ip_conntrack_standalone.c       |   16 ++--
 net/ipv4/netfilter/ip_queue.c                      |    7 +-
 net/ipv4/netfilter/ipt_CLUSTERIP.c                 |    3 +-
 net/ipv4/netfilter/ipt_recent.c                    |    5 +-
 .../netfilter/nf_conntrack_l3proto_ipv4_compat.c   |   17 ++--
 net/ipv4/proc.c                                    |   11 +-
 net/ipv4/raw.c                                     |    5 +-
 net/ipv4/route.c                                   |    7 +-
 net/ipv4/tcp_ipv4.c                                |    5 +-
 net/ipv4/tcp_probe.c                               |    6 +-
 net/ipv4/udp.c                                     |    5 +-
 net/ipv6/addrconf.c                                |    7 +-
 net/ipv6/anycast.c                                 |    5 +-
 net/ipv6/ip6_flowlabel.c                           |    5 +-
 net/ipv6/mcast.c                                   |    9 +-
 net/ipv6/netfilter/ip6_queue.c                     |    7 +-
 net/ipv6/proc.c                                    |   17 ++--
 net/ipv6/raw.c                                     |    5 +-
 net/ipv6/route.c                                   |    9 +-
 net/ipx/ipx_proc.c                                 |    7 +-
 net/irda/irproc.c                                  |    5 +-
 net/key/af_key.c                                   |    5 +-
 net/llc/llc_proc.c                                 |    7 +-
 net/netfilter/core.c                               |    3 +-
 net/netfilter/nf_conntrack_standalone.c            |   19 ++--
 net/netfilter/x_tables.c                           |   17 ++--
 net/netfilter/xt_hashlimit.c                       |   11 +-
 net/netlink/af_netlink.c                           |    3 +-
 net/netrom/af_netrom.c                             |   13 +-
 net/packet/af_packet.c                             |    5 +-
 net/rose/af_rose.c                                 |   17 ++--
 net/rxrpc/proc.c                                   |    7 +-
 net/sched/sch_api.c                                |    3 +-
 net/sctp/protocol.c                                |    5 +-
 net/sunrpc/stats.c                                 |    5 +-
 net/unix/af_unix.c                                 |    5 +-
 net/wanrouter/wanproc.c                            |    7 +-
 net/x25/x25_proc.c                                 |    7 +-
 85 files changed, 462 insertions(+), 250 deletions(-)

diff --git a/drivers/isdn/divert/divert_procfs.c b/drivers/isdn/divert/divert_procfs.c
index 06967da..6517dd5 100644
--- a/drivers/isdn/divert/divert_procfs.c
+++ b/drivers/isdn/divert/divert_procfs.c
@@ -18,6 +18,7 @@
 #include <linux/fs.h>
 #endif
 #include <linux/isdnif.h>
+#include <net/net_namespace.h>
 #include "isdn_divert.h"
 
 
@@ -285,12 +286,12 @@ divert_dev_init(void)
 	init_waitqueue_head(&rd_queue);
 
 #ifdef CONFIG_PROC_FS
-	isdn_proc_entry = proc_mkdir("net/isdn", NULL);
+	isdn_proc_entry = proc_mkdir("isdn", per_net(proc_net, init_net()));
 	if (!isdn_proc_entry)
 		return (-1);
 	isdn_divert_entry = create_proc_entry("divert", S_IFREG | S_IRUGO, isdn_proc_entry);
 	if (!isdn_divert_entry) {
-		remove_proc_entry("net/isdn", NULL);
+		remove_proc_entry("isdn", per_net(proc_net, init_net()));
 		return (-1);
 	}
 	isdn_divert_entry->proc_fops = &isdn_fops; 
@@ -310,7 +311,7 @@ divert_dev_deinit(void)
 
 #ifdef CONFIG_PROC_FS
 	remove_proc_entry("divert", isdn_proc_entry);
-	remove_proc_entry("net/isdn", NULL);
+	remove_proc_entry("isdn", per_net(proc_net, init_net()));
 #endif	/* CONFIG_PROC_FS */
 
 	return (0);
diff --git a/drivers/isdn/hardware/eicon/diva_didd.c b/drivers/isdn/hardware/eicon/diva_didd.c
index 14298b8..1b7c0f9 100644
--- a/drivers/isdn/hardware/eicon/diva_didd.c
+++ b/drivers/isdn/hardware/eicon/diva_didd.c
@@ -15,6 +15,7 @@
 #include <linux/init.h>
 #include <linux/kernel.h>
 #include <linux/proc_fs.h>
+#include <net/net_namespace.h>
 
 #include "platform.h"
 #include "di_defs.h"
@@ -86,7 +87,7 @@ proc_read(char *page, char **start, off_t off, int count, int *eof,
 
 static int DIVA_INIT_FUNCTION create_proc(void)
 {
-	proc_net_eicon = proc_mkdir("net/eicon", NULL);
+	proc_net_eicon = proc_mkdir("eicon", per_net(proc_net, init_net()));
 
 	if (proc_net_eicon) {
 		if ((proc_didd =
@@ -102,7 +103,7 @@ static int DIVA_INIT_FUNCTION create_proc(void)
 static void DIVA_EXIT_FUNCTION remove_proc(void)
 {
 	remove_proc_entry(DRIVERLNAME, proc_net_eicon);
-	remove_proc_entry("net/eicon", NULL);
+	remove_proc_entry("eicon", per_net(proc_net, init_net()));
 }
 
 static int DIVA_INIT_FUNCTION divadidd_init(void)
diff --git a/drivers/isdn/hysdn/hysdn_procconf.c b/drivers/isdn/hysdn/hysdn_procconf.c
index 94a9350..b634e67 100644
--- a/drivers/isdn/hysdn/hysdn_procconf.c
+++ b/drivers/isdn/hysdn/hysdn_procconf.c
@@ -392,7 +392,7 @@ hysdn_procconf_init(void)
 	hysdn_card *card;
 	unsigned char conf_name[20];
 
-	hysdn_proc_entry = proc_mkdir(PROC_SUBDIR_NAME, proc_net);
+	hysdn_proc_entry = proc_mkdir(PROC_SUBDIR_NAME, per_net(proc_net, init_net()));
 	if (!hysdn_proc_entry) {
 		printk(KERN_ERR "HYSDN: unable to create hysdn subdir\n");
 		return (-1);
@@ -437,5 +437,5 @@ hysdn_procconf_release(void)
 		card = card->next;	/* point to next card */
 	}
 
-	remove_proc_entry(PROC_SUBDIR_NAME, proc_net);
+	remove_proc_entry(PROC_SUBDIR_NAME, per_net(proc_net, init_net()));
 }
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 6482aed..9b3bf4e 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -75,6 +75,7 @@
 #include <linux/if_vlan.h>
 #include <linux/if_bonding.h>
 #include <net/route.h>
+#include <net/net_namespace.h>
 #include "bonding.h"
 #include "bond_3ad.h"
 #include "bond_alb.h"
@@ -3169,7 +3170,7 @@ static void bond_create_proc_dir(void)
 {
 	int len = strlen(DRV_NAME);
 
-	for (bond_proc_dir = proc_net->subdir; bond_proc_dir;
+	for (bond_proc_dir = per_net(proc_net, init_net())->subdir; bond_proc_dir;
 	     bond_proc_dir = bond_proc_dir->next) {
 		if ((bond_proc_dir->namelen == len) &&
 		    !memcmp(bond_proc_dir->name, DRV_NAME, len)) {
@@ -3178,7 +3179,7 @@ static void bond_create_proc_dir(void)
 	}
 
 	if (!bond_proc_dir) {
-		bond_proc_dir = proc_mkdir(DRV_NAME, proc_net);
+		bond_proc_dir = proc_mkdir(DRV_NAME, per_net(proc_net, init_net()));
 		if (bond_proc_dir) {
 			bond_proc_dir->owner = THIS_MODULE;
 		} else {
@@ -3213,7 +3214,7 @@ static void bond_destroy_proc_dir(void)
 			bond_proc_dir->owner = NULL;
 		}
 	} else {
-		remove_proc_entry(DRV_NAME, proc_net);
+		remove_proc_entry(DRV_NAME, per_net(proc_net, init_net()));
 		bond_proc_dir = NULL;
 	}
 }
diff --git a/drivers/net/hamradio/bpqether.c b/drivers/net/hamradio/bpqether.c
index 5b788d8..9fc92ad 100644
--- a/drivers/net/hamradio/bpqether.c
+++ b/drivers/net/hamradio/bpqether.c
@@ -83,6 +83,7 @@
 
 #include <net/ip.h>
 #include <net/arp.h>
+#include <net/net_namespace.h>
 
 #include <linux/bpqether.h>
 
@@ -594,7 +595,7 @@ static int bpq_device_event(struct notifier_block *this,unsigned long event, voi
 static int __init bpq_init_driver(void)
 {
 #ifdef CONFIG_PROC_FS
-	if (!proc_net_fops_create("bpqether", S_IRUGO, &bpq_info_fops)) {
+	if (!proc_net_fops_create(init_net(), "bpqether", S_IRUGO, &bpq_info_fops)) {
 		printk(KERN_ERR
 			"bpq: cannot create /proc/net/bpqether entry.\n");
 		return -ENOENT;
@@ -618,7 +619,7 @@ static void __exit bpq_cleanup_driver(void)
 
 	unregister_netdevice_notifier(&bpq_dev_notifier);
 
-	proc_net_remove("bpqether");
+	proc_net_remove(init_net(), "bpqether");
 
 	rtnl_lock();
 	while (!list_empty(&bpq_devices)) {
diff --git a/drivers/net/hamradio/scc.c b/drivers/net/hamradio/scc.c
index 2ce047e..2000597 100644
--- a/drivers/net/hamradio/scc.c
+++ b/drivers/net/hamradio/scc.c
@@ -2114,7 +2114,7 @@ static int __init scc_init_driver (void)
 	}
 	rtnl_unlock();
 
-	proc_net_fops_create("z8530drv", 0, &scc_net_seq_fops);
+	proc_net_fops_create(init_net(), "z8530drv", 0, &scc_net_seq_fops);
 
 	return 0;
 }
@@ -2169,7 +2169,7 @@ static void __exit scc_cleanup_driver(void)
 	if (Vector_Latch)
 		release_region(Vector_Latch, 1);
 
-	proc_net_remove("z8530drv");
+	proc_net_remove(init_net(), "z8530drv");
 }
 
 MODULE_AUTHOR("Joerg Reuter <jreuter@yaina.de>");
diff --git a/drivers/net/hamradio/yam.c b/drivers/net/hamradio/yam.c
index 6d74f08..3e92f3b 100644
--- a/drivers/net/hamradio/yam.c
+++ b/drivers/net/hamradio/yam.c
@@ -60,6 +60,7 @@
 #include <linux/etherdevice.h>
 #include <linux/skbuff.h>
 #include <net/ax25.h>
+#include <net/net_namespace.h>
 
 #include <linux/kernel.h>
 #include <linux/proc_fs.h>
@@ -1147,7 +1148,7 @@ static int __init yam_init_driver(void)
 	yam_timer.expires = jiffies + HZ / 100;
 	add_timer(&yam_timer);
 
-	proc_net_fops_create("yam", S_IRUGO, &yam_info_fops);
+	proc_net_fops_create(init_net(), "yam", S_IRUGO, &yam_info_fops);
 	return 0;
  error:
 	while (--i >= 0) {
@@ -1179,7 +1180,7 @@ static void __exit yam_cleanup_driver(void)
 		kfree(p);
 	}
 
-	proc_net_remove("yam");
+	proc_net_remove(init_net(), "yam");
 }
 
 /* --------------------------------------------------------------------- */
diff --git a/drivers/net/ibmveth.c b/drivers/net/ibmveth.c
index 99343b5..d8b0ba8 100644
--- a/drivers/net/ibmveth.c
+++ b/drivers/net/ibmveth.c
@@ -97,7 +97,7 @@ static inline void ibmveth_rxq_harvest_buffer(struct ibmveth_adapter *adapter);
 static struct kobj_type ktype_veth_pool;
 
 #ifdef CONFIG_PROC_FS
-#define IBMVETH_PROC_DIR "net/ibmveth"
+#define IBMVETH_PROC_DIR "ibmveth"
 static struct proc_dir_entry *ibmveth_proc_dir;
 #endif
 
@@ -1073,7 +1073,7 @@ static int __devexit ibmveth_remove(struct vio_dev *dev)
 #ifdef CONFIG_PROC_FS
 static void ibmveth_proc_register_driver(void)
 {
-	ibmveth_proc_dir = proc_mkdir(IBMVETH_PROC_DIR, NULL);
+	ibmveth_proc_dir = proc_mkdir(IBMVETH_PROC_DIR, per_net(proc_net, init_net()));
 	if (ibmveth_proc_dir) {
 		SET_MODULE_OWNER(ibmveth_proc_dir);
 	}
@@ -1081,7 +1081,7 @@ static void ibmveth_proc_register_driver(void)
 
 static void ibmveth_proc_unregister_driver(void)
 {
-	remove_proc_entry(IBMVETH_PROC_DIR, NULL);
+	remove_proc_entry(IBMVETH_PROC_DIR, per_net(proc_net, init_net()));
 }
 
 static void *ibmveth_seq_start(struct seq_file *seq, loff_t *pos)
diff --git a/drivers/net/pppoe.c b/drivers/net/pppoe.c
index 315d5c3..d34fe16 100644
--- a/drivers/net/pppoe.c
+++ b/drivers/net/pppoe.c
@@ -72,6 +72,7 @@
 #include <linux/proc_fs.h>
 #include <linux/seq_file.h>
 
+#include <net/net_namespace.h>
 #include <net/sock.h>
 
 #include <asm/uaccess.h>
@@ -1055,7 +1056,7 @@ static int __init pppoe_proc_init(void)
 {
 	struct proc_dir_entry *p;
 
-	p = create_proc_entry("net/pppoe", S_IRUGO, NULL);
+	p = create_proc_entry("pppoe", S_IRUGO, per_net(proc_net, init_net()));
 	if (!p)
 		return -ENOMEM;
 
@@ -1126,7 +1127,7 @@ static void __exit pppoe_exit(void)
 	dev_remove_pack(&pppoes_ptype);
 	dev_remove_pack(&pppoed_ptype);
 	unregister_netdevice_notifier(&pppoe_notifier);
-	remove_proc_entry("net/pppoe", NULL);
+	remove_proc_entry("pppoe", per_net(proc_net, init_net()));
 	proto_unregister(&pppoe_sk_proto);
 }
 
diff --git a/drivers/net/tc35815.c b/drivers/net/tc35815.c
index 81ed82f..1f26c29 100644
--- a/drivers/net/tc35815.c
+++ b/drivers/net/tc35815.c
@@ -61,7 +61,6 @@ static const char *version =
  * io regions, irqs and dma channels
  */
 static const char* cardname = "TC35815CF";
-#define TC35815_PROC_ENTRY "net/tc35815"
 
 #define TC35815_MODULE_NAME "TC35815CF"
 #define TX_TIMEOUT (4*HZ)
diff --git a/drivers/net/tokenring/lanstreamer.c b/drivers/net/tokenring/lanstreamer.c
index e999feb..b382ef3 100644
--- a/drivers/net/tokenring/lanstreamer.c
+++ b/drivers/net/tokenring/lanstreamer.c
@@ -250,7 +250,7 @@ static int __devinit streamer_init_one(struct pci_dev *pdev,
 #if STREAMER_NETWORK_MONITOR
 #ifdef CONFIG_PROC_FS
 	if (!dev_streamer)
-		create_proc_read_entry("net/streamer_tr", 0, 0,
+		create_proc_read_entry("streamer_tr", 0, per_net(proc_net, init_net()),
 					streamer_proc_info, NULL); 
 	streamer_priv->next = dev_streamer;
 	dev_streamer = streamer_priv;
@@ -423,7 +423,7 @@ static void __devexit streamer_remove_one(struct pci_dev *pdev)
 			}
 		}
 		if (!dev_streamer)
-			remove_proc_entry("net/streamer_tr", NULL);
+			remove_proc_entry("streamer_tr", per_net(proc_net, init_net()));
 	}
 #endif
 #endif
diff --git a/drivers/net/tokenring/olympic.c b/drivers/net/tokenring/olympic.c
index 8f4ecc1..6b74c3b 100644
--- a/drivers/net/tokenring/olympic.c
+++ b/drivers/net/tokenring/olympic.c
@@ -101,6 +101,7 @@
 #include <linux/bitops.h>
 #include <linux/jiffies.h>
 
+#include <net/net_namespace.h>
 #include <net/checksum.h>
 
 #include <asm/io.h>
@@ -268,9 +269,9 @@ static int __devinit olympic_probe(struct pci_dev *pdev, const struct pci_device
 	printk("Olympic: %s registered as: %s\n",olympic_priv->olympic_card_name,dev->name);
 	if (olympic_priv->olympic_network_monitor) { /* Must go after register_netdev as we need the device name */ 
 		char proc_name[20] ; 
-		strcpy(proc_name,"net/olympic_") ; 
+		strcpy(proc_name,"olympic_") ; 
 		strcat(proc_name,dev->name) ; 
-		create_proc_read_entry(proc_name,0,NULL,olympic_proc_info,(void *)dev) ; 
+		create_proc_read_entry(proc_name,0,per_net(proc_net, init_net()),olympic_proc_info,(void *)dev) ; 
 		printk("Olympic: Network Monitor information: /proc/%s\n",proc_name); 
 	}
 	return  0 ;
@@ -1750,9 +1751,9 @@ static void __devexit olympic_remove_one(struct pci_dev *pdev)
 
 	if (olympic_priv->olympic_network_monitor) { 
 		char proc_name[20] ; 
-		strcpy(proc_name,"net/olympic_") ; 
+		strcpy(proc_name,"olympic_") ; 
 		strcat(proc_name,dev->name) ;
-		remove_proc_entry(proc_name,NULL); 
+		remove_proc_entry(proc_name,per_net(proc_net, init_net())); 
 	}
 	unregister_netdev(dev) ; 
 	iounmap(olympic_priv->olympic_mmio) ; 
diff --git a/drivers/net/wireless/hostap/hostap_main.c b/drivers/net/wireless/hostap/hostap_main.c
index 04c19ce..69b56d6 100644
--- a/drivers/net/wireless/hostap/hostap_main.c
+++ b/drivers/net/wireless/hostap/hostap_main.c
@@ -24,6 +24,7 @@
 #include <linux/rtnetlink.h>
 #include <linux/wireless.h>
 #include <linux/etherdevice.h>
+#include <net/net_namespace.h>
 #include <net/iw_handler.h>
 #include <net/ieee80211.h>
 #include <net/ieee80211_crypt.h>
@@ -1093,8 +1094,8 @@ struct proc_dir_entry *hostap_proc;
 
 static int __init hostap_init(void)
 {
-	if (proc_net != NULL) {
-		hostap_proc = proc_mkdir("hostap", proc_net);
+	if (per_net(proc_net, init_net()) != NULL) {
+		hostap_proc = proc_mkdir("hostap", per_net(proc_net, init_net()));
 		if (!hostap_proc)
 			printk(KERN_WARNING "Failed to mkdir "
 			       "/proc/net/hostap\n");
@@ -1109,7 +1110,7 @@ static void __exit hostap_exit(void)
 {
 	if (hostap_proc != NULL) {
 		hostap_proc = NULL;
-		remove_proc_entry("hostap", proc_net);
+		remove_proc_entry("hostap", per_net(proc_net, init_net()));
 	}
 }
 
diff --git a/drivers/net/wireless/strip.c b/drivers/net/wireless/strip.c
index ce3a8ba..6c27ff2 100644
--- a/drivers/net/wireless/strip.c
+++ b/drivers/net/wireless/strip.c
@@ -107,6 +107,7 @@ static const char StripVersion[] = "1.3A-STUART.CHESHIRE";
 #include <linux/serialP.h>
 #include <linux/rcupdate.h>
 #include <net/arp.h>
+#include <net/net_namespace.h>
 
 #include <linux/ip.h>
 #include <linux/tcp.h>
@@ -2789,7 +2790,7 @@ static int __init strip_init_driver(void)
 	/*
 	 * Register the status file with /proc
 	 */
-	proc_net_fops_create("strip", S_IFREG | S_IRUGO, &strip_seq_fops);
+	proc_net_fops_create(init_net(), "strip", S_IFREG | S_IRUGO, &strip_seq_fops);
 
 	return status;
 }
@@ -2811,7 +2812,7 @@ static void __exit strip_exit_driver(void)
 	}
 
 	/* Unregister with the /proc/net file here. */
-	proc_net_remove("strip");
+	proc_net_remove(init_net(), "strip");
 
 	if ((i = tty_unregister_ldisc(N_STRIP)))
 		printk(KERN_ERR "STRIP: can't unregister line discipline (err = %d)\n", i);
diff --git a/fs/proc/Makefile b/fs/proc/Makefile
index a6b3a8f..63cc3ce 100644
--- a/fs/proc/Makefile
+++ b/fs/proc/Makefile
@@ -10,6 +10,7 @@ proc-$(CONFIG_MMU)	:= mmu.o task_mmu.o
 proc-y       += inode.o root.o base.o generic.o array.o \
 		proc_tty.o proc_misc.o proc_sysctl.o
 
+proc-$(CONFIG_NET)		+= proc_net.o
 proc-$(CONFIG_PROC_KCORE)	+= kcore.o
 proc-$(CONFIG_PROC_VMCORE)	+= vmcore.o
 proc-$(CONFIG_PROC_DEVICETREE)	+= proc_devtree.o
diff --git a/fs/proc/internal.h b/fs/proc/internal.h
index 3c9a305..f916252 100644
--- a/fs/proc/internal.h
+++ b/fs/proc/internal.h
@@ -12,6 +12,11 @@
 #include <linux/proc_fs.h>
 
 extern int proc_sys_init(void);
+#ifdef CONFIG_NET
+extern int proc_net_init(void);
+#else
+static inline int proc_net_init(void) { return 0; }
+#endif
 
 struct vmalloc_info {
 	unsigned long	used;
diff --git a/fs/proc/proc_net.c b/fs/proc/proc_net.c
new file mode 100644
index 0000000..022dd9a
--- /dev/null
+++ b/fs/proc/proc_net.c
@@ -0,0 +1,126 @@
+/*
+ *  linux/fs/proc/net.c
+ *
+ *  Copyright (C) 2007
+ *
+ *  Author: Eric Biederman <ebiederm@xmission.com>
+ *
+ *  proc net directory handling functions
+ */
+
+#include <asm/uaccess.h>
+
+#include <linux/errno.h>
+#include <linux/time.h>
+#include <linux/proc_fs.h>
+#include <linux/stat.h>
+#include <linux/init.h>
+#include <linux/sched.h>
+#include <linux/module.h>
+#include <linux/bitops.h>
+#include <linux/smp_lock.h>
+#include <linux/mount.h>
+#include <linux/nsproxy.h>
+#include <net/net_namespace.h>
+
+#include "internal.h"
+
+static struct proc_dir_entry *proc_net_shadow;
+DEFINE_PER_NET(struct proc_dir_entry *, proc_net);
+DEFINE_PER_NET(struct proc_dir_entry *, proc_net_stat);
+EXPORT_PER_NET_SYMBOL(proc_net);
+EXPORT_PER_NET_SYMBOL(proc_net_stat);
+
+static DEFINE_PER_NET(struct proc_dir_entry, proc_net_root);
+
+static struct dentry *proc_net_shadow_dentry(struct dentry *parent,
+						struct proc_dir_entry *de)
+{
+	struct dentry *shadow = NULL;
+	struct inode *inode;
+	if (!de)
+		goto out;
+	inode = proc_get_inode(parent->d_inode->i_sb, de->low_ino, de);
+	if (!inode)
+		goto out;
+	shadow = d_alloc_name(parent, de->name);
+	if (!shadow)
+		goto out_iput;
+	shadow->d_op = parent->d_op; /* proc_dentry_operations */
+	d_instantiate(shadow, inode);
+out:
+	return shadow;
+out_iput:
+	iput(inode);
+	goto out;
+}
+
+static void *proc_net_follow_link(struct dentry *parent, struct nameidata *nd)
+{
+	net_t net = current->nsproxy->net_ns;
+	struct dentry *shadow;
+	shadow = proc_net_shadow_dentry(parent, per_net(proc_net, net));
+	if (!shadow)
+		return ERR_PTR(-ENOENT);
+
+	dput(nd->dentry);
+	/* My dentry count is 1 and that should be enough as the 
+	 * shadow dentry is thrown away immediately.
+	 */
+	nd->dentry = shadow;
+	return NULL;
+}
+
+static const struct file_operations proc_net_dir_operations = {
+	.read			= generic_read_dir,
+};
+
+static struct inode_operations proc_net_dir_inode_operations = {
+	.follow_link	= proc_net_follow_link,
+};
+
+
+static int proc_net_ns_init(net_t net)
+{
+	struct proc_dir_entry *netd, *net_statd;
+
+	netd = proc_mkdir("net", &per_net(proc_net_root, net));
+	if (!netd)
+		return -EEXIST;
+
+	net_statd = proc_mkdir("stat", netd);
+	if (!net_statd) {
+		remove_proc_entry("net", &per_net(proc_net_root, net));
+		return -EEXIST;
+	}
+
+	netd->data = net_to_voidp(net);
+	net_statd->data = net_to_voidp(net);
+	per_net(proc_net_root, net).data = net_to_voidp(net);
+
+	per_net(proc_net, net) = netd;
+	per_net(proc_net_stat, net) = net_statd;
+
+	return 0;
+}
+
+static void proc_net_ns_exit(net_t net)
+{
+	remove_proc_entry("stat", per_net(proc_net, net));
+	remove_proc_entry("net", &per_net(proc_net_root, net));
+
+}
+
+struct pernet_operations proc_net_ns_ops = {
+	.init = proc_net_ns_init,
+	.exit = proc_net_ns_exit,
+};
+
+int proc_net_init(void)
+{
+	proc_net_shadow = proc_mkdir("net", NULL);
+	proc_net_shadow->proc_iops = &proc_net_dir_inode_operations;
+	proc_net_shadow->proc_fops = &proc_net_dir_operations;
+
+	return register_pernet_subsys(&proc_net_ns_ops);
+}
diff --git a/fs/proc/root.c b/fs/proc/root.c
index 4d42406..7c3939c 100644
--- a/fs/proc/root.c
+++ b/fs/proc/root.c
@@ -21,7 +21,7 @@
 
 #include "internal.h"
 
-struct proc_dir_entry *proc_net, *proc_net_stat, *proc_bus, *proc_root_fs, *proc_root_driver;
+struct proc_dir_entry *proc_bus, *proc_root_fs, *proc_root_driver;
 
 static int proc_get_sb(struct file_system_type *fs_type,
 	int flags, const char *dev_name, void *data, struct vfsmount *mnt)
@@ -61,8 +61,8 @@ void __init proc_root_init(void)
 		return;
 	}
 	proc_misc_init();
-	proc_net = proc_mkdir("net", NULL);
-	proc_net_stat = proc_mkdir("net/stat", NULL);
+
+	proc_net_init();
 
 #ifdef CONFIG_SYSVIPC
 	proc_mkdir("sysvipc", NULL);
@@ -161,7 +161,5 @@ EXPORT_SYMBOL(create_proc_entry);
 EXPORT_SYMBOL(remove_proc_entry);
 EXPORT_SYMBOL(proc_root);
 EXPORT_SYMBOL(proc_root_fs);
-EXPORT_SYMBOL(proc_net);
-EXPORT_SYMBOL(proc_net_stat);
 EXPORT_SYMBOL(proc_bus);
 EXPORT_SYMBOL(proc_root_driver);
diff --git a/include/linux/proc_fs.h b/include/linux/proc_fs.h
index 2969913..c1b958d 100644
--- a/include/linux/proc_fs.h
+++ b/include/linux/proc_fs.h
@@ -5,6 +5,7 @@
 #include <linux/fs.h>
 #include <linux/spinlock.h>
 #include <linux/magic.h>
+#include <linux/net_namespace_type.h>
 #include <asm/atomic.h>
 
 /*
@@ -85,8 +86,8 @@ struct vmcore {
 
 extern struct proc_dir_entry proc_root;
 extern struct proc_dir_entry *proc_root_fs;
-extern struct proc_dir_entry *proc_net;
-extern struct proc_dir_entry *proc_net_stat;
+DECLARE_PER_NET(struct proc_dir_entry *, proc_net);
+DECLARE_PER_NET(struct proc_dir_entry *, proc_net_stat);
 extern struct proc_dir_entry *proc_bus;
 extern struct proc_dir_entry *proc_root_driver;
 extern struct proc_dir_entry *proc_root_kcore;
@@ -183,24 +184,25 @@ static inline struct proc_dir_entry *create_proc_info_entry(const char *name,
 	return res;
 }
  
-static inline struct proc_dir_entry *proc_net_create(const char *name,
-	mode_t mode, get_info_t *get_info)
+static inline struct proc_dir_entry *proc_net_create(net_t net,
+	const char *name, mode_t mode, get_info_t *get_info)
 {
-	return create_proc_info_entry(name,mode,proc_net,get_info);
+	return create_proc_info_entry(name,mode,per_net(proc_net, net),get_info);
 }
 
-static inline struct proc_dir_entry *proc_net_fops_create(const char *name,
-	mode_t mode, const struct file_operations *fops)
+static inline struct proc_dir_entry *proc_net_fops_create(net_t net,
+	const char *name, mode_t mode, const struct file_operations *fops)
 {
-	struct proc_dir_entry *res = create_proc_entry(name, mode, proc_net);
+	struct proc_dir_entry *res = 
+		create_proc_entry(name, mode, per_net(proc_net, net));
 	if (res)
 		res->proc_fops = fops;
 	return res;
 }
 
-static inline void proc_net_remove(const char *name)
+static inline void proc_net_remove(net_t net, const char *name)
 {
-	remove_proc_entry(name,proc_net);
+	remove_proc_entry(name, per_net(proc_net, net));
 }
 
 #else
@@ -209,9 +211,9 @@ static inline void proc_net_remove(const char *name)
 #define proc_net NULL
 #define proc_bus NULL
 
-#define proc_net_fops_create(name, mode, fops)  ({ (void)(mode), NULL; })
-#define proc_net_create(name, mode, info)	({ (void)(mode), NULL; })
-static inline void proc_net_remove(const char *name) {}
+#define proc_net_fops_create(net, name, mode, fops)  ({ (void)(mode), NULL; })
+#define proc_net_create(net, name, mode, info)	({ (void)(mode), NULL; })
+static inline void proc_net_remove(net_t net, const char *name) {}
 
 static inline void proc_flush_task(struct task_struct *task) { }
 
diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
index 9208e2e..b64568f 100644
--- a/include/net/net_namespace.h
+++ b/include/net/net_namespace.h
@@ -8,6 +8,7 @@
 #include <linux/workqueue.h>
 #include <linux/nsproxy.h>
 #include <linux/net_namespace_type.h>
+#include <linux/proc_fs.h>
 
 /* How many bytes in each network namespace should we allocate
  * for use by modules when they are loaded.
@@ -65,4 +66,14 @@ extern void unregister_pernet_subsys(struct pernet_operations *);
 extern int register_pernet_device(struct pernet_operations *);
 extern void unregister_pernet_device(struct pernet_operations *);
 
+static inline net_t PDE_NET(struct proc_dir_entry *pde)
+{
+	return net_from_voidp(pde->parent->data);
+}
+
+static inline net_t PROC_NET(const struct inode *inode)
+{
+	return PDE_NET(PDE(inode));
+}
+
 #endif /* __NET_NET_NAMESPACE_H */
diff --git a/net/802/tr.c b/net/802/tr.c
index 829deb4..3324fa6 100644
--- a/net/802/tr.c
+++ b/net/802/tr.c
@@ -36,6 +36,7 @@
 #include <linux/seq_file.h>
 #include <linux/init.h>
 #include <net/arp.h>
+#include <net/net_namespace.h>
 
 static void tr_add_rif_info(struct trh_hdr *trh, struct net_device *dev);
 static void rif_check_expire(unsigned long dummy);
@@ -636,7 +637,7 @@ static int __init rif_init(void)
 	rif_timer.function = rif_check_expire;
 	add_timer(&rif_timer);
 
-	proc_net_fops_create("tr_rif", S_IRUGO, &rif_seq_fops);
+	proc_net_fops_create(init_net(), "tr_rif", S_IRUGO, &rif_seq_fops);
 	return 0;
 }
 
diff --git a/net/8021q/vlanproc.c b/net/8021q/vlanproc.c
index a8fc0de..abcf58c 100644
--- a/net/8021q/vlanproc.c
+++ b/net/8021q/vlanproc.c
@@ -33,6 +33,7 @@
 #include <linux/fs.h>
 #include <linux/netdevice.h>
 #include <linux/if_vlan.h>
+#include <net/net_namespace.h>
 #include "vlanproc.h"
 #include "vlan.h"
 
@@ -143,7 +144,7 @@ void vlan_proc_cleanup(void)
 		remove_proc_entry(name_conf, proc_vlan_dir);
 
 	if (proc_vlan_dir)
-		proc_net_remove(name_root);
+		proc_net_remove(init_net(), name_root);
 
 	/* Dynamically added entries should be cleaned up as their vlan_device
 	 * is removed, so we should not have to take care of it here...
@@ -156,7 +157,7 @@ void vlan_proc_cleanup(void)
 
 int __init vlan_proc_init(void)
 {
-	proc_vlan_dir = proc_mkdir(name_root, proc_net);
+	proc_vlan_dir = proc_mkdir(name_root, per_net(proc_net, init_net()));
 	if (proc_vlan_dir) {
 		proc_vlan_conf = create_proc_entry(name_conf,
 						   S_IFREG|S_IRUSR|S_IWUSR,
diff --git a/net/appletalk/atalk_proc.c b/net/appletalk/atalk_proc.c
index 7ae4916..0e77c68 100644
--- a/net/appletalk/atalk_proc.c
+++ b/net/appletalk/atalk_proc.c
@@ -13,6 +13,7 @@
 #include <linux/seq_file.h>
 #include <net/sock.h>
 #include <linux/atalk.h>
+#include <net/net_namespace.h>
 
 
 static __inline__ struct atalk_iface *atalk_get_interface_idx(loff_t pos)
@@ -271,7 +272,7 @@ int __init atalk_proc_init(void)
 	struct proc_dir_entry *p;
 	int rc = -ENOMEM;
 
-	atalk_proc_dir = proc_mkdir("atalk", proc_net);
+	atalk_proc_dir = proc_mkdir("atalk", per_net(proc_net, init_net()));
 	if (!atalk_proc_dir)
 		goto out;
 	atalk_proc_dir->owner = THIS_MODULE;
@@ -306,7 +307,7 @@ out_socket:
 out_route:
 	remove_proc_entry("interface", atalk_proc_dir);
 out_interface:
-	remove_proc_entry("atalk", proc_net);
+	remove_proc_entry("atalk", per_net(proc_net, init_net()));
 	goto out;
 }
 
@@ -316,5 +317,5 @@ void __exit atalk_proc_exit(void)
 	remove_proc_entry("route", atalk_proc_dir);
 	remove_proc_entry("socket", atalk_proc_dir);
 	remove_proc_entry("arp", atalk_proc_dir);
-	remove_proc_entry("atalk", proc_net);
+	remove_proc_entry("atalk", per_net(proc_net, init_net()));
 }
diff --git a/net/atm/proc.c b/net/atm/proc.c
index 739866b..8b0299d 100644
--- a/net/atm/proc.c
+++ b/net/atm/proc.c
@@ -22,6 +22,7 @@
 #include <linux/netdevice.h>
 #include <linux/atmclip.h>
 #include <linux/init.h> /* for __init */
+#include <net/net_namespace.h>
 #include <net/atmclip.h>
 #include <asm/uaccess.h>
 #include <asm/atomic.h>
@@ -475,7 +476,7 @@ static void atm_proc_dirs_remove(void)
 		if (e->dirent) 
 			remove_proc_entry(e->name, atm_proc_root);
 	}
-	remove_proc_entry("net/atm", NULL);
+	remove_proc_entry("atm", per_net(proc_net, init_net()));
 }
 
 int __init atm_proc_init(void)
@@ -483,7 +484,7 @@ int __init atm_proc_init(void)
 	static struct atm_proc_entry *e;
 	int ret;
 
-	atm_proc_root = proc_mkdir("net/atm",NULL);
+	atm_proc_root = proc_mkdir("atm", per_net(proc_net, init_net()));
 	if (!atm_proc_root)
 		goto err_out;
 	for (e = atm_proc_ents; e->name; e++) {
diff --git a/net/ax25/af_ax25.c b/net/ax25/af_ax25.c
index 42233df..e60af4e 100644
--- a/net/ax25/af_ax25.c
+++ b/net/ax25/af_ax25.c
@@ -48,6 +48,7 @@
 #include <net/tcp_states.h>
 #include <net/ip.h>
 #include <net/arp.h>
+#include <net/net_namespace.h>
 
 
 
@@ -2000,9 +2001,9 @@ static int __init ax25_init(void)
 	register_netdevice_notifier(&ax25_dev_notifier);
 	ax25_register_sysctl();
 
-	proc_net_fops_create("ax25_route", S_IRUGO, &ax25_route_fops);
-	proc_net_fops_create("ax25", S_IRUGO, &ax25_info_fops);
-	proc_net_fops_create("ax25_calls", S_IRUGO, &ax25_uid_fops);
+	proc_net_fops_create(init_net(), "ax25_route", S_IRUGO, &ax25_route_fops);
+	proc_net_fops_create(init_net(), "ax25", S_IRUGO, &ax25_info_fops);
+	proc_net_fops_create(init_net(), "ax25_calls", S_IRUGO, &ax25_uid_fops);
 out:
 	return rc;
 }
@@ -2016,9 +2017,9 @@ MODULE_ALIAS_NETPROTO(PF_AX25);
 
 static void __exit ax25_exit(void)
 {
-	proc_net_remove("ax25_route");
-	proc_net_remove("ax25");
-	proc_net_remove("ax25_calls");
+	proc_net_remove(init_net(), "ax25_route");
+	proc_net_remove(init_net(), "ax25");
+	proc_net_remove(init_net(), "ax25_calls");
 	ax25_rt_free();
 	ax25_uid_free();
 	ax25_dev_free();
diff --git a/net/core/dev.c b/net/core/dev.c
index 17c07f3..90e4c0e 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -116,6 +116,7 @@
 #include <linux/dmaengine.h>
 #include <linux/err.h>
 #include <linux/ctype.h>
+#include <net/net_namespace.h>
 
 /*
  *	The list of packet types we will receive (as opposed to discard)
@@ -2238,9 +2239,9 @@ static int __init dev_proc_init(void)
 {
 	int rc = -ENOMEM;
 
-	if (!proc_net_fops_create("dev", S_IRUGO, &dev_seq_fops))
+	if (!proc_net_fops_create(init_net(), "dev", S_IRUGO, &dev_seq_fops))
 		goto out;
-	if (!proc_net_fops_create("softnet_stat", S_IRUGO, &softnet_seq_fops))
+	if (!proc_net_fops_create(init_net(), "softnet_stat", S_IRUGO, &softnet_seq_fops))
 		goto out_dev;
 	if (wireless_proc_init())
 		goto out_softnet;
@@ -2248,9 +2249,9 @@ static int __init dev_proc_init(void)
 out:
 	return rc;
 out_softnet:
-	proc_net_remove("softnet_stat");
+	proc_net_remove(init_net(), "softnet_stat");
 out_dev:
-	proc_net_remove("dev");
+	proc_net_remove(init_net(), "dev");
 	goto out;
 }
 #else
diff --git a/net/core/dev_mcast.c b/net/core/dev_mcast.c
index b22648d..623e606 100644
--- a/net/core/dev_mcast.c
+++ b/net/core/dev_mcast.c
@@ -47,6 +47,7 @@
 #include <linux/skbuff.h>
 #include <net/sock.h>
 #include <net/arp.h>
+#include <net/net_namespace.h>
 
 
 /*
@@ -289,7 +290,7 @@ static struct file_operations dev_mc_seq_fops = {
 
 void __init dev_mcast_init(void)
 {
-	proc_net_fops_create("dev_mcast", 0, &dev_mc_seq_fops);
+	proc_net_fops_create(init_net(), "dev_mcast", 0, &dev_mc_seq_fops);
 }
 
 EXPORT_SYMBOL(dev_mc_add);
diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index 8437678..90e1d2e 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -34,6 +34,7 @@
 #include <linux/rtnetlink.h>
 #include <linux/random.h>
 #include <linux/string.h>
+#include <net/net_namespace.h>
 
 #define NEIGH_DEBUG 1
 
@@ -1348,7 +1349,7 @@ void neigh_table_init_no_netlink(struct neigh_table *tbl)
 		panic("cannot create neighbour cache statistics");
 	
 #ifdef CONFIG_PROC_FS
-	tbl->pde = create_proc_entry(tbl->id, 0, proc_net_stat);
+	tbl->pde = create_proc_entry(tbl->id, 0, per_net(proc_net_stat, init_net()));
 	if (!tbl->pde) 
 		panic("cannot create neighbour proc dir entry");
 	tbl->pde->proc_fops = &neigh_stat_seq_fops;
diff --git a/net/core/pktgen.c b/net/core/pktgen.c
index 04d4b93..ab48533 100644
--- a/net/core/pktgen.c
+++ b/net/core/pktgen.c
@@ -152,6 +152,7 @@
 #include <net/checksum.h>
 #include <net/ipv6.h>
 #include <net/addrconf.h>
+#include <net/net_namespace.h>
 #include <asm/byteorder.h>
 #include <linux/rcupdate.h>
 #include <asm/bitops.h>
@@ -3565,7 +3566,7 @@ static int __init pg_init(void)
 
 	printk(version);
 
-	pg_proc_dir = proc_mkdir(PG_PROC_DIR, proc_net);
+	pg_proc_dir = proc_mkdir(PG_PROC_DIR, per_net(proc_net, init_net()));
 	if (!pg_proc_dir)
 		return -ENODEV;
 	pg_proc_dir->owner = THIS_MODULE;
@@ -3574,7 +3575,7 @@ static int __init pg_init(void)
 	if (pe == NULL) {
 		printk("pktgen: ERROR: cannot create %s procfs entry.\n",
 		       PGCTRL);
-		proc_net_remove(PG_PROC_DIR);
+		proc_net_remove(init_net(), PG_PROC_DIR);
 		return -EINVAL;
 	}
 
@@ -3597,7 +3598,7 @@ static int __init pg_init(void)
 		printk("pktgen: ERROR: Initialization failed for all threads\n");
 		unregister_netdevice_notifier(&pktgen_notifier_block);
 		remove_proc_entry(PGCTRL, pg_proc_dir);
-		proc_net_remove(PG_PROC_DIR);
+		proc_net_remove(init_net(), PG_PROC_DIR);
 		return -ENODEV;
 	}
 
@@ -3624,7 +3625,7 @@ static void __exit pg_cleanup(void)
 
 	/* Clean up proc file system */
 	remove_proc_entry(PGCTRL, pg_proc_dir);
-	proc_net_remove(PG_PROC_DIR);
+	proc_net_remove(init_net(), PG_PROC_DIR);
 }
 
 module_init(pg_init);
diff --git a/net/core/sock.c b/net/core/sock.c
index 0ed5b4f..5555364 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -123,6 +123,7 @@
 #include <net/sock.h>
 #include <net/xfrm.h>
 #include <linux/ipsec.h>
+#include <net/net_namespace.h>
 
 #include <linux/filter.h>
 
@@ -1922,7 +1923,7 @@ static struct file_operations proto_seq_fops = {
 static int __init proto_init(void)
 {
 	/* register /proc/net/protocols */
-	return proc_net_fops_create("protocols", S_IRUGO, &proto_seq_fops) == NULL ? -ENOBUFS : 0;
+	return proc_net_fops_create(init_net(), "protocols", S_IRUGO, &proto_seq_fops) == NULL ? -ENOBUFS : 0;
 }
 
 subsys_initcall(proto_init);
diff --git a/net/core/wireless.c b/net/core/wireless.c
index f69ab7b..faa242f 100644
--- a/net/core/wireless.c
+++ b/net/core/wireless.c
@@ -94,6 +94,7 @@
 #include <linux/wireless.h>		/* Pretty obvious */
 #include <net/iw_handler.h>		/* New driver API */
 #include <net/netlink.h>
+#include <net/net_namespace.h>
 
 #include <asm/uaccess.h>		/* copy_to_user() */
 
@@ -685,7 +686,7 @@ static struct file_operations wireless_seq_fops = {
 int __init wireless_proc_init(void)
 {
 	/* Create /proc/net/wireless entry */
-	if (!proc_net_fops_create("wireless", S_IRUGO, &wireless_seq_fops))
+	if (!proc_net_fops_create(init_net(), "wireless", S_IRUGO, &wireless_seq_fops))
 		return -ENOMEM;
 
 	return 0;
diff --git a/net/dccp/probe.c b/net/dccp/probe.c
index f81e37d..7c1c1ef 100644
--- a/net/dccp/probe.c
+++ b/net/dccp/probe.c
@@ -30,6 +30,7 @@
 #include <linux/module.h>
 #include <linux/kfifo.h>
 #include <linux/vmalloc.h>
+#include <net/net_namespace.h>
 
 #include "dccp.h"
 #include "ccid.h"
@@ -165,7 +166,7 @@ static __init int dccpprobe_init(void)
 	if (IS_ERR(dccpw.fifo))
 		return PTR_ERR(dccpw.fifo);
 
-	if (!proc_net_fops_create(procname, S_IRUSR, &dccpprobe_fops))
+	if (!proc_net_fops_create(init_net(), procname, S_IRUSR, &dccpprobe_fops))
 		goto err0;
 
 	ret = register_jprobe(&dccp_send_probe);
@@ -175,7 +176,7 @@ static __init int dccpprobe_init(void)
 	pr_info("DCCP watch registered (port=%d)\n", port);
 	return 0;
 err1:
-	proc_net_remove(procname);
+	proc_net_remove(init_net(), procname);
 err0:
 	kfifo_free(dccpw.fifo);
 	return ret;
@@ -185,7 +186,7 @@ module_init(dccpprobe_init);
 static __exit void dccpprobe_exit(void)
 {
 	kfifo_free(dccpw.fifo);
-	proc_net_remove(procname);
+	proc_net_remove(init_net(), procname);
 	unregister_jprobe(&dccp_send_probe);
 
 }
diff --git a/net/decnet/af_decnet.c b/net/decnet/af_decnet.c
index 21f20f2..77cd802 100644
--- a/net/decnet/af_decnet.c
+++ b/net/decnet/af_decnet.c
@@ -131,6 +131,7 @@ Version 0.0.6    2.1.110   07-aug-98   Eduardo Marcelo Serrat
 #include <net/neighbour.h>
 #include <net/dst.h>
 #include <net/fib_rules.h>
+#include <net/net_namespace.h>
 #include <net/dn.h>
 #include <net/dn_nsp.h>
 #include <net/dn_dev.h>
@@ -2396,7 +2397,7 @@ static int __init decnet_init(void)
 	dev_add_pack(&dn_dix_packet_type);
 	register_netdevice_notifier(&dn_dev_notifier);
 
-	proc_net_fops_create("decnet", S_IRUGO, &dn_socket_seq_fops);
+	proc_net_fops_create(init_net(), "decnet", S_IRUGO, &dn_socket_seq_fops);
 	dn_register_sysctl();
 out:
 	return rc;
@@ -2424,7 +2425,7 @@ static void __exit decnet_exit(void)
 	dn_neigh_cleanup();
 	dn_fib_cleanup();
 
-	proc_net_remove("decnet");
+	proc_net_remove(init_net(), "decnet");
 
 	proto_unregister(&dn_proto);
 }
diff --git a/net/decnet/dn_dev.c b/net/decnet/dn_dev.c
index 913e25a..19b1469 100644
--- a/net/decnet/dn_dev.c
+++ b/net/decnet/dn_dev.c
@@ -47,6 +47,7 @@
 #include <net/flow.h>
 #include <net/fib_rules.h>
 #include <net/netlink.h>
+#include <net/net_namespace.h>
 #include <net/dn.h>
 #include <net/dn_dev.h>
 #include <net/dn_route.h>
@@ -1483,7 +1484,7 @@ void __init dn_dev_init(void)
 
 	rtnetlink_links[PF_DECnet] = dnet_rtnetlink_table;
 
-	proc_net_fops_create("decnet_dev", S_IRUGO, &dn_dev_seq_fops);
+	proc_net_fops_create(init_net(), "decnet_dev", S_IRUGO, &dn_dev_seq_fops);
 
 #ifdef CONFIG_SYSCTL
 	{
@@ -1506,7 +1507,7 @@ void __exit dn_dev_cleanup(void)
 	}
 #endif /* CONFIG_SYSCTL */
 
-	proc_net_remove("decnet_dev");
+	proc_net_remove(init_net(), "decnet_dev");
 
 	dn_dev_devices_off();
 }
diff --git a/net/decnet/dn_neigh.c b/net/decnet/dn_neigh.c
index 7322bb3..fd99aca 100644
--- a/net/decnet/dn_neigh.c
+++ b/net/decnet/dn_neigh.c
@@ -38,6 +38,7 @@
 #include <linux/rcupdate.h>
 #include <linux/jhash.h>
 #include <asm/atomic.h>
+#include <net/net_namespace.h>
 #include <net/neighbour.h>
 #include <net/dst.h>
 #include <net/flow.h>
@@ -611,11 +612,11 @@ static struct file_operations dn_neigh_seq_fops = {
 void __init dn_neigh_init(void)
 {
 	neigh_table_init(&dn_neigh_table);
-	proc_net_fops_create("decnet_neigh", S_IRUGO, &dn_neigh_seq_fops);
+	proc_net_fops_create(init_net(), "decnet_neigh", S_IRUGO, &dn_neigh_seq_fops);
 }
 
 void __exit dn_neigh_cleanup(void)
 {
-	proc_net_remove("decnet_neigh");
+	proc_net_remove(init_net(), "decnet_neigh");
 	neigh_table_clear(&dn_neigh_table);
 }
diff --git a/net/decnet/dn_route.c b/net/decnet/dn_route.c
index 9881933..0d657eb 100644
--- a/net/decnet/dn_route.c
+++ b/net/decnet/dn_route.c
@@ -81,6 +81,7 @@
 #include <net/dst.h>
 #include <net/flow.h>
 #include <net/fib_rules.h>
+#include <net/net_namespace.h>
 #include <net/dn.h>
 #include <net/dn_dev.h>
 #include <net/dn_nsp.h>
@@ -1811,7 +1812,7 @@ void __init dn_route_init(void)
 
         dn_dst_ops.gc_thresh = (dn_rt_hash_mask + 1);
 
-	proc_net_fops_create("decnet_cache", S_IRUGO, &dn_rt_cache_seq_fops);
+	proc_net_fops_create(init_net(), "decnet_cache", S_IRUGO, &dn_rt_cache_seq_fops);
 }
 
 void __exit dn_route_cleanup(void)
@@ -1819,6 +1820,6 @@ void __exit dn_route_cleanup(void)
 	del_timer(&dn_route_timer);
 	dn_run_flush(0);
 
-	proc_net_remove("decnet_cache");
+	proc_net_remove(init_net(), "decnet_cache");
 }
 
diff --git a/net/ieee80211/ieee80211_module.c b/net/ieee80211/ieee80211_module.c
index b1c6d1f..23539f6 100644
--- a/net/ieee80211/ieee80211_module.c
+++ b/net/ieee80211/ieee80211_module.c
@@ -263,7 +263,7 @@ static int __init ieee80211_init(void)
 	struct proc_dir_entry *e;
 
 	ieee80211_debug_level = debug;
-	ieee80211_proc = proc_mkdir(DRV_NAME, proc_net);
+	ieee80211_proc = proc_mkdir(DRV_NAME, per_net(proc_net, init_net()));
 	if (ieee80211_proc == NULL) {
 		IEEE80211_ERROR("Unable to create " DRV_NAME
 				" proc directory\n");
@@ -272,7 +272,7 @@ static int __init ieee80211_init(void)
 	e = create_proc_entry("debug_level", S_IFREG | S_IRUGO | S_IWUSR,
 			      ieee80211_proc);
 	if (!e) {
-		remove_proc_entry(DRV_NAME, proc_net);
+		remove_proc_entry(DRV_NAME, per_net(proc_net, init_net()));
 		ieee80211_proc = NULL;
 		return -EIO;
 	}
@@ -292,7 +292,7 @@ static void __exit ieee80211_exit(void)
 #ifdef CONFIG_IEEE80211_DEBUG
 	if (ieee80211_proc) {
 		remove_proc_entry("debug_level", ieee80211_proc);
-		remove_proc_entry(DRV_NAME, proc_net);
+		remove_proc_entry(DRV_NAME, per_net(proc_net, init_net()));
 		ieee80211_proc = NULL;
 	}
 #endif				/* CONFIG_IEEE80211_DEBUG */
diff --git a/net/ipv4/arp.c b/net/ipv4/arp.c
index 3981e8b..e3b89a7 100644
--- a/net/ipv4/arp.c
+++ b/net/ipv4/arp.c
@@ -110,6 +110,7 @@
 #include <net/protocol.h>
 #include <net/tcp.h>
 #include <net/sock.h>
+#include <net/net_namespace.h>
 #include <net/arp.h>
 #if defined(CONFIG_AX25) || defined(CONFIG_AX25_MODULE)
 #include <net/ax25.h>
@@ -1400,7 +1401,7 @@ static struct file_operations arp_seq_fops = {
 
 static int __init arp_proc_init(void)
 {
-	if (!proc_net_fops_create("arp", S_IRUGO, &arp_seq_fops))
+	if (!proc_net_fops_create(init_net(), "arp", S_IRUGO, &arp_seq_fops))
 		return -ENOMEM;
 	return 0;
 }
diff --git a/net/ipv4/fib_hash.c b/net/ipv4/fib_hash.c
index 648f47c..42ea992 100644
--- a/net/ipv4/fib_hash.c
+++ b/net/ipv4/fib_hash.c
@@ -41,6 +41,7 @@
 #include <net/route.h>
 #include <net/tcp.h>
 #include <net/sock.h>
+#include <net/net_namespace.h>
 #include <net/ip_fib.h>
 
 #include "fib_lookup.h"
@@ -1067,13 +1068,13 @@ static struct file_operations fib_seq_fops = {
 
 int __init fib_proc_init(void)
 {
-	if (!proc_net_fops_create("route", S_IRUGO, &fib_seq_fops))
+	if (!proc_net_fops_create(init_net(), "route", S_IRUGO, &fib_seq_fops))
 		return -ENOMEM;
 	return 0;
 }
 
 void __init fib_proc_exit(void)
 {
-	proc_net_remove("route");
+	proc_net_remove(init_net(), "route");
 }
 #endif /* CONFIG_PROC_FS */
diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c
index 13307c0..94598b3 100644
--- a/net/ipv4/fib_trie.c
+++ b/net/ipv4/fib_trie.c
@@ -79,6 +79,7 @@
 #include <net/route.h>
 #include <net/tcp.h>
 #include <net/sock.h>
+#include <net/net_namespace.h>
 #include <net/ip_fib.h>
 #include "fib_lookup.h"
 
@@ -2494,30 +2495,30 @@ static struct file_operations fib_route_fops = {
 
 int __init fib_proc_init(void)
 {
-	if (!proc_net_fops_create("fib_trie", S_IRUGO, &fib_trie_fops))
+	if (!proc_net_fops_create(init_net(), "fib_trie", S_IRUGO, &fib_trie_fops))
 		goto out1;
 
-	if (!proc_net_fops_create("fib_triestat", S_IRUGO, &fib_triestat_fops))
+	if (!proc_net_fops_create(init_net(), "fib_triestat", S_IRUGO, &fib_triestat_fops))
 		goto out2;
 
-	if (!proc_net_fops_create("route", S_IRUGO, &fib_route_fops))
+	if (!proc_net_fops_create(init_net(), "route", S_IRUGO, &fib_route_fops))
 		goto out3;
 
 	return 0;
 
 out3:
-	proc_net_remove("fib_triestat");
+	proc_net_remove(init_net(), "fib_triestat");
 out2:
-	proc_net_remove("fib_trie");
+	proc_net_remove(init_net(), "fib_trie");
 out1:
 	return -ENOMEM;
 }
 
 void __init fib_proc_exit(void)
 {
-	proc_net_remove("fib_trie");
-	proc_net_remove("fib_triestat");
-	proc_net_remove("route");
+	proc_net_remove(init_net(), "fib_trie");
+	proc_net_remove(init_net(), "fib_triestat");
+	proc_net_remove(init_net(), "route");
 }
 
 #endif /* CONFIG_PROC_FS */
diff --git a/net/ipv4/igmp.c b/net/ipv4/igmp.c
index 0017ccb..92624cc 100644
--- a/net/ipv4/igmp.c
+++ b/net/ipv4/igmp.c
@@ -97,6 +97,7 @@
 #include <net/route.h>
 #include <net/sock.h>
 #include <net/checksum.h>
+#include <net/net_namespace.h>
 #include <linux/netfilter_ipv4.h>
 #ifdef CONFIG_IP_MROUTE
 #include <linux/mroute.h>
@@ -2585,8 +2586,8 @@ static struct file_operations igmp_mcf_seq_fops = {
 
 int __init igmp_mc_proc_init(void)
 {
-	proc_net_fops_create("igmp", S_IRUGO, &igmp_mc_seq_fops);
-	proc_net_fops_create("mcfilter", S_IRUGO, &igmp_mcf_seq_fops);
+	proc_net_fops_create(init_net(), "igmp", S_IRUGO, &igmp_mc_seq_fops);
+	proc_net_fops_create(init_net(), "mcfilter", S_IRUGO, &igmp_mcf_seq_fops);
 	return 0;
 }
 #endif
diff --git a/net/ipv4/ipconfig.c b/net/ipv4/ipconfig.c
index afa60b9..8b649c5 100644
--- a/net/ipv4/ipconfig.c
+++ b/net/ipv4/ipconfig.c
@@ -59,6 +59,7 @@
 #include <net/ip.h>
 #include <net/ipconfig.h>
 #include <net/route.h>
+#include <net/net_namespace.h>
 
 #include <asm/uaccess.h>
 #include <net/checksum.h>
@@ -1252,7 +1253,7 @@ static int __init ip_auto_config(void)
 	__be32 addr;
 
 #ifdef CONFIG_PROC_FS
-	proc_net_fops_create("pnp", S_IRUGO, &pnp_seq_fops);
+	proc_net_fops_create(init_net(), "pnp", S_IRUGO, &pnp_seq_fops);
 #endif /* CONFIG_PROC_FS */
 
 	if (!ic_enable)
diff --git a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c
index ecb5422..af50394 100644
--- a/net/ipv4/ipmr.c
+++ b/net/ipv4/ipmr.c
@@ -63,6 +63,7 @@
 #include <linux/netfilter_ipv4.h>
 #include <net/ipip.h>
 #include <net/checksum.h>
+#include <net/net_namespace.h>
 
 #if defined(CONFIG_IP_PIMSM_V1) || defined(CONFIG_IP_PIMSM_V2)
 #define CONFIG_IP_PIMSM	1
@@ -1906,7 +1907,7 @@ void __init ip_mr_init(void)
 	ipmr_expire_timer.function=ipmr_expire_process;
 	register_netdevice_notifier(&ip_mr_notifier);
 #ifdef CONFIG_PROC_FS	
-	proc_net_fops_create("ip_mr_vif", 0, &ipmr_vif_fops);
-	proc_net_fops_create("ip_mr_cache", 0, &ipmr_mfc_fops);
+	proc_net_fops_create(init_net(),"ip_mr_vif", 0, &ipmr_vif_fops);
+	proc_net_fops_create(init_net(),"ip_mr_cache", 0, &ipmr_mfc_fops);
 #endif	
 }
diff --git a/net/ipv4/ipvs/ip_vs_app.c b/net/ipv4/ipvs/ip_vs_app.c
index 6c40899..4f44452 100644
--- a/net/ipv4/ipvs/ip_vs_app.c
+++ b/net/ipv4/ipvs/ip_vs_app.c
@@ -32,6 +32,7 @@
 #include <linux/proc_fs.h>
 #include <linux/seq_file.h>
 #include <linux/mutex.h>
+#include <net/net_namespace.h>
 
 #include <net/ip_vs.h>
 
@@ -618,12 +619,12 @@ int ip_vs_skb_replace(struct sk_buff *skb, gfp_t pri,
 int ip_vs_app_init(void)
 {
 	/* we will replace it with proc_net_ipvs_create() soon */
-	proc_net_fops_create("ip_vs_app", 0, &ip_vs_app_fops);
+	proc_net_fops_create(init_net(), "ip_vs_app", 0, &ip_vs_app_fops);
 	return 0;
 }
 
 
 void ip_vs_app_cleanup(void)
 {
-	proc_net_remove("ip_vs_app");
+	proc_net_remove(init_net(), "ip_vs_app");
 }
diff --git a/net/ipv4/ipvs/ip_vs_conn.c b/net/ipv4/ipvs/ip_vs_conn.c
index 8086787..0764e0f 100644
--- a/net/ipv4/ipvs/ip_vs_conn.c
+++ b/net/ipv4/ipvs/ip_vs_conn.c
@@ -34,6 +34,7 @@
 #include <linux/seq_file.h>
 #include <linux/jhash.h>
 #include <linux/random.h>
+#include <net/net_namespace.h>
 
 #include <net/ip_vs.h>
 
@@ -923,7 +924,7 @@ int ip_vs_conn_init(void)
 		rwlock_init(&__ip_vs_conntbl_lock_array[idx].l);
 	}
 
-	proc_net_fops_create("ip_vs_conn", 0, &ip_vs_conn_fops);
+	proc_net_fops_create(init_net(), "ip_vs_conn", 0, &ip_vs_conn_fops);
 
 	/* calculate the random value for connection hash */
 	get_random_bytes(&ip_vs_conn_rnd, sizeof(ip_vs_conn_rnd));
@@ -939,6 +940,6 @@ void ip_vs_conn_cleanup(void)
 
 	/* Release the empty cache */
 	kmem_cache_destroy(ip_vs_conn_cachep);
-	proc_net_remove("ip_vs_conn");
+	proc_net_remove(init_net(), "ip_vs_conn");
 	vfree(ip_vs_conn_tab);
 }
diff --git a/net/ipv4/ipvs/ip_vs_ctl.c b/net/ipv4/ipvs/ip_vs_ctl.c
index c4e4237..d4bf160 100644
--- a/net/ipv4/ipvs/ip_vs_ctl.c
+++ b/net/ipv4/ipvs/ip_vs_ctl.c
@@ -39,6 +39,7 @@
 #include <net/ip.h>
 #include <net/route.h>
 #include <net/sock.h>
+#include <net/net_namespace.h>
 
 #include <asm/uaccess.h>
 
@@ -2356,8 +2357,8 @@ int ip_vs_control_init(void)
 		return ret;
 	}
 
-	proc_net_fops_create("ip_vs", 0, &ip_vs_info_fops);
-	proc_net_fops_create("ip_vs_stats",0, &ip_vs_stats_fops);
+	proc_net_fops_create(init_net(), "ip_vs", 0, &ip_vs_info_fops);
+	proc_net_fops_create(init_net(), "ip_vs_stats",0, &ip_vs_stats_fops);
 
 	sysctl_header = register_sysctl_table(vs_root_table);
 
@@ -2389,8 +2390,8 @@ void ip_vs_control_cleanup(void)
 	cancel_rearming_delayed_work(&defense_work);
 	ip_vs_kill_estimator(&ip_vs_stats);
 	unregister_sysctl_table(sysctl_header);
-	proc_net_remove("ip_vs_stats");
-	proc_net_remove("ip_vs");
+	proc_net_remove(init_net(), "ip_vs_stats");
+	proc_net_remove(init_net(), "ip_vs");
 	nf_unregister_sockopt(&ip_vs_sockopts);
 	LeaveFunction(2);
 }
diff --git a/net/ipv4/ipvs/ip_vs_lblcr.c b/net/ipv4/ipvs/ip_vs_lblcr.c
index 22004f8..f8491e7 100644
--- a/net/ipv4/ipvs/ip_vs_lblcr.c
+++ b/net/ipv4/ipvs/ip_vs_lblcr.c
@@ -843,7 +843,7 @@ static int __init ip_vs_lblcr_init(void)
 	INIT_LIST_HEAD(&ip_vs_lblcr_scheduler.n_list);
 	sysctl_header = register_sysctl_table(lblcr_root_table);
 #ifdef CONFIG_IP_VS_LBLCR_DEBUG
-	proc_net_create("ip_vs_lblcr", 0, ip_vs_lblcr_getinfo);
+	proc_net_create(init_net(), "ip_vs_lblcr", 0, ip_vs_lblcr_getinfo);
 #endif
 	return register_ip_vs_scheduler(&ip_vs_lblcr_scheduler);
 }
@@ -852,7 +852,7 @@ static int __init ip_vs_lblcr_init(void)
 static void __exit ip_vs_lblcr_cleanup(void)
 {
 #ifdef CONFIG_IP_VS_LBLCR_DEBUG
-	proc_net_remove("ip_vs_lblcr");
+	proc_net_remove(init_net(), "ip_vs_lblcr");
 #endif
 	unregister_sysctl_table(sysctl_header);
 	unregister_ip_vs_scheduler(&ip_vs_lblcr_scheduler);
diff --git a/net/ipv4/netfilter/ip_conntrack_standalone.c b/net/ipv4/netfilter/ip_conntrack_standalone.c
index 9d89469..d04cbb0 100644
--- a/net/ipv4/netfilter/ip_conntrack_standalone.c
+++ b/net/ipv4/netfilter/ip_conntrack_standalone.c
@@ -828,14 +828,14 @@ static int __init ip_conntrack_standalone_init(void)
 
 #ifdef CONFIG_PROC_FS
 	ret = -ENOMEM;
-	proc = proc_net_fops_create("ip_conntrack", 0440, &ct_file_ops);
+	proc = proc_net_fops_create(init_net(), "ip_conntrack", 0440, &ct_file_ops);
 	if (!proc) goto cleanup_init;
 
-	proc_exp = proc_net_fops_create("ip_conntrack_expect", 0440,
+	proc_exp = proc_net_fops_create(init_net(), "ip_conntrack_expect", 0440,
 					&exp_file_ops);
 	if (!proc_exp) goto cleanup_proc;
 
-	proc_stat = create_proc_entry("ip_conntrack", S_IRUGO, proc_net_stat);
+	proc_stat = create_proc_entry("ip_conntrack", S_IRUGO, per_net(proc_net_stat, init_net()));
 	if (!proc_stat)
 		goto cleanup_proc_exp;
 
@@ -864,11 +864,11 @@ static int __init ip_conntrack_standalone_init(void)
 #endif
  cleanup_proc_stat:
 #ifdef CONFIG_PROC_FS
-	remove_proc_entry("ip_conntrack", proc_net_stat);
+	remove_proc_entry("ip_conntrack", per_net(proc_net_stat, init_net()));
  cleanup_proc_exp:
-	proc_net_remove("ip_conntrack_expect");
+	proc_net_remove(init_net(), "ip_conntrack_expect");
  cleanup_proc:
-	proc_net_remove("ip_conntrack");
+	proc_net_remove(init_net(), "ip_conntrack");
  cleanup_init:
 #endif /* CONFIG_PROC_FS */
 	ip_conntrack_cleanup();
@@ -884,8 +884,8 @@ static void __exit ip_conntrack_standalone_fini(void)
 	nf_unregister_hooks(ip_conntrack_ops, ARRAY_SIZE(ip_conntrack_ops));
 #ifdef CONFIG_PROC_FS
 	remove_proc_entry("ip_conntrack", proc_net_stat);
-	proc_net_remove("ip_conntrack_expect");
-	proc_net_remove("ip_conntrack");
+	proc_net_remove(init_net(), "ip_conntrack_expect");
+	proc_net_remove(init_net(), "ip_conntrack");
 #endif /* CONFIG_PROC_FS */
 	ip_conntrack_cleanup();
 }
diff --git a/net/ipv4/netfilter/ip_queue.c b/net/ipv4/netfilter/ip_queue.c
index 3446d4a..aae660c 100644
--- a/net/ipv4/netfilter/ip_queue.c
+++ b/net/ipv4/netfilter/ip_queue.c
@@ -38,6 +38,7 @@
 #include <linux/mutex.h>
 #include <net/sock.h>
 #include <net/route.h>
+#include <net/net_namespace.h>
 
 #define IPQ_QMAX_DEFAULT 1024
 #define IPQ_PROC_FS_NAME "ip_queue"
@@ -684,7 +685,7 @@ static int __init ip_queue_init(void)
 		goto cleanup_netlink_notifier;
 	}
 
-	proc = proc_net_create(IPQ_PROC_FS_NAME, 0, ipq_get_info);
+	proc = proc_net_create(init_net(), IPQ_PROC_FS_NAME, 0, ipq_get_info);
 	if (proc)
 		proc->owner = THIS_MODULE;
 	else {
@@ -705,7 +706,7 @@ static int __init ip_queue_init(void)
 cleanup_sysctl:
 	unregister_sysctl_table(ipq_sysctl_header);
 	unregister_netdevice_notifier(&ipq_dev_notifier);
-	proc_net_remove(IPQ_PROC_FS_NAME);
+	proc_net_remove(init_net(), IPQ_PROC_FS_NAME);
 	
 cleanup_ipqnl:
 	sock_release(ipqnl->sk_socket);
@@ -725,7 +726,7 @@ static void __exit ip_queue_fini(void)
 
 	unregister_sysctl_table(ipq_sysctl_header);
 	unregister_netdevice_notifier(&ipq_dev_notifier);
-	proc_net_remove(IPQ_PROC_FS_NAME);
+	proc_net_remove(init_net(), IPQ_PROC_FS_NAME);
 
 	sock_release(ipqnl->sk_socket);
 	mutex_lock(&ipqnl_mutex);
diff --git a/net/ipv4/netfilter/ipt_CLUSTERIP.c b/net/ipv4/netfilter/ipt_CLUSTERIP.c
index b1c1116..779e2c6 100644
--- a/net/ipv4/netfilter/ipt_CLUSTERIP.c
+++ b/net/ipv4/netfilter/ipt_CLUSTERIP.c
@@ -22,6 +22,7 @@
 #include <linux/proc_fs.h>
 #include <linux/seq_file.h>
 
+#include <net/net_namespace.h>
 #include <net/checksum.h>
 
 #include <linux/netfilter_arp.h>
@@ -736,7 +737,7 @@ static int __init ipt_clusterip_init(void)
 		goto cleanup_target;
 
 #ifdef CONFIG_PROC_FS
-	clusterip_procdir = proc_mkdir("ipt_CLUSTERIP", proc_net);
+	clusterip_procdir = proc_mkdir("ipt_CLUSTERIP", per_net(proc_net, init_net()));
 	if (!clusterip_procdir) {
 		printk(KERN_ERR "CLUSTERIP: Unable to proc dir entry\n");
 		ret = -ENOMEM;
diff --git a/net/ipv4/netfilter/ipt_recent.c b/net/ipv4/netfilter/ipt_recent.c
index 4db0e73..4bfa2f9 100644
--- a/net/ipv4/netfilter/ipt_recent.c
+++ b/net/ipv4/netfilter/ipt_recent.c
@@ -23,6 +23,7 @@
 #include <linux/bitops.h>
 #include <linux/skbuff.h>
 #include <linux/inet.h>
+#include <net/net_namespace.h>
 
 #include <linux/netfilter_ipv4/ip_tables.h>
 #include <linux/netfilter_ipv4/ipt_recent.h>
@@ -483,7 +484,7 @@ static int __init ipt_recent_init(void)
 #ifdef CONFIG_PROC_FS
 	if (err)
 		return err;
-	proc_dir = proc_mkdir("ipt_recent", proc_net);
+	proc_dir = proc_mkdir("ipt_recent", per_net(proc_net, init_net()));
 	if (proc_dir == NULL) {
 		ipt_unregister_match(&recent_match);
 		err = -ENOMEM;
@@ -497,7 +498,7 @@ static void __exit ipt_recent_exit(void)
 	BUG_ON(!list_empty(&tables));
 	ipt_unregister_match(&recent_match);
 #ifdef CONFIG_PROC_FS
-	remove_proc_entry("ipt_recent", proc_net);
+	remove_proc_entry("ipt_recent", per_net(proc_net, init_net()));
 #endif
 }
 
diff --git a/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4_compat.c b/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4_compat.c
index 3b31bc6..ebdb56e 100644
--- a/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4_compat.c
+++ b/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4_compat.c
@@ -11,6 +11,7 @@
 #include <linux/proc_fs.h>
 #include <linux/seq_file.h>
 #include <linux/percpu.h>
+#include <net/net_namespace.h>
 
 #include <linux/netfilter.h>
 #include <net/netfilter/nf_conntrack_core.h>
@@ -378,16 +379,16 @@ int __init nf_conntrack_ipv4_compat_init(void)
 {
 	struct proc_dir_entry *proc, *proc_exp, *proc_stat;
 
-	proc = proc_net_fops_create("ip_conntrack", 0440, &ct_file_ops);
+	proc = proc_net_fops_create(init_net(), "ip_conntrack", 0440, &ct_file_ops);
 	if (!proc)
 		goto err1;
 
-	proc_exp = proc_net_fops_create("ip_conntrack_expect", 0440,
+	proc_exp = proc_net_fops_create(init_net(), "ip_conntrack_expect", 0440,
 					&ip_exp_file_ops);
 	if (!proc_exp)
 		goto err2;
 
-	proc_stat = create_proc_entry("ip_conntrack", S_IRUGO, proc_net_stat);
+	proc_stat = create_proc_entry("ip_conntrack", S_IRUGO, per_net(proc_net_stat, init_net()));
 	if (!proc_stat)
 		goto err3;
 
@@ -397,16 +398,16 @@ int __init nf_conntrack_ipv4_compat_init(void)
 	return 0;
 
 err3:
-	proc_net_remove("ip_conntrack_expect");
+	proc_net_remove(init_net(), "ip_conntrack_expect");
 err2:
-	proc_net_remove("ip_conntrack");
+	proc_net_remove(init_net(), "ip_conntrack");
 err1:
 	return -ENOMEM;
 }
 
 void __exit nf_conntrack_ipv4_compat_fini(void)
 {
-	remove_proc_entry("ip_conntrack", proc_net_stat);
-	proc_net_remove("ip_conntrack_expect");
-	proc_net_remove("ip_conntrack");
+	remove_proc_entry("ip_conntrack", per_net(proc_net_stat, init_net()));
+	proc_net_remove(init_net(), "ip_conntrack_expect");
+	proc_net_remove(init_net(), "ip_conntrack");
 }
diff --git a/net/ipv4/proc.c b/net/ipv4/proc.c
index cd873da..c9c5601 100644
--- a/net/ipv4/proc.c
+++ b/net/ipv4/proc.c
@@ -44,6 +44,7 @@
 #include <linux/seq_file.h>
 #include <net/sock.h>
 #include <net/raw.h>
+#include <net/net_namespace.h>
 
 static int fold_prot_inuse(struct proto *proto)
 {
@@ -372,20 +373,20 @@ int __init ip_misc_proc_init(void)
 {
 	int rc = 0;
 
-	if (!proc_net_fops_create("netstat", S_IRUGO, &netstat_seq_fops))
+	if (!proc_net_fops_create(init_net(), "netstat", S_IRUGO, &netstat_seq_fops))
 		goto out_netstat;
 
-	if (!proc_net_fops_create("snmp", S_IRUGO, &snmp_seq_fops))
+	if (!proc_net_fops_create(init_net(), "snmp", S_IRUGO, &snmp_seq_fops))
 		goto out_snmp;
 
-	if (!proc_net_fops_create("sockstat", S_IRUGO, &sockstat_seq_fops))
+	if (!proc_net_fops_create(init_net(), "sockstat", S_IRUGO, &sockstat_seq_fops))
 		goto out_sockstat;
 out:
 	return rc;
 out_sockstat:
-	proc_net_remove("snmp");
+	proc_net_remove(init_net(), "snmp");
 out_snmp:
-	proc_net_remove("netstat");
+	proc_net_remove(init_net(), "netstat");
 out_netstat:
 	rc = -ENOMEM;
 	goto out;
diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c
index a6c63bb..38fe668 100644
--- a/net/ipv4/raw.c
+++ b/net/ipv4/raw.c
@@ -73,6 +73,7 @@
 #include <net/inet_common.h>
 #include <net/checksum.h>
 #include <net/xfrm.h>
+#include <net/net_namespace.h>
 #include <linux/rtnetlink.h>
 #include <linux/proc_fs.h>
 #include <linux/seq_file.h>
@@ -926,13 +927,13 @@ static struct file_operations raw_seq_fops = {
 
 int __init raw_proc_init(void)
 {
-	if (!proc_net_fops_create("raw", S_IRUGO, &raw_seq_fops))
+	if (!proc_net_fops_create(init_net(), "raw", S_IRUGO, &raw_seq_fops))
 		return -ENOMEM;
 	return 0;
 }
 
 void __init raw_proc_exit(void)
 {
-	proc_net_remove("raw");
+	proc_net_remove(init_net(), "raw");
 }
 #endif /* CONFIG_PROC_FS */
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 2daa0dc..8be7506 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -105,6 +105,7 @@
 #include <net/xfrm.h>
 #include <net/ip_mp_alg.h>
 #include <net/netevent.h>
+#include <net/net_namespace.h>
 #ifdef CONFIG_SYSCTL
 #include <linux/sysctl.h>
 #endif
@@ -3178,15 +3179,15 @@ int __init ip_rt_init(void)
 #ifdef CONFIG_PROC_FS
 	{
 	struct proc_dir_entry *rtstat_pde = NULL; /* keep gcc happy */
-	if (!proc_net_fops_create("rt_cache", S_IRUGO, &rt_cache_seq_fops) ||
+	if (!proc_net_fops_create(init_net(), "rt_cache", S_IRUGO, &rt_cache_seq_fops) ||
 	    !(rtstat_pde = create_proc_entry("rt_cache", S_IRUGO, 
-			    		     proc_net_stat))) {
+			    		     per_net(proc_net_stat, init_net())))) {
 		return -ENOMEM;
 	}
 	rtstat_pde->proc_fops = &rt_cpu_seq_fops;
 	}
 #ifdef CONFIG_NET_CLS_ROUTE
-	create_proc_read_entry("rt_acct", 0, proc_net, ip_rt_acct_read, NULL);
+	create_proc_read_entry("rt_acct", 0, per_net(proc_net, init_net()), ip_rt_acct_read, NULL);
 #endif
 #endif
 #ifdef CONFIG_XFRM
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 12de90a..ee4306f 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -71,6 +71,7 @@
 #include <net/timewait_sock.h>
 #include <net/xfrm.h>
 #include <net/netdma.h>
+#include <net/net_namespace.h>
 
 #include <linux/inet.h>
 #include <linux/ipv6.h>
@@ -2252,7 +2253,7 @@ int tcp_proc_register(struct tcp_seq_afinfo *afinfo)
 	afinfo->seq_fops->llseek	= seq_lseek;
 	afinfo->seq_fops->release	= seq_release_private;
 
-	p = proc_net_fops_create(afinfo->name, S_IRUGO, afinfo->seq_fops);
+	p = proc_net_fops_create(init_net(), afinfo->name, S_IRUGO, afinfo->seq_fops);
 	if (p)
 		p->data = afinfo;
 	else
@@ -2264,7 +2265,7 @@ void tcp_proc_unregister(struct tcp_seq_afinfo *afinfo)
 {
 	if (!afinfo)
 		return;
-	proc_net_remove(afinfo->name);
+	proc_net_remove(init_net(), afinfo->name);
 	memset(afinfo->seq_fops, 0, sizeof(*afinfo->seq_fops));
 }
 
diff --git a/net/ipv4/tcp_probe.c b/net/ipv4/tcp_probe.c
index f230eee..e8a3d96 100644
--- a/net/ipv4/tcp_probe.c
+++ b/net/ipv4/tcp_probe.c
@@ -159,7 +159,7 @@ static __init int tcpprobe_init(void)
 	if (IS_ERR(tcpw.fifo))
 		return PTR_ERR(tcpw.fifo);
 
-	if (!proc_net_fops_create(procname, S_IRUSR, &tcpprobe_fops))
+	if (!proc_net_fops_create(init_net(), procname, S_IRUSR, &tcpprobe_fops))
 		goto err0;
 
 	ret = register_jprobe(&tcp_send_probe);
@@ -169,7 +169,7 @@ static __init int tcpprobe_init(void)
 	pr_info("TCP watch registered (port=%d)\n", port);
 	return 0;
  err1:
-	proc_net_remove(procname);
+	proc_net_remove(init_net(), procname);
  err0:
 	kfifo_free(tcpw.fifo);
 	return ret;
@@ -179,7 +179,7 @@ module_init(tcpprobe_init);
 static __exit void tcpprobe_exit(void)
 {
 	kfifo_free(tcpw.fifo);
-	proc_net_remove(procname);
+	proc_net_remove(init_net(), procname);
 	unregister_jprobe(&tcp_send_probe);
 
 }
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index cfff930..7527183 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -101,6 +101,7 @@
 #include <net/route.h>
 #include <net/checksum.h>
 #include <net/xfrm.h>
+#include <net/net_namespace.h>
 #include "udp_impl.h"
 
 /*
@@ -1643,7 +1644,7 @@ int udp_proc_register(struct udp_seq_afinfo *afinfo)
 	afinfo->seq_fops->llseek	= seq_lseek;
 	afinfo->seq_fops->release	= seq_release_private;
 
-	p = proc_net_fops_create(afinfo->name, S_IRUGO, afinfo->seq_fops);
+	p = proc_net_fops_create(init_net(), afinfo->name, S_IRUGO, afinfo->seq_fops);
 	if (p)
 		p->data = afinfo;
 	else
@@ -1655,7 +1656,7 @@ void udp_proc_unregister(struct udp_seq_afinfo *afinfo)
 {
 	if (!afinfo)
 		return;
-	proc_net_remove(afinfo->name);
+	proc_net_remove(init_net(), afinfo->name);
 	memset(afinfo->seq_fops, 0, sizeof(*afinfo->seq_fops));
 }
 
diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 6aded83..52bd4dd 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -74,6 +74,7 @@
 #include <net/tcp.h>
 #include <net/ip.h>
 #include <net/netlink.h>
+#include <net/net_namespace.h>
 #include <linux/if_tunnel.h>
 #include <linux/rtnetlink.h>
 
@@ -2780,14 +2781,14 @@ static struct file_operations if6_fops = {
 
 int __init if6_proc_init(void)
 {
-	if (!proc_net_fops_create("if_inet6", S_IRUGO, &if6_fops))
+	if (!proc_net_fops_create(init_net(), "if_inet6", S_IRUGO, &if6_fops))
 		return -ENOMEM;
 	return 0;
 }
 
 void if6_proc_exit(void)
 {
-	proc_net_remove("if_inet6");
+	proc_net_remove(init_net(), "if_inet6");
 }
 #endif	/* CONFIG_PROC_FS */
 
@@ -4143,6 +4144,6 @@ void __exit addrconf_cleanup(void)
 	rtnl_unlock();
 
 #ifdef CONFIG_PROC_FS
-	proc_net_remove("if_inet6");
+	proc_net_remove(init_net(), "if_inet6");
 #endif
 }
diff --git a/net/ipv6/anycast.c b/net/ipv6/anycast.c
index a960476..c42bad9 100644
--- a/net/ipv6/anycast.c
+++ b/net/ipv6/anycast.c
@@ -33,6 +33,7 @@
 
 #include <net/sock.h>
 #include <net/snmp.h>
+#include <net/net_namespace.h>
 
 #include <net/ipv6.h>
 #include <net/protocol.h>
@@ -575,7 +576,7 @@ static struct file_operations ac6_seq_fops = {
 
 int __init ac6_proc_init(void)
 {
-	if (!proc_net_fops_create("anycast6", S_IRUGO, &ac6_seq_fops))
+	if (!proc_net_fops_create(init_net(), "anycast6", S_IRUGO, &ac6_seq_fops))
 		return -ENOMEM;
 
 	return 0;
@@ -583,7 +584,7 @@ int __init ac6_proc_init(void)
 
 void ac6_proc_exit(void)
 {
-	proc_net_remove("anycast6");
+	proc_net_remove(init_net(), "anycast6");
 }
 #endif
 
diff --git a/net/ipv6/ip6_flowlabel.c b/net/ipv6/ip6_flowlabel.c
index 624fae2..350aedb 100644
--- a/net/ipv6/ip6_flowlabel.c
+++ b/net/ipv6/ip6_flowlabel.c
@@ -22,6 +22,7 @@
 #include <linux/seq_file.h>
 
 #include <net/sock.h>
+#include <net/net_namespace.h>
 
 #include <net/ipv6.h>
 #include <net/ndisc.h>
@@ -690,7 +691,7 @@ static struct file_operations ip6fl_seq_fops = {
 void ip6_flowlabel_init(void)
 {
 #ifdef CONFIG_PROC_FS
-	proc_net_fops_create("ip6_flowlabel", S_IRUGO, &ip6fl_seq_fops);
+	proc_net_fops_create(init_net(), "ip6_flowlabel", S_IRUGO, &ip6fl_seq_fops);
 #endif
 }
 
@@ -698,6 +699,6 @@ void ip6_flowlabel_cleanup(void)
 {
 	del_timer(&ip6_fl_gc_timer);
 #ifdef CONFIG_PROC_FS
-	proc_net_remove("ip6_flowlabel");
+	proc_net_remove(init_net(), "ip6_flowlabel");
 #endif
 }
diff --git a/net/ipv6/mcast.c b/net/ipv6/mcast.c
index a1c231a..2759571 100644
--- a/net/ipv6/mcast.c
+++ b/net/ipv6/mcast.c
@@ -51,6 +51,7 @@
 
 #include <net/sock.h>
 #include <net/snmp.h>
+#include <net/net_namespace.h>
 
 #include <net/ipv6.h>
 #include <net/protocol.h>
@@ -2661,8 +2662,8 @@ int __init igmp6_init(struct net_proto_family *ops)
 	np->hop_limit = 1;
 
 #ifdef CONFIG_PROC_FS
-	proc_net_fops_create("igmp6", S_IRUGO, &igmp6_mc_seq_fops);
-	proc_net_fops_create("mcfilter6", S_IRUGO, &igmp6_mcf_seq_fops);
+	proc_net_fops_create(init_net(), "igmp6", S_IRUGO, &igmp6_mc_seq_fops);
+	proc_net_fops_create(init_net(), "mcfilter6", S_IRUGO, &igmp6_mcf_seq_fops);
 #endif
 
 	return 0;
@@ -2674,7 +2675,7 @@ void igmp6_cleanup(void)
 	igmp6_socket = NULL; /* for safety */
 
 #ifdef CONFIG_PROC_FS
-	proc_net_remove("mcfilter6");
-	proc_net_remove("igmp6");
+	proc_net_remove(init_net(), "mcfilter6");
+	proc_net_remove(init_net(), "igmp6");
 #endif
 }
diff --git a/net/ipv6/netfilter/ip6_queue.c b/net/ipv6/netfilter/ip6_queue.c
index e774be7..45b64a5 100644
--- a/net/ipv6/netfilter/ip6_queue.c
+++ b/net/ipv6/netfilter/ip6_queue.c
@@ -36,6 +36,7 @@
 #include <linux/sysctl.h>
 #include <linux/proc_fs.h>
 #include <linux/mutex.h>
+#include <net/net_namespace.h>
 #include <net/sock.h>
 #include <net/ipv6.h>
 #include <net/ip6_route.h>
@@ -674,7 +675,7 @@ static int __init ip6_queue_init(void)
 		goto cleanup_netlink_notifier;
 	}
 
-	proc = proc_net_create(IPQ_PROC_FS_NAME, 0, ipq_get_info);
+	proc = proc_net_create(init_net(), IPQ_PROC_FS_NAME, 0, ipq_get_info);
 	if (proc)
 		proc->owner = THIS_MODULE;
 	else {
@@ -695,7 +696,7 @@ static int __init ip6_queue_init(void)
 cleanup_sysctl:
 	unregister_sysctl_table(ipq_sysctl_header);
 	unregister_netdevice_notifier(&ipq_dev_notifier);
-	proc_net_remove(IPQ_PROC_FS_NAME);
+	proc_net_remove(init_net(), IPQ_PROC_FS_NAME);
 	
 cleanup_ipqnl:
 	sock_release(ipqnl->sk_socket);
@@ -715,7 +716,7 @@ static void __exit ip6_queue_fini(void)
 
 	unregister_sysctl_table(ipq_sysctl_header);
 	unregister_netdevice_notifier(&ipq_dev_notifier);
-	proc_net_remove(IPQ_PROC_FS_NAME);
+	proc_net_remove(init_net(), IPQ_PROC_FS_NAME);
 
 	sock_release(ipqnl->sk_socket);
 	mutex_lock(&ipqnl_mutex);
diff --git a/net/ipv6/proc.c b/net/ipv6/proc.c
index 35249d8..1827885 100644
--- a/net/ipv6/proc.c
+++ b/net/ipv6/proc.c
@@ -28,6 +28,7 @@
 #include <net/tcp.h>
 #include <net/transp_v6.h>
 #include <net/ipv6.h>
+#include <net/net_namespace.h>
 
 #ifdef CONFIG_PROC_FS
 static struct proc_dir_entry *proc_net_devsnmp6;
@@ -244,22 +245,22 @@ int __init ipv6_misc_proc_init(void)
 {
 	int rc = 0;
 
-	if (!proc_net_fops_create("snmp6", S_IRUGO, &snmp6_seq_fops))
+	if (!proc_net_fops_create(init_net(), "snmp6", S_IRUGO, &snmp6_seq_fops))
 		goto proc_snmp6_fail;
 
-	proc_net_devsnmp6 = proc_mkdir("dev_snmp6", proc_net);
+	proc_net_devsnmp6 = proc_mkdir("dev_snmp6", per_net(proc_net, init_net()));
 	if (!proc_net_devsnmp6)
 		goto proc_dev_snmp6_fail;
 
-	if (!proc_net_fops_create("sockstat6", S_IRUGO, &sockstat6_seq_fops))
+	if (!proc_net_fops_create(init_net(), "sockstat6", S_IRUGO, &sockstat6_seq_fops))
 		goto proc_sockstat6_fail;
 out:
 	return rc;
 
 proc_sockstat6_fail:
-	proc_net_remove("dev_snmp6");
+	proc_net_remove(init_net(), "dev_snmp6");
 proc_dev_snmp6_fail:
-	proc_net_remove("snmp6");
+	proc_net_remove(init_net(), "snmp6");
 proc_snmp6_fail:
 	rc = -ENOMEM;
 	goto out;
@@ -267,9 +268,9 @@ proc_snmp6_fail:
 
 void ipv6_misc_proc_exit(void)
 {
-	proc_net_remove("sockstat6");
-	proc_net_remove("dev_snmp6");
-	proc_net_remove("snmp6");
+	proc_net_remove(init_net(), "sockstat6");
+	proc_net_remove(init_net(), "dev_snmp6");
+	proc_net_remove(init_net(), "snmp6");
 }
 
 #else	/* CONFIG_PROC_FS */
diff --git a/net/ipv6/raw.c b/net/ipv6/raw.c
index 4ae1b19..2e1825c 100644
--- a/net/ipv6/raw.c
+++ b/net/ipv6/raw.c
@@ -50,6 +50,7 @@
 #include <net/udp.h>
 #include <net/inet_common.h>
 #include <net/tcp_states.h>
+#include <net/net_namespace.h>
 #ifdef CONFIG_IPV6_MIP6
 #include <net/mip6.h>
 #endif
@@ -1274,13 +1275,13 @@ static struct file_operations raw6_seq_fops = {
 
 int __init raw6_proc_init(void)
 {
-	if (!proc_net_fops_create("raw6", S_IRUGO, &raw6_seq_fops))
+	if (!proc_net_fops_create(init_net(), "raw6", S_IRUGO, &raw6_seq_fops))
 		return -ENOMEM;
 	return 0;
 }
 
 void raw6_proc_exit(void)
 {
-	proc_net_remove("raw6");
+	proc_net_remove(init_net(), "raw6");
 }
 #endif	/* CONFIG_PROC_FS */
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 8c3d568..8c9fef9 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -56,6 +56,7 @@
 #include <net/xfrm.h>
 #include <net/netevent.h>
 #include <net/netlink.h>
+#include <net/net_namespace.h>
 
 #include <asm/uaccess.h>
 
@@ -2458,11 +2459,11 @@ void __init ip6_route_init(void)
 				  SLAB_HWCACHE_ALIGN|SLAB_PANIC, NULL, NULL);
 	fib6_init();
 #ifdef 	CONFIG_PROC_FS
-	p = proc_net_create("ipv6_route", 0, rt6_proc_info);
+	p = proc_net_create(init_net(), "ipv6_route", 0, rt6_proc_info);
 	if (p)
 		p->owner = THIS_MODULE;
 
-	proc_net_fops_create("rt6_stats", S_IRUGO, &rt6_stats_seq_fops);
+	proc_net_fops_create(init_net(), "rt6_stats", S_IRUGO, &rt6_stats_seq_fops);
 #endif
 #ifdef CONFIG_XFRM
 	xfrm6_init();
@@ -2478,8 +2479,8 @@ void ip6_route_cleanup(void)
 	fib6_rules_cleanup();
 #endif
 #ifdef CONFIG_PROC_FS
-	proc_net_remove("ipv6_route");
-	proc_net_remove("rt6_stats");
+	proc_net_remove(init_net(), "ipv6_route");
+	proc_net_remove(init_net(), "rt6_stats");
 #endif
 #ifdef CONFIG_XFRM
 	xfrm6_fini();
diff --git a/net/ipx/ipx_proc.c b/net/ipx/ipx_proc.c
index b7463df..bda8775 100644
--- a/net/ipx/ipx_proc.c
+++ b/net/ipx/ipx_proc.c
@@ -9,6 +9,7 @@
 #include <linux/proc_fs.h>
 #include <linux/spinlock.h>
 #include <linux/seq_file.h>
+#include <net/net_namespace.h>
 #include <net/tcp_states.h>
 #include <net/ipx.h>
 
@@ -353,7 +354,7 @@ int __init ipx_proc_init(void)
 	struct proc_dir_entry *p;
 	int rc = -ENOMEM;
        
-	ipx_proc_dir = proc_mkdir("ipx", proc_net);
+	ipx_proc_dir = proc_mkdir("ipx", per_net(proc_net, init_net()));
 
 	if (!ipx_proc_dir)
 		goto out;
@@ -381,7 +382,7 @@ out_socket:
 out_route:
 	remove_proc_entry("interface", ipx_proc_dir);
 out_interface:
-	remove_proc_entry("ipx", proc_net);
+	remove_proc_entry("ipx", per_net(proc_net, init_net()));
 	goto out;
 }
 
@@ -390,7 +391,7 @@ void __exit ipx_proc_exit(void)
 	remove_proc_entry("interface", ipx_proc_dir);
 	remove_proc_entry("route", ipx_proc_dir);
 	remove_proc_entry("socket", ipx_proc_dir);
-	remove_proc_entry("ipx", proc_net);
+	remove_proc_entry("ipx", per_net(proc_net, init_net()));
 }
 
 #else /* CONFIG_PROC_FS */
diff --git a/net/irda/irproc.c b/net/irda/irproc.c
index 88b9c43..0af0f55 100644
--- a/net/irda/irproc.c
+++ b/net/irda/irproc.c
@@ -28,6 +28,7 @@
 #include <linux/seq_file.h>
 #include <linux/module.h>
 #include <linux/init.h>
+#include <net/net_namespace.h>
 
 #include <net/irda/irda.h>
 #include <net/irda/irlap.h>
@@ -66,7 +67,7 @@ void __init irda_proc_register(void)
 	int i;
 	struct proc_dir_entry *d;
 
-	proc_irda = proc_mkdir("irda", proc_net);
+	proc_irda = proc_mkdir("irda", per_net(proc_net, init_net()));
 	if (proc_irda == NULL)
 		return;
 	proc_irda->owner = THIS_MODULE;
@@ -92,7 +93,7 @@ void __exit irda_proc_unregister(void)
                 for (i=0; i<ARRAY_SIZE(irda_dirs); i++)
                         remove_proc_entry(irda_dirs[i].name, proc_irda);
 
-                remove_proc_entry("irda", proc_net);
+                remove_proc_entry("irda", per_net(proc_net, init_net()));
                 proc_irda = NULL;
         }
 }
diff --git a/net/key/af_key.c b/net/key/af_key.c
index 5dd5094..c79f9c4 100644
--- a/net/key/af_key.c
+++ b/net/key/af_key.c
@@ -28,6 +28,7 @@
 #include <linux/init.h>
 #include <net/xfrm.h>
 #include <linux/audit.h>
+#include <net/net_namespace.h>
 
 #include <net/sock.h>
 
@@ -3292,7 +3293,7 @@ static struct xfrm_mgr pfkeyv2_mgr =
 static void __exit ipsec_pfkey_exit(void)
 {
 	xfrm_unregister_km(&pfkeyv2_mgr);
-	remove_proc_entry("net/pfkey", NULL);
+	remove_proc_entry("pfkey", per_net(proc_net, init_net()));
 	sock_unregister(PF_KEY);
 	proto_unregister(&key_proto);
 }
@@ -3309,7 +3310,7 @@ static int __init ipsec_pfkey_init(void)
 		goto out_unregister_key_proto;
 #ifdef CONFIG_PROC_FS
 	err = -ENOMEM;
-	if (create_proc_read_entry("net/pfkey", 0, NULL, pfkey_read_proc, NULL) == NULL)
+	if (create_proc_read_entry("pfkey", 0, per_net(proc_net, init_net()), pfkey_read_proc, NULL) == NULL)
 		goto out_sock_unregister;
 #endif
 	err = xfrm_register_km(&pfkeyv2_mgr);
diff --git a/net/llc/llc_proc.c b/net/llc/llc_proc.c
index 19308fe..4d0a804 100644
--- a/net/llc/llc_proc.c
+++ b/net/llc/llc_proc.c
@@ -18,6 +18,7 @@
 #include <linux/errno.h>
 #include <linux/seq_file.h>
 #include <net/sock.h>
+#include <net/net_namespace.h>
 #include <net/llc.h>
 #include <net/llc_c_ac.h>
 #include <net/llc_c_ev.h>
@@ -231,7 +232,7 @@ int __init llc_proc_init(void)
 	int rc = -ENOMEM;
 	struct proc_dir_entry *p;
 
-	llc_proc_dir = proc_mkdir("llc", proc_net);
+	llc_proc_dir = proc_mkdir("llc", per_net(proc_net, init_net()));
 	if (!llc_proc_dir)
 		goto out;
 	llc_proc_dir->owner = THIS_MODULE;
@@ -254,7 +255,7 @@ out:
 out_core:
 	remove_proc_entry("socket", llc_proc_dir);
 out_socket:
-	remove_proc_entry("llc", proc_net);
+	remove_proc_entry("llc", per_net(proc_net, init_net()));
 	goto out;
 }
 
@@ -262,5 +263,5 @@ void llc_proc_exit(void)
 {
 	remove_proc_entry("socket", llc_proc_dir);
 	remove_proc_entry("core", llc_proc_dir);
-	remove_proc_entry("llc", proc_net);
+	remove_proc_entry("llc", per_net(proc_net, init_net()));
 }
diff --git a/net/netfilter/core.c b/net/netfilter/core.c
index 291b8c6..cafa00c 100644
--- a/net/netfilter/core.c
+++ b/net/netfilter/core.c
@@ -23,6 +23,7 @@
 #include <linux/inetdevice.h>
 #include <linux/proc_fs.h>
 #include <net/sock.h>
+#include <net/net_namespace.h>
 
 #include "nf_internals.h"
 
@@ -269,7 +270,7 @@ void __init netfilter_init(void)
 	}
 
 #ifdef CONFIG_PROC_FS
-	proc_net_netfilter = proc_mkdir("netfilter", proc_net);
+	proc_net_netfilter = proc_mkdir("netfilter", per_net(proc_net, init_net()));
 	if (!proc_net_netfilter)
 		panic("cannot create netfilter proc entry");
 #endif
diff --git a/net/netfilter/nf_conntrack_standalone.c b/net/netfilter/nf_conntrack_standalone.c
index 2587b49..314dc2c 100644
--- a/net/netfilter/nf_conntrack_standalone.c
+++ b/net/netfilter/nf_conntrack_standalone.c
@@ -25,6 +25,7 @@
 #include <linux/seq_file.h>
 #include <linux/percpu.h>
 #include <linux/netdevice.h>
+#include <net/net_namespace.h>
 #ifdef CONFIG_SYSCTL
 #include <linux/sysctl.h>
 #endif
@@ -430,14 +431,14 @@ static int __init nf_conntrack_standalone_init(void)
 		return ret;
 
 #ifdef CONFIG_PROC_FS
-	proc = proc_net_fops_create("nf_conntrack", 0440, &ct_file_ops);
+	proc = proc_net_fops_create(init_net(), "nf_conntrack", 0440, &ct_file_ops);
 	if (!proc) goto cleanup_init;
 
-	proc_exp = proc_net_fops_create("nf_conntrack_expect", 0440,
+	proc_exp = proc_net_fops_create(init_net(), "nf_conntrack_expect", 0440,
 					&exp_file_ops);
 	if (!proc_exp) goto cleanup_proc;
 
-	proc_stat = create_proc_entry("nf_conntrack", S_IRUGO, proc_net_stat);
+	proc_stat = create_proc_entry("nf_conntrack", S_IRUGO, per_net(proc_net_stat, init_net()));
 	if (!proc_stat)
 		goto cleanup_proc_exp;
 
@@ -458,11 +459,11 @@ static int __init nf_conntrack_standalone_init(void)
  cleanup_proc_stat:
 #endif
 #ifdef CONFIG_PROC_FS
-	remove_proc_entry("nf_conntrack", proc_net_stat);
+	remove_proc_entry("nf_conntrack", per_net(proc_net_stat, init_net()));
  cleanup_proc_exp:
-	proc_net_remove("nf_conntrack_expect");
+	proc_net_remove(init_net(), "nf_conntrack_expect");
  cleanup_proc:
-	proc_net_remove("nf_conntrack");
+	proc_net_remove(init_net(), "nf_conntrack");
  cleanup_init:
 #endif /* CNFIG_PROC_FS */
 	nf_conntrack_cleanup();
@@ -475,9 +476,9 @@ static void __exit nf_conntrack_standalone_fini(void)
  	unregister_sysctl_table(nf_ct_sysctl_header);
 #endif
 #ifdef CONFIG_PROC_FS
-	remove_proc_entry("nf_conntrack", proc_net_stat);
-	proc_net_remove("nf_conntrack_expect");
-	proc_net_remove("nf_conntrack");
+	remove_proc_entry("nf_conntrack", per_net(proc_net_stat, init_net()));
+	proc_net_remove(init_net(), "nf_conntrack_expect");
+	proc_net_remove(init_net(), "nf_conntrack");
 #endif /* CNFIG_PROC_FS */
 	nf_conntrack_cleanup();
 }
diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c
index 8996584..9fb3491 100644
--- a/net/netfilter/x_tables.c
+++ b/net/netfilter/x_tables.c
@@ -22,6 +22,7 @@
 #include <linux/vmalloc.h>
 #include <linux/mutex.h>
 #include <linux/mm.h>
+#include <net/net_namespace.h>
 
 #include <linux/netfilter/x_tables.h>
 #include <linux/netfilter_arp.h>
@@ -800,7 +801,7 @@ int xt_proto_init(int af)
 #ifdef CONFIG_PROC_FS
 	strlcpy(buf, xt_proto_prefix[af], sizeof(buf));
 	strlcat(buf, FORMAT_TABLES, sizeof(buf));
-	proc = proc_net_fops_create(buf, 0440, &xt_file_ops);
+	proc = proc_net_fops_create(init_net(), buf, 0440, &xt_file_ops);
 	if (!proc)
 		goto out;
 	proc->data = (void *) ((unsigned long) af | (TABLE << 16));
@@ -808,14 +809,14 @@ int xt_proto_init(int af)
 
 	strlcpy(buf, xt_proto_prefix[af], sizeof(buf));
 	strlcat(buf, FORMAT_MATCHES, sizeof(buf));
-	proc = proc_net_fops_create(buf, 0440, &xt_file_ops);
+	proc = proc_net_fops_create(init_net(), buf, 0440, &xt_file_ops);
 	if (!proc)
 		goto out_remove_tables;
 	proc->data = (void *) ((unsigned long) af | (MATCH << 16));
 
 	strlcpy(buf, xt_proto_prefix[af], sizeof(buf));
 	strlcat(buf, FORMAT_TARGETS, sizeof(buf));
-	proc = proc_net_fops_create(buf, 0440, &xt_file_ops);
+	proc = proc_net_fops_create(init_net(), buf, 0440, &xt_file_ops);
 	if (!proc)
 		goto out_remove_matches;
 	proc->data = (void *) ((unsigned long) af | (TARGET << 16));
@@ -827,12 +828,12 @@ int xt_proto_init(int af)
 out_remove_matches:
 	strlcpy(buf, xt_proto_prefix[af], sizeof(buf));
 	strlcat(buf, FORMAT_MATCHES, sizeof(buf));
-	proc_net_remove(buf);
+	proc_net_remove(init_net(), buf);
 
 out_remove_tables:
 	strlcpy(buf, xt_proto_prefix[af], sizeof(buf));
 	strlcat(buf, FORMAT_TABLES, sizeof(buf));
-	proc_net_remove(buf);
+	proc_net_remove(init_net(), buf);
 out:
 	return -1;
 #endif
@@ -846,15 +847,15 @@ void xt_proto_fini(int af)
 
 	strlcpy(buf, xt_proto_prefix[af], sizeof(buf));
 	strlcat(buf, FORMAT_TABLES, sizeof(buf));
-	proc_net_remove(buf);
+	proc_net_remove(init_net(), buf);
 
 	strlcpy(buf, xt_proto_prefix[af], sizeof(buf));
 	strlcat(buf, FORMAT_TARGETS, sizeof(buf));
-	proc_net_remove(buf);
+	proc_net_remove(init_net(), buf);
 
 	strlcpy(buf, xt_proto_prefix[af], sizeof(buf));
 	strlcat(buf, FORMAT_MATCHES, sizeof(buf));
-	proc_net_remove(buf);
+	proc_net_remove(init_net(), buf);
 #endif /*CONFIG_PROC_FS*/
 }
 EXPORT_SYMBOL_GPL(xt_proto_fini);
diff --git a/net/netfilter/xt_hashlimit.c b/net/netfilter/xt_hashlimit.c
index f28bf69..21c07df 100644
--- a/net/netfilter/xt_hashlimit.c
+++ b/net/netfilter/xt_hashlimit.c
@@ -21,6 +21,7 @@
 #include <linux/in.h>
 #include <linux/ip.h>
 #include <linux/ipv6.h>
+#include <net/net_namespace.h>
 
 #include <linux/netfilter/x_tables.h>
 #include <linux/netfilter_ipv4/ip_tables.h>
@@ -737,13 +738,13 @@ static int __init xt_hashlimit_init(void)
 		printk(KERN_ERR "xt_hashlimit: unable to create slab cache\n");
 		goto err2;
 	}
-	hashlimit_procdir4 = proc_mkdir("ipt_hashlimit", proc_net);
+	hashlimit_procdir4 = proc_mkdir("ipt_hashlimit", per_net(proc_net, init_net()));
 	if (!hashlimit_procdir4) {
 		printk(KERN_ERR "xt_hashlimit: unable to create proc dir "
 				"entry\n");
 		goto err3;
 	}
-	hashlimit_procdir6 = proc_mkdir("ip6t_hashlimit", proc_net);
+	hashlimit_procdir6 = proc_mkdir("ip6t_hashlimit", per_net(proc_net, init_net()));
 	if (!hashlimit_procdir6) {
 		printk(KERN_ERR "xt_hashlimit: unable to create proc dir "
 				"entry\n");
@@ -751,7 +752,7 @@ static int __init xt_hashlimit_init(void)
 	}
 	return 0;
 err4:
-	remove_proc_entry("ipt_hashlimit", proc_net);
+	remove_proc_entry("ipt_hashlimit", per_net(proc_net, init_net()));
 err3:
 	kmem_cache_destroy(hashlimit_cachep);
 err2:
@@ -763,8 +764,8 @@ err1:
 
 static void __exit xt_hashlimit_fini(void)
 {
-	remove_proc_entry("ipt_hashlimit", proc_net);
-	remove_proc_entry("ip6t_hashlimit", proc_net);
+	remove_proc_entry("ipt_hashlimit", per_net(proc_net, init_net()));
+	remove_proc_entry("ip6t_hashlimit", per_net(proc_net, init_net()));
 	kmem_cache_destroy(hashlimit_cachep);
 	xt_unregister_matches(xt_hashlimit, ARRAY_SIZE(xt_hashlimit));
 }
diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index 383dd4e..3c00f48 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -60,6 +60,7 @@
 #include <net/sock.h>
 #include <net/scm.h>
 #include <net/netlink.h>
+#include <net/net_namespace.h>
 
 #define NLGRPSZ(x)	(ALIGN(x, sizeof(unsigned long) * 8) / 8)
 
@@ -1806,7 +1807,7 @@ static int __init netlink_proto_init(void)
 
 	sock_register(&netlink_family_ops);
 #ifdef CONFIG_PROC_FS
-	proc_net_fops_create("netlink", 0, &netlink_seq_fops);
+	proc_net_fops_create(init_net(), "netlink", 0, &netlink_seq_fops);
 #endif
 	/* The netlink device handler may be needed early. */ 
 	rtnetlink_init();
diff --git a/net/netrom/af_netrom.c b/net/netrom/af_netrom.c
index 43bbe2c..601d58c 100644
--- a/net/netrom/af_netrom.c
+++ b/net/netrom/af_netrom.c
@@ -41,6 +41,7 @@
 #include <net/ip.h>
 #include <net/tcp_states.h>
 #include <net/arp.h>
+#include <net/net_namespace.h>
 #include <linux/init.h>
 
 static int nr_ndevs = 4;
@@ -1442,9 +1443,9 @@ static int __init nr_proto_init(void)
 
 	nr_loopback_init();
 
-	proc_net_fops_create("nr", S_IRUGO, &nr_info_fops);
-	proc_net_fops_create("nr_neigh", S_IRUGO, &nr_neigh_fops);
-	proc_net_fops_create("nr_nodes", S_IRUGO, &nr_nodes_fops);
+	proc_net_fops_create(init_net(), "nr", S_IRUGO, &nr_info_fops);
+	proc_net_fops_create(init_net(), "nr_neigh", S_IRUGO, &nr_neigh_fops);
+	proc_net_fops_create(init_net(), "nr_nodes", S_IRUGO, &nr_nodes_fops);
 out:
 	return rc;
 fail:
@@ -1472,9 +1473,9 @@ static void __exit nr_exit(void)
 {
 	int i;
 
-	proc_net_remove("nr");
-	proc_net_remove("nr_neigh");
-	proc_net_remove("nr_nodes");
+	proc_net_remove(init_net(), "nr");
+	proc_net_remove(init_net(), "nr_neigh");
+	proc_net_remove(init_net(), "nr_nodes");
 	nr_loopback_clear();
 
 	nr_rt_free();
diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index da73e8a..04e295a 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -65,6 +65,7 @@
 #include <net/protocol.h>
 #include <linux/skbuff.h>
 #include <net/sock.h>
+#include <net/net_namespace.h>
 #include <linux/errno.h>
 #include <linux/timer.h>
 #include <asm/system.h>
@@ -1911,7 +1912,7 @@ static struct file_operations packet_seq_fops = {
 
 static void __exit packet_exit(void)
 {
-	proc_net_remove("packet");
+	proc_net_remove(init_net(), "packet");
 	unregister_netdevice_notifier(&packet_netdev_notifier);
 	sock_unregister(PF_PACKET);
 	proto_unregister(&packet_proto);
@@ -1926,7 +1927,7 @@ static int __init packet_init(void)
 
 	sock_register(&packet_family_ops);
 	register_netdevice_notifier(&packet_netdev_notifier);
-	proc_net_fops_create("packet", 0, &packet_seq_fops);
+	proc_net_fops_create(init_net(), "packet", 0, &packet_seq_fops);
 out:
 	return rc;
 }
diff --git a/net/rose/af_rose.c b/net/rose/af_rose.c
index 9e27946..5532340 100644
--- a/net/rose/af_rose.c
+++ b/net/rose/af_rose.c
@@ -45,6 +45,7 @@
 #include <net/tcp_states.h>
 #include <net/ip.h>
 #include <net/arp.h>
+#include <net/net_namespace.h>
 
 static int rose_ndevs = 10;
 
@@ -1550,10 +1551,10 @@ static int __init rose_proto_init(void)
 
 	rose_add_loopback_neigh();
 
-	proc_net_fops_create("rose", S_IRUGO, &rose_info_fops);
-	proc_net_fops_create("rose_neigh", S_IRUGO, &rose_neigh_fops);
-	proc_net_fops_create("rose_nodes", S_IRUGO, &rose_nodes_fops);
-	proc_net_fops_create("rose_routes", S_IRUGO, &rose_routes_fops);
+	proc_net_fops_create(init_net(), "rose", S_IRUGO, &rose_info_fops);
+	proc_net_fops_create(init_net(), "rose_neigh", S_IRUGO, &rose_neigh_fops);
+	proc_net_fops_create(init_net(), "rose_nodes", S_IRUGO, &rose_nodes_fops);
+	proc_net_fops_create(init_net(), "rose_routes", S_IRUGO, &rose_routes_fops);
 out:
 	return rc;
 fail:
@@ -1580,10 +1581,10 @@ static void __exit rose_exit(void)
 {
 	int i;
 
-	proc_net_remove("rose");
-	proc_net_remove("rose_neigh");
-	proc_net_remove("rose_nodes");
-	proc_net_remove("rose_routes");
+	proc_net_remove(init_net(), "rose");
+	proc_net_remove(init_net(), "rose_neigh");
+	proc_net_remove(init_net(), "rose_nodes");
+	proc_net_remove(init_net(), "rose_routes");
 	rose_loopback_clear();
 
 	rose_rt_free();
diff --git a/net/rxrpc/proc.c b/net/rxrpc/proc.c
index 29975d9..e7bd87b 100644
--- a/net/rxrpc/proc.c
+++ b/net/rxrpc/proc.c
@@ -14,6 +14,7 @@
 #include <linux/module.h>
 #include <linux/proc_fs.h>
 #include <linux/seq_file.h>
+#include <net/net_namespace.h>
 #include <rxrpc/rxrpc.h>
 #include <rxrpc/transport.h>
 #include <rxrpc/peer.h>
@@ -133,7 +134,7 @@ int rxrpc_proc_init(void)
 {
 	struct proc_dir_entry *p;
 
-	proc_rxrpc = proc_mkdir("rxrpc", proc_net);
+	proc_rxrpc = proc_mkdir("rxrpc", per_net(proc_net, init_net()));
 	if (!proc_rxrpc)
 		goto error;
 	proc_rxrpc->owner = THIS_MODULE;
@@ -169,7 +170,7 @@ int rxrpc_proc_init(void)
  error_calls:
 	remove_proc_entry("calls", proc_rxrpc);
  error_proc:
-	remove_proc_entry("rxrpc", proc_net);
+	remove_proc_entry("rxrpc", per_net(proc_net, init_net()));
  error:
 	return -ENOMEM;
 } /* end rxrpc_proc_init() */
@@ -185,7 +186,7 @@ void rxrpc_proc_cleanup(void)
 	remove_proc_entry("connections", proc_rxrpc);
 	remove_proc_entry("calls", proc_rxrpc);
 
-	remove_proc_entry("rxrpc", proc_net);
+	remove_proc_entry("rxrpc", per_net(proc_net, init_net()));
 
 } /* end rxrpc_proc_cleanup() */
 
diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index 65825f4..da7e1eb 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -36,6 +36,7 @@
 #include <linux/list.h>
 #include <linux/bitops.h>
 
+#include <net/net_namespace.h>
 #include <net/sock.h>
 #include <net/pkt_sched.h>
 
@@ -1296,7 +1297,7 @@ static int __init pktsched_init(void)
 
 	register_qdisc(&pfifo_qdisc_ops);
 	register_qdisc(&bfifo_qdisc_ops);
-	proc_net_fops_create("psched", 0, &psched_fops);
+	proc_net_fops_create(init_net(), "psched", 0, &psched_fops);
 
 	return 0;
 }
diff --git a/net/sctp/protocol.c b/net/sctp/protocol.c
index 225f39b..ea94951 100644
--- a/net/sctp/protocol.c
+++ b/net/sctp/protocol.c
@@ -59,6 +59,7 @@
 #include <net/addrconf.h>
 #include <net/inet_common.h>
 #include <net/inet_ecn.h>
+#include <net/net_namespace.h>
 
 /* Global data structures. */
 struct sctp_globals sctp_globals __read_mostly;
@@ -93,7 +94,7 @@ static __init int sctp_proc_init(void)
 {
 	if (!proc_net_sctp) {
 		struct proc_dir_entry *ent;
-		ent = proc_mkdir("net/sctp", NULL);
+		ent = proc_mkdir("sctp", per_net(proc_net, init_net()));
 		if (ent) {
 			ent->owner = THIS_MODULE;
 			proc_net_sctp = ent;
@@ -126,7 +127,7 @@ static void sctp_proc_exit(void)
 
 	if (proc_net_sctp) {
 		proc_net_sctp = NULL;
-		remove_proc_entry("net/sctp", NULL);
+		remove_proc_entry("sctp", per_net(proc_net, init_net()));
 	}
 }
 
diff --git a/net/sunrpc/stats.c b/net/sunrpc/stats.c
index bd98124..996b71c 100644
--- a/net/sunrpc/stats.c
+++ b/net/sunrpc/stats.c
@@ -22,6 +22,7 @@
 #include <linux/sunrpc/clnt.h>
 #include <linux/sunrpc/svcsock.h>
 #include <linux/sunrpc/metrics.h>
+#include <net/net_namespace.h>
 
 #define RPCDBG_FACILITY	RPCDBG_MISC
 
@@ -266,7 +267,7 @@ rpc_proc_init(void)
 	dprintk("RPC: registering /proc/net/rpc\n");
 	if (!proc_net_rpc) {
 		struct proc_dir_entry *ent;
-		ent = proc_mkdir("rpc", proc_net);
+		ent = proc_mkdir("rpc", per_net(proc_net, init_net()));
 		if (ent) {
 			ent->owner = THIS_MODULE;
 			proc_net_rpc = ent;
@@ -280,7 +281,7 @@ rpc_proc_exit(void)
 	dprintk("RPC: unregistering /proc/net/rpc\n");
 	if (proc_net_rpc) {
 		proc_net_rpc = NULL;
-		remove_proc_entry("net/rpc", NULL);
+		remove_proc_entry("rpc", per_net(proc_net, init_net()));
 	}
 }
 
diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 2f208c7..30855e1 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -116,6 +116,7 @@
 #include <linux/mount.h>
 #include <net/checksum.h>
 #include <linux/security.h>
+#include <net/net_namespace.h>
 
 int sysctl_unix_max_dgram_qlen __read_mostly = 10;
 
@@ -2072,7 +2073,7 @@ static int __init af_unix_init(void)
 
 	sock_register(&unix_family_ops);
 #ifdef CONFIG_PROC_FS
-	proc_net_fops_create("unix", 0, &unix_seq_fops);
+	proc_net_fops_create(init_net(), "unix", 0, &unix_seq_fops);
 #endif
 	unix_sysctl_register();
 out:
@@ -2083,7 +2084,7 @@ static void __exit af_unix_exit(void)
 {
 	sock_unregister(PF_UNIX);
 	unix_sysctl_unregister();
-	proc_net_remove("unix");
+	proc_net_remove(init_net(), "unix");
 	proto_unregister(&unix_proto);
 }
 
diff --git a/net/wanrouter/wanproc.c b/net/wanrouter/wanproc.c
index 930ea59..1fcb0b8 100644
--- a/net/wanrouter/wanproc.c
+++ b/net/wanrouter/wanproc.c
@@ -28,6 +28,7 @@
 #include <linux/wanrouter.h>	/* WAN router API definitions */
 #include <linux/seq_file.h>
 #include <linux/smp_lock.h>
+#include <net/net_namespace.h>
 
 #include <asm/io.h>
 
@@ -287,7 +288,7 @@ static struct file_operations wandev_fops = {
 int __init wanrouter_proc_init(void)
 {
 	struct proc_dir_entry *p;
-	proc_router = proc_mkdir(ROUTER_NAME, proc_net);
+	proc_router = proc_mkdir(ROUTER_NAME, per_net(proc_net, init_net()));
 	if (!proc_router)
 		goto fail;
 
@@ -303,7 +304,7 @@ int __init wanrouter_proc_init(void)
 fail_stat:
 	remove_proc_entry("config", proc_router);
 fail_config:
-	remove_proc_entry(ROUTER_NAME, proc_net);
+	remove_proc_entry(ROUTER_NAME, per_net(proc_net, init_net()));
 fail:
 	return -ENOMEM;
 }
@@ -316,7 +317,7 @@ void wanrouter_proc_cleanup(void)
 {
 	remove_proc_entry("config", proc_router);
 	remove_proc_entry("status", proc_router);
-	remove_proc_entry(ROUTER_NAME, proc_net);
+	remove_proc_entry(ROUTER_NAME, per_net(proc_net, init_net()));
 }
 
 /*
diff --git a/net/x25/x25_proc.c b/net/x25/x25_proc.c
index a11837d..7bcf98d 100644
--- a/net/x25/x25_proc.c
+++ b/net/x25/x25_proc.c
@@ -20,6 +20,7 @@
 #include <linux/init.h>
 #include <linux/proc_fs.h>
 #include <linux/seq_file.h>
+#include <net/net_namespace.h>
 #include <net/sock.h>
 #include <net/x25.h>
 
@@ -212,7 +213,7 @@ int __init x25_proc_init(void)
 	struct proc_dir_entry *p;
 	int rc = -ENOMEM;
 
-	x25_proc_dir = proc_mkdir("x25", proc_net);
+	x25_proc_dir = proc_mkdir("x25", per_net(proc_net, init_net()));
 	if (!x25_proc_dir)
 		goto out;
 
@@ -231,7 +232,7 @@ out:
 out_socket:
 	remove_proc_entry("route", x25_proc_dir);
 out_route:
-	remove_proc_entry("x25", proc_net);
+	remove_proc_entry("x25", per_net(proc_net, init_net()));
 	goto out;
 }
 
@@ -239,7 +240,7 @@ void __exit x25_proc_exit(void)
 {
 	remove_proc_entry("route", x25_proc_dir);
 	remove_proc_entry("socket", x25_proc_dir);
-	remove_proc_entry("x25", proc_net);
+	remove_proc_entry("x25", per_net(proc_net, init_net()));
 }
 
 #else /* CONFIG_PROC_FS */
-- 
1.4.4.1.g278f

_______________________________________________
Containers mailing list
Containers@lists.osdl.org
https://lists.osdl.org/mailman/listinfo/containers

Report message to a moderator

[PATCH RFC 8/31] net: Make /sys/class/net handle multiple network namespaces [message #17346 is a reply to message #17338]

Thu, 25 January 2007 19:00

ebiederm
Messages: 1354
Registered: February 2006

Senior Member

From: Eric W. Biederman <ebiederm@xmission.com> - unquoted

In combination with the sysfs support I am in the process of merging
with gregkh, creates a separate instance of the /sys/class/net directory
for each network namespace so two devices with the same name do not conflict.
Then a network namespace sensitive follow link method on the /sys/class/net
directory ensures that you see the directory instance for your current network
namespace.

Ensuring all existing applications continue to see what we is currently
present in sysfs.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 net/core/net-sysfs.c |   53 +++++++++++++++++++++++++++++++++++++++++++++++++-
 1 files changed, 52 insertions(+), 1 deletions(-)

diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index 5d08cc9..b08c1be 100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -11,12 +11,14 @@
 
 #include <linux/capability.h>
 #include <linux/kernel.h>
+#include <linux/sysfs.h>
 #include <linux/netdevice.h>
 #include <linux/if_arp.h>
 #include <net/sock.h>
 #include <linux/rtnetlink.h>
 #include <linux/wireless.h>
 #include <net/iw_handler.h>
+#include <net/net_namespace.h>
 
 #define to_class_dev(obj) container_of(obj,struct class_device,kobj)
 #define to_net_dev(class) container_of(class, struct net_device, class_dev)
@@ -431,6 +433,24 @@ static void netdev_release(struct class_device *cd)
 	kfree((char *)dev - dev->padded);
 }
 
+static DEFINE_PER_NET(struct dentry *, net_shadow) = NULL;
+
+static struct dentry *net_class_device_dparent(struct class_device *cd)
+{
+	struct net_device *dev
+		= container_of(cd, struct net_device, class_dev);
+	net_t net = dev->nd_net;
+
+	return per_net(net_shadow, net);
+}
+
+static void *class_net_follow_link(struct dentry *dentry, struct nameidata *nd)
+{
+	dput(nd->dentry);
+	nd->dentry = dget(per_net(net_shadow, current->nsproxy->net_ns));
+	return NULL;
+}
+
 static struct class net_class = {
 	.name = "net",
 	.release = netdev_release,
@@ -438,6 +458,8 @@ static struct class net_class = {
 #ifdef CONFIG_HOTPLUG
 	.uevent = netdev_uevent,
 #endif
+	.class_device_dparent	= net_class_device_dparent,
+	.class_follow_link	= class_net_follow_link,
 };
 
 void netdev_unregister_sysfs(struct net_device * dev)
@@ -470,7 +492,36 @@ int netdev_register_sysfs(struct net_device *dev)
 	return class_device_add(class_dev);
 }
 
+static int netdev_sysfs_net_init(net_t net)
+{
+	struct dentry *shadow;
+	int error = 0;
+	shadow = sysfs_create_shadow_dir(&net_class.subsys.kset.kobj);
+	if (IS_ERR(shadow))
+		error = PTR_ERR(shadow);
+	else
+		per_net(net_shadow, net) = shadow;
+	return error;
+}
+
+static void netdev_sysfs_net_exit(net_t net)
+{
+	sysfs_remove_shadow_dir(per_net(net_shadow, net));
+	per_net(net_shadow, net) = NULL;
+}
+
+static struct pernet_operations netdev_sysfs_ops = {
+	.init = netdev_sysfs_net_init,
+	.exit = netdev_sysfs_net_exit,
+};
+ 
 int netdev_sysfs_init(void)
 {
-	return class_register(&net_class);
+	int rc;
+	if ((rc = class_register(&net_class)))
+		goto out;
+	if ((rc = register_pernet_subsys(&netdev_sysfs_ops)))
+		goto out;
+out:
+	return rc;
 }
-- 
1.4.4.1.g278f

_______________________________________________
Containers mailing list
Containers@lists.osdl.org
https://lists.osdl.org/mailman/listinfo/containers

Report message to a moderator

[PATCH RFC 9/31] net: Implement the per network namespace sysctl infrastructure [message #17347 is a reply to message #17338]

Thu, 25 January 2007 19:00

ebiederm
Messages: 1354
Registered: February 2006

Senior Member

From: Eric W. Biederman <ebiederm@xmission.com> - unquoted

The user interface is: register_net_sysctl_table and
unregister_net_sysctl_table.  Very much like the current
interface except there is an network namespace parameter.

This this any sysctl in the net_root_table and it's
subdirectories are registered with register_net_sysctl
shows up only to tasks in the same network namespace.

All other sysctls continue to be globally visible.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 include/linux/sysctl.h     |    7 ++++
 include/net/sock.h         |    1 +
 kernel/sysctl.c            |   71 ++++++++++++++++++++++++++++++++++++++++++-
 net/core/sysctl_net_core.c |    5 +++
 net/sysctl_net.c           |   20 ++++++++++++
 5 files changed, 102 insertions(+), 2 deletions(-)

diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h
index 8eba2d2..286e723 100644
--- a/include/linux/sysctl.h
+++ b/include/linux/sysctl.h
@@ -1044,6 +1044,13 @@ struct ctl_table_header * register_sysctl_table(ctl_table * table);
 
 void unregister_sysctl_table(struct ctl_table_header * table);
 
+#ifdef CONFIG_NET
+#include <linux/net_namespace_type.h>
+extern struct ctl_table_header *register_net_sysctl_table(net_t net, struct ctl_table *table);
+extern void unregister_net_sysctl_table(struct ctl_table_header *header);
+DECLARE_PER_NET(struct ctl_table, net_root_table[]);
+#endif
+
 #else /* __KERNEL__ */
 
 #endif /* __KERNEL__ */
diff --git a/include/net/sock.h b/include/net/sock.h
index 5bf6bb5..01a2781 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1414,6 +1414,7 @@ extern void sk_init(void);
 
 #ifdef CONFIG_SYSCTL
 extern struct ctl_table core_table[];
+DECLARE_PER_NET(struct ctl_table, multi_core_table[]);
 #endif
 
 extern int sysctl_optmem_max;
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 7da313e..ae6a424 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -45,6 +45,7 @@
 #include <linux/syscalls.h>
 #include <linux/nfs_fs.h>
 #include <linux/acpi.h>
+#include <net/net_namespace.h>
 
 #include <asm/uaccess.h>
 #include <asm/processor.h>
@@ -135,6 +136,10 @@ static int proc_do_cad_pid(ctl_table *table, int write, struct file *filp,
 		  void __user *buffer, size_t *lenp, loff_t *ppos);
 #endif
 
+#ifdef CONFIG_NET
+static DEFINE_PER_NET(struct ctl_table_header, net_table_header);
+#endif
+
 static ctl_table root_table[];
 static struct ctl_table_header root_table_header =
 	{ root_table, LIST_HEAD_INIT(root_table_header.ctl_entry) };
@@ -1059,6 +1064,7 @@ struct ctl_table_header *sysctl_head_next(struct ctl_table_header *prev)
 {
 	struct ctl_table_header *head;
 	struct list_head *tmp;
+	net_t net = current->nsproxy->net_ns;
 	spin_lock(&sysctl_lock);
 	if (prev) {
 		tmp = &prev->ctl_entry;
@@ -1076,6 +1082,10 @@ struct ctl_table_header *sysctl_head_next(struct ctl_table_header *prev)
 	next:
 		tmp = tmp->next;
 		if (tmp == &root_table_header.ctl_entry)
+#ifdef CONFIG_NET
+			tmp = &per_net(net_table_header, net).ctl_entry;
+		else if (tmp == &per_net(net_table_header, net).ctl_entry)
+#endif
 			break;
 	}
 	spin_unlock(&sysctl_lock);
@@ -1290,7 +1300,8 @@ int do_sysctl_strategy (ctl_table *table,
  * This routine returns %NULL on a failure to register, and a pointer
  * to the table header on success.
  */
-struct ctl_table_header *register_sysctl_table(ctl_table * table)
+static struct ctl_table_header *__register_sysctl_table(
+	struct ctl_table_header *root, ctl_table * table)
 {
 	struct ctl_table_header *tmp;
 	tmp = kmalloc(sizeof(struct ctl_table_header), GFP_KERNEL);
@@ -1301,11 +1312,16 @@ struct ctl_table_header *register_sysctl_table(ctl_table * table)
 	tmp->used = 0;
 	tmp->unregistering = NULL;
 	spin_lock(&sysctl_lock);
-	list_add_tail(&tmp->ctl_entry, &root_table_header.ctl_entry);
+	list_add_tail(&tmp->ctl_entry, &root->ctl_entry);
 	spin_unlock(&sysctl_lock);
 	return tmp;
 }
 
+struct ctl_table_header *register_sysctl_table(ctl_table *table)
+{
+	return __register_sysctl_table(&root_table_header, table);
+}
+
 /**
  * unregister_sysctl_table - unregister a sysctl table hierarchy
  * @header: the header returned from register_sysctl_table
@@ -1322,6 +1338,57 @@ void unregister_sysctl_table(struct ctl_table_header * header)
 	kfree(header);
 }
 
+#ifdef CONFIG_NET
+
+static void *fixup_per_net_addr(net_t net, void *addr)
+{
+	char *ptr = addr;
+	if ((ptr >= __per_net_start) && (ptr < __per_net_end))
+		ptr += __per_net_offset(net);
+	return ptr;
+}
+
+static void sysctl_net_table_fixup(net_t net, struct ctl_table *table)
+{
+	for (; table->ctl_name || table->procname; table++) {
+		table->child  = fixup_per_net_addr(net, table->child);
+		table->data   = fixup_per_net_addr(net, table->data);
+		table->extra1 = fixup_per_net_addr(net, table->extra1);
+		table->extra2 = fixup_per_net_addr(net, table->extra2);
+
+		/* Whee recursive functions on the kernel stack */
+		if (table->child)
+			sysctl_net_table_fixup(net, table->child);
+	}
+}
+
+static void sysctl_net_init(net_t net)
+{
+	struct ctl_table *table = per_net(net_root_table, net);
+
+	sysctl_net_table_fixup(net, table);
+	per_net(net_table_header, net).ctl_table = table;
+
+	INIT_LIST_HEAD(&per_net(net_table_header, net).ctl_entry);
+}
+
+struct ctl_table_header *register_net_sysctl_table(net_t net, ctl_table *table)
+{
+	if (!per_net(net_table_header, net).ctl_table)
+		sysctl_net_init(net);
+	sysctl_net_table_fixup(net, table);
+	return __register_sysctl_table(&per_net(net_table_header, net), table);
+}
+EXPORT_SYMBOL_GPL(register_net_sysctl_table);
+
+void unregister_net_sysctl_table(struct ctl_table_header *header)
+{
+	return unregister_sysctl_table(header);
+}
+EXPORT_SYMBOL_GPL(unregister_net_sysctl_table);
+#endif
+
+
 #else /* !CONFIG_SYSCTL */
 struct ctl_table_header * register_sysctl_table(ctl_table * table,
 						int insert_at_head)
diff --git a/net/core/sysctl_net_core.c b/net/core/sysctl_net_core.c
index 176ad08..76f7a29 100644
--- a/net/core/sysctl_net_core.c
+++ b/net/core/sysctl_net_core.c
@@ -125,3 +125,8 @@ ctl_table core_table[] = {
 	},
 	{ .ctl_name = 0 }
 };
+
+DEFINE_PER_NET(struct ctl_table, multi_core_table[]) = {
+	/* Stub for holding per network namespace sysctls */
+	{}
+};
diff --git a/net/sysctl_net.c b/net/sysctl_net.c
index cd4eafb..359c163 100644
--- a/net/sysctl_net.c
+++ b/net/sysctl_net.c
@@ -54,3 +54,23 @@ struct ctl_table net_table[] = {
 #endif
 	{ 0 },
 };
+
+DEFINE_PER_NET(struct ctl_table, multi_net_table[]) = {
+	{
+		.ctl_name	= NET_CORE,
+		.procname	= "core",
+		.mode		= 0555,
+		.child		= __per_net_base(multi_core_table),
+	},
+	{},
+};
+
+DEFINE_PER_NET(struct ctl_table, net_root_table[]) = {
+	{
+		.ctl_name	= CTL_NET,
+		.procname	= "net",
+		.mode		= 0555,
+		.child		= __per_net_base(multi_net_table),
+	},
+	{},
+};
-- 
1.4.4.1.g278f

_______________________________________________
Containers mailing list
Containers@lists.osdl.org
https://lists.osdl.org/mailman/listinfo/containers

Report message to a moderator

[PATCH RFC 10/31] net: Make socket creation namespace safe. [message #17348 is a reply to message #17338]

Thu, 25 January 2007 19:00

ebiederm
Messages: 1354
Registered: February 2006

Senior Member

From: Eric W. Biederman <ebiederm@xmission.com> - unquoted

This patch passes in the namespace a new socket should be created in
and has the socket code do the appropriate reference counting.  By
virtue of this all socket create methods are touched.  In addition
the socket create methods are modified so that they will fail if
you attempt to create a socket in a non-default network namespace.

Failing if we attempt to create a socket outside of the default
socket namespace ensures that as we incrementally make the network stack
network namespace aware we will not export functionality that someone
has not audited and made certain is network namespace safe.
Allowing us to partially enable network namespaces before all of the
exotic protocols are supported.

Any protocol layers I have missed will fail to compile because I now
pass an extra parameter into the socket creation code.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 drivers/net/pppoe.c          |    4 ++--
 drivers/net/pppox.c          |    7 +++++--
 include/linux/if_pppox.h     |    2 +-
 include/linux/net.h          |    3 ++-
 include/net/llc_conn.h       |    2 +-
 include/net/sock.h           |    4 +++-
 net/appletalk/ddp.c          |    7 +++++--
 net/atm/common.c             |    4 ++--
 net/atm/common.h             |    2 +-
 net/atm/pvc.c                |    7 +++++--
 net/atm/svc.c                |   11 +++++++----
 net/ax25/af_ax25.c           |    9 ++++++---
 net/bluetooth/af_bluetooth.c |    7 +++++--
 net/bluetooth/bnep/sock.c    |    4 ++--
 net/bluetooth/cmtp/sock.c    |    4 ++--
 net/bluetooth/hci_sock.c     |    4 ++--
 net/bluetooth/hidp/sock.c    |    4 ++--
 net/bluetooth/l2cap.c        |   10 +++++-----
 net/bluetooth/rfcomm/sock.c  |   10 +++++-----
 net/bluetooth/sco.c          |   10 +++++-----
 net/core/sock.c              |    6 ++++--
 net/decnet/af_decnet.c       |   13 ++++++++-----
 net/econet/af_econet.c       |    7 +++++--
 net/ipv4/af_inet.c           |    7 +++++--
 net/ipv6/af_inet6.c          |    7 +++++--
 net/ipx/af_ipx.c             |    7 +++++--
 net/irda/af_irda.c           |   11 +++++++----
 net/key/af_key.c             |    7 +++++--
 net/llc/af_llc.c             |    7 +++++--
 net/llc/llc_conn.c           |    6 +++---
 net/netlink/af_netlink.c     |   13 ++++++++-----
 net/netrom/af_netrom.c       |    9 ++++++---
 net/packet/af_packet.c       |    7 +++++--
 net/rose/af_rose.c           |    9 ++++++---
 net/sctp/ipv6.c              |    2 +-
 net/sctp/protocol.c          |    2 +-
 net/socket.c                 |    8 ++++----
 net/tipc/socket.c            |    9 ++++++---
 net/unix/af_unix.c           |   13 ++++++++-----
 net/wanrouter/af_wanpipe.c   |   15 +++++++++------
 net/x25/af_x25.c             |   13 ++++++++-----
 41 files changed, 182 insertions(+), 111 deletions(-)

diff --git a/drivers/net/pppoe.c b/drivers/net/pppoe.c
index d34fe16..d09334d 100644
--- a/drivers/net/pppoe.c
+++ b/drivers/net/pppoe.c
@@ -475,12 +475,12 @@ static struct proto pppoe_sk_proto = {
  * Initialize a new struct sock.
  *
  **********************************************************************/
-static int pppoe_create(struct socket *sock)
+static int pppoe_create(net_t net, struct socket *sock)
 {
 	int error = -ENOMEM;
 	struct sock *sk;
 
-	sk = sk_alloc(PF_PPPOX, GFP_KERNEL, &pppoe_sk_proto, 1);
+	sk = sk_alloc(net, PF_PPPOX, GFP_KERNEL, &pppoe_sk_proto, 1);
 	if (!sk)
 		goto out;
 
diff --git a/drivers/net/pppox.c b/drivers/net/pppox.c
index 9315046..0d5c7bc 100644
--- a/drivers/net/pppox.c
+++ b/drivers/net/pppox.c
@@ -106,10 +106,13 @@ int pppox_ioctl(struct socket *sock, unsigned int cmd, unsigned long arg)
 
 EXPORT_SYMBOL(pppox_ioctl);
 
-static int pppox_create(struct socket *sock, int protocol)
+static int pppox_create(net_t net, struct socket *sock, int protocol)
 {
 	int rc = -EPROTOTYPE;
 
+	if (!net_eq(net, init_net()))
+		return -EAFNOSUPPORT;
+
 	if (protocol < 0 || protocol > PX_MAX_PROTO)
 		goto out;
 
@@ -118,7 +121,7 @@ static int pppox_create(struct socket *sock, int protocol)
 	    !try_module_get(pppox_protos[protocol]->owner))
 		goto out;
 
-	rc = pppox_protos[protocol]->create(sock);
+	rc = pppox_protos[protocol]->create(net, sock);
 
 	module_put(pppox_protos[protocol]->owner);
 out:
diff --git a/include/linux/if_pppox.h b/include/linux/if_pppox.h
index 4fab3d0..f6ffd83 100644
--- a/include/linux/if_pppox.h
+++ b/include/linux/if_pppox.h
@@ -148,7 +148,7 @@ static inline struct sock *sk_pppox(struct pppox_sock *po)
 struct module;
 
 struct pppox_proto {
-	int		(*create)(struct socket *sock);
+	int		(*create)(net_t net, struct socket *sock);
 	int		(*ioctl)(struct socket *sock, unsigned int cmd,
 				 unsigned long arg);
 	struct module	*owner;
diff --git a/include/linux/net.h b/include/linux/net.h
index f28d8a2..4136768 100644
--- a/include/linux/net.h
+++ b/include/linux/net.h
@@ -19,6 +19,7 @@
 #define _LINUX_NET_H
 
 #include <linux/wait.h>
+#include <linux/net_namespace_type.h>
 #include <asm/socket.h>
 
 struct poll_table_struct;
@@ -169,7 +170,7 @@ struct proto_ops {
 
 struct net_proto_family {
 	int		family;
-	int		(*create)(struct socket *sock, int protocol);
+	int		(*create)(net_t net, struct socket *sock, int protocol);
 	struct module	*owner;
 };
 
diff --git a/include/net/llc_conn.h b/include/net/llc_conn.h
index 00730d2..e4f7104 100644
--- a/include/net/llc_conn.h
+++ b/include/net/llc_conn.h
@@ -93,7 +93,7 @@ static __inline__ char llc_backlog_type(struct sk_buff *skb)
 	return skb->cb[sizeof(skb->cb) - 1];
 }
 
-extern struct sock *llc_sk_alloc(int family, gfp_t priority,
+extern struct sock *llc_sk_alloc(net_t net, int family, gfp_t priority,
 				 struct proto *prot);
 extern void llc_sk_free(struct sock *sk);
 
diff --git a/include/net/sock.h b/include/net/sock.h
index 01a2781..ebcaa7f 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -55,6 +55,7 @@
 #include <asm/atomic.h>
 #include <net/dst.h>
 #include <net/checksum.h>
+#include <net/net_namespace.h>
 
 /*
  * This structure really needs to be cleaned up.
@@ -784,7 +785,7 @@ extern void FASTCALL(release_sock(struct sock *sk));
 				SINGLE_DEPTH_NESTING)
 #define bh_unlock_sock(__sk)	spin_unlock(&((__sk)->sk_lock.slock))
 
-extern struct sock		*sk_alloc(int family,
+extern struct sock		*sk_alloc(net_t net, int family,
 					  gfp_t priority,
 					  struct proto *prot, int zero_it);
 extern void			sk_free(struct sock *sk);
@@ -1013,6 +1014,7 @@ static inline void sock_copy(struct sock *nsk, const struct sock *osk)
 #endif
 
 	memcpy(nsk, osk, osk->sk_prot->obj_size);
+	get_net(nsk->sk_net);
 #ifdef CONFIG_SECURITY_NETWORK
 	nsk->sk_security = sptr;
 	security_sk_clone(osk, nsk);
diff --git a/net/appletalk/ddp.c b/net/appletalk/ddp.c
index 5b8a8ce..e08367b 100644
--- a/net/appletalk/ddp.c
+++ b/net/appletalk/ddp.c
@@ -1026,11 +1026,14 @@ static struct proto ddp_proto = {
  * Create a socket. Initialise the socket, blank the addresses
  * set the state.
  */
-static int atalk_create(struct socket *sock, int protocol)
+static int atalk_create(net_t net, struct socket *sock, int protocol)
 {
 	struct sock *sk;
 	int rc = -ESOCKTNOSUPPORT;
 
+	if (!net_eq(net, init_net()))
+		return -EAFNOSUPPORT;
+
 	/*
 	 * We permit SOCK_DGRAM and RAW is an extension. It is trivial to do
 	 * and gives you the full ELAP frame. Should be handy for CAP 8) 
@@ -1038,7 +1041,7 @@ static int atalk_create(struct socket *sock, int protocol)
 	if (sock->type != SOCK_RAW && sock->type != SOCK_DGRAM)
 		goto out;
 	rc = -ENOMEM;
-	sk = sk_alloc(PF_APPLETALK, GFP_KERNEL, &ddp_proto, 1);
+	sk = sk_alloc(net, PF_APPLETALK, GFP_KERNEL, &ddp_proto, 1);
 	if (!sk)
 		goto out;
 	rc = 0;
diff --git a/net/atm/common.c b/net/atm/common.c
index fbabff4..c4329f0 100644
--- a/net/atm/common.c
+++ b/net/atm/common.c
@@ -132,7 +132,7 @@ static struct proto vcc_proto = {
 	.obj_size = sizeof(struct atm_vcc),
 };
  
-int vcc_create(struct socket *sock, int protocol, int family)
+int vcc_create(net_t net, struct socket *sock, int protocol, int family)
 {
 	struct sock *sk;
 	struct atm_vcc *vcc;
@@ -140,7 +140,7 @@ int vcc_create(struct socket *sock, int protocol, int family)
 	sock->sk = NULL;
 	if (sock->type == SOCK_STREAM)
 		return -EINVAL;
-	sk = sk_alloc(family, GFP_KERNEL, &vcc_proto, 1);
+	sk = sk_alloc(net, family, GFP_KERNEL, &vcc_proto, 1);
 	if (!sk)
 		return -ENOMEM;
 	sock_init_data(sock, sk);
diff --git a/net/atm/common.h b/net/atm/common.h
index a422da7..c7101c7 100644
--- a/net/atm/common.h
+++ b/net/atm/common.h
@@ -10,7 +10,7 @@
 #include <linux/poll.h> /* for poll_table */
 
 
-int vcc_create(struct socket *sock, int protocol, int family);
+int vcc_create(net_t net, struct socket *sock, int protocol, int family);
 int vcc_release(struct socket *sock);
 int vcc_connect(struct socket *sock, int itf, short vpi, int vci);
 int vcc_recvmsg(struct kiocb *iocb, struct socket *sock, struct msghdr *msg,
diff --git a/net/atm/pvc.c b/net/atm/pvc.c
index b2148b4..13bf58e 100644
--- a/net/atm/pvc.c
+++ b/net/atm/pvc.c
@@ -124,10 +124,13 @@ static const struct proto_ops pvc_proto_ops = {
 };
 
 
-static int pvc_create(struct socket *sock,int protocol)
+static int pvc_create(net_t net, struct socket *sock,int protocol)
 {
+	if (!net_eq(net, init_net()))
+		return -EAFNOSUPPORT;
+
 	sock->ops = &pvc_proto_ops;
-	return vcc_create(sock, protocol, PF_ATMPVC);
+	return vcc_create(net, sock, protocol, PF_ATMPVC);
 }
 
 
diff --git a/net/atm/svc.c b/net/atm/svc.c
index 3a180cf..e78d9f7 100644
--- a/net/atm/svc.c
+++ b/net/atm/svc.c
@@ -33,7 +33,7 @@
 #endif
 
 
-static int svc_create(struct socket *sock,int protocol);
+static int svc_create(net_t net, struct socket *sock,int protocol);
 
 
 /*
@@ -335,7 +335,7 @@ static int svc_accept(struct socket *sock,struct socket *newso

...

[ Show the rest of the message ]

Report message to a moderator

[PATCH RFC 11/31] net: Initialize the network namespace of network devices. [message #17349 is a reply to message #17338]

Thu, 25 January 2007 19:00

ebiederm
Messages: 1354
Registered: February 2006

Senior Member

From: Eric W. Biederman <ebiederm@xmission.com> - unquoted

Except for carefully selected pseudo devices all network
interfaces should start out in the initial network namespace.
Ultimately it will be register_netdev that examines what
dev->nd_net is set to and places a device in a network namespace.

This patch modifies alloc_netdev to initialize the network
namespace a device is in with the initial network namespace.
This gets it right for the vast majority of devices so their
drivers need not be modified and for those few pseudo devices
that need something different they can change this parameter
before calling register_netdevice.

The network namespace parameter on a network device is not
reference counted as the devices are inside of a network namespace
and cannot remain in that namespace past the lifetime of the
network namespace.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 drivers/net/loopback.c |    1 +
 net/core/dev.c         |    1 +
 2 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/drivers/net/loopback.c b/drivers/net/loopback.c
index 2b739fd..22b672d 100644
--- a/drivers/net/loopback.c
+++ b/drivers/net/loopback.c
@@ -231,6 +231,7 @@ struct net_device loopback_dev = {
 /* Setup and register the loopback device. */
 static int __init loopback_init(void)
 {
+	loopback_dev.nd_net = init_net();
 	return register_netdev(&loopback_dev);
 };
 
diff --git a/net/core/dev.c b/net/core/dev.c
index 90e4c0e..a3ee150 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3192,6 +3192,7 @@ struct net_device *alloc_netdev(int sizeof_priv, const char *name,
 	dev = (struct net_device *)
 		(((long)p + NETDEV_ALIGN_CONST) & ~NETDEV_ALIGN_CONST);
 	dev->padded = (char *)dev - (char *)p;
+	dev->nd_net = init_net();
 
 	if (sizeof_priv)
 		dev->priv = netdev_priv(dev);
-- 
1.4.4.1.g278f

_______________________________________________
Containers mailing list
Containers@lists.osdl.org
https://lists.osdl.org/mailman/listinfo/containers

Report message to a moderator

[PATCH RFC 12/31] net: Make packet reception network namespace safe [message #17350 is a reply to message #17338]

Thu, 25 January 2007 19:00

ebiederm
Messages: 1354
Registered: February 2006

Senior Member

From: Eric W. Biederman <ebiederm@xmission.com> - unquoted

This patch modifies every packet receive function
registered with dev_add_pack() to drop packets if they
are not from the initial network namespace, in addition
to ensure consistency of argument passing the unnecessary
device parameter is removed.

This should ensure that the various network stacks do
not receive packets in a anything but the initial network
namespace until the code has been converted and is ready
for them.

Anything I may have missed will generate a compiler error,
as the function protype has changed, preventing us from
overlooking something by accident.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 drivers/block/aoe/aoenet.c      |    7 ++++++-
 drivers/net/bonding/bond_3ad.c  |    7 ++++++-
 drivers/net/bonding/bond_3ad.h  |    2 +-
 drivers/net/bonding/bond_alb.c  |    6 +++++-
 drivers/net/bonding/bond_main.c |    6 +++++-
 drivers/net/hamradio/bpqether.c |    8 ++++++--
 drivers/net/pppoe.c             |    8 ++++++--
 drivers/net/wan/hdlc.c          |   10 +++++++++-
 drivers/net/wan/lapbether.c     |    6 +++++-
 drivers/net/wan/syncppp.c       |   14 ++++++++++----
 include/linux/netdevice.h       |    1 -
 include/net/ax25.h              |    2 +-
 include/net/datalink.h          |    2 +-
 include/net/ip.h                |    2 +-
 include/net/ipv6.h              |    1 -
 include/net/llc.h               |    4 +---
 include/net/p8022.h             |    1 -
 include/net/psnap.h             |    2 +-
 include/net/x25.h               |    2 +-
 net/802/p8022.c                 |    1 -
 net/802/psnap.c                 |    5 ++---
 net/8021q/vlan.h                |    2 +-
 net/8021q/vlan_dev.c            |    8 +++++++-
 net/appletalk/aarp.c            |    6 +++++-
 net/appletalk/ddp.c             |   15 ++++++++++++---
 net/ax25/ax25_in.c              |    8 +++++++-
 net/bridge/br_private.h         |    2 +-
 net/bridge/br_stp_bpdu.c        |    8 ++++++--
 net/core/dev.c                  |    6 +++---
 net/decnet/af_decnet.c          |    2 +-
 net/decnet/dn_route.c           |    6 +++++-
 net/econet/af_econet.c          |    6 +++++-
 net/ipv4/arp.c                  |    6 +++++-
 net/ipv4/ip_input.c             |    7 +++++--
 net/ipv4/ipconfig.c             |   16 ++++++++++++----
 net/ipv6/ip6_input.c            |    8 +++++++-
 net/ipx/af_ipx.c                |    6 +++++-
 net/irda/irlap_frame.c          |    7 +++++--
 net/irda/irmod.c                |    2 +-
 net/llc/llc_core.c              |    1 -
 net/llc/llc_input.c             |   10 +++++++---
 net/packet/af_packet.c          |   18 +++++++++++++++---
 net/tipc/eth_media.c            |    9 ++++++++-
 net/x25/x25_dev.c               |    6 +++++-
 44 files changed, 195 insertions(+), 67 deletions(-)

diff --git a/drivers/block/aoe/aoenet.c b/drivers/block/aoe/aoenet.c
index 9626e0f..9b72a58 100644
--- a/drivers/block/aoe/aoenet.c
+++ b/drivers/block/aoe/aoenet.c
@@ -8,6 +8,7 @@
 #include <linux/blkdev.h>
 #include <linux/netdevice.h>
 #include <linux/moduleparam.h>
+#include <net/net_namespace.h>
 #include "aoe.h"
 
 #define NECODES 5
@@ -108,11 +109,15 @@ aoenet_xmit(struct sk_buff *sl)
  * (1) len doesn't include the header by default.  I want this. 
  */
 static int
-aoenet_rcv(struct sk_buff *skb, struct net_device *ifp, struct packet_type *pt, struct net_device *orig_dev)
+aoenet_rcv(struct sk_buff *skb, struct packet_type *pt, struct net_device *orig_dev)
 {
+	struct net_device *ifp = skb->dev;
 	struct aoe_hdr *h;
 	u32 n;
 
+	if (!net_eq(skb->dev->nd_net, init_net()))
+		goto exit;
+
 	skb = skb_share_check(skb, GFP_ATOMIC);
 	if (skb == NULL)
 		return 0;
diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
index 3fb354d..eea4f11 100644
--- a/drivers/net/bonding/bond_3ad.c
+++ b/drivers/net/bonding/bond_3ad.c
@@ -29,6 +29,7 @@
 #include <linux/ethtool.h>
 #include <linux/if_bonding.h>
 #include <linux/pkt_sched.h>
+#include <net/net_namespace.h>
 #include "bonding.h"
 #include "bond_3ad.h"
 
@@ -2443,12 +2444,16 @@ out:
 	return 0;
 }
 
-int bond_3ad_lacpdu_recv(struct sk_buff *skb, struct net_device *dev, struct packet_type* ptype, struct net_device *orig_dev)
+int bond_3ad_lacpdu_recv(struct sk_buff *skb, struct packet_type* ptype, struct net_device *orig_dev)
 {
+	struct net_device *dev = skb->dev;
 	struct bonding *bond = dev->priv;
 	struct slave *slave = NULL;
 	int ret = NET_RX_DROP;
 
+	if (!net_eq(skb->dev->nd_net, init_net()))
+		goto out;
+
 	if (!(dev->flags & IFF_MASTER))
 		goto out;
 
diff --git a/drivers/net/bonding/bond_3ad.h b/drivers/net/bonding/bond_3ad.h
index 6ad5ad6..1f2d7d2 100644
--- a/drivers/net/bonding/bond_3ad.h
+++ b/drivers/net/bonding/bond_3ad.h
@@ -282,7 +282,7 @@ void bond_3ad_adapter_duplex_changed(struct slave *slave);
 void bond_3ad_handle_link_change(struct slave *slave, char link);
 int  bond_3ad_get_active_agg_info(struct bonding *bond, struct ad_info *ad_info);
 int bond_3ad_xmit_xor(struct sk_buff *skb, struct net_device *dev);
-int bond_3ad_lacpdu_recv(struct sk_buff *skb, struct net_device *dev, struct packet_type* ptype, struct net_device *orig_dev);
+int bond_3ad_lacpdu_recv(struct sk_buff *skb, struct packet_type* ptype, struct net_device *orig_dev);
 int bond_3ad_set_carrier(struct bonding *bond);
 #endif //__BOND_3AD_H__
 
diff --git a/drivers/net/bonding/bond_alb.c b/drivers/net/bonding/bond_alb.c
index 3292316..be780a8 100644
--- a/drivers/net/bonding/bond_alb.c
+++ b/drivers/net/bonding/bond_alb.c
@@ -336,12 +336,16 @@ static void rlb_update_entry_from_arp(struct bonding *bond, struct arp_pkt *arp)
 	_unlock_rx_hashtbl(bond);
 }
 
-static int rlb_arp_recv(struct sk_buff *skb, struct net_device *bond_dev, struct packet_type *ptype, struct net_device *orig_dev)
+static int rlb_arp_recv(struct sk_buff *skb, struct packet_type *ptype, struct net_device *orig_dev)
 {
+	struct net_device *bond_dev = skb->dev;
 	struct bonding *bond = bond_dev->priv;
 	struct arp_pkt *arp = (struct arp_pkt *)skb->data;
 	int res = NET_RX_DROP;
 
+	if (!net_eq(skb->dev->nd_net, init_net()))
+		goto out;
+
 	if (!(bond_dev->flags & IFF_MASTER))
 		goto out;
 
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 9b3bf4e..9c70568 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -2475,14 +2475,18 @@ static void bond_validate_arp(struct bonding *bond, struct slave *slave, u32 sip
 	}
 }
 
-static int bond_arp_rcv(struct sk_buff *skb, struct net_device *dev, struct packet_type *pt, struct net_device *orig_dev)
+static int bond_arp_rcv(struct sk_buff *skb, struct packet_type *pt, struct net_device *orig_dev)
 {
+	struct net_device *dev = skb->dev;
 	struct arphdr *arp;
 	struct slave *slave;
 	struct bonding *bond;
 	unsigned char *arp_ptr;
 	u32 sip, tip;
 
+	if (!net_eq(skb->dev->nd_net, init_net()))
+		goto out;
+
 	if (!(dev->priv_flags & IFF_BONDING) || !(dev->flags & IFF_MASTER))
 		goto out;
 
diff --git a/drivers/net/hamradio/bpqether.c b/drivers/net/hamradio/bpqether.c
index 9fc92ad..c513e90 100644
--- a/drivers/net/hamradio/bpqether.c
+++ b/drivers/net/hamradio/bpqether.c
@@ -93,7 +93,7 @@ static char bcast_addr[6]={0xFF,0xFF,0xFF,0xFF,0xFF,0xFF};
 
 static char bpq_eth_addr[6];
 
-static int bpq_rcv(struct sk_buff *, struct net_device *, struct packet_type *, struct net_device *);
+static int bpq_rcv(struct sk_buff *, struct packet_type *, struct net_device *);
 static int bpq_device_event(struct notifier_block *, unsigned long, void *);
 static const char *bpq_print_ethaddr(const unsigned char *);
 
@@ -166,13 +166,17 @@ static inline int dev_is_ethdev(struct net_device *dev)
 /*
  *	Receive an AX.25 frame via an ethernet interface.
  */
-static int bpq_rcv(struct sk_buff *skb, struct net_device *dev, struct packet_type *ptype, struct net_device *orig_dev)
+static int bpq_rcv(struct sk_buff *skb, struct packet_type *ptype, struct net_device *orig_dev)
 {
+	struct net_device *dev = skb->dev;
 	int len;
 	char * ptr;
 	struct ethhdr *eth;
 	struct bpqdev *bpq;
 
+	if (!net_eq(skb->dev->nd_net, init_net()))
+		goto drop;
+
 	if ((skb = skb_share_check(skb, GFP_ATOMIC)) == NULL)
 		return NET_RX_DROP;
 
diff --git a/drivers/net/pppoe.c b/drivers/net/pppoe.c
index d09334d..caf8ca3 100644
--- a/drivers/net/pppoe.c
+++ b/drivers/net/pppoe.c
@@ -376,7 +376,6 @@ abort_kfree:
  *
  ***********************************************************************/
 static int pppoe_rcv(struct sk_buff *skb,
-		     struct net_device *dev,
 		     struct packet_type *pt,
 		     struct net_device *orig_dev)
 
@@ -384,6 +383,9 @@ static int pppoe_rcv(struct sk_buff *skb,
 	struct pppoe_hdr *ph;
 	struct pppox_sock *po;
 
+	if (!net_eq(skb->dev->nd_net, init_net()))
+		goto drop;
+
 	if (!pskb_may_pull(skb, sizeof(struct pppoe_hdr)))
 		goto drop;
 
@@ -408,7 +410,6 @@ out:
  *
  ***********************************************************************/
 static int pppoe_disc_rcv(struct sk_buff *skb,
-			  struct net_device *dev,
 			  struct packet_type *pt,
 			  struct net_device *orig_dev)
 
@@ -416,6 +417,9 @@ static int pppoe_disc_rcv(struct sk_buff *skb,
 	struct pppoe_hdr *ph;
 	struct pppox_sock *po;
 
+	if (!net_eq(skb->dev->nd_net, init_net()))
+		goto abort;
+
 	if (!pskb_may_pull(skb, sizeof(struct pppoe_hdr)))
 		goto abort;
 
diff --git a/drivers/net/wan/hdlc.c b/drivers/net/wan/hdlc.c
index db354e0..f3bf160 100644
--- a/drivers/net/wan/hdlc.c
+++ b/drivers/net/wan/hdlc.c
@@ -36,6 +36,7 @@
 #include <linux/rtnetlink.h>
 #include <linux/notifier.h>
 #include <linux/hdlc.h>
+#include <net/net_namespace.h>
 
 
 static const char* version = "HDLC support module revision 1.20";
@@ -62,10 +63,17 @@ static struc

...

[ Show the rest of the message ]

Report message to a moderator

[PATCH RFC 13/31] net: Make device event notification network namespace safe [message #17351 is a reply to message #17338]

Thu, 25 January 2007 19:00

ebiederm
Messages: 1354
Registered: February 2006

Senior Member

From: Eric W. Biederman <ebiederm@xmission.com> - unquoted

Every user of the network device notifiers is either a protocol
stack or a pseudo device.  If a protocol stack that does not have
support for multiple network namespaces receives an event for a
device that is not in the initial network namespace it quite possibly
can get confused and do the wrong thing.

To avoid problems until all of the protocol stacks are converted
this patch modifies all netdev event handlers to ignore events on
devices that are not in the initial network namespace.

As the rest of the code is made network namespace aware these
checks can be removed.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 arch/ia64/hp/sim/simeth.c           |    3 +++
 drivers/net/bonding/bond_main.c     |    3 +++
 drivers/net/hamradio/bpqether.c     |    3 +++
 drivers/net/pppoe.c                 |    3 +++
 drivers/net/wan/dlci.c              |    3 +++
 drivers/net/wan/hdlc.c              |    3 +++
 drivers/net/wan/lapbether.c         |    3 +++
 net/8021q/vlan.c                    |    4 ++++
 net/appletalk/aarp.c                |    3 +++
 net/appletalk/ddp.c                 |    3 +++
 net/atm/clip.c                      |    3 +++
 net/atm/mpc.c                       |    4 ++++
 net/ax25/af_ax25.c                  |    3 +++
 net/bridge/br_notify.c              |    4 ++++
 net/core/dst.c                      |    4 ++++
 net/core/fib_rules.c                |    4 ++++
 net/core/pktgen.c                   |    3 +++
 net/core/rtnetlink.c                |    4 ++++
 net/decnet/af_decnet.c              |    3 +++
 net/econet/af_econet.c              |    3 +++
 net/ipv4/arp.c                      |    3 +++
 net/ipv4/devinet.c                  |    3 +++
 net/ipv4/fib_frontend.c             |    3 +++
 net/ipv4/ipmr.c                     |    7 ++++++-
 net/ipv4/multipath_drr.c            |    3 +++
 net/ipv4/netfilter/ip_queue.c       |    3 +++
 net/ipv4/netfilter/ipt_MASQUERADE.c |    3 +++
 net/ipv6/addrconf.c                 |    3 +++
 net/ipv6/ndisc.c                    |    3 +++
 net/ipv6/netfilter/ip6_queue.c      |    3 +++
 net/ipx/af_ipx.c                    |    3 +++
 net/netfilter/nfnetlink_queue.c     |    3 +++
 net/netrom/af_netrom.c              |    3 +++
 net/packet/af_packet.c              |    3 +++
 net/rose/af_rose.c                  |    3 +++
 net/tipc/eth_media.c                |    3 +++
 net/wanrouter/af_wanpipe.c          |    3 +++
 net/x25/af_x25.c                    |    3 +++
 net/xfrm/xfrm_policy.c              |    5 +++++
 security/selinux/netif.c            |    3 +++
 40 files changed, 131 insertions(+), 1 deletions(-)

diff --git a/arch/ia64/hp/sim/simeth.c b/arch/ia64/hp/sim/simeth.c
index 424e925..1cbaa9e 100644
--- a/arch/ia64/hp/sim/simeth.c
+++ b/arch/ia64/hp/sim/simeth.c
@@ -300,6 +300,9 @@ simeth_device_event(struct notifier_block *this,unsigned long event, void *ptr)
 		return NOTIFY_DONE;
 	}
 
+	if (!net_eq(dev->nd_net, init_net()))
+		return NOTIFY_DONE;
+
 	if ( event != NETDEV_UP && event != NETDEV_DOWN ) return NOTIFY_DONE;
 
 	/*
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 9c70568..3e04f58 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -3325,6 +3325,9 @@ static int bond_netdev_event(struct notifier_block *this, unsigned long event, v
 {
 	struct net_device *event_dev = (struct net_device *)ptr;
 
+	if (!net_eq(event_dev->nd_net, init_net()))
+		return NOTIFY_DONE;
+
 	dprintk("event_dev: %s, event: %lx\n",
 		(event_dev ? event_dev->name : "None"),
 		event);
diff --git a/drivers/net/hamradio/bpqether.c b/drivers/net/hamradio/bpqether.c
index c513e90..8826a96 100644
--- a/drivers/net/hamradio/bpqether.c
+++ b/drivers/net/hamradio/bpqether.c
@@ -564,6 +564,9 @@ static int bpq_device_event(struct notifier_block *this,unsigned long event, voi
 {
 	struct net_device *dev = (struct net_device *)ptr;
 
+	if (!net_eq(dev->nd_net, init_net()))
+		return NOTIFY_DONE;
+
 	if (!dev_is_ethdev(dev))
 		return NOTIFY_DONE;
 
diff --git a/drivers/net/pppoe.c b/drivers/net/pppoe.c
index caf8ca3..3618862 100644
--- a/drivers/net/pppoe.c
+++ b/drivers/net/pppoe.c
@@ -299,6 +299,9 @@ static int pppoe_device_event(struct notifier_block *this,
 {
 	struct net_device *dev = (struct net_device *) ptr;
 
+	if (!net_eq(dev->nd_net, init_net()))
+		return NOTIFY_DONE;
+
 	/* Only look at sockets that are using this specific device. */
 	switch (event) {
 	case NETDEV_CHANGEMTU:
diff --git a/drivers/net/wan/dlci.c b/drivers/net/wan/dlci.c
index 7369875..f826494 100644
--- a/drivers/net/wan/dlci.c
+++ b/drivers/net/wan/dlci.c
@@ -513,6 +513,9 @@ static int dlci_dev_event(struct notifier_block *unused,
 {
 	struct net_device *dev = (struct net_device *) ptr;
 
+	if (!net_eq(dev->nd_net, init_net()))
+		return NOTIFY_DONE;
+
 	if (event == NETDEV_UNREGISTER) {
 		struct dlci_local *dlp;
 
diff --git a/drivers/net/wan/hdlc.c b/drivers/net/wan/hdlc.c
index f3bf160..e56e0a1 100644
--- a/drivers/net/wan/hdlc.c
+++ b/drivers/net/wan/hdlc.c
@@ -110,6 +110,9 @@ static int hdlc_device_event(struct notifier_block *this, unsigned long event,
 	unsigned long flags;
 	int on;
  
+	if (!net_eq(dev->nd_net, init_net()))
+		return NOTIFY_DONE;
+
 	if (dev->get_stats != hdlc_get_stats)
 		return NOTIFY_DONE; /* not an HDLC device */
  
diff --git a/drivers/net/wan/lapbether.c b/drivers/net/wan/lapbether.c
index c1de21e..a3560a9 100644
--- a/drivers/net/wan/lapbether.c
+++ b/drivers/net/wan/lapbether.c
@@ -395,6 +395,9 @@ static int lapbeth_device_event(struct notifier_block *this,
 	struct lapbethdev *lapbeth;
 	struct net_device *dev = ptr;
 
+	if (!net_eq(dev->nd_net, init_net()))
+		return NOTIFY_DONE;
+
 	if (!dev_is_ethdev(dev))
 		return NOTIFY_DONE;
 
diff --git a/net/8021q/vlan.c b/net/8021q/vlan.c
index 18fcb9f..f80cfdd 100644
--- a/net/8021q/vlan.c
+++ b/net/8021q/vlan.c
@@ -31,6 +31,7 @@
 #include <net/arp.h>
 #include <linux/rtnetlink.h>
 #include <linux/notifier.h>
+#include <net/net_namespace.h>
 
 #include <linux/if_vlan.h>
 #include "vlan.h"
@@ -595,6 +596,9 @@ static int vlan_device_event(struct notifier_block *unused, unsigned long event,
 	int i, flgs;
 	struct net_device *vlandev;
 
+	if (!net_eq(dev->nd_net, init_net()))
+		return NOTIFY_DONE;
+
 	if (!grp)
 		goto out;
 
diff --git a/net/appletalk/aarp.c b/net/appletalk/aarp.c
index 85c4dbc..6fd58a6 100644
--- a/net/appletalk/aarp.c
+++ b/net/appletalk/aarp.c
@@ -327,6 +327,9 @@ static int aarp_device_event(struct notifier_block *this, unsigned long event,
 	struct net_device *dev = ptr;
 	int ct;
 
+	if (!net_eq(dev->nd_net, init_net()))
+		return NOTIFY_DONE;
+
 	if (event == NETDEV_DOWN) {
 		write_lock_bh(&aarp_lock);
 
diff --git a/net/appletalk/ddp.c b/net/appletalk/ddp.c
index f4ff8aa..61f36b1 100644
--- a/net/appletalk/ddp.c
+++ b/net/appletalk/ddp.c
@@ -649,6 +649,9 @@ static int ddp_device_event(struct notifier_block *this, unsigned long event,
 {
 	struct net_device *dev = ptr;
 
+	if (!net_eq(dev->nd_net, init_net()))
+		return NOTIFY_DONE;
+
 	if (event == NETDEV_DOWN)
 		/* Discard any use of this */
 	        atalk_dev_down(dev);
diff --git a/net/atm/clip.c b/net/atm/clip.c
index 5f8a1d2..7d150c2 100644
--- a/net/atm/clip.c
+++ b/net/atm/clip.c
@@ -629,6 +629,9 @@ static int clip_device_event(struct notifier_block *this, unsigned long event,
 {
 	struct net_device *dev = arg;
 
+	if (!net_eq(dev->nd_net, init_net()))
+		return NOTIFY_DONE;
+
 	if (event == NETDEV_UNREGISTER) {
 		neigh_ifdown(&clip_tbl, dev);
 		return NOTIFY_DONE;
diff --git a/net/atm/mpc.c b/net/atm/mpc.c
index c18f737..4fdb1af 100644
--- a/net/atm/mpc.c
+++ b/net/atm/mpc.c
@@ -953,6 +953,10 @@ static int mpoa_event_listener(struct notifier_block *mpoa_notifier, unsigned lo
 	struct lec_priv *priv;
 
 	dev = (struct net_device *)dev_ptr;
+
+	if (!net_eq(dev->nd_net, init_net()))
+		return NOTIFY_DONE;
+
 	if (dev->name == NULL || strncmp(dev->name, "lec", 3))
 		return NOTIFY_DONE; /* we are only interested in lec:s */
 	
diff --git a/net/ax25/af_ax25.c b/net/ax25/af_ax25.c
index cdbf3f6..8c187a6 100644
--- a/net/ax25/af_ax25.c
+++ b/net/ax25/af_ax25.c
@@ -105,6 +105,9 @@ static int ax25_device_event(struct notifier_block *this, unsigned long event,
 {
 	struct net_device *dev = (struct net_device *)ptr;
 
+	if (!net_eq(dev->nd_net, init_net()))
+		return NOTIFY_DONE;
+
 	/* Reject non AX.25 devices */
 	if (dev->type != ARPHRD_AX25)
 		return NOTIFY_DONE;
diff --git a/net/bridge/br_notify.c b/net/bridge/br_notify.c
index 2027849..0d56bc2 100644
--- a/net/bridge/br_notify.c
+++ b/net/bridge/br_notify.c
@@ -15,6 +15,7 @@
 
 #include <linux/kernel.h>
 #include <linux/rtnetlink.h>
+#include <net/net_namespace.h>
 
 #include "br_private.h"
 
@@ -36,6 +37,9 @@ static int br_device_event(struct notifier_block *unused, unsigned long event, v
 	struct net_bridge_port *p = dev->br_port;
 	struct net_bridge *br;
 
+	if (!net_eq(dev->nd_net, init_net()))
+		return NOTIFY_DONE;
+
 	/* not a port of a bridge */
 	if (p == NULL)
 		return NOTIFY_DONE;
diff --git a/net/core/dst.c b/net/core/dst.c
index 836ec66..8c4a272 100644
--- a/net/core/dst.c
+++ b/net/core/dst.c
@@ -16,6 +16,7 @@
 #include <linux/skbuff.h>
 #include <linux/string.h>
 #include <linux/types.h>
+#include <net/net_namespace.h>
 
 #include <net/dst.h>
 
@@ -256,6 +257,9 @@ static int dst_dev_event(struct notifier_block *this, unsigned long event, void
 	struct net_device *dev = ptr;
 	struct dst_entry *dst;
 
+	if (!net_eq(dev->nd_net, init_net()))
+		return NOTIFY_DONE;
+
 	switch (event) {
 	case NETDEV_UNREGISTER:
 	case NETDEV_DOWN:
diff --git a/net/core/fib_rules.c b/net/core/fib_rules.c
index 1df6cd4..ffc31c1 10064

...

[ Show the rest of the message ]

Report message to a moderator

[PATCH RFC 14/31] net: Support multiple network namespaces with netlink [message #17352 is a reply to message #17338]

Thu, 25 January 2007 19:00

ebiederm
Messages: 1354
Registered: February 2006

Senior Member

From: Eric W. Biederman <ebiederm@xmission.com> - unquoted

Each netlink socket will live in exactly one network namespace,
this includes the controlling kernel sockets.

This patch updates all of the existing netlink protocols
to only support the initial network namespace.  Request
by clients in other namespaces will get -ECONREFUSED.
As they would if the kernel did not have the support for
that netlink protocol compiled in.

As each netlink protocol is updated to be multiple network
namespace safe it can register multiple kernel sockets
to acquire a presence in the rest of the network namespaces.

The implementation in af_netlink is a simple filter implemenation
at hash table insertion and hash table look up time.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 drivers/scsi/scsi_netlink.c         |    2 +-
 drivers/scsi/scsi_transport_iscsi.c |    2 +-
 include/linux/netlink.h             |    3 +-
 kernel/audit.c                      |    4 +-
 lib/kobject_uevent.c                |    4 +-
 net/bridge/netfilter/ebt_ulog.c     |    5 +-
 net/core/rtnetlink.c                |    4 +-
 net/decnet/netfilter/dn_rtmsg.c     |    3 +-
 net/ipv4/fib_frontend.c             |    3 +-
 net/ipv4/inet_diag.c                |    4 +-
 net/ipv4/netfilter/ip_queue.c       |    6 +-
 net/ipv4/netfilter/ipt_ULOG.c       |    4 +-
 net/ipv6/netfilter/ip6_queue.c      |    4 +-
 net/netfilter/nfnetlink.c           |    2 +-
 net/netfilter/nfnetlink_log.c       |    3 +-
 net/netfilter/nfnetlink_queue.c     |    3 +-
 net/netlink/af_netlink.c            |  104 ++++++++++++++++++++++++++---------
 net/netlink/genetlink.c             |    4 +-
 net/xfrm/xfrm_user.c                |    2 +-
 19 files changed, 112 insertions(+), 54 deletions(-)

diff --git a/drivers/scsi/scsi_netlink.c b/drivers/scsi/scsi_netlink.c
index 1b59b27..02c2c1e 100644
--- a/drivers/scsi/scsi_netlink.c
+++ b/drivers/scsi/scsi_netlink.c
@@ -167,7 +167,7 @@ scsi_netlink_init(void)
 		return;
 	}
 
-	scsi_nl_sock = netlink_kernel_create(NETLINK_SCSITRANSPORT,
+	scsi_nl_sock = netlink_kernel_create(init_net(), NETLINK_SCSITRANSPORT,
 				SCSI_NL_GRP_CNT, scsi_nl_rcv, THIS_MODULE);
 	if (!scsi_nl_sock) {
 		printk(KERN_ERR "%s: register of recieve handler failed\n",
diff --git a/drivers/scsi/scsi_transport_iscsi.c b/drivers/scsi/scsi_transport_iscsi.c
index 9c22f13..1ad22c2 100644
--- a/drivers/scsi/scsi_transport_iscsi.c
+++ b/drivers/scsi/scsi_transport_iscsi.c
@@ -1435,7 +1435,7 @@ static __init int iscsi_transport_init(void)
 	if (err)
 		goto unregister_conn_class;
 
-	nls = netlink_kernel_create(NETLINK_ISCSI, 1, iscsi_if_rx,
+	nls = netlink_kernel_create(init_net(), NETLINK_ISCSI, 1, iscsi_if_rx,
 			THIS_MODULE);
 	if (!nls) {
 		err = -ENOBUFS;
diff --git a/include/linux/netlink.h b/include/linux/netlink.h
index b3b9b60..9dacd00 100644
--- a/include/linux/netlink.h
+++ b/include/linux/netlink.h
@@ -151,7 +151,7 @@ struct netlink_skb_parms
 #define NETLINK_CREDS(skb)	(&NETLINK_CB((skb)).creds)
 
 
-extern struct sock *netlink_kernel_create(int unit, unsigned int groups, void (*input)(struct sock *sk, int len), struct module *module);
+extern struct sock *netlink_kernel_create(net_t net, int unit, unsigned int groups, void (*input)(struct sock *sk, int len), struct module *module);
 extern void netlink_ack(struct sk_buff *in_skb, struct nlmsghdr *nlh, int err);
 extern int netlink_has_listeners(struct sock *sk, unsigned int group);
 extern int netlink_unicast(struct sock *ssk, struct sk_buff *skb, __u32 pid, int nonblock);
@@ -188,6 +188,7 @@ struct netlink_callback
 
 struct netlink_notify
 {
+	net_t net;
 	int pid;
 	int protocol;
 };
diff --git a/kernel/audit.c b/kernel/audit.c
index d9b690a..b0c5c61 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -696,8 +696,8 @@ static int __init audit_init(void)
 
 	printk(KERN_INFO "audit: initializing netlink socket (%s)\n",
 	       audit_default ? "enabled" : "disabled");
-	audit_sock = netlink_kernel_create(NETLINK_AUDIT, 0, audit_receive,
-					   THIS_MODULE);
+	audit_sock = netlink_kernel_create(init_net(), NETLINK_AUDIT, 0,
+					   audit_receive, THIS_MODULE);
 	if (!audit_sock)
 		audit_panic("cannot initialize netlink socket");
 	else
diff --git a/lib/kobject_uevent.c b/lib/kobject_uevent.c
index 84272ed..9a5d4ca 100644
--- a/lib/kobject_uevent.c
+++ b/lib/kobject_uevent.c
@@ -292,8 +292,8 @@ EXPORT_SYMBOL_GPL(add_uevent_var);
 #if defined(CONFIG_NET)
 static int __init kobject_uevent_init(void)
 {
-	uevent_sock = netlink_kernel_create(NETLINK_KOBJECT_UEVENT, 1, NULL,
-					    THIS_MODULE);
+	uevent_sock = netlink_kernel_create(init_net(), NETLINK_KOBJECT_UEVENT, 1,
+					    NULL, THIS_MODULE);
 
 	if (!uevent_sock) {
 		printk(KERN_ERR
diff --git a/net/bridge/netfilter/ebt_ulog.c b/net/bridge/netfilter/ebt_ulog.c
index c1af68b..abf2be7 100644
--- a/net/bridge/netfilter/ebt_ulog.c
+++ b/net/bridge/netfilter/ebt_ulog.c
@@ -301,8 +301,9 @@ static int __init ebt_ulog_init(void)
 		spin_lock_init(&ulog_buffers[i].lock);
 	}
 
-	ebtulognl = netlink_kernel_create(NETLINK_NFLOG, EBT_ULOG_MAXNLGROUPS,
-	                                  NULL, THIS_MODULE);
+	ebtulognl = netlink_kernel_create(init_net(), NETLINK_NFLOG,
+					  EBT_ULOG_MAXNLGROUPS, NULL,
+					  THIS_MODULE);
 	if (!ebtulognl)
 		ret = -ENOMEM;
 	else if ((ret = ebt_register_watcher(&ulog)))
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 7841e89..8f3dda8 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -870,8 +870,8 @@ void __init rtnetlink_init(void)
 	if (!rta_buf)
 		panic("rtnetlink_init: cannot allocate rta_buf\n");
 
-	rtnl = netlink_kernel_create(NETLINK_ROUTE, RTNLGRP_MAX, rtnetlink_rcv,
-	                             THIS_MODULE);
+	rtnl = netlink_kernel_create(init_net(), NETLINK_ROUTE, RTNLGRP_MAX,
+				      rtnetlink_rcv, THIS_MODULE);
 	if (rtnl == NULL)
 		panic("rtnetlink_init: cannot initialize rtnetlink\n");
 	netlink_set_nonroot(NETLINK_ROUTE, NL_NONROOT_RECV);
diff --git a/net/decnet/netfilter/dn_rtmsg.c b/net/decnet/netfilter/dn_rtmsg.c
index 8b99bd3..14089ed 100644
--- a/net/decnet/netfilter/dn_rtmsg.c
+++ b/net/decnet/netfilter/dn_rtmsg.c
@@ -137,7 +137,8 @@ static int __init dn_rtmsg_init(void)
 {
 	int rv = 0;
 
-	dnrmg = netlink_kernel_create(NETLINK_DNRTMSG, DNRNG_NLGRP_MAX,
+	dnrmg = netlink_kernel_create(init_net(),
+				      NETLINK_DNRTMSG, DNRNG_NLGRP_MAX,
 	                              dnrmg_receive_user_sk, THIS_MODULE);
 	if (dnrmg == NULL) {
 		printk(KERN_ERR "dn_rtmsg: Cannot create netlink socket");
diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
index 049c370..d1859ff 100644
--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -817,7 +817,8 @@ static void nl_fib_input(struct sock *sk, int len)
 
 static void nl_fib_lookup_init(void)
 {
-      netlink_kernel_create(NETLINK_FIB_LOOKUP, 0, nl_fib_input, THIS_MODULE);
+	netlink_kernel_create(init_net(), NETLINK_FIB_LOOKUP, 0, nl_fib_input,
+			      THIS_MODULE);
 }
 
 static void fib_disable_ip(struct net_device *dev, int force)
diff --git a/net/ipv4/inet_diag.c b/net/ipv4/inet_diag.c
index 77761ac..bdf3064 100644
--- a/net/ipv4/inet_diag.c
+++ b/net/ipv4/inet_diag.c
@@ -913,8 +913,8 @@ static int __init inet_diag_init(void)
 	if (!inet_diag_table)
 		goto out;
 
-	idiagnl = netlink_kernel_create(NETLINK_INET_DIAG, 0, inet_diag_rcv,
-					THIS_MODULE);
+	idiagnl = netlink_kernel_create(init_net(), NETLINK_INET_DIAG, 0,
+					inet_diag_rcv, THIS_MODULE);
 	if (idiagnl == NULL)
 		goto out_free_table;
 	err = 0;
diff --git a/net/ipv4/netfilter/ip_queue.c b/net/ipv4/netfilter/ip_queue.c
index 8650a57..d1c42b5 100644
--- a/net/ipv4/netfilter/ip_queue.c
+++ b/net/ipv4/netfilter/ip_queue.c
@@ -589,7 +589,7 @@ ipq_rcv_nl_event(struct notifier_block *this,
 	if (event == NETLINK_URELEASE &&
 	    n->protocol == NETLINK_FIREWALL && n->pid) {
 		write_lock_bh(&queue_lock);
-		if (n->pid == peer_pid)
+		if (net_eq(n->net, init_net()) && (n->pid == peer_pid))
 			__ipq_reset();
 		write_unlock_bh(&queue_lock);
 	}
@@ -681,8 +681,8 @@ static int __init ip_queue_init(void)
 	struct proc_dir_entry *proc;
 	
 	netlink_register_notifier(&ipq_nl_notifier);
-	ipqnl = netlink_kernel_create(NETLINK_FIREWALL, 0, ipq_rcv_sk,
-				      THIS_MODULE);
+	ipqnl = netlink_kernel_create(init_net(), NETLINK_FIREWALL, 0,
+				      ipq_rcv_sk, THIS_MODULE);
 	if (ipqnl == NULL) {
 		printk(KERN_ERR "ip_queue: failed to create netlink socket\n");
 		goto cleanup_netlink_notifier;
diff --git a/net/ipv4/netfilter/ipt_ULOG.c b/net/ipv4/netfilter/ipt_ULOG.c
index dbd3478..8071d15 100644
--- a/net/ipv4/netfilter/ipt_ULOG.c
+++ b/net/ipv4/netfilter/ipt_ULOG.c
@@ -395,8 +395,8 @@ static int __init ipt_ulog_init(void)
 		ulog_buffers[i].timer.data = i;
 	}
 
-	nflognl = netlink_kernel_create(NETLINK_NFLOG, ULOG_MAXNLGROUPS, NULL,
-	                                THIS_MODULE);
+	nflognl = netlink_kernel_create(init_net(), NETLINK_NFLOG,
+					ULOG_MAXNLGROUPS, NULL, THIS_MODULE);
 	if (!nflognl)
 		return -ENOMEM;
 
diff --git a/net/ipv6/netfilter/ip6_queue.c b/net/ipv6/netfilter/ip6_queue.c
index f6e108c..02589b2 100644
--- a/net/ipv6/netfilter/ip6_queue.c
+++ b/net/ipv6/netfilter/ip6_queue.c
@@ -579,7 +579,7 @@ ipq_rcv_nl_event(struct notifier_block *this,
 	if (event == NETLINK_URELEASE &&
 	    n->protocol == NETLINK_IP6_FW && n->pid) {
 		write_lock_bh(&queue_lock);
-		if (n->pid == peer_pid)
+		if (net_eq(n->net, init_net()) && (n->pid == peer_pid))
 			__ipq_reset();
 		write_unlock_bh(&queue_lock);
 	}
@@ -671,7 +671,7 @@ static int __init ip6_queue_init(void)
 	struct proc_dir_entry *proc;
 	
 	netlink_register_notifier(&ipq_nl_notifier);
-	ipqnl = netlink_kernel_create(NETLINK_IP6_FW, 0, ipq_rcv_sk

...

[ Show the rest of the message ]

Report message to a moderator

[PATCH RFC 15/31] net: Make the loopback device per network namespace [message #17353 is a reply to message #17338]

Thu, 25 January 2007 19:00

ebiederm
Messages: 1354
Registered: February 2006

Senior Member

From: Eric W. Biederman <ebiederm@xmission.com> - unquoted

This patch makes the loopback_dev per network namespace.
The loopback device registers itself as a pernet_device so
we can register the new loopback_dev instance when we add
a new network namespace and so we can unregister the
loopback device when we destory the network namespace.

Currently the loopback device statitics are kept accross
all loopback devices, a minor glitch that will not affect
correct operation but something we may want to fix.

This patch modifies all users the loopback_dev so they
access it as per_net(loopback_dev, init_net()), keeping all of the
code compiling and working.  A later pass will be needed to
update the users to use something other than the initial network
namespace.

The only non-trivial modification was the ipv6 code in route.c as the
loopback_dev can no longer be used in static initializers, and
even that change was very simple.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 drivers/net/loopback.c           |   24 ++++++++++++++++++++----
 include/linux/netdevice.h        |    2 +-
 net/core/dst.c                   |    8 ++++----
 net/decnet/dn_dev.c              |    4 ++--
 net/decnet/dn_route.c            |   14 +++++++-------
 net/ipv4/devinet.c               |    4 ++--
 net/ipv4/ipconfig.c              |    8 +++++---
 net/ipv4/ipvs/ip_vs_core.c       |    2 +-
 net/ipv4/route.c                 |   18 +++++++++---------
 net/ipv4/xfrm4_policy.c          |    2 +-
 net/ipv6/addrconf.c              |    8 ++++----
 net/ipv6/netfilter/ip6t_REJECT.c |    2 +-
 net/ipv6/route.c                 |   24 +++++++++++++++---------
 net/ipv6/xfrm6_policy.c          |    2 +-
 net/xfrm/xfrm_policy.c           |    4 ++--
 15 files changed, 75 insertions(+), 51 deletions(-)

diff --git a/drivers/net/loopback.c b/drivers/net/loopback.c
index 22b672d..e9abf3f 100644
--- a/drivers/net/loopback.c
+++ b/drivers/net/loopback.c
@@ -57,6 +57,7 @@
 #include <linux/ip.h>
 #include <linux/tcp.h>
 #include <linux/percpu.h>
+#include <net/net_namespace.h>
 
 struct pcpu_lstats {
 	unsigned long packets;
@@ -204,7 +205,7 @@ static const struct ethtool_ops loopback_ethtool_ops = {
  * The loopback device is special. There is only one instance and
  * it is statically allocated. Don't do this for other devices.
  */
-struct net_device loopback_dev = {
+DEFINE_PER_NET(struct net_device, loopback_dev) = {
 	.name	 		= "lo",
 	.get_stats		= &get_stats,
 	.priv			= &loopback_stats,
@@ -228,13 +229,28 @@ struct net_device loopback_dev = {
 	.ethtool_ops		= &loopback_ethtool_ops,
 };
 
+static int loopback_net_init(net_t net)
+{
+	per_net(loopback_dev, net).nd_net = net;
+	return register_netdev(&per_net(loopback_dev, net));
+}
+
+static void loopback_net_exit(net_t net)
+{
+	unregister_netdev(&per_net(loopback_dev, net));
+}
+
+static struct pernet_operations loopback_net_ops = {
+	.init = loopback_net_init,
+	.exit = loopback_net_exit,
+};
+
 /* Setup and register the loopback device. */
 static int __init loopback_init(void)
 {
-	loopback_dev.nd_net = init_net();
-	return register_netdev(&loopback_dev);
+	return register_pernet_device(&loopback_net_ops);
 };
 
 module_init(loopback_init);
 
-EXPORT_SYMBOL(loopback_dev);
+EXPORT_PER_NET_SYMBOL(loopback_dev);
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 9e28671..73931a0 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -570,7 +570,7 @@ struct packet_type {
 #include <linux/interrupt.h>
 #include <linux/notifier.h>
 
-extern struct net_device		loopback_dev;		/* The loopback */
+DECLARE_PER_NET(struct net_device,	loopback_dev);		/* The loopback */
 extern struct net_device		*dev_base;		/* All devices */
 extern rwlock_t				dev_base_lock;		/* Device list lock */
 
diff --git a/net/core/dst.c b/net/core/dst.c
index 8c4a272..3435771 100644
--- a/net/core/dst.c
+++ b/net/core/dst.c
@@ -241,13 +241,13 @@ static inline void dst_ifdown(struct dst_entry *dst, struct net_device *dev,
 		dst->input = dst_discard_in;
 		dst->output = dst_discard_out;
 	} else {
-		dst->dev = &loopback_dev;
-		dev_hold(&loopback_dev);
+		dst->dev = &per_net(loopback_dev, init_net());
+		dev_hold(dst->dev);
 		dev_put(dev);
 		if (dst->neighbour && dst->neighbour->dev == dev) {
-			dst->neighbour->dev = &loopback_dev;
+			dst->neighbour->dev = &per_net(loopback_dev, init_net());
 			dev_put(dev);
-			dev_hold(&loopback_dev);
+			dev_hold(dst->neighbour->dev);
 		}
 	}
 }
diff --git a/net/decnet/dn_dev.c b/net/decnet/dn_dev.c
index 19b1469..dbaf001 100644
--- a/net/decnet/dn_dev.c
+++ b/net/decnet/dn_dev.c
@@ -866,10 +866,10 @@ last_chance:
 		rv = dn_dev_get_first(dev, addr);
 		read_unlock(&dev_base_lock);
 		dev_put(dev);
-		if (rv == 0 || dev == &loopback_dev)
+		if (rv == 0 || dev == &per_net(loopback_dev, init_net()))
 			return rv;
 	}
-	dev = &loopback_dev;
+	dev = &per_net(loopback_dev, init_net());
 	dev_hold(dev);
 	goto last_chance;
 }
diff --git a/net/decnet/dn_route.c b/net/decnet/dn_route.c
index 4263cd9..b553cd4 100644
--- a/net/decnet/dn_route.c
+++ b/net/decnet/dn_route.c
@@ -887,7 +887,7 @@ static int dn_route_output_slow(struct dst_entry **pprt, const struct flowi *old
 					.scope = RT_SCOPE_UNIVERSE,
 				     } },
 			    .mark = oldflp->mark,
-			    .iif = loopback_dev.ifindex,
+			    .iif = per_net(loopback_dev, init_net()).ifindex,
 			    .oif = oldflp->oif };
 	struct dn_route *rt = NULL;
 	struct net_device *dev_out = NULL;
@@ -904,7 +904,7 @@ static int dn_route_output_slow(struct dst_entry **pprt, const struct flowi *old
 		       "dn_route_output_slow: dst=%04x src=%04x mark=%d"
 		       " iif=%d oif=%d\n", dn_ntohs(oldflp->fld_dst),
 		       dn_ntohs(oldflp->fld_src),
-                       oldflp->mark, loopback_dev.ifindex, oldflp->oif);
+                       oldflp->mark, per_net(loopback_dev, init_net()).ifindex, oldflp->oif);
 
 	/* If we have an output interface, verify its a DECnet device */
 	if (oldflp->oif) {
@@ -955,7 +955,7 @@ source_ok:
 		err = -EADDRNOTAVAIL;
 		if (dev_out)
 			dev_put(dev_out);
-		dev_out = &loopback_dev;
+		dev_out = &per_net(loopback_dev, init_net());
 		dev_hold(dev_out);
 		if (!fl.fld_dst) {
 			fl.fld_dst =
@@ -964,7 +964,7 @@ source_ok:
 			if (!fl.fld_dst)
 				goto out;
 		}
-		fl.oif = loopback_dev.ifindex;
+		fl.oif = per_net(loopback_dev, init_net()).ifindex;
 		res.type = RTN_LOCAL;
 		goto make_route;
 	}
@@ -1010,7 +1010,7 @@ source_ok:
 					if (dev_out)
 						dev_put(dev_out);
 					if (dn_dev_islocal(neigh->dev, fl.fld_dst)) {
-						dev_out = &loopback_dev;
+						dev_out = &per_net(loopback_dev, init_net());
 						res.type = RTN_LOCAL;
 					} else {
 						dev_out = neigh->dev;
@@ -1031,7 +1031,7 @@ source_ok:
 		/* Possible improvement - check all devices for local addr */
 		if (dn_dev_islocal(dev_out, fl.fld_dst)) {
 			dev_put(dev_out);
-			dev_out = &loopback_dev;
+			dev_out = &per_net(loopback_dev, init_net());
 			dev_hold(dev_out);
 			res.type = RTN_LOCAL;
 			goto select_source;
@@ -1067,7 +1067,7 @@ select_source:
 			fl.fld_src = fl.fld_dst;
 		if (dev_out)
 			dev_put(dev_out);
-		dev_out = &loopback_dev;
+		dev_out = &per_net(loopback_dev, init_net());
 		dev_hold(dev_out);
 		fl.oif = dev_out->ifindex;
 		if (res.fi)
diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c
index a7d991d..201442c 100644
--- a/net/ipv4/devinet.c
+++ b/net/ipv4/devinet.c
@@ -1056,7 +1056,7 @@ static int inetdev_event(struct notifier_block *this, unsigned long event,
 	ASSERT_RTNL();
 
 	if (!in_dev) {
-		if (event == NETDEV_REGISTER && dev == &loopback_dev) {
+		if (event == NETDEV_REGISTER && dev == &per_net(loopback_dev, init_net())) {
 			in_dev = inetdev_init(dev);
 			if (!in_dev)
 				panic("devinet: Failed to create loopback\n");
@@ -1074,7 +1074,7 @@ static int inetdev_event(struct notifier_block *this, unsigned long event,
 	case NETDEV_UP:
 		if (dev->mtu < 68)
 			break;
-		if (dev == &loopback_dev) {
+		if (dev == &per_net(loopback_dev, init_net())) {
 			struct in_ifaddr *ifa;
 			if ((ifa = inet_alloc_ifa()) != NULL) {
 				ifa->ifa_local =
diff --git a/net/ipv4/ipconfig.c b/net/ipv4/ipconfig.c
index 91b5729..ee77938 100644
--- a/net/ipv4/ipconfig.c
+++ b/net/ipv4/ipconfig.c
@@ -185,16 +185,18 @@ static int __init ic_open_devs(void)
 	struct ic_device *d, **last;
 	struct net_device *dev;
 	unsigned short oflags;
+	struct net_device *lo;
 
 	last = &ic_first_dev;
 	rtnl_lock();
 
 	/* bring loopback device up first */
-	if (dev_change_flags(&loopback_dev, loopback_dev.flags | IFF_UP) < 0)
-		printk(KERN_ERR "IP-Config: Failed to open %s\n", loopback_dev.name);
+	lo = &per_net(loopback_dev, init_net());
+	if (dev_change_flags(lo, lo->flags | IFF_UP) < 0)
+		printk(KERN_ERR "IP-Config: Failed to open %s\n", lo->name);
 
 	for (dev = dev_base; dev; dev = dev->next) {
-		if (dev == &loopback_dev)
+		if (dev == lo)
 			continue;
 		if (user_dev_name[0] ? !strcmp(dev->name, user_dev_name) :
 		    (!(dev->flags & IFF_LOOPBACK) &&
diff --git a/net/ipv4/ipvs/ip_vs_core.c b/net/ipv4/ipvs/ip_vs_core.c
index 3425752..2e1e41f 100644
--- a/net/ipv4/ipvs/ip_vs_core.c
+++ b/net/ipv4/ipvs/ip_vs_core.c
@@ -963,7 +963,7 @@ ip_vs_in(unsigned int hooknum, struct sk_buff **pskb,
 	 *	... don't know why 1st test DOES NOT include 2nd (?)
 	 */
 	if (unlikely(skb->pkt_type != PACKET_HOST
-		     || skb->dev == &loopback_dev || skb->sk)) {
+		     || skb->dev == &per_net(loopback_dev, init_net()) || skb->sk)) {
 		IP_VS_DBG(12, "packet type=%d proto=%d daddr=%d.%d.%d.%d ign

...

[ Show the rest of the message ]

Report message to a moderator

[PATCH RFC 16/31] net: Make the device list and device lookups per namespace. [message #17354 is a reply to message #17338]

Thu, 25 January 2007 19:00

ebiederm
Messages: 1354
Registered: February 2006

Senior Member

From: Eric W. Biederman <ebiederm@xmission.com> - unquoted

This patch makes most of the generic device layer network
namespace safe.  This patch makes dev_base, dev_base_lock
per network namespace variables, and then it picks up
a few associated variables.  The funnctions:
dev_getbyhwaddr
dev_getfirsthwbytype
dev_get_by_flags
dev_get_by_name
__dev_get_by_name
dev_get_by_index
__dev_get_by_index
dev_ioctl
dev_ethtool
dev_load
wireless_process_ioctl

were modified to take a network namespace argument, and
deal with it.

vlan_ioctl_set and brioctl_set were modified so their
hooks will receive a network namespace argument.

So basically anthing in the core of the network stack that was
affected to by the change of dev_base and dev_base_lock was
modified to handle multiple network namespaces.  The rest of the
network stack was simply modified to explicitly use init_net()
the initial network namespace.  This can be fixed when those
components of the network stack are modified to handle multiple
network namespaces.

For now the ifindex generator is left global.

Fundametally ifindex numbers are per namespace, or else
we will have corner case problems with migration when
we get that far.

At the same time there are assumptions in the network stack
that the ifindex of a network device won't change.  Making
the ifindex number global seems a good compromise until
the network stack can cope with ifindex changes when
you change namespaces, and the like.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 arch/s390/appldata/appldata_net_sum.c |    6 +-
 arch/sparc64/solaris/ioctl.c          |    6 +-
 drivers/atm/idt77252.c                |    2 +-
 drivers/block/aoe/aoecmd.c            |    7 +-
 drivers/net/bonding/bond_main.c       |    6 +-
 drivers/net/bonding/bond_sysfs.c      |    3 +-
 drivers/net/eql.c                     |    9 +-
 drivers/net/pppoe.c                   |    2 +-
 drivers/net/shaper.c                  |    3 +-
 drivers/net/tun.c                     |    3 +-
 drivers/net/wan/dlci.c                |    4 +-
 drivers/net/wan/sbni.c                |    2 +-
 drivers/net/wireless/strip.c          |    8 +-
 drivers/parisc/led.c                  |    6 +-
 include/linux/if_bridge.h             |    2 +-
 include/linux/if_vlan.h               |    2 +-
 include/linux/netdevice.h             |   24 ++--
 include/net/iw_handler.h              |    2 +-
 net/802/tr.c                          |    2 +-
 net/8021q/vlan.c                      |   10 +-
 net/8021q/vlan_dev.c                  |   12 +-
 net/8021q/vlanproc.c                  |    8 +-
 net/appletalk/ddp.c                   |    6 +-
 net/atm/mpc.c                         |    2 +-
 net/ax25/af_ax25.c                    |    2 +-
 net/bridge/br_if.c                    |    6 +-
 net/bridge/br_ioctl.c                 |    7 +-
 net/bridge/br_netlink.c               |    9 +-
 net/bridge/br_private.h               |    2 +-
 net/core/dev.c                        |  282 +++++++++++++++++++++------------
 net/core/dev_mcast.c                  |   46 +++++-
 net/core/ethtool.c                    |    4 +-
 net/core/fib_rules.c                  |    4 +-
 net/core/link_watch.c                 |    5 +-
 net/core/neighbour.c                  |    6 +-
 net/core/net-sysfs.c                  |   27 ++--
 net/core/netpoll.c                    |    2 +-
 net/core/pktgen.c                     |    2 +-
 net/core/rtnetlink.c                  |   24 ++--
 net/core/sock.c                       |    3 +-
 net/core/wireless.c                   |   43 +++++-
 net/decnet/af_decnet.c                |    6 +-
 net/decnet/dn_dev.c                   |   32 ++--
 net/decnet/dn_fib.c                   |   12 +-
 net/decnet/dn_route.c                 |   14 +-
 net/decnet/sysctl_net_decnet.c        |    4 +-
 net/econet/af_econet.c                |    2 +-
 net/ipv4/arp.c                        |    4 +-
 net/ipv4/devinet.c                    |   36 ++--
 net/ipv4/fib_frontend.c               |    2 +-
 net/ipv4/fib_semantics.c              |    4 +-
 net/ipv4/igmp.c                       |   12 +-
 net/ipv4/ip_fragment.c                |    2 +-
 net/ipv4/ip_gre.c                     |    4 +-
 net/ipv4/ip_sockglue.c                |    2 +-
 net/ipv4/ipconfig.c                   |    2 +-
 net/ipv4/ipip.c                       |    4 +-
 net/ipv4/ipmr.c                       |    4 +-
 net/ipv4/ipvs/ip_vs_sync.c            |   10 +-
 net/ipv4/netfilter/ipt_CLUSTERIP.c    |    2 +-
 net/ipv4/route.c                      |    4 +-
 net/ipv6/addrconf.c                   |   44 +++---
 net/ipv6/af_inet6.c                   |    2 +-
 net/ipv6/anycast.c                    |   20 ++--
 net/ipv6/datagram.c                   |    2 +-
 net/ipv6/ip6_tunnel.c                 |    6 +-
 net/ipv6/ipv6_sockglue.c              |    2 +-
 net/ipv6/mcast.c                      |   20 ++--
 net/ipv6/raw.c                        |    2 +-
 net/ipv6/reassembly.c                 |    2 +-
 net/ipv6/route.c                      |    4 +-
 net/ipv6/sit.c                        |    4 +-
 net/ipx/af_ipx.c                      |    6 +-
 net/llc/af_llc.c                      |    4 +-
 net/llc/llc_core.c                    |    5 +-
 net/netrom/nr_route.c                 |   14 +-
 net/packet/af_packet.c                |   18 +-
 net/rose/rose_route.c                 |   20 ++--
 net/sched/act_mirred.c                |    2 +-
 net/sched/cls_api.c                   |    4 +-
 net/sched/em_meta.c                   |    2 +-
 net/sched/sch_api.c                   |   14 +-
 net/sctp/ipv6.c                       |    4 +-
 net/sctp/protocol.c                   |    6 +-
 net/socket.c                          |   22 ++-
 net/tipc/eth_media.c                  |    2 +-
 net/wanrouter/af_wanpipe.c            |   24 ++--
 net/x25/x25_route.c                   |    2 +-
 88 files changed, 597 insertions(+), 433 deletions(-)

diff --git a/arch/s390/appldata/appldata_net_sum.c b/arch/s390/appldata/appldata_net_sum.c
index 075e619..4a32370 100644
--- a/arch/s390/appldata/appldata_net_sum.c
+++ b/arch/s390/appldata/appldata_net_sum.c
@@ -106,8 +106,8 @@ static void appldata_get_net_sum_data(void *data)
 	rx_dropped = 0;
 	tx_dropped = 0;
 	collisions = 0;
-	read_lock(&dev_base_lock);
-	for (dev = dev_base; dev != NULL; dev = dev->next) {
+	read_lock(&per_net(dev_base_lock, init_net()));
+	for (dev = per_net(dev_base, init_net()); dev != NULL; dev = dev->next) {
 		if (dev->get_stats == NULL) {
 			continue;
 		}
@@ -123,7 +123,7 @@ static void appldata_get_net_sum_data(void *data)
 		collisions += stats->collisions;
 		i++;
 	}
-	read_unlock(&dev_base_lock);
+	read_unlock(&per_net(dev_base_lock, init_net()));
 	net_data->nr_interfaces = i;
 	net_data->rx_packets = rx_packets;
 	net_data->tx_packets = tx_packets;
diff --git a/arch/sparc64/solaris/ioctl.c b/arch/sparc64/solaris/ioctl.c
index 330743c..1ecf4ab 100644
--- a/arch/sparc64/solaris/ioctl.c
+++ b/arch/sparc64/solaris/ioctl.c
@@ -685,9 +685,9 @@ static inline int solaris_i(unsigned int fd, unsigned int cmd, u32 arg)
 			struct net_device *d;
 			int i = 0;
 			
-			read_lock_bh(&dev_base_lock);
-			for (d = dev_base; d; d = d->next) i++;
-			read_unlock_bh(&dev_base_lock);
+			read_lock_bh(&per_net(dev_base_lock, init_net()));
+			for (d = per_net(dev_base, init_net()); d; d = d->next) i++;
+			read_unlock_bh(&per_net(dev_base_lock, init_net()));
 
 			if (put_user (i, (int __user *)A(arg)))
 				return -EFAULT;
diff --git a/drivers/atm/idt77252.c b/drivers/atm/idt77252.c
index f407861..3e75e0e 100644
--- a/drivers/atm/idt77252.c
+++ b/drivers/atm/idt77252.c
@@ -3569,7 +3569,7 @@ init_card(struct atm_dev *dev)
 	 * XXX: <hack>
 	 */
 	sprintf(tname, "eth%d", card->index);
-	tmp = dev_get_by_name(tname);	/* jhs: was "tmp = dev_get(tname);" */
+	tmp = dev_get_by_name(init_net(), tname);	/* jhs: was "tmp = dev_get(tname);" */
 	if (tmp) {
 		memcpy(card->atmdev->esi, tmp->dev_addr, 6);
 
diff --git a/drivers/block/aoe/aoecmd.c b/drivers/block/aoe/aoecmd.c
index bb022ed..9678169 100644
--- a/drivers/block/aoe/aoecmd.c
+++ b/drivers/block/aoe/aoecmd.c
@@ -9,6 +9,7 @@
 #include <linux/skbuff.h>
 #include <linux/netdevice.h>
 #include <linux/genhd.h>
+#include <net/net_namespace.h>
 #include <asm/unaligned.h>
 #include "aoe.h"
 
@@ -192,8 +193,8 @@ aoecmd_cfg_pkts(ushort aoemajor, unsigned char aoeminor, struct sk_buff **tail)
 
 	sl = sl_tail = NULL;
 
-	read_lock(&dev_base_lock);
-	for (ifp = dev_base; ifp; dev_put(ifp), ifp = ifp->next) {
+	read_lock(&per_net(dev_base_lock, init_net()));
+	for (ifp = per_net(dev_base, init_net()); ifp; dev_put(ifp), ifp = ifp->next) {
 		dev_hold(ifp);
 		if (!is_aoe_netif(ifp))
 			continue;
@@ -221,7 +222,7 @@ aoecmd_cfg_pkts(ushort aoemajor, unsigned char aoeminor, struct sk_buff **tail)
 		skb->next = sl;
 		sl = skb;
 	}
-	read_unlock(&dev_base_lock);
+	read_unlock(&per_net(dev_base_lock, init_net()));
 
 	if (tail != NULL)
 		*tail = sl_tail;
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 3e04f58..2963004 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -2932,7 +2932,7 @@ static void *bond_info_seq_start(struct seq_file *seq, loff_t *pos)
 	int i;
 
 	/* make sure the bond won't be taken away */
-	read_lock(&dev_base_lock);
+	read_lock(&per_net(dev_base_lock, init_net()));
 	read_lock_bh(&bond->lock);
 
 	if (*pos == 0) {
@@ -2968,7 +2968,7 @@ static void bond_info_seq_stop(struct seq_file *seq, void *v)
 	struct bonding *bond = seq->private;
 
 	read_unlock_bh(&bond->lock);
-	read_unlock(&dev_base_lock);
+	read_unlock(&per_net(dev_base_lock, init_net()));
 }
 
 static void bond_info_show_master(struct seq_file *se

...

[ Show the rest of the message ]

Report message to a moderator

[PATCH RFC 17/31] net: Factor out __dev_alloc_name from dev_alloc_name [message #17355 is a reply to message #17338]

Thu, 25 January 2007 19:00

ebiederm
Messages: 1354
Registered: February 2006

Senior Member

From: Eric W. Biederman <ebiederm@xmission.com> - unquoted

When forcibly changing the network namespace of a device
I need something that can generate a name for the device
in the new namespace without overwriting the old name.

__dev_alloc_name provides me that functionality.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 net/core/dev.c |   44 +++++++++++++++++++++++++++++++++-----------
 1 files changed, 33 insertions(+), 11 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 32fe905..fc0d2af 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -655,9 +655,10 @@ int dev_valid_name(const char *name)
 }
 
 /**
- *	dev_alloc_name - allocate a name for a device
- *	@dev: device
+ *	__dev_alloc_name - allocate a name for a device
+ *	@net: network namespace to allocate the device name in
  *	@name: name format string
+ *	@buf:  scratch buffer and result name string
  *
  *	Passed a format string - eg "lt%d" it will try and find a suitable
  *	id. It scans list of devices to build up a free map, then chooses
@@ -668,18 +669,13 @@ int dev_valid_name(const char *name)
  *	Returns the number of the unit assigned or a negative errno code.
  */
 
-int dev_alloc_name(struct net_device *dev, const char *name)
+static int __dev_alloc_name(net_t net, const char *name, char buf[IFNAMSIZ])
 {
 	int i = 0;
-	char buf[IFNAMSIZ];
 	const char *p;
 	const int max_netdevices = 8*PAGE_SIZE;
 	long *inuse;
 	struct net_device *d;
-	net_t net;
-
-	BUG_ON(null_net(dev->nd_net));
-	net = dev->nd_net;
 
 	p = strnchr(name, IFNAMSIZ-1, '%');
 	if (p) {
@@ -713,10 +709,8 @@ int dev_alloc_name(struct net_device *dev, const char *name)
 	}
 
 	snprintf(buf, sizeof(buf), name, i);
-	if (!__dev_get_by_name(net, buf)) {
-		strlcpy(dev->name, buf, IFNAMSIZ);
+	if (!__dev_get_by_name(net, buf))
 		return i;
-	}
 
 	/* It is possible to run out of possible slots
 	 * when the name is long and there isn't enough space left
@@ -725,6 +719,34 @@ int dev_alloc_name(struct net_device *dev, const char *name)
 	return -ENFILE;
 }
 
+/**
+ *	dev_alloc_name - allocate a name for a device
+ *	@dev: device
+ *	@name: name format string
+ *
+ *	Passed a format string - eg "lt%d" it will try and find a suitable
+ *	id. It scans list of devices to build up a free map, then chooses
+ *	the first empty slot. The caller must hold the dev_base or rtnl lock
+ *	while allocating the name and adding the device in order to avoid
+ *	duplicates.
+ *	Limited to bits_per_byte * page size devices (ie 32K on most platforms).
+ *	Returns the number of the unit assigned or a negative errno code.
+ */
+
+int dev_alloc_name(struct net_device *dev, const char *name)
+{
+	char buf[IFNAMSIZ];
+	net_t net;
+	int ret;
+
+	BUG_ON(null_net(dev->nd_net));
+	net = dev->nd_net;
+	ret = __dev_alloc_name(net, name, buf);
+	if (ret >= 0)
+		strlcpy(dev->name, buf, IFNAMSIZ);
+	return ret;
+}
+
 
 /**
  *	dev_change_name - change name of a device
-- 
1.4.4.1.g278f

_______________________________________________
Containers mailing list
Containers@lists.osdl.org
https://lists.osdl.org/mailman/listinfo/containers

Report message to a moderator

[PATCH RFC 18/31] net: Implment network device movement between namespaces [message #17356 is a reply to message #17338]

Thu, 25 January 2007 19:00

ebiederm
Messages: 1354
Registered: February 2006

Senior Member

From: Eric W. Biederman <ebiederm@xmission.com> - unquoted

This patch introduces NETIF_F_NETNS_LOCAL a flag to indicate
a network device is local to a single network namespace and
should never be moved.  Useful for pseudo devices that we
need an instance in each network namespace (like the loopback
device) and for any device we find that cannot handle multiple
network namespaces so we may trap them in the initial network
namespace.

This patch introduces the function dev_change_net_namespace
a function used to move a network device from one network
namespace to another.  To the network device nothing
special appears to happen, to the components of the network
stack it appears as if the network device was unregistered
in the network namespace it is in, and a new device
was registered in the network namespace the device
was moved to.

This patch sets up a namespace device destructor that
upon the exit of a network namespace moves all of the
movable network devices  to the initial network namespace
so they are not lost.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 drivers/net/loopback.c    |    3 +-
 include/linux/netdevice.h |    3 +
 net/core/dev.c            |  222 +++++++++++++++++++++++++++++++++++++++-----
 3 files changed, 201 insertions(+), 27 deletions(-)

diff --git a/drivers/net/loopback.c b/drivers/net/loopback.c
index e9abf3f..7d15de0 100644
--- a/drivers/net/loopback.c
+++ b/drivers/net/loopback.c
@@ -225,7 +225,8 @@ DEFINE_PER_NET(struct net_device, loopback_dev) = {
 				  | NETIF_F_TSO
 #endif
 				  | NETIF_F_NO_CSUM | NETIF_F_HIGHDMA
-				  | NETIF_F_LLTX,
+				  | NETIF_F_LLTX
+				  | NETIF_F_NETNS_LOCAL,
 	.ethtool_ops		= &loopback_ethtool_ops,
 };
 
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 0b4a4dc..3fcaf60 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -324,6 +324,7 @@ struct net_device
 #define NETIF_F_VLAN_CHALLENGED	1024	/* Device cannot handle VLAN packets */
 #define NETIF_F_GSO		2048	/* Enable software GSO. */
 #define NETIF_F_LLTX		4096	/* LockLess TX */
+#define NETIF_F_NETNS_LOCAL	8192	/* Does not change network namespaces */
 
 	/* Segmentation offload features */
 #define NETIF_F_GSO_SHIFT	16
@@ -710,6 +711,8 @@ extern int		dev_ethtool(net_t net, struct ifreq *);
 extern unsigned		dev_get_flags(const struct net_device *);
 extern int		dev_change_flags(struct net_device *, unsigned);
 extern int		dev_change_name(struct net_device *, char *);
+extern int		dev_change_net_namespace(struct net_device *, net_t,
+						 const char *);
 extern int		dev_set_mtu(struct net_device *, int);
 extern int		dev_set_mac_address(struct net_device *,
 					    struct sockaddr *);
diff --git a/net/core/dev.c b/net/core/dev.c
index fc0d2af..52994e4 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -198,6 +198,52 @@ static inline struct hlist_head *dev_index_hash(net_t net, int ifindex)
 	return &per_net(dev_index_head, net)[ifindex & ((1<<NETDEV_HASHBITS)-1)];
 }
 
+/* Device list insertion */
+static int list_netdevice(struct net_device *dev)
+{
+	net_t net = dev->nd_net;
+
+	ASSERT_RTNL();
+
+	dev->next = NULL;
+	write_lock_bh(&per_net(dev_base_lock, net));
+	*per_net(dev_tail, net) = dev;
+	per_net(dev_tail, net) = &dev->next;
+	hlist_add_head(&dev->name_hlist, dev_name_hash(net, dev->name));
+	hlist_add_head(&dev->index_hlist, dev_index_hash(net, dev->ifindex));
+	write_unlock_bh(&per_net(dev_base_lock, net));
+	return 0;
+}
+
+/* Device list removal */
+static int unlist_netdevice(struct net_device *dev)
+{
+	struct net_device *d, **dp;
+	net_t net = dev->nd_net;
+
+	ASSERT_RTNL();
+
+	/* Unlink dev from the device chain */
+	for (dp = &per_net(dev_base, net); (d = *dp) != NULL; dp = &d->next) {
+		if (d == dev) {
+			write_lock_bh(&per_net(dev_base_lock, net));
+			hlist_del(&dev->name_hlist);
+			hlist_del(&dev->index_hlist);
+			if (per_net(dev_tail, net) == &dev->next)
+				per_net(dev_tail, net) = dp;
+			*dp = d->next;
+			write_unlock_bh(&per_net(dev_base_lock, net));
+			break;
+		}
+	}
+	if (!d) {
+		printk(KERN_ERR "unlist net_device: '%s' not found\n",
+		       dev->name);
+		return -ENODEV;
+	}
+	return 0;
+}
+
 /*
  *	Our notifier list
  */
@@ -3054,15 +3100,9 @@ int register_netdevice(struct net_device *dev)
 
 	set_bit(__LINK_STATE_PRESENT, &dev->state);
 
-	dev->next = NULL;
 	dev_init_scheduler(dev);
-	write_lock_bh(&per_net(dev_base_lock, net));
-	*per_net(dev_tail, net) = dev;
-	per_net(dev_tail, net) = &dev->next;
-	hlist_add_head(&dev->name_hlist, head);
-	hlist_add_head(&dev->index_hlist, dev_index_hash(net, dev->ifindex));
 	dev_hold(dev);
-	write_unlock_bh(&per_net(dev_base_lock, net));
+	list_netdevice(dev);
 
 	/* Notify protocols, that a new device appeared. */
 	raw_notifier_call_chain(&netdev_chain, NETDEV_REGISTER, dev);
@@ -3327,9 +3367,6 @@ void synchronize_net(void)
 
 int unregister_netdevice(struct net_device *dev)
 {
-	struct net_device *d, **dp;
-	net_t net = dev->nd_net;
-
 	BUG_ON(dev_boot_phase);
 	ASSERT_RTNL();
 
@@ -3347,23 +3384,8 @@ int unregister_netdevice(struct net_device *dev)
 		dev_close(dev);
 
 	/* And unlink it from device chain. */
-	for (dp = &per_net(dev_base, net); (d = *dp) != NULL; dp = &d->next) {
-		if (d == dev) {
-			write_lock_bh(&per_net(dev_base_lock, net));
-			hlist_del(&dev->name_hlist);
-			hlist_del(&dev->index_hlist);
-			if (per_net(dev_tail, net) == &dev->next)
-				per_net(dev_tail, net) = dp;
-			*dp = d->next;
-			write_unlock_bh(&per_net(dev_base_lock, net));
-			break;
-		}
-	}
-	if (!d) {
-		printk(KERN_ERR "unregister net_device: '%s' not found\n",
-		       dev->name);
+	if (unlist_netdevice(dev))
 		return -ENODEV;
-	}
 
 	dev->reg_state = NETREG_UNREGISTERING;
 
@@ -3419,6 +3441,120 @@ void unregister_netdev(struct net_device *dev)
 
 EXPORT_SYMBOL(unregister_netdev);
 
+/**
+ *	dev_change_net_namespace - move device to different nethost namespace
+ *	@dev: device
+ *	@net: network namespace
+ *	@pat: If not NULL name pattern to try if the current device name
+ *	      is already taken in the destination network namespace.
+ *
+ *	This function shuts down a device interface and moves it
+ *	to a new network namespace. On success 0 is returned, on
+ *	a failure a netagive errno code is returned.
+ *
+ *	Callers must hold the rtnl semaphore.
+ */
+
+int dev_change_net_namespace(struct net_device *dev, net_t net, const char *pat)
+{
+	char buf[IFNAMSIZ];
+	const char *destname;
+	int err;
+
+	ASSERT_RTNL();
+
+	/* Don't allow namespace local devices to be moved. */
+	err = -EINVAL;
+	if (dev->features & NETIF_F_NETNS_LOCAL)
+		goto out;
+
+	/* Ensure the device has been registrered */
+	err = -EINVAL;
+	if (dev->reg_state != NETREG_REGISTERED)
+		goto out;
+	
+	/* Get out if there is nothing todo */
+	err = 0;
+	if (net_eq(dev->nd_net, net))
+		goto out;
+
+	/* Pick the destination device name, and ensure
+	 * we can use it in the destination network namespace.
+	 */
+	err = -EEXIST;
+	destname = dev->name;
+	if (__dev_get_by_name(net, destname) && pat) {
+		/* We get here if we can't use the current device name */
+		if (!dev_valid_name(pat))
+			goto out;
+		if (strchr(pat, '%')) {
+			if (__dev_alloc_name(net, pat, buf) < 0)
+				goto out;
+			destname = buf;
+		} else
+			destname = pat;
+		if (__dev_get_by_name(net, destname))
+			goto out;
+	}
+
+	/*
+	 * And now a mini version of register_netdevice unregister_netdevice. 
+	 */
+
+	/* If device is running close it first. */
+	if (dev->flags & IFF_UP)
+		dev_close(dev);
+
+	/* And unlink it from device chain */
+	err = -ENODEV;
+	if (unlist_netdevice(dev))
+		goto out;
+	
+	synchronize_net();
+	
+	/* Shutdown queueing discipline. */
+	dev_shutdown(dev);
+
+	/* Notify protocols, that we are about to destroy
+	   this device. They should clean all the things.
+	*/
+	call_netdevice_notifiers(NETDEV_UNREGISTER, dev);
+	
+	/*
+	 *	Flush the multicast chain
+	 */
+	dev_mc_discard(dev);
+
+	/* Actually switch the network namespace */
+	dev->nd_net = net;
+	
+	/* Assign the new device name */
+	if (destname != dev->name)
+		strcpy(dev->name, destname);
+
+	/* If there is an ifindex conflict assign a new one */
+	if (__dev_get_by_index(net, dev->ifindex)) {
+		int iflink = (dev->iflink == dev->ifindex);
+		dev->ifindex = dev_new_index(net);
+		if (iflink)
+			dev->iflink = dev->ifindex;
+	}
+
+	/* Fixup sysfs */
+	class_device_rename(&dev->class_dev, dev->name);
+
+	/* Add the device back in the hashes */
+	list_netdevice(dev);
+
+	/* Notify protocols, that a new device appeared. */
+	call_netdevice_notifiers(NETDEV_REGISTER, dev);
+
+	synchronize_net();
+	err = 0;
+out:
+	return err;
+}
+
 static int dev_cpu_callback(struct notifier_block *nfb,
 			    unsigned long action,
 			    void *ocpu)
@@ -3561,6 +3697,37 @@ static struct pernet_operations netdev_net_ops = {
 	.init = netdev_init,
 };
 
+static void default_device_exit(net_t net)
+{
+	struct net_device *dev, *next;
+	/*
+	 * Push all migratable of the network devices back to the
+	 * initial network namespace 
+	 */
+	rtnl_lock();
+	for (dev = per_net(dev_base, net); dev; dev = next) {
+		int err;
+		next = dev->next;
+
+		/* Ignore unmoveable devices (i.e. loopback) */
+		if (dev->features & NETIF_F_NETNS_LOCAL)
+			continue;
+
+		/* Push remaing network devices to init_net */
+		err = dev_change_net_namespace(dev, init_net(), "dev%d");
+		if (err) {
+			printk(KERN_WARNING "%s: failed to move %s to init_net: %d\n",
+				__func__, dev->name, err);
+			unregister_netdevice(dev);
+		}
+	}
+	rtnl_unlock();
+}
+
+static struct pernet_operations default_device_ops = {
+	.exit = default_device_exit,

...

[ Show the rest of the message ]

Report message to a moderator

[PATCH RFC 19/31] net: sysfs interface support for moving devices between network namespaces. [message #17357 is a reply to message #17338]

Thu, 25 January 2007 19:00

ebiederm
Messages: 1354
Registered: February 2006

Senior Member

From: Eric W. Biederman <ebiederm@xmission.com> - unquoted

I haven't a clue if this interface will meet with widespread approval but
at this point it is simple, and very useful.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 net/core/net-sysfs.c |   35 +++++++++++++++++++++++++++++++++++
 1 files changed, 35 insertions(+), 0 deletions(-)

diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index 1be6f94..f8a5c6b 100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -188,6 +188,40 @@ static ssize_t store_mtu(struct class_device *cd, const char *buf, size_t len)
 	return netdev_store(cd, buf, len, change_mtu);
 }
 
+static ssize_t show_new_ns_pid(struct class_device *cd, char *buf)
+{
+	return -EPERM;
+}
+static int change_new_ns_pid(struct net_device *dev, unsigned long new_ns_pid)
+{
+	struct task_struct *tsk;
+	int err;
+	net_t net;
+	/* Look up the network namespace */
+	err = -ESRCH;
+	rcu_read_lock();
+	tsk = find_task_by_pid(new_ns_pid);
+	if (tsk) {
+		task_lock(tsk);
+		if (tsk->nsproxy) {
+			err = 0;
+			net = get_net(tsk->nsproxy->net_ns);
+		}
+		task_unlock(tsk);
+	}
+	rcu_read_unlock();
+	/* If I found a network namespace move the device */
+	if (!err) {
+		err = dev_change_net_namespace(dev, net, NULL);
+		put_net(net);
+	}
+	return err;
+}
+static ssize_t store_new_ns_pid(struct class_device *cd, const char *buf, size_t len)
+{
+	return netdev_store(cd, buf, len, change_new_ns_pid);
+}
+
 NETDEVICE_SHOW(flags, fmt_hex);
 
 static int change_flags(struct net_device *dev, unsigned long new_flags)
@@ -243,6 +277,7 @@ static struct class_device_attribute net_class_attributes[] = {
 	__ATTR(tx_queue_len, S_IRUGO | S_IWUSR, show_tx_queue_len,
 	       store_tx_queue_len),
 	__ATTR(weight, S_IRUGO | S_IWUSR, show_weight, store_weight),
+	__ATTR(new_ns_pid, S_IWUSR, show_new_ns_pid, store_new_ns_pid),
 	{}
 };
 
-- 
1.4.4.1.g278f

_______________________________________________
Containers mailing list
Containers@lists.osdl.org
https://lists.osdl.org/mailman/listinfo/containers

Report message to a moderator

[PATCH RFC 20/31] net: Implement CONFIG_NET_NS [message #17358 is a reply to message #17338]

Thu, 25 January 2007 19:00

ebiederm
Messages: 1354
Registered: February 2006

Senior Member

From: Eric W. Biederman <ebiederm@xmission.com> - unquoted

Add the config option to enable multiple network namespaces.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 net/Kconfig |    7 +++++++
 1 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/net/Kconfig b/net/Kconfig
index 7dfc949..4671398 100644
--- a/net/Kconfig
+++ b/net/Kconfig
@@ -27,6 +27,13 @@ if NET
 
 menu "Networking options"
 
+config NET_NS
+	bool "Network namespace support"
+	depends on EXPERIMENTAL
+	help
+	  Support what appear to user space as multiple instances of the 
+ 	  network stack.
+
 config NETDEBUG
 	bool "Network packet debugging"
 	help
-- 
1.4.4.1.g278f

_______________________________________________
Containers mailing list
Containers@lists.osdl.org
https://lists.osdl.org/mailman/listinfo/containers

Report message to a moderator

[PATCH RFC 21/31] net: Implement the guts of the network namespace infrastructure [message #17359 is a reply to message #17338]

Thu, 25 January 2007 19:00

ebiederm
Messages: 1354
Registered: February 2006

Senior Member

From: Eric W. Biederman <ebiederm@xmission.com> - unquoted

Support is added for the .data.pernet section where all of
the variables who have a single instance in each network
namespace will live.  Every architectures linker script
is modified so is should work.

Summarizing the functions:
net_ns_init creates a slab and allocates the template and
the initial network namespace.

pernet_modcopy keeps the network namespaces in sync with
the loaded modules.  Initializing new data variables as
they are added.

The network namespace destruction because the last reference
can come from interrupt context queues itself for later with
schedule_work.   Then we alert everyone the network namespace
is disappearing.  If a buggy user is still holding a reference
to the network namespace we print a nasty message and leak
the network namespace.

The wrest are just light-weight wrapper functions to make things
more convinient.

A little should probably be said about net_head the variable
at the start of my network namespace structure.  It is the only
variable with a location decided by the C code instead of the linker
and I string them together in a linked list so I can iterate.

Probably more interesting is that it looks like it is saner not
to directly use a pointer to my network namespace but instead to
use an offset.  All of the references to data in my network namespace
are coming from per_net(...) which takes the address of the variable
in the .data.pernet section and then adds my magic offset.  If I
used a pointer I would have to subract an additional value and
export an extra symbol.  Not good for performance or maintenance :)

The expected usage of network namespace variables is to replace
sequences like:  &loopback_dev with &per_net(loopback_dev, net)
where net is some network namespace reference.  In my preliminary
tests the only a single additional addition is inserted so it
appears to be an efficient idiom.  Hopefully it is also easy to
comprehend and use.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 arch/alpha/kernel/vmlinux.lds.S            |    2 +
 arch/arm/kernel/vmlinux.lds.S              |    3 +
 arch/arm26/kernel/vmlinux-arm26-xip.lds.in |    3 +
 arch/arm26/kernel/vmlinux-arm26.lds.in     |    3 +
 arch/avr32/kernel/vmlinux.lds.c            |    3 +
 arch/cris/arch-v10/vmlinux.lds.S           |    2 +
 arch/cris/arch-v32/vmlinux.lds.S           |    2 +
 arch/frv/kernel/vmlinux.lds.S              |    2 +
 arch/h8300/kernel/vmlinux.lds.S            |    3 +
 arch/i386/kernel/vmlinux.lds.S             |    3 +
 arch/ia64/kernel/vmlinux.lds.S             |    2 +
 arch/m32r/kernel/vmlinux.lds.S             |    3 +
 arch/m68k/kernel/vmlinux-std.lds           |    3 +
 arch/m68k/kernel/vmlinux-sun3.lds          |    3 +
 arch/m68knommu/kernel/vmlinux.lds.S        |    3 +
 arch/mips/kernel/vmlinux.lds.S             |    3 +
 arch/parisc/kernel/vmlinux.lds.S           |    3 +
 arch/powerpc/kernel/vmlinux.lds.S          |    2 +
 arch/ppc/kernel/vmlinux.lds.S              |    2 +
 arch/s390/kernel/vmlinux.lds.S             |    3 +
 arch/sh/kernel/vmlinux.lds.S               |    3 +
 arch/sh64/kernel/vmlinux.lds.S             |    3 +
 arch/sparc/kernel/vmlinux.lds.S            |    3 +
 arch/sparc64/kernel/vmlinux.lds.S          |    3 +
 arch/v850/kernel/vmlinux.lds.S             |    6 +-
 arch/x86_64/kernel/vmlinux.lds.S           |    3 +
 arch/xtensa/kernel/vmlinux.lds.S           |    2 +
 include/asm-generic/vmlinux.lds.h          |    8 +
 include/asm-um/common.lds.S                |    4 +-
 include/linux/module.h                     |    3 +
 include/linux/net_namespace_type.h         |   63 ++++++++-
 include/net/net_namespace.h                |   49 ++++++-
 kernel/module.c                            |  211 ++++++++++++++++++++++++-
 net/core/net_namespace.c                   |  232 ++++++++++++++++++++++++++++
 34 files changed, 631 insertions(+), 15 deletions(-)

diff --git a/arch/alpha/kernel/vmlinux.lds.S b/arch/alpha/kernel/vmlinux.lds.S
index 76bf071..ad20077 100644
--- a/arch/alpha/kernel/vmlinux.lds.S
+++ b/arch/alpha/kernel/vmlinux.lds.S
@@ -72,6 +72,8 @@ SECTIONS
   .data.percpu : { *(.data.percpu) }
   __per_cpu_end = .;
 
+  DATA_PER_NET
+
   . = ALIGN(2*8192);
   __init_end = .;
   /* Freed after init ends here */
diff --git a/arch/arm/kernel/vmlinux.lds.S b/arch/arm/kernel/vmlinux.lds.S
index a8fa75e..5b003f9 100644
--- a/arch/arm/kernel/vmlinux.lds.S
+++ b/arch/arm/kernel/vmlinux.lds.S
@@ -61,6 +61,9 @@ SECTIONS
 		__per_cpu_start = .;
 			*(.data.percpu)
 		__per_cpu_end = .;
+
+		DATA_PER_NET
+
 #ifndef CONFIG_XIP_KERNEL
 		__init_begin = _stext;
 		*(.init.data)
diff --git a/arch/arm26/kernel/vmlinux-arm26-xip.lds.in b/arch/arm26/kernel/vmlinux-arm26-xip.lds.in
index ca61ec8..69d5772 100644
--- a/arch/arm26/kernel/vmlinux-arm26-xip.lds.in
+++ b/arch/arm26/kernel/vmlinux-arm26-xip.lds.in
@@ -50,6 +50,9 @@ SECTIONS
 		__initramfs_start = .;
 			usr/built-in.o(.init.ramfs)
 		__initramfs_end = .;
+
+		DATA_PER_NET
+
 		. = ALIGN(32768);
 		__init_end = .;
 	}
diff --git a/arch/arm26/kernel/vmlinux-arm26.lds.in b/arch/arm26/kernel/vmlinux-arm26.lds.in
index d1d3418..473a5b4 100644
--- a/arch/arm26/kernel/vmlinux-arm26.lds.in
+++ b/arch/arm26/kernel/vmlinux-arm26.lds.in
@@ -51,6 +51,9 @@ SECTIONS
 		__initramfs_start = .;
 			usr/built-in.o(.init.ramfs)
 		__initramfs_end = .;
+
+		DATA_PER_NET
+
 		. = ALIGN(32768);
 		__init_end = .;
 	}
diff --git a/arch/avr32/kernel/vmlinux.lds.c b/arch/avr32/kernel/vmlinux.lds.c
index 5c4424e..dee3715 100644
--- a/arch/avr32/kernel/vmlinux.lds.c
+++ b/arch/avr32/kernel/vmlinux.lds.c
@@ -50,6 +50,9 @@ SECTIONS
 		__initramfs_start = .;
 			*(.init.ramfs)
 		__initramfs_end = .;
+
+		DATA_PER_NET
+
 		. = ALIGN(4096);
 		__init_end = .;
 	}
diff --git a/arch/cris/arch-v10/vmlinux.lds.S b/arch/cris/arch-v10/vmlinux.lds.S
index 689729a..f1c890c 100644
--- a/arch/cris/arch-v10/vmlinux.lds.S
+++ b/arch/cris/arch-v10/vmlinux.lds.S
@@ -83,6 +83,8 @@ SECTIONS
 	}	
 	SECURITY_INIT
 		
+	DATA_PER_NET
+
 	.init.ramfs : {
 		__initramfs_start = .;
 		*(.init.ramfs)
diff --git a/arch/cris/arch-v32/vmlinux.lds.S b/arch/cris/arch-v32/vmlinux.lds.S
index 472d4b3..eb08771 100644
--- a/arch/cris/arch-v32/vmlinux.lds.S
+++ b/arch/cris/arch-v32/vmlinux.lds.S
@@ -95,6 +95,8 @@ SECTIONS
 	.data.percpu  : { *(.data.percpu) }
 	__per_cpu_end = .;
 
+	DATA_PER_NET
+
 	.init.ramfs : {
 		__initramfs_start = .;
 		*(.init.ramfs)
diff --git a/arch/frv/kernel/vmlinux.lds.S b/arch/frv/kernel/vmlinux.lds.S
index 9c1fb12..f383c83 100644
--- a/arch/frv/kernel/vmlinux.lds.S
+++ b/arch/frv/kernel/vmlinux.lds.S
@@ -61,6 +61,8 @@ SECTIONS
   .data.percpu  : { *(.data.percpu) }
   __per_cpu_end = .;
 
+  DATA_PER_NET
+
   . = ALIGN(4096);
   __initramfs_start = .;
   .init.ramfs : { *(.init.ramfs) }
diff --git a/arch/h8300/kernel/vmlinux.lds.S b/arch/h8300/kernel/vmlinux.lds.S
index f05288b..5d5fda5 100644
--- a/arch/h8300/kernel/vmlinux.lds.S
+++ b/arch/h8300/kernel/vmlinux.lds.S
@@ -130,6 +130,9 @@ SECTIONS
 	___initramfs_start = .;
   		*(.init.ramfs)
   	___initramfs_end = .;
+
+	DATA_PER_NET
+
 	. = ALIGN(0x4) ;
 	___init_end = .;
 	__edata = . ;
diff --git a/arch/i386/kernel/vmlinux.lds.S b/arch/i386/kernel/vmlinux.lds.S
index a53c8b1..1aae8b4 100644
--- a/arch/i386/kernel/vmlinux.lds.S
+++ b/arch/i386/kernel/vmlinux.lds.S
@@ -193,6 +193,9 @@ SECTIONS
 	*(.data.percpu)
 	__per_cpu_end = .;
   }
+
+  DATA_PER_NET
+
   . = ALIGN(4096);
   /* freed after init ends here */
 	
diff --git a/arch/ia64/kernel/vmlinux.lds.S b/arch/ia64/kernel/vmlinux.lds.S
index d6083a0..28dd9eb 100644
--- a/arch/ia64/kernel/vmlinux.lds.S
+++ b/arch/ia64/kernel/vmlinux.lds.S
@@ -118,6 +118,8 @@ SECTIONS
 	  __initramfs_end = .;
 	}
 
+  DATA_PER_NET
+
    . = ALIGN(16);
   .init.setup : AT(ADDR(.init.setup) - LOAD_OFFSET)
         {
diff --git a/arch/m32r/kernel/vmlinux.lds.S b/arch/m32r/kernel/vmlinux.lds.S
index 358b9ce..3e8c624 100644
--- a/arch/m32r/kernel/vmlinux.lds.S
+++ b/arch/m32r/kernel/vmlinux.lds.S
@@ -107,6 +107,9 @@ SECTIONS
   __per_cpu_start = .;
   .data.percpu  : { *(.data.percpu) }
   __per_cpu_end = .;
+
+  DATA_PER_NET
+
   . = ALIGN(4096);
   __init_end = .;
   /* freed after init ends here */
diff --git a/arch/m68k/kernel/vmlinux-std.lds b/arch/m68k/kernel/vmlinux-std.lds
index d279445..d60cb7e 100644
--- a/arch/m68k/kernel/vmlinux-std.lds
+++ b/arch/m68k/kernel/vmlinux-std.lds
@@ -65,6 +65,9 @@ SECTIONS
   __initramfs_start = .;
   .init.ramfs : { *(.init.ramfs) }
   __initramfs_end = .;
+
+  DATA_PER_NET
+
   . = ALIGN(8192);
   __init_end = .;
 
diff --git a/arch/m68k/kernel/vmlinux-sun3.lds b/arch/m68k/kernel/vmlinux-sun3.lds
index 8c7eccb..101ec12 100644
--- a/arch/m68k/kernel/vmlinux-sun3.lds
+++ b/arch/m68k/kernel/vmlinux-sun3.lds
@@ -59,6 +59,9 @@ __init_begin = .;
 	__initramfs_start = .;
 	.init.ramfs : { *(.init.ramfs) }
 	__initramfs_end = .;
+
+	DATA_PER_NET
+
 	. = ALIGN(8192);
 	__init_end = .;
 	.data.init.task : { *(.data.init_task) }
diff --git a/arch/m68knommu/kernel/vmlinux.lds.S b/arch/m68knommu/kernel/vmlinux.lds.S
index 2b2a10d..e713614 100644
--- a/arch/m68knommu/kernel/vmlinux.lds.S
+++ b/arch/m68knommu/kernel/vmlinux.lds.S
@@ -153,6 +153,9 @@ SECTIONS {
 		__initramfs_start = .;
 		*(.init.ramfs)
 		__initramfs_end = .;
+
+		DATA_PER_NET
+
 		. = ALIGN(4096);
 		__init_end = .;
 	} > INIT
diff --git a/arch/mips/kernel/vmlinux.lds.S b/arch/mips/kernel/vmlinux.lds.S
index cecff24..a5cfeef 100644
--- a/arch/mips/kernel/vmlinux.lds.S
+++ b/arch/mips/kernel/vmlinux.lds.S
@@ -121,6 +121,9 @@ SECTIONS
   __per_cpu_start = .;
   .data.percpu  : { *(.data.percpu) }
   __per_cpu_end = .;
+
+  DATA_PER_NET
+
   . = ALIGN(_PAGE_SIZE);
   __init_end = .;
   /* freed after init ends here */
diff --git a/arch/parisc/kernel/vmlinux.lds.S b/arch/parisc/kernel/vmlinux.lds.S
index 7b943b4..2cf241b 100644
--- a/arch/parisc/kernel/v

...

[ Show the rest of the message ]

Report message to a moderator

[PATCH RFC 22/31] net: Add network namespace clone support. [message #17360 is a reply to message #17338]

Thu, 25 January 2007 19:00

ebiederm
Messages: 1354
Registered: February 2006

Senior Member

From: Eric W. Biederman <ebiederm@xmission.com> - unquoted

This patch allows you to create a new network namespace
using sys_clone(...).

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 include/linux/sched.h    |    1 +
 kernel/nsproxy.c         |   11 +++++++++++
 net/core/net_namespace.c |   38 ++++++++++++++++++++++++++++++++++++++
 3 files changed, 50 insertions(+), 0 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 4463735..9e0f91a 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -26,6 +26,7 @@
 #define CLONE_STOPPED		0x02000000	/* Start in stopped state */
 #define CLONE_NEWUTS		0x04000000	/* New utsname group? */
 #define CLONE_NEWIPC		0x08000000	/* New ipcs */
+#define CLONE_NEWNET		0x20000000	/* New network namespace */
 
 /*
  * Scheduling policies
diff --git a/kernel/nsproxy.c b/kernel/nsproxy.c
index 4f3c95a..7861c4c 100644
--- a/kernel/nsproxy.c
+++ b/kernel/nsproxy.c
@@ -20,6 +20,7 @@
 #include <linux/mnt_namespace.h>
 #include <linux/utsname.h>
 #include <linux/pid_namespace.h>
+#include <net/net_namespace.h>
 
 struct nsproxy init_nsproxy = INIT_NSPROXY(init_nsproxy);
 EXPORT_SYMBOL_GPL(init_nsproxy);
@@ -70,6 +71,7 @@ struct nsproxy *dup_namespaces(struct nsproxy *orig)
 			get_ipc_ns(ns->ipc_ns);
 		if (ns->pid_ns)
 			get_pid_ns(ns->pid_ns);
+		get_net(ns->net_ns);
 	}
 
 	return ns;
@@ -117,10 +119,18 @@ int copy_namespaces(int flags, struct task_struct *tsk)
 	if (err)
 		goto out_pid;
 
+	err = copy_net(flags, tsk);
+	if (err)
+		goto out_net;
+
 out:
 	put_nsproxy(old_ns);
 	return err;
 
+out_net:
+	if (new_ns->pid_ns)
+		put_pid_ns(new_ns->pid_ns);
+
 out_pid:
 	if (new_ns->ipc_ns)
 		put_ipc_ns(new_ns->ipc_ns);
@@ -146,5 +156,6 @@ void free_nsproxy(struct nsproxy *ns)
 		put_ipc_ns(ns->ipc_ns);
 	if (ns->pid_ns)
 		put_pid_ns(ns->pid_ns);
+	put_net(ns->net_ns);
 	kfree(ns);
 }
diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
index 93e3879..cc56105 100644
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -175,6 +175,44 @@ out_undo:
 	goto out;
 }
 
+int copy_net(int flags, struct task_struct *tsk)
+{
+	net_t old_net = tsk->nsproxy->net_ns;
+	net_t new_net;
+	int err;
+
+	get_net(old_net);
+
+	if (!(flags & CLONE_NEWNET))
+		return 0;
+
+	err = -EPERM;
+	if (!capable(CAP_SYS_ADMIN))
+		goto out;
+
+	err = -ENOMEM;
+	new_net = net_alloc();
+	if (null_net(new_net))
+		goto out;
+
+	mutex_lock(&net_mutex);
+	err = setup_net(new_net);
+	if (err)
+		goto out_unlock;
+
+	net_lock();
+	net_list_append(new_net);
+	net_unlock();
+
+	tsk->nsproxy->net_ns = new_net;
+
+out_unlock:
+	mutex_unlock(&net_mutex);
+out:
+	put_net(old_net);
+	return err;
+}
+
 void pernet_modcopy(void *pnetdst, const void *src, unsigned long size)
 {
 	net_t net;
-- 
1.4.4.1.g278f

_______________________________________________
Containers mailing list
Containers@lists.osdl.org
https://lists.osdl.org/mailman/listinfo/containers

Report message to a moderator

[PATCH RFC 23/31] net: Modify all rtnetlink methods to only work in the initial namespace [message #17361 is a reply to message #17338]

Thu, 25 January 2007 19:00

ebiederm
Messages: 1354
Registered: February 2006

Senior Member

From: Eric W. Biederman <ebiederm@xmission.com> - unquoted

Before I can enable rtnetlink to work in all network namespaces
I need to be certain that something won't break.  So this
patch deliberately disables all of the methods and when they
are audited this extra check can be disabled.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 net/bridge/br_netlink.c |    9 +++++++++
 net/core/fib_rules.c    |    7 +++++++
 net/core/neighbour.c    |   18 ++++++++++++++++++
 net/core/rtnetlink.c    |   13 +++++++++++++
 net/decnet/dn_dev.c     |   12 ++++++++++++
 net/decnet/dn_fib.c     |    8 ++++++++
 net/decnet/dn_route.c   |    8 ++++++++
 net/decnet/dn_rules.c   |    5 +++++
 net/decnet/dn_table.c   |    4 ++++
 net/ipv4/devinet.c      |   12 ++++++++++++
 net/ipv4/fib_frontend.c |   12 ++++++++++++
 net/ipv4/fib_rules.c    |    5 +++++
 net/ipv6/addrconf.c     |   31 +++++++++++++++++++++++++++++++
 net/ipv6/fib6_rules.c   |    5 +++++
 net/ipv6/ip6_fib.c      |    4 ++++
 net/ipv6/route.c        |   12 ++++++++++++
 net/sched/act_api.c     |    8 ++++++++
 net/sched/cls_api.c     |    8 ++++++++
 net/sched/sch_api.c     |   20 ++++++++++++++++++++
 19 files changed, 201 insertions(+), 0 deletions(-)

diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c
index 119b97d..85165a1 100644
--- a/net/bridge/br_netlink.c
+++ b/net/bridge/br_netlink.c
@@ -14,6 +14,7 @@
 #include <linux/rtnetlink.h>
 #include <net/netlink.h>
 #include <net/net_namespace.h>
+#include <net/sock.h>
 #include "br_private.h"
 
 static inline size_t br_nlmsg_size(void)
@@ -104,9 +105,13 @@ errout:
  */
 static int br_dump_ifinfo(struct sk_buff *skb, struct netlink_callback *cb)
 {
+	net_t net = skb->sk->sk_net;
 	struct net_device *dev;
 	int idx;
 
+	if (!net_eq(net, init_net()))
+		return 0;
+
 	read_lock(&per_net(dev_base_lock, init_net()));
 	for (dev = per_net(dev_base, init_net()), idx = 0; dev; dev = dev->next) {
 		/* not a bridge port */
@@ -133,12 +138,16 @@ skip:
  */
 static int br_rtm_setlink(struct sk_buff *skb,  struct nlmsghdr *nlh, void *arg)
 {
+	net_t net = skb->sk->sk_net;
 	struct ifinfomsg *ifm;
 	struct nlattr *protinfo;
 	struct net_device *dev;
 	struct net_bridge_port *p;
 	u8 new_state;
 
+	if (!net_eq(net, init_net()))
+		return -EINVAL;
+
 	if (nlmsg_len(nlh) < sizeof(*ifm))
 		return -EINVAL;
 
diff --git a/net/core/fib_rules.c b/net/core/fib_rules.c
index 2fa2708..00b4148 100644
--- a/net/core/fib_rules.c
+++ b/net/core/fib_rules.c
@@ -163,6 +163,9 @@ int fib_nl_newrule(struct sk_buff *skb, struct nlmsghdr* nlh, void *arg)
 	struct nlattr *tb[FRA_MAX+1];
 	int err = -EINVAL;
 
+	if (!net_eq(net, init_net()))
+		return -EINVAL;
+
 	if (nlh->nlmsg_len < nlmsg_msg_size(sizeof(*frh)))
 		goto errout;
 
@@ -244,12 +247,16 @@ errout:
 
 int fib_nl_delrule(struct sk_buff *skb, struct nlmsghdr* nlh, void *arg)
 {
+	net_t net = skb->sk->sk_net;
 	struct fib_rule_hdr *frh = nlmsg_data(nlh);
 	struct fib_rules_ops *ops = NULL;
 	struct fib_rule *rule;
 	struct nlattr *tb[FRA_MAX+1];
 	int err = -EINVAL;
 
+	if (!net_eq(net, init_net()))
+		return -EINVAL;
+
 	if (nlh->nlmsg_len < nlmsg_msg_size(sizeof(*frh)))
 		goto errout;
 
diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index f5d4f92..d89c6fe 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -1445,6 +1445,9 @@ int neigh_delete(struct sk_buff *skb, struct nlmsghdr *nlh, void *arg)
 	struct net_device *dev = NULL;
 	int err = -EINVAL;
 
+	if (!net_eq(net, init_net()))
+		return -EINVAL;
+
 	if (nlmsg_len(nlh) < sizeof(*ndm))
 		goto out;
 
@@ -1511,6 +1514,9 @@ int neigh_add(struct sk_buff *skb, struct nlmsghdr *nlh, void *arg)
 	struct net_device *dev = NULL;
 	int err;
 
+	if (!net_eq(net, init_net()))
+		return -EINVAL;
+
 	err = nlmsg_parse(nlh, sizeof(*ndm), tb, NDA_MAX, NULL);
 	if (err < 0)
 		goto out;
@@ -1783,11 +1789,15 @@ static struct nla_policy nl_ntbl_parm_policy[NDTPA_MAX+1] __read_mostly = {
 
 int neightbl_set(struct sk_buff *skb, struct nlmsghdr *nlh, void *arg)
 {
+	net_t net = skb->sk->sk_net;
 	struct neigh_table *tbl;
 	struct ndtmsg *ndtmsg;
 	struct nlattr *tb[NDTA_MAX+1];
 	int err;
 
+	if (!net_eq(net, init_net()))
+		return -EINVAL;
+
 	err = nlmsg_parse(nlh, sizeof(*ndtmsg), tb, NDTA_MAX,
 			  nl_neightbl_policy);
 	if (err < 0)
@@ -1907,11 +1917,15 @@ errout:
 
 int neightbl_dump_info(struct sk_buff *skb, struct netlink_callback *cb)
 {
+	net_t net = skb->sk->sk_net;
 	int family, tidx, nidx = 0;
 	int tbl_skip = cb->args[0];
 	int neigh_skip = cb->args[1];
 	struct neigh_table *tbl;
 
+	if (!net_eq(net, init_net()))
+		return 0;
+
 	family = ((struct rtgenmsg *) nlmsg_data(cb->nlh))->rtgen_family;
 
 	read_lock(&neigh_tbl_lock);
@@ -2030,9 +2044,13 @@ out:
 
 int neigh_dump_info(struct sk_buff *skb, struct netlink_callback *cb)
 {
+	net_t net = skb->sk->sk_net;
 	struct neigh_table *tbl;
 	int t, family, s_t;
 
+	if (!net_eq(net, init_net()))
+		return 0;
+
 	read_lock(&neigh_tbl_lock);
 	family = ((struct rtgenmsg *) nlmsg_data(cb->nlh))->rtgen_family;
 	s_t = cb->args[0];
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 5ac07a0..9be586c 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -395,6 +395,9 @@ static int rtnl_dump_ifinfo(struct sk_buff *skb, struct netlink_callback *cb)
 	int s_idx = cb->args[0];
 	struct net_device *dev;
 
+	if (!net_eq(net, init_net()))
+		return 0;
+
 	read_lock(&per_net(dev_base_lock, net));
 	for (dev=per_net(dev_base, net), idx=0; dev; dev = dev->next, idx++) {
 		if (idx < s_idx)
@@ -429,6 +432,9 @@ static int rtnl_setlink(struct sk_buff *skb, struct nlmsghdr *nlh, void *arg)
 	struct nlattr *tb[IFLA_MAX+1];
 	char ifname[IFNAMSIZ];
 
+	if (!net_eq(net, init_net()))
+		return -EINVAL;
+
 	err = nlmsg_parse(nlh, sizeof(*ifm), tb, IFLA_MAX, ifla_policy);
 	if (err < 0)
 		goto errout;
@@ -602,6 +608,9 @@ static int rtnl_getlink(struct sk_buff *skb, struct nlmsghdr* nlh, void *arg)
 	int iw_buf_len = 0;
 	int err;
 
+	if (!net_eq(net, init_net()))
+		return -EINVAL;
+
 	err = nlmsg_parse(nlh, sizeof(*ifm), tb, IFLA_MAX, ifla_policy);
 	if (err < 0)
 		return err;
@@ -650,9 +659,13 @@ errout:
 
 static int rtnl_dump_all(struct sk_buff *skb, struct netlink_callback *cb)
 {
+	net_t net = skb->sk->sk_net;
 	int idx;
 	int s_idx = cb->family;
 
+	if (!net_eq(net, init_net()))
+		return 0;
+
 	if (s_idx == 0)
 		s_idx = 1;
 	for (idx=1; idx<NPROTO; idx++) {
diff --git a/net/decnet/dn_dev.c b/net/decnet/dn_dev.c
index c83c8d1..a09275b 100644
--- a/net/decnet/dn_dev.c
+++ b/net/decnet/dn_dev.c
@@ -648,12 +648,16 @@ static struct nla_policy dn_ifa_policy[IFA_MAX+1] __read_mostly = {
 
 static int dn_nl_deladdr(struct sk_buff *skb, struct nlmsghdr *nlh, void *arg)
 {
+	net_t net = skb->sk->sk_net;
 	struct nlattr *tb[IFA_MAX+1];
 	struct dn_dev *dn_db;
 	struct ifaddrmsg *ifm;
 	struct dn_ifaddr *ifa, **ifap;
 	int err = -EADDRNOTAVAIL;
 
+	if (!net_eq(net, init_net()))
+		goto errout;
+
 	err = nlmsg_parse(nlh, sizeof(*ifm), tb, IFA_MAX, dn_ifa_policy);
 	if (err < 0)
 		goto errout;
@@ -680,6 +684,7 @@ errout:
 
 static int dn_nl_newaddr(struct sk_buff *skb, struct nlmsghdr *nlh, void *arg)
 {
+	net_t net = skb->sk->sk_net;
 	struct nlattr *tb[IFA_MAX+1];
 	struct net_device *dev;
 	struct dn_dev *dn_db;
@@ -687,6 +692,9 @@ static int dn_nl_newaddr(struct sk_buff *skb, struct nlmsghdr *nlh, void *arg)
 	struct dn_ifaddr *ifa;
 	int err;
 
+	if (!net_eq(net, init_net()))
+		return -EINVAL;
+
 	err = nlmsg_parse(nlh, sizeof(*ifm), tb, IFA_MAX, dn_ifa_policy);
 	if (err < 0)
 		return err;
@@ -788,11 +796,15 @@ errout:
 
 static int dn_nl_dump_ifaddr(struct sk_buff *skb, struct netlink_callback *cb)
 {
+	net_t net = skb->sk->sk_net;
 	int idx, dn_idx = 0, skip_ndevs, skip_naddr;
 	struct net_device *dev;
 	struct dn_dev *dn_db;
 	struct dn_ifaddr *ifa;
 
+	if (!net_eq(net, init_net()))
+		return 0;
+
 	skip_ndevs = cb->args[0];
 	skip_naddr = cb->args[1];
 
diff --git a/net/decnet/dn_fib.c b/net/decnet/dn_fib.c
index cc2ab1f..832e1b4 100644
--- a/net/decnet/dn_fib.c
+++ b/net/decnet/dn_fib.c
@@ -503,10 +503,14 @@ static int dn_fib_check_attr(struct rtmsg *r, struct rtattr **rta)
 
 int dn_fib_rtm_delroute(struct sk_buff *skb, struct nlmsghdr *nlh, void *arg)
 {
+	net_t net = skb->sk->sk_net;
 	struct dn_fib_table *tb;
 	struct rtattr **rta = arg;
 	struct rtmsg *r = NLMSG_DATA(nlh);
 
+	if (!net_eq(net, init_net()))
+		return -EINVAL;
+
 	if (dn_fib_check_attr(r, rta))
 		return -EINVAL;
 
@@ -519,10 +523,14 @@ int dn_fib_rtm_delroute(struct sk_buff *skb, struct nlmsghdr *nlh, void *arg)
 
 int dn_fib_rtm_newroute(struct sk_buff *skb, struct nlmsghdr *nlh, void *arg)
 {
+	net_t net = skb->sk->sk_net;
 	struct dn_fib_table *tb;
 	struct rtattr **rta = arg;
 	struct rtmsg *r = NLMSG_DATA(nlh);
 
+	if (!net_eq(net, init_net()))
+		return -EINVAL;
+
 	if (dn_fib_check_attr(r, rta))
 		return -EINVAL;
 
diff --git a/net/decnet/dn_route.c b/net/decnet/dn_route.c
index 9669e50..d942ea0 100644
--- a/net/decnet/dn_route.c
+++ b/net/decnet/dn_route.c
@@ -1528,6 +1528,7 @@ rtattr_failure:
  */
 int dn_cache_getroute(struct sk_buff *in_skb, struct nlmsghdr *nlh, void *arg)
 {
+	net_t net = in_skb->sk->sk_net;
 	struct rtattr **rta = arg;
 	struct rtmsg *rtm = NLMSG_DATA(nlh);
 	struct dn_route *rt = NULL;
@@ -1536,6 +1537,9 @@ int dn_cache_getroute(struct sk_buff *in_skb, struct nlmsghdr *nlh, void *arg)
 	struct sk_buff *skb;
 	struct flowi fl;
 
+	if (!net_eq(net, init_net()))
+		return -EINVAL;
+
 	memset(&fl, 0, sizeof(fl));
 	fl.proto = DNPROTO_NSP;
 
@@ -1613,10 +1617,14 @@ out_free:
  */
 int dn_cache_dump(struct sk_buff *skb, struct netlink_callback *cb)
 {
+	net_t net = skb->sk->sk_net;
 	struct dn_route *rt;
 	int h, s_h;

...

[ Show the rest of the message ]

Report message to a moderator

[PATCH RFC 24/31] net: Make rtnetlink network namespace aware [message #17362 is a reply to message #17338]

Thu, 25 January 2007 19:00

ebiederm
Messages: 1354
Registered: February 2006

Senior Member

From: Eric W. Biederman <ebiederm@xmission.com> - unquoted

After this patch none of the netlink callback support anything
except the initial network namespace but the rtnetlink infrastructure
now handles multiple network namespaces.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 include/linux/rtnetlink.h |    8 ++--
 net/bridge/br_netlink.c   |    4 +-
 net/core/fib_rules.c      |    4 +-
 net/core/neighbour.c      |    4 +-
 net/core/rtnetlink.c      |   74 +++++++++++++++++++++++++++++++++++---------
 net/core/wireless.c       |    5 ++-
 net/decnet/dn_dev.c       |    4 +-
 net/decnet/dn_route.c     |    2 +-
 net/decnet/dn_table.c     |    4 +-
 net/ipv4/devinet.c        |    4 +-
 net/ipv4/fib_semantics.c  |    4 +-
 net/ipv4/ipmr.c           |    4 +-
 net/ipv4/route.c          |    2 +-
 net/ipv6/addrconf.c       |   14 ++++----
 net/ipv6/route.c          |    6 ++--
 net/sched/cls_api.c       |    2 +-
 net/sched/sch_api.c       |    4 +-
 17 files changed, 98 insertions(+), 51 deletions(-)

diff --git a/include/linux/rtnetlink.h b/include/linux/rtnetlink.h
index 4a629ea..6c8281d 100644
--- a/include/linux/rtnetlink.h
+++ b/include/linux/rtnetlink.h
@@ -581,11 +581,11 @@ struct rtnetlink_link
 };
 
 extern struct rtnetlink_link * rtnetlink_links[NPROTO];
-extern int rtnetlink_send(struct sk_buff *skb, u32 pid, u32 group, int echo);
-extern int rtnl_unicast(struct sk_buff *skb, u32 pid);
-extern int rtnl_notify(struct sk_buff *skb, u32 pid, u32 group,
+extern int rtnetlink_send(struct sk_buff *skb, net_t net, u32 pid, u32 group, int echo);
+extern int rtnl_unicast(struct sk_buff *skb, net_t net, u32 pid);
+extern int rtnl_notify(struct sk_buff *skb, net_t net, u32 pid, u32 group,
 		       struct nlmsghdr *nlh, gfp_t flags);
-extern void rtnl_set_sk_err(u32 group, int error);
+extern void rtnl_set_sk_err(net_t net, u32 group, int error);
 extern int rtnetlink_put_metrics(struct sk_buff *skb, u32 *metrics);
 extern int rtnl_put_cacheinfo(struct sk_buff *skb, struct dst_entry *dst,
 			      u32 id, u32 ts, u32 tsage, long expires,
diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c
index 85165a1..372fb18 100644
--- a/net/bridge/br_netlink.c
+++ b/net/bridge/br_netlink.c
@@ -94,10 +94,10 @@ void br_ifinfo_notify(int event, struct net_bridge_port *port)
 	/* failure implies BUG in br_nlmsg_size() */
 	BUG_ON(err < 0);
 
-	err = rtnl_notify(skb, 0, RTNLGRP_LINK, NULL, GFP_ATOMIC);
+	err = rtnl_notify(skb, init_net(), 0, RTNLGRP_LINK, NULL, GFP_ATOMIC);
 errout:
 	if (err < 0)
-		rtnl_set_sk_err(RTNLGRP_LINK, err);
+		rtnl_set_sk_err(init_net(), RTNLGRP_LINK, err);
 }
 
 /*
diff --git a/net/core/fib_rules.c b/net/core/fib_rules.c
index 00b4148..5f65973 100644
--- a/net/core/fib_rules.c
+++ b/net/core/fib_rules.c
@@ -418,10 +418,10 @@ static void notify_rule_change(int event, struct fib_rule *rule,
 	/* failure implies BUG in fib_rule_nlmsg_size() */
 	BUG_ON(err < 0);
 
-	err = rtnl_notify(skb, pid, ops->nlgroup, nlh, GFP_KERNEL);
+	err = rtnl_notify(skb, init_net(), pid, ops->nlgroup, nlh, GFP_KERNEL);
 errout:
 	if (err < 0)
-		rtnl_set_sk_err(ops->nlgroup, err);
+		rtnl_set_sk_err(init_net(), ops->nlgroup, err);
 }
 
 static void attach_rules(struct list_head *rules, struct net_device *dev)
diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index d89c6fe..6f61207 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -2453,10 +2453,10 @@ static void __neigh_notify(struct neighbour *n, int type, int flags)
 	/* failure implies BUG in neigh_nlmsg_size() */
 	BUG_ON(err < 0);
 
-	err = rtnl_notify(skb, 0, RTNLGRP_NEIGH, NULL, GFP_ATOMIC);
+	err = rtnl_notify(skb, init_net(), 0, RTNLGRP_NEIGH, NULL, GFP_ATOMIC);
 errout:
 	if (err < 0)
-		rtnl_set_sk_err(RTNLGRP_NEIGH, err);
+		rtnl_set_sk_err(init_net(), RTNLGRP_NEIGH, err);
 }
 
 void neigh_app_ns(struct neighbour *n)
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 9be586c..29a81bf 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -58,7 +58,7 @@
 #endif	/* CONFIG_NET_WIRELESS_RTNETLINK */
 
 static DEFINE_MUTEX(rtnl_mutex);
-static struct sock *rtnl;
+static DEFINE_PER_NET(struct sock *, rtnl);
 
 void rtnl_lock(void)
 {
@@ -72,9 +72,17 @@ void __rtnl_unlock(void)
 
 void rtnl_unlock(void)
 {
+	net_t net;
 	mutex_unlock(&rtnl_mutex);
-	if (rtnl && rtnl->sk_receive_queue.qlen)
-		rtnl->sk_data_ready(rtnl, 0);
+	
+	net_lock();
+	for_each_net(net) {
+		struct sock *rtnl = per_net(rtnl, net);
+		if (rtnl && rtnl->sk_receive_queue.qlen)
+			rtnl->sk_data_ready(rtnl, 0);
+	}
+	net_unlock();
+
 	netdev_run_todo();
 }
 
@@ -151,8 +159,9 @@ size_t rtattr_strlcpy(char *dest, const struct rtattr *rta, size_t size)
 	return ret;
 }
 
-int rtnetlink_send(struct sk_buff *skb, u32 pid, unsigned group, int echo)
+int rtnetlink_send(struct sk_buff *skb, net_t net, u32 pid, unsigned group, int echo)
 {
+	struct sock *rtnl = per_net(rtnl, net);
 	int err = 0;
 
 	NETLINK_CB(skb).dst_group = group;
@@ -164,14 +173,17 @@ int rtnetlink_send(struct sk_buff *skb, u32 pid, unsigned group, int echo)
 	return err;
 }
 
-int rtnl_unicast(struct sk_buff *skb, u32 pid)
+int rtnl_unicast(struct sk_buff *skb, net_t net, u32 pid)
 {
+	struct sock *rtnl = per_net(rtnl, net);
+
 	return nlmsg_unicast(rtnl, skb, pid);
 }
 
-int rtnl_notify(struct sk_buff *skb, u32 pid, u32 group,
+int rtnl_notify(struct sk_buff *skb, net_t net, u32 pid, u32 group,
 		struct nlmsghdr *nlh, gfp_t flags)
 {
+	struct sock *rtnl = per_net(rtnl, net);
 	int report = 0;
 
 	if (nlh)
@@ -180,8 +192,10 @@ int rtnl_notify(struct sk_buff *skb, u32 pid, u32 group,
 	return nlmsg_notify(rtnl, skb, pid, group, report, flags);
 }
 
-void rtnl_set_sk_err(u32 group, int error)
+void rtnl_set_sk_err(net_t net, u32 group, int error)
 {
+	struct sock *rtnl = per_net(rtnl, net);
+
 	netlink_set_err(rtnl, 0, group, error);
 }
 
@@ -649,7 +663,7 @@ static int rtnl_getlink(struct sk_buff *skb, struct nlmsghdr* nlh, void *arg)
 	/* failure impilies BUG in if_nlmsg_size or wireless_rtnetlink_get */
 	BUG_ON(err < 0);
 
-	err = rtnl_unicast(nskb, NETLINK_CB(skb).pid);
+	err = rtnl_unicast(nskb, net, NETLINK_CB(skb).pid);
 errout:
 	kfree(iw_buf);
 	dev_put(dev);
@@ -698,10 +712,10 @@ void rtmsg_ifinfo(int type, struct net_device *dev, unsigned change)
 	/* failure implies BUG in if_nlmsg_size() */
 	BUG_ON(err < 0);
 
-	err = rtnl_notify(skb, 0, RTNLGRP_LINK, NULL, GFP_KERNEL);
+	err = rtnl_notify(skb, init_net(), 0, RTNLGRP_LINK, NULL, GFP_KERNEL);
 errout:
 	if (err < 0)
-		rtnl_set_sk_err(RTNLGRP_LINK, err);
+		rtnl_set_sk_err(init_net(), RTNLGRP_LINK, err);
 }
 
 /* Protected by RTNL sempahore.  */
@@ -713,6 +727,7 @@ static int rtattr_max;
 static __inline__ int
 rtnetlink_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh, int *errp)
 {
+	net_t net = skb->sk->sk_net;
 	struct rtnetlink_link *link;
 	struct rtnetlink_link *link_tab;
 	int sz_idx, kind;
@@ -767,7 +782,7 @@ rtnetlink_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh, int *errp)
 		if (link->dumpit == NULL)
 			goto err_inval;
 
-		if ((*errp = netlink_dump_start(rtnl, skb, nlh,
+		if ((*errp = netlink_dump_start(per_net(rtnl, net), skb, nlh,
 						link->dumpit, NULL)) != 0) {
 			return -1;
 		}
@@ -875,6 +890,36 @@ static struct notifier_block rtnetlink_dev_notifier = {
 	.notifier_call	= rtnetlink_event,
 };
 
+
+static int rtnetlink_net_init(net_t net)
+{
+	struct sock *sk;
+	sk = netlink_kernel_create(net, NETLINK_ROUTE, RTNLGRP_MAX,
+				   rtnetlink_rcv, THIS_MODULE);
+	if (!sk)
+		return -ENOMEM;
+
+	/* Don't hold an extra reference on the namespace */
+	put_net(sk->sk_net);
+	per_net(rtnl, net) = sk;
+	return 0;
+}
+
+static void rtnetlink_net_exit(net_t net)
+{
+	/* At the last minute lie and say this is a socket for the
+	 * initial network namespace.  So the socket will be safe to
+	 * free.
+	 */
+	per_net(rtnl, net)->sk_net = get_net(init_net());
+	sock_put(per_net(rtnl, net));
+}
+
+static struct pernet_operations rtnetlink_net_ops = {
+	.init = rtnetlink_net_init,
+	.exit = rtnetlink_net_exit,
+};
+
 void __init rtnetlink_init(void)
 {
 	int i;
@@ -887,10 +932,9 @@ void __init rtnetlink_init(void)
 	if (!rta_buf)
 		panic("rtnetlink_init: cannot allocate rta_buf\n");
 
-	rtnl = netlink_kernel_create(init_net(), NETLINK_ROUTE, RTNLGRP_MAX,
-				      rtnetlink_rcv, THIS_MODULE);
-	if (rtnl == NULL)
-		panic("rtnetlink_init: cannot initialize rtnetlink\n");
+	if (register_pernet_subsys(&rtnetlink_net_ops))
+  		panic("rtnetlink_init: cannot initialize rtnetlink\n");
+
 	netlink_set_nonroot(NETLINK_ROUTE, NL_NONROOT_RECV);
 	register_netdevice_notifier(&rtnetlink_dev_notifier);
 	rtnetlink_links[PF_UNSPEC] = link_rtnetlink_table;
diff --git a/net/core/wireless.c b/net/core/wireless.c
index d1418bf..9036359 100644
--- a/net/core/wireless.c
+++ b/net/core/wireless.c
@@ -1935,7 +1935,7 @@ static void wireless_nlevent_process(unsigned long data)
 	struct sk_buff *skb;
 
 	while ((skb = skb_dequeue(&wireless_nlevent_queue)))
-		rtnl_notify(skb, 0, RTNLGRP_LINK, NULL, GFP_ATOMIC);
+		rtnl_notify(skb, init_net(), 0, RTNLGRP_LINK, NULL, GFP_ATOMIC);
 }
 
 static DECLARE_TASKLET(wireless_nlevent_tasklet, wireless_nlevent_process, 0);
@@ -1992,6 +1992,9 @@ static inline void rtmsg_iwinfo(struct net_device *	dev,
 	struct sk_buff *skb;
 	int size = NLMSG_GOODSIZE;
 
+	if (!net_eq(dev->nd_net, init_net()))
+		return;
+
 	skb = alloc_skb(size, GFP_ATOMIC);
 	if (!skb)
 		return;
diff --git a/net/decnet/dn_dev.c b/net/decnet/dn_dev.c
index a09275b..bad972d 100644
--- a/net/decnet/dn_dev.c
+++ b/net/decnet/dn_dev.c
@@ -788,10 +788,10 @@ static void dn_ifaddr_notify(int event, struct dn_ifaddr *ifa)
 	/* failure implies BUG in dn_ifaddr_nlmsg_size() */
 	BUG_ON(err < 0);
 
-	err = rtnl_notify(skb, 0, RTNLGRP_DECnet_IFADDR, NULL, GFP_KERNEL);
+	err = rt

...

[ Show the rest of the message ]

Report message to a moderator

[PATCH RFC 25/31] net: Make wireless netlink event generation handle multiple network namespaces [message #17363 is a reply to message #17338]

Thu, 25 January 2007 19:00

ebiederm
Messages: 1354
Registered: February 2006

Senior Member

From: Eric W. Biederman <ebiederm@xmission.com> - unquoted

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 net/core/wireless.c |   15 ++++++++++-----
 1 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/net/core/wireless.c b/net/core/wireless.c
index 9036359..d534617 100644
--- a/net/core/wireless.c
+++ b/net/core/wireless.c
@@ -1934,8 +1934,13 @@ static void wireless_nlevent_process(unsigned long data)
 {
 	struct sk_buff *skb;
 
-	while ((skb = skb_dequeue(&wireless_nlevent_queue)))
-		rtnl_notify(skb, init_net(), 0, RTNLGRP_LINK, NULL, GFP_ATOMIC);
+	while ((skb = skb_dequeue(&wireless_nlevent_queue))) {
+		struct net_device *dev = skb->dev;
+		net_t net = dev->nd_net;
+		skb->dev = NULL;
+		rtnl_notify(skb, net, 0, RTNLGRP_LINK, NULL, GFP_ATOMIC);
+		dev_put(dev);
+	}
 }
 
 static DECLARE_TASKLET(wireless_nlevent_tasklet, wireless_nlevent_process, 0);
@@ -1992,9 +1997,6 @@ static inline void rtmsg_iwinfo(struct net_device *	dev,
 	struct sk_buff *skb;
 	int size = NLMSG_GOODSIZE;
 
-	if (!net_eq(dev->nd_net, init_net()))
-		return;
-
 	skb = alloc_skb(size, GFP_ATOMIC);
 	if (!skb)
 		return;
@@ -2004,6 +2006,9 @@ static inline void rtmsg_iwinfo(struct net_device *	dev,
 		kfree_skb(skb);
 		return;
 	}
+	/* Remember the device until we are in process context */
+	dev_hold(dev);
+	skb->dev = dev;
 	NETLINK_CB(skb).dst_group = RTNLGRP_LINK;
 	skb_queue_tail(&wireless_nlevent_queue, skb);
 	tasklet_schedule(&wireless_nlevent_tasklet);
-- 
1.4.4.1.g278f

_______________________________________________
Containers mailing list
Containers@lists.osdl.org
https://lists.osdl.org/mailman/listinfo/containers

Report message to a moderator

[PATCH RFC 26/31] net: Make the netlink methods in rtnetlink handle multiple network namespaces [message #17364 is a reply to message #17338]

Thu, 25 January 2007 19:00

ebiederm
Messages: 1354
Registered: February 2006

Senior Member

From: Eric W. Biederman <ebiederm@xmission.com> - unquoted

It turns out after a quick audit that except for removing the checks
there is really nothing to do here.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 net/core/rtnetlink.c |   21 +++------------------
 1 files changed, 3 insertions(+), 18 deletions(-)

diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 29a81bf..0a42258 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -409,9 +409,6 @@ static int rtnl_dump_ifinfo(struct sk_buff *skb, struct netlink_callback *cb)
 	int s_idx = cb->args[0];
 	struct net_device *dev;
 
-	if (!net_eq(net, init_net()))
-		return 0;
-
 	read_lock(&per_net(dev_base_lock, net));
 	for (dev=per_net(dev_base, net), idx=0; dev; dev = dev->next, idx++) {
 		if (idx < s_idx)
@@ -446,9 +443,6 @@ static int rtnl_setlink(struct sk_buff *skb, struct nlmsghdr *nlh, void *arg)
 	struct nlattr *tb[IFLA_MAX+1];
 	char ifname[IFNAMSIZ];
 
-	if (!net_eq(net, init_net()))
-		return -EINVAL;
-
 	err = nlmsg_parse(nlh, sizeof(*ifm), tb, IFLA_MAX, ifla_policy);
 	if (err < 0)
 		goto errout;
@@ -622,9 +616,6 @@ static int rtnl_getlink(struct sk_buff *skb, struct nlmsghdr* nlh, void *arg)
 	int iw_buf_len = 0;
 	int err;
 
-	if (!net_eq(net, init_net()))
-		return -EINVAL;
-
 	err = nlmsg_parse(nlh, sizeof(*ifm), tb, IFLA_MAX, ifla_policy);
 	if (err < 0)
 		return err;
@@ -673,13 +664,9 @@ errout:
 
 static int rtnl_dump_all(struct sk_buff *skb, struct netlink_callback *cb)
 {
-	net_t net = skb->sk->sk_net;
 	int idx;
 	int s_idx = cb->family;
 
-	if (!net_eq(net, init_net()))
-		return 0;
-
 	if (s_idx == 0)
 		s_idx = 1;
 	for (idx=1; idx<NPROTO; idx++) {
@@ -701,6 +688,7 @@ static int rtnl_dump_all(struct sk_buff *skb, struct netlink_callback *cb)
 
 void rtmsg_ifinfo(int type, struct net_device *dev, unsigned change)
 {
+	net_t net = dev->nd_net;
 	struct sk_buff *skb;
 	int err = -ENOBUFS;
 
@@ -712,10 +700,10 @@ void rtmsg_ifinfo(int type, struct net_device *dev, unsigned change)
 	/* failure implies BUG in if_nlmsg_size() */
 	BUG_ON(err < 0);
 
-	err = rtnl_notify(skb, init_net(), 0, RTNLGRP_LINK, NULL, GFP_KERNEL);
+	err = rtnl_notify(skb, net, 0, RTNLGRP_LINK, NULL, GFP_KERNEL);
 errout:
 	if (err < 0)
-		rtnl_set_sk_err(init_net(), RTNLGRP_LINK, err);
+		rtnl_set_sk_err(net, RTNLGRP_LINK, err);
 }
 
 /* Protected by RTNL sempahore.  */
@@ -862,9 +850,6 @@ static int rtnetlink_event(struct notifier_block *this, unsigned long event, voi
 {
 	struct net_device *dev = ptr;
 
-	if (!net_eq(dev->nd_net, init_net()))
-		return NOTIFY_DONE;
-
 	switch (event) {
 	case NETDEV_UNREGISTER:
 		rtmsg_ifinfo(RTM_DELLINK, dev, ~0U);
-- 
1.4.4.1.g278f

_______________________________________________
Containers mailing list
Containers@lists.osdl.org
https://lists.osdl.org/mailman/listinfo/containers

Report message to a moderator

[PATCH RFC 27/31] net: Make the xfrm sysctls per network namespace. [message #17365 is a reply to message #17338]

Thu, 25 January 2007 19:00

ebiederm
Messages: 1354
Registered: February 2006

Senior Member

From: Eric W. Biederman <ebiederm@xmission.com> - unquoted

In particalure I moved:
/proc/sys/net/core/xfrm_aevent_etime
/proc/sys/net/core/xfrm_aevent_rseqth

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 include/net/xfrm.h         |    4 ++--
 net/core/sysctl_net_core.c |   37 ++++++++++++++++++-------------------
 net/xfrm/xfrm_state.c      |    8 ++++----
 net/xfrm/xfrm_user.c       |   10 ++++++----
 4 files changed, 30 insertions(+), 29 deletions(-)

diff --git a/include/net/xfrm.h b/include/net/xfrm.h
index e476541..9b2e727 100644
--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@ -24,8 +24,8 @@
 	MODULE_ALIAS("xfrm-mode-" __stringify(family) "-" __stringify(encap))
 
 extern struct sock *xfrm_nl;
-extern u32 sysctl_xfrm_aevent_etime;
-extern u32 sysctl_xfrm_aevent_rseqth;
+DECLARE_PER_NET(u32, sysctl_xfrm_aevent_etime);
+DECLARE_PER_NET(u32, sysctl_xfrm_aevent_rseqth);
 
 extern struct mutex xfrm_cfg_mutex;
 
diff --git a/net/core/sysctl_net_core.c b/net/core/sysctl_net_core.c
index 76f7a29..90f2a39 100644
--- a/net/core/sysctl_net_core.c
+++ b/net/core/sysctl_net_core.c
@@ -88,24 +88,6 @@ ctl_table core_table[] = {
 		.mode		= 0644,
 		.proc_handler	= &proc_dointvec
 	},
-#ifdef CONFIG_XFRM
-	{
-		.ctl_name	= NET_CORE_AEVENT_ETIME,
-		.procname	= "xfrm_aevent_etime",
-		.data		= &sysctl_xfrm_aevent_etime,
-		.maxlen		= sizeof(u32),
-		.mode		= 0644,
-		.proc_handler	= &proc_dointvec
-	},
-	{
-		.ctl_name	= NET_CORE_AEVENT_RSEQTH,
-		.procname	= "xfrm_aevent_rseqth",
-		.data		= &sysctl_xfrm_aevent_rseqth,
-		.maxlen		= sizeof(u32),
-		.mode		= 0644,
-		.proc_handler	= &proc_dointvec
-	},
-#endif /* CONFIG_XFRM */
 #endif /* CONFIG_NET */
 	{
 		.ctl_name	= NET_CORE_SOMAXCONN,
@@ -127,6 +109,23 @@ ctl_table core_table[] = {
 };
 
 DEFINE_PER_NET(struct ctl_table, multi_core_table[]) = {
-	/* Stub for holding per network namespace sysctls */
+#ifdef CONFIG_XFRM
+	{
+		.ctl_name	= NET_CORE_AEVENT_ETIME,
+		.procname	= "xfrm_aevent_etime",
+		.data		= &__per_net_base(sysctl_xfrm_aevent_etime),
+		.maxlen		= sizeof(u32),
+		.mode		= 0644,
+		.proc_handler	= &proc_dointvec
+	},
+	{
+		.ctl_name	= NET_CORE_AEVENT_RSEQTH,
+		.procname	= "xfrm_aevent_rseqth",
+		.data		= &__per_net_base(sysctl_xfrm_aevent_rseqth),
+		.maxlen		= sizeof(u32),
+		.mode		= 0644,
+		.proc_handler	= &proc_dointvec
+	},
+#endif /* CONFIG_XFRM */
 	{}
 };
diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c
index fdb08d9..3304a2d 100644
--- a/net/xfrm/xfrm_state.c
+++ b/net/xfrm/xfrm_state.c
@@ -27,11 +27,11 @@
 struct sock *xfrm_nl;
 EXPORT_SYMBOL(xfrm_nl);
 
-u32 sysctl_xfrm_aevent_etime = XFRM_AE_ETIME;
-EXPORT_SYMBOL(sysctl_xfrm_aevent_etime);
+DEFINE_PER_NET(u32, sysctl_xfrm_aevent_etime) = XFRM_AE_ETIME;
+EXPORT_PER_NET_SYMBOL(sysctl_xfrm_aevent_etime);
 
-u32 sysctl_xfrm_aevent_rseqth = XFRM_AE_SEQT_SIZE;
-EXPORT_SYMBOL(sysctl_xfrm_aevent_rseqth);
+DEFINE_PER_NET(u32, sysctl_xfrm_aevent_rseqth) = XFRM_AE_SEQT_SIZE;
+EXPORT_PER_NET_SYMBOL(sysctl_xfrm_aevent_rseqth);
 
 /* Each xfrm_state may be linked to two tables:
 
diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c
index 55affa7..15e962b 100644
--- a/net/xfrm/xfrm_user.c
+++ b/net/xfrm/xfrm_user.c
@@ -375,7 +375,8 @@ error:
 	return err;
 }
 
-static struct xfrm_state *xfrm_state_construct(struct xfrm_usersa_info *p,
+static struct xfrm_state *xfrm_state_construct(net_t net,
+					       struct xfrm_usersa_info *p,
 					       struct rtattr **xfrma,
 					       int *errp)
 {
@@ -411,9 +412,9 @@ static struct xfrm_state *xfrm_state_construct(struct xfrm_usersa_info *p,
 		goto error;
 
 	x->km.seq = p->seq;
-	x->replay_maxdiff = sysctl_xfrm_aevent_rseqth;
+	x->replay_maxdiff = per_net(sysctl_xfrm_aevent_rseqth, net);
 	/* sysctl_xfrm_aevent_etime is in 100ms units */
-	x->replay_maxage = (sysctl_xfrm_aevent_etime*HZ)/XFRM_AE_ETH_M;
+	x->replay_maxage = (per_net(sysctl_xfrm_aevent_etime, net)*HZ)/XFRM_AE_ETH_M;
 	x->preplay.bitmap = 0;
 	x->preplay.seq = x->replay.seq+x->replay_maxdiff;
 	x->preplay.oseq = x->replay.oseq +x->replay_maxdiff;
@@ -437,6 +438,7 @@ error_no_put:
 static int xfrm_add_sa(struct sk_buff *skb, struct nlmsghdr *nlh,
 		struct rtattr **xfrma)
 {
+	net_t net = skb->sk->sk_net;
 	struct xfrm_usersa_info *p = NLMSG_DATA(nlh);
 	struct xfrm_state *x;
 	int err;
@@ -446,7 +448,7 @@ static int xfrm_add_sa(struct sk_buff *skb, struct nlmsghdr *nlh,
 	if (err)
 		return err;
 
-	x = xfrm_state_construct(p, xfrma, &err);
+	x = xfrm_state_construct(net, p, xfrma, &err);
 	if (!x)
 		return err;
 
-- 
1.4.4.1.g278f

_______________________________________________
Containers mailing list
Containers@lists.osdl.org
https://lists.osdl.org/mailman/listinfo/containers

Report message to a moderator

[PATCH RFC 28/31] net: Make the SOMAXCONN sysctl per network namespace [message #17366 is a reply to message #17338]

Thu, 25 January 2007 19:00

ebiederm
Messages: 1354
Registered: February 2006

Senior Member

From: Eric W. Biederman <ebiederm@xmission.com> - unquoted

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 include/linux/socket.h     |    3 ++-
 net/core/sysctl_net_core.c |   16 ++++++++--------
 net/socket.c               |    7 ++++---
 3 files changed, 14 insertions(+), 12 deletions(-)

diff --git a/include/linux/socket.h b/include/linux/socket.h
index 92cd38e..aa159ea 100644
--- a/include/linux/socket.h
+++ b/include/linux/socket.h
@@ -23,8 +23,9 @@ struct __kernel_sockaddr_storage {
 #include <linux/uio.h>			/* iovec support		*/
 #include <linux/types.h>		/* pid_t			*/
 #include <linux/compiler.h>		/* __user			*/
+#include <linux/net_namespace_type.h>
 
-extern int sysctl_somaxconn;
+DECLARE_PER_NET(int, sysctl_somaxconn);
 #ifdef CONFIG_PROC_FS
 struct seq_file;
 extern void socket_seq_show(struct seq_file *seq);
diff --git a/net/core/sysctl_net_core.c b/net/core/sysctl_net_core.c
index 90f2a39..14eca68 100644
--- a/net/core/sysctl_net_core.c
+++ b/net/core/sysctl_net_core.c
@@ -90,14 +90,6 @@ ctl_table core_table[] = {
 	},
 #endif /* CONFIG_NET */
 	{
-		.ctl_name	= NET_CORE_SOMAXCONN,
-		.procname	= "somaxconn",
-		.data		= &sysctl_somaxconn,
-		.maxlen		= sizeof(int),
-		.mode		= 0644,
-		.proc_handler	= &proc_dointvec
-	},
-	{
 		.ctl_name	= NET_CORE_BUDGET,
 		.procname	= "netdev_budget",
 		.data		= &netdev_budget,
@@ -127,5 +119,13 @@ DEFINE_PER_NET(struct ctl_table, multi_core_table[]) = {
 		.proc_handler	= &proc_dointvec
 	},
 #endif /* CONFIG_XFRM */
+	{
+		.ctl_name	= NET_CORE_SOMAXCONN,
+		.procname	= "somaxconn",
+		.data		= &__per_net_base(sysctl_somaxconn),
+		.maxlen		= sizeof(int),
+		.mode		= 0644,
+		.proc_handler	= &proc_dointvec
+	},
 	{}
 };
diff --git a/net/socket.c b/net/socket.c
index 7371654..ab2aeea 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -1305,7 +1305,7 @@ asmlinkage long sys_bind(int fd, struct sockaddr __user *umyaddr, int addrlen)
  *	ready for listening.
  */
 
-int sysctl_somaxconn __read_mostly = SOMAXCONN;
+DEFINE_PER_NET(int, sysctl_somaxconn)= SOMAXCONN;
 
 asmlinkage long sys_listen(int fd, int backlog)
 {
@@ -1314,8 +1314,9 @@ asmlinkage long sys_listen(int fd, int backlog)
 
 	sock = sockfd_lookup_light(fd, &err, &fput_needed);
 	if (sock) {
-		if ((unsigned)backlog > sysctl_somaxconn)
-			backlog = sysctl_somaxconn;
+		net_t net = sock->sk->sk_net;
+		if ((unsigned)backlog > per_net(sysctl_somaxconn, net))
+			backlog = per_net(sysctl_somaxconn, net);
 
 		err = security_socket_listen(sock, backlog);
 		if (!err)
-- 
1.4.4.1.g278f

_______________________________________________
Containers mailing list
Containers@lists.osdl.org
https://lists.osdl.org/mailman/listinfo/containers

Report message to a moderator

[PATCH RFC 29/31] net: Make AF_PACKET handle multiple network namespaces [message #17367 is a reply to message #17338]

Thu, 25 January 2007 19:00

ebiederm
Messages: 1354
Registered: February 2006

Senior Member

From: Eric W. Biederman <ebiederm@xmission.com> - unquoted

This is done by making all of the relevant global variables
per network namespace.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 net/packet/af_packet.c |  125 +++++++++++++++++++++++++++++++-----------------
 1 files changed, 81 insertions(+), 44 deletions(-)

diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index 4ac9f9f..c772491 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -152,8 +152,8 @@ dev->hard_header == NULL (ll header is added by device, we cannot control it)
  */
 
 /* List of all packet sockets. */
-static HLIST_HEAD(packet_sklist);
-static DEFINE_RWLOCK(packet_sklist_lock);
+static DEFINE_PER_NET(rwlock_t, packet_sklist_lock);
+static DEFINE_PER_NET(struct hlist_head, packet_sklist);
 
 static atomic_t packet_socks_nr;
 
@@ -264,9 +264,6 @@ static int packet_rcv_spkt(struct sk_buff *skb, struct packet_type *pt, struct n
 	struct sock *sk;
 	struct sockaddr_pkt *spkt;
 
-	if (!net_eq(dev->nd_net, init_net()))
-		goto out;
-
 	/*
 	 *	When we registered the protocol we saved the socket in the data
 	 *	field for just this event.
@@ -288,6 +285,9 @@ static int packet_rcv_spkt(struct sk_buff *skb, struct packet_type *pt, struct n
 	if (skb->pkt_type == PACKET_LOOPBACK)
 		goto out;
 
+	if (!net_eq(dev->nd_net, sk->sk_net))
+		goto out;
+
 	if ((skb = skb_share_check(skb, GFP_ATOMIC)) == NULL)
 		goto oom;
 
@@ -359,7 +359,7 @@ static int packet_sendmsg_spkt(struct kiocb *iocb, struct socket *sock,
 	 */
 
 	saddr->spkt_device[13] = 0;
-	dev = dev_get_by_name(init_net(), saddr->spkt_device);
+	dev = dev_get_by_name(sk->sk_net, saddr->spkt_device);
 	err = -ENODEV;
 	if (dev == NULL)
 		goto out_unlock;
@@ -475,15 +475,15 @@ static int packet_rcv(struct sk_buff *skb, struct packet_type *pt, struct net_de
 	int skb_len = skb->len;
 	unsigned snaplen;
 
-	if (!net_eq(dev->nd_net, init_net()))
-		goto drop;
-
 	if (skb->pkt_type == PACKET_LOOPBACK)
 		goto drop;
 
 	sk = pt->af_packet_priv;
 	po = pkt_sk(sk);
 
+	if (!net_eq(dev->nd_net, sk->sk_net))
+		goto drop;
+
 	skb->dev = dev;
 
 	if (dev->hard_header) {
@@ -583,15 +583,15 @@ static int tpacket_rcv(struct sk_buff *skb, struct packet_type *pt, struct net_d
 	unsigned short macoff, netoff;
 	struct sk_buff *copy_skb = NULL;
 
-	if (!net_eq(dev->nd_net, init_net()))
-		goto drop;
-
 	if (skb->pkt_type == PACKET_LOOPBACK)
 		goto drop;
 
 	sk = pt->af_packet_priv;
 	po = pkt_sk(sk);
 
+	if (!net_eq(dev->nd_net, sk->sk_net))
+		goto drop;
+
 	if (dev->hard_header) {
 		if (sk->sk_type != SOCK_DGRAM)
 			skb_push(skb, skb->data - skb->mac.raw);
@@ -744,7 +744,7 @@ static int packet_sendmsg(struct kiocb *iocb, struct socket *sock,
 	}
 
 
-	dev = dev_get_by_index(init_net(), ifindex);
+	dev = dev_get_by_index(sk->sk_net, ifindex);
 	err = -ENXIO;
 	if (dev == NULL)
 		goto out_unlock;
@@ -817,15 +817,17 @@ static int packet_release(struct socket *sock)
 {
 	struct sock *sk = sock->sk;
 	struct packet_sock *po;
+	net_t net;
 
 	if (!sk)
 		return 0;
 
+	net = sk->sk_net;
 	po = pkt_sk(sk);
 
-	write_lock_bh(&packet_sklist_lock);
+	write_lock_bh(&per_net(packet_sklist_lock, net));
 	sk_del_node_init(sk);
-	write_unlock_bh(&packet_sklist_lock);
+	write_unlock_bh(&per_net(packet_sklist_lock, net));
 
 	/*
 	 *	Unhook packet receive handler.
@@ -943,7 +945,7 @@ static int packet_bind_spkt(struct socket *sock, struct sockaddr *uaddr, int add
 		return -EINVAL;
 	strlcpy(name,uaddr->sa_data,sizeof(name));
 
-	dev = dev_get_by_name(init_net(), name);
+	dev = dev_get_by_name(sk->sk_net, name);
 	if (dev) {
 		err = packet_do_bind(sk, dev, pkt_sk(sk)->num);
 		dev_put(dev);
@@ -971,7 +973,7 @@ static int packet_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len
 
 	if (sll->sll_ifindex) {
 		err = -ENODEV;
-		dev = dev_get_by_index(init_net(), sll->sll_ifindex);
+		dev = dev_get_by_index(sk->sk_net, sll->sll_ifindex);
 		if (dev == NULL)
 			goto out;
 	}
@@ -1000,9 +1002,6 @@ static int packet_create(net_t net, struct socket *sock, int protocol)
 	__be16 proto = (__force __be16)protocol; /* weird, but documented */
 	int err;
 
-	if (!net_eq(net, init_net()))
-		return -EAFNOSUPPORT;
-
 	if (!capable(CAP_NET_RAW))
 		return -EPERM;
 	if (sock->type != SOCK_DGRAM && sock->type != SOCK_RAW
@@ -1052,9 +1051,9 @@ static int packet_create(net_t net, struct socket *sock, int protocol)
 		po->running = 1;
 	}
 
-	write_lock_bh(&packet_sklist_lock);
-	sk_add_node(sk, &packet_sklist);
-	write_unlock_bh(&packet_sklist_lock);
+	write_lock_bh(&per_net(packet_sklist_lock, net));
+	sk_add_node(sk, &per_net(packet_sklist, net));
+	write_unlock_bh(&per_net(packet_sklist_lock, net));
 	return(0);
 out:
 	return err;
@@ -1158,7 +1157,7 @@ static int packet_getname_spkt(struct socket *sock, struct sockaddr *uaddr,
 		return -EOPNOTSUPP;
 
 	uaddr->sa_family = AF_PACKET;
-	dev = dev_get_by_index(init_net(), pkt_sk(sk)->ifindex);
+	dev = dev_get_by_index(sk->sk_net, pkt_sk(sk)->ifindex);
 	if (dev) {
 		strlcpy(uaddr->sa_data, dev->name, 15);
 		dev_put(dev);
@@ -1184,7 +1183,7 @@ static int packet_getname(struct socket *sock, struct sockaddr *uaddr,
 	sll->sll_family = AF_PACKET;
 	sll->sll_ifindex = po->ifindex;
 	sll->sll_protocol = po->num;
-	dev = dev_get_by_index(init_net(), po->ifindex);
+	dev = dev_get_by_index(sk->sk_net, po->ifindex);
 	if (dev) {
 		sll->sll_hatype = dev->type;
 		sll->sll_halen = dev->addr_len;
@@ -1237,7 +1236,7 @@ static int packet_mc_add(struct sock *sk, struct packet_mreq_max *mreq)
 	rtnl_lock();
 
 	err = -ENODEV;
-	dev = __dev_get_by_index(init_net(), mreq->mr_ifindex);
+	dev = __dev_get_by_index(sk->sk_net, mreq->mr_ifindex);
 	if (!dev)
 		goto done;
 
@@ -1291,7 +1290,7 @@ static int packet_mc_drop(struct sock *sk, struct packet_mreq_max *mreq)
 			if (--ml->count == 0) {
 				struct net_device *dev;
 				*mlp = ml->next;
-				dev = dev_get_by_index(init_net(), ml->ifindex);
+				dev = dev_get_by_index(sk->sk_net, ml->ifindex);
 				if (dev) {
 					packet_dev_mc(dev, ml, -1);
 					dev_put(dev);
@@ -1319,7 +1318,7 @@ static void packet_flush_mclist(struct sock *sk)
 		struct net_device *dev;
 
 		po->mclist = ml->next;
-		if ((dev = dev_get_by_index(init_net(), ml->ifindex)) != NULL) {
+		if ((dev = dev_get_by_index(sk->sk_net, ml->ifindex)) != NULL) {
 			packet_dev_mc(dev, ml, -1);
 			dev_put(dev);
 		}
@@ -1438,12 +1437,10 @@ static int packet_notifier(struct notifier_block *this, unsigned long msg, void
 	struct sock *sk;
 	struct hlist_node *node;
 	struct net_device *dev = (struct net_device*)data;
+	net_t net = dev->nd_net;
 
-	if (!net_eq(dev->nd_net, init_net()))
-		return NOTIFY_DONE;
-
-	read_lock(&packet_sklist_lock);
-	sk_for_each(sk, node, &packet_sklist) {
+	read_lock(&per_net(packet_sklist_lock, net));
+	sk_for_each(sk, node, &per_net(packet_sklist, net)) {
 		struct packet_sock *po = pkt_sk(sk);
 
 		switch (msg) {
@@ -1483,7 +1480,7 @@ static int packet_notifier(struct notifier_block *this, unsigned long msg, void
 			break;
 		}
 	}
-	read_unlock(&packet_sklist_lock);
+	read_unlock(&per_net(packet_sklist_lock, net));
 	return NOTIFY_DONE;
 }
 
@@ -1851,12 +1848,12 @@ static struct notifier_block packet_netdev_notifier = {
 };
 
 #ifdef CONFIG_PROC_FS
-static inline struct sock *packet_seq_idx(loff_t off)
+static inline struct sock *packet_seq_idx(net_t net, loff_t off)
 {
 	struct sock *s;
 	struct hlist_node *node;
 
-	sk_for_each(s, node, &packet_sklist) {
+	sk_for_each(s, node, &per_net(packet_sklist, net)) {
 		if (!off--)
 			return s;
 	}
@@ -1865,21 +1862,24 @@ static inline struct sock *packet_seq_idx(loff_t off)
 
 static void *packet_seq_start(struct seq_file *seq, loff_t *pos)
 {
-	read_lock(&packet_sklist_lock);
-	return *pos ? packet_seq_idx(*pos - 1) : SEQ_START_TOKEN;
+	net_t net = net_from_voidp(seq->private);
+	read_lock(&per_net(packet_sklist_lock, net));
+	return *pos ? packet_seq_idx(net, *pos - 1) : SEQ_START_TOKEN;
 }
 
 static void *packet_seq_next(struct seq_file *seq, void *v, loff_t *pos)
 {
+	net_t net = net_from_voidp(seq->private);
 	++*pos;
 	return  (v == SEQ_START_TOKEN) 
-		? sk_head(&packet_sklist) 
+		? sk_head(&per_net(packet_sklist, net)) 
 		: sk_next((struct sock*)v) ;
 }
 
 static void packet_seq_stop(struct seq_file *seq, void *v)
 {
-	read_unlock(&packet_sklist_lock);		
+	net_t net = net_from_voidp(seq->private);
+	read_unlock(&per_net(packet_sklist_lock, net));
 }
 
 static int packet_seq_show(struct seq_file *seq, void *v) 
@@ -1915,7 +1915,22 @@ static struct seq_operations packet_seq_ops = {
 
 static int packet_seq_open(struct inode *inode, struct file *file)
 {
-	return seq_open(file, &packet_seq_ops);
+	struct seq_file *seq;
+	int res;
+	res = seq_open(file, &packet_seq_ops);
+	if (!res) {
+		seq = file->private_data;
+		seq->private = net_to_voidp(get_net(PROC_NET(inode)));
+	}
+	return res;
+}
+
+static int packet_seq_release(struct inode *inode, struct file *file)
+{
+	struct seq_file *seq=  file->private_data;
+	net_t net = net_from_voidp(seq->private);
+	put_net(net);
+	return seq_release(inode, file);
 }
 
 static struct file_operations packet_seq_fops = {
@@ -1923,15 +1938,37 @@ static struct file_operations packet_seq_fops = {
 	.open		= packet_seq_open,
 	.read		= seq_read,
 	.llseek		= seq_lseek,
-	.release	= seq_release,
+	.release	= packet_seq_release,
 };
 
 #endif
 
+static int packet_net_init(net_t net)
+{
+	rwlock_init(&per_net(packet_sklist_lock, net));
+	INIT_HLIST_HEAD(&per_net(packet_sklist, net));
+
+	if (!proc_net_fops_create(net, "packet", 0, &packet_seq_fops

...

[ Show the rest of the message ]

Report message to a moderator

[PATCH RFC 30/31] net: Make AF_UNIX per network namespace safe. [message #17368 is a reply to message #17338]

Thu, 25 January 2007 19:00

ebiederm
Messages: 1354
Registered: February 2006

Senior Member

From: Eric W. Biederman <ebiederm@xmission.com> - unquoted

Because of the global nature of garbage collection, and
because of the cost of per namespace hash tables
unix_socket_table has been kept global.  With a filter
added on lookups so we don't see sockets from the wrong
namespace.

Currently I don't fold the namesapce into the hash so
multiple namespaces using the same socket name will be
guaranateed a hash collision.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 include/net/af_unix.h      |   10 ++--
 net/unix/af_unix.c         |  116 ++++++++++++++++++++++++++++++++------------
 net/unix/sysctl_net_unix.c |   24 +++++----
 3 files changed, 103 insertions(+), 47 deletions(-)

diff --git a/include/net/af_unix.h b/include/net/af_unix.h
index c0398f5..1f40dd2 100644
--- a/include/net/af_unix.h
+++ b/include/net/af_unix.h
@@ -89,12 +89,12 @@ struct unix_sock {
 #define unix_sk(__sk) ((struct unix_sock *)__sk)
 
 #ifdef CONFIG_SYSCTL
-extern int sysctl_unix_max_dgram_qlen;
-extern void unix_sysctl_register(void);
-extern void unix_sysctl_unregister(void);
+DECLARE_PER_NET(int, sysctl_unix_max_dgram_qlen);
+extern void unix_sysctl_register(net_t net);
+extern void unix_sysctl_unregister(net_t net);
 #else
-static inline void unix_sysctl_register(void) {}
-static inline void unix_sysctl_unregister(void) {}
+static inline void unix_sysctl_register(net_t net) {}
+static inline void unix_sysctl_unregister(net_t net) {}
 #endif
 #endif
 #endif
diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 8015a03..3f57cb2 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -118,7 +118,7 @@
 #include <linux/security.h>
 #include <net/net_namespace.h>
 
-int sysctl_unix_max_dgram_qlen __read_mostly = 10;
+DEFINE_PER_NET(int, sysctl_unix_max_dgram_qlen) = 10;
 
 struct hlist_head unix_socket_table[UNIX_HASH_SIZE + 1];
 DEFINE_SPINLOCK(unix_table_lock);
@@ -245,7 +245,8 @@ static inline void unix_insert_socket(struct hlist_head *list, struct sock *sk)
 	spin_unlock(&unix_table_lock);
 }
 
-static struct sock *__unix_find_socket_byname(struct sockaddr_un *sunname,
+static struct sock *__unix_find_socket_byname(net_t net,
+					      struct sockaddr_un *sunname,
 					      int len, int type, unsigned hash)
 {
 	struct sock *s;
@@ -254,6 +255,9 @@ static struct sock *__unix_find_socket_byname(struct sockaddr_un *sunname,
 	sk_for_each(s, node, &unix_socket_table[hash ^ type]) {
 		struct unix_sock *u = unix_sk(s);
 
+		if (!net_eq(s->sk_net, net))
+			continue;
+
 		if (u->addr->len == len &&
 		    !memcmp(u->addr->name, sunname, len))
 			goto found;
@@ -263,21 +267,22 @@ found:
 	return s;
 }
 
-static inline struct sock *unix_find_socket_byname(struct sockaddr_un *sunname,
+static inline struct sock *unix_find_socket_byname(net_t net,
+						   struct sockaddr_un *sunname,
 						   int len, int type,
 						   unsigned hash)
 {
 	struct sock *s;
 
 	spin_lock(&unix_table_lock);
-	s = __unix_find_socket_byname(sunname, len, type, hash);
+	s = __unix_find_socket_byname(net, sunname, len, type, hash);
 	if (s)
 		sock_hold(s);
 	spin_unlock(&unix_table_lock);
 	return s;
 }
 
-static struct sock *unix_find_socket_byinode(struct inode *i)
+static struct sock *unix_find_socket_byinode(net_t net, struct inode *i)
 {
 	struct sock *s;
 	struct hlist_node *node;
@@ -287,6 +292,9 @@ static struct sock *unix_find_socket_byinode(struct inode *i)
 		    &unix_socket_table[i->i_ino & (UNIX_HASH_SIZE - 1)]) {
 		struct dentry *dentry = unix_sk(s)->dentry;
 
+		if (!net_eq(s->sk_net, net))
+			continue;
+
 		if(dentry && dentry->d_inode == i)
 		{
 			sock_hold(s);
@@ -588,7 +596,7 @@ static struct sock * unix_create1(net_t net, struct socket *sock)
 				&af_unix_sk_receive_queue_lock_key);
 
 	sk->sk_write_space	= unix_write_space;
-	sk->sk_max_ack_backlog	= sysctl_unix_max_dgram_qlen;
+	sk->sk_max_ack_backlog	= per_net(sysctl_unix_max_dgram_qlen, net);
 	sk->sk_destruct		= unix_sock_destructor;
 	u	  = unix_sk(sk);
 	u->dentry = NULL;
@@ -604,9 +612,6 @@ out:
 
 static int unix_create(net_t net, struct socket *sock, int protocol)
 {
-	if (!net_eq(net, init_net()))
-		return -EAFNOSUPPORT;
-
 	if (protocol && protocol != PF_UNIX)
 		return -EPROTONOSUPPORT;
 
@@ -650,6 +655,7 @@ static int unix_release(struct socket *sock)
 static int unix_autobind(struct socket *sock)
 {
 	struct sock *sk = sock->sk;
+	net_t net = sk->sk_net;
 	struct unix_sock *u = unix_sk(sk);
 	static u32 ordernum = 1;
 	struct unix_address * addr;
@@ -676,7 +682,7 @@ retry:
 	spin_lock(&unix_table_lock);
 	ordernum = (ordernum+1)&0xFFFFF;
 
-	if (__unix_find_socket_byname(addr->name, addr->len, sock->type,
+	if (__unix_find_socket_byname(net, addr->name, addr->len, sock->type,
 				      addr->hash)) {
 		spin_unlock(&unix_table_lock);
 		/* Sanity yield. It is unusual case, but yet... */
@@ -696,7 +702,8 @@ out:	mutex_unlock(&u->readlock);
 	return err;
 }
 
-static struct sock *unix_find_other(struct sockaddr_un *sunname, int len,
+static struct sock *unix_find_other(net_t net,
+				    struct sockaddr_un *sunname, int len,
 				    int type, unsigned hash, int *error)
 {
 	struct sock *u;
@@ -714,7 +721,7 @@ static struct sock *unix_find_other(struct sockaddr_un *sunname, int len,
 		err = -ECONNREFUSED;
 		if (!S_ISSOCK(nd.dentry->d_inode->i_mode))
 			goto put_fail;
-		u=unix_find_socket_byinode(nd.dentry->d_inode);
+		u=unix_find_socket_byinode(net, nd.dentry->d_inode);
 		if (!u)
 			goto put_fail;
 
@@ -730,7 +737,7 @@ static struct sock *unix_find_other(struct sockaddr_un *sunname, int len,
 		}
 	} else {
 		err = -ECONNREFUSED;
-		u=unix_find_socket_byname(sunname, len, type, hash);
+		u=unix_find_socket_byname(net, sunname, len, type, hash);
 		if (u) {
 			struct dentry *dentry;
 			dentry = unix_sk(u)->dentry;
@@ -752,6 +759,7 @@ fail:
 static int unix_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len)
 {
 	struct sock *sk = sock->sk;
+	net_t net = sk->sk_net;
 	struct unix_sock *u = unix_sk(sk);
 	struct sockaddr_un *sunaddr=(struct sockaddr_un *)uaddr;
 	struct dentry * dentry = NULL;
@@ -826,7 +834,7 @@ static int unix_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len)
 
 	if (!sunaddr->sun_path[0]) {
 		err = -EADDRINUSE;
-		if (__unix_find_socket_byname(sunaddr, addr_len,
+		if (__unix_find_socket_byname(net, sunaddr, addr_len,
 					      sk->sk_type, hash)) {
 			unix_release_addr(addr);
 			goto out_unlock;
@@ -867,6 +875,7 @@ static int unix_dgram_connect(struct socket *sock, struct sockaddr *addr,
 			      int alen, int flags)
 {
 	struct sock *sk = sock->sk;
+	net_t net = sk->sk_net;
 	struct sockaddr_un *sunaddr=(struct sockaddr_un*)addr;
 	struct sock *other;
 	unsigned hash;
@@ -882,7 +891,7 @@ static int unix_dgram_connect(struct socket *sock, struct sockaddr *addr,
 		    !unix_sk(sk)->addr && (err = unix_autobind(sock)) != 0)
 			goto out;
 
-		other=unix_find_other(sunaddr, alen, sock->type, hash, &err);
+		other=unix_find_other(net, sunaddr, alen, sock->type, hash, &err);
 		if (!other)
 			goto out;
 
@@ -955,6 +964,7 @@ static int unix_stream_connect(struct socket *sock, struct sockaddr *uaddr,
 {
 	struct sockaddr_un *sunaddr=(struct sockaddr_un *)uaddr;
 	struct sock *sk = sock->sk;
+	net_t net = sk->sk_net;
 	struct unix_sock *u = unix_sk(sk), *newu, *otheru;
 	struct sock *newsk = NULL;
 	struct sock *other = NULL;
@@ -994,7 +1004,7 @@ static int unix_stream_connect(struct socket *sock, struct sockaddr *uaddr,
 
 restart:
 	/*  Find listening sock. */
-	other = unix_find_other(sunaddr, addr_len, sk->sk_type, hash, &err);
+	other = unix_find_other(net, sunaddr, addr_len, sk->sk_type, hash, &err);
 	if (!other)
 		goto out;
 
@@ -1273,6 +1283,7 @@ static int unix_dgram_sendmsg(struct kiocb *kiocb, struct socket *sock,
 {
 	struct sock_iocb *siocb = kiocb_to_siocb(kiocb);
 	struct sock *sk = sock->sk;
+	net_t net = sk->sk_net;
 	struct unix_sock *u = unix_sk(sk);
 	struct sockaddr_un *sunaddr=msg->msg_name;
 	struct sock *other = NULL;
@@ -1336,7 +1347,7 @@ restart:
 		if (sunaddr == NULL)
 			goto out_free;
 
-		other = unix_find_other(sunaddr, namelen, sk->sk_type,
+		other = unix_find_other(net, sunaddr, namelen, sk->sk_type,
 					hash, &err);
 		if (other==NULL)
 			goto out_free;
@@ -1935,12 +1946,18 @@ static unsigned int unix_poll(struct file * file, struct socket *sock, poll_tabl
 
 
 #ifdef CONFIG_PROC_FS
-static struct sock *unix_seq_idx(int *iter, loff_t pos)
+struct unix_iter_state {
+	net_t net;
+	int i;
+};
+static struct sock *unix_seq_idx(struct unix_iter_state *iter, loff_t pos)
 {
 	loff_t off = 0;
 	struct sock *s;
 
-	for (s = first_unix_socket(iter); s; s = next_unix_socket(iter, s)) {
+	for (s = first_unix_socket(&iter->i); s; s = next_unix_socket(&iter->i, s)) {
+		if (!net_eq(s->sk_net, iter->net))
+			continue;
 		if (off == pos) 
 			return s;
 		++off;
@@ -1951,17 +1968,24 @@ static struct sock *unix_seq_idx(int *iter, loff_t pos)
 
 static void *unix_seq_start(struct seq_file *seq, loff_t *pos)
 {
+	struct unix_iter_state *iter = seq->private;
 	spin_lock(&unix_table_lock);
-	return *pos ? unix_seq_idx(seq->private, *pos - 1) : ((void *) 1);
+	return *pos ? unix_seq_idx(iter, *pos - 1) : ((void *) 1);
 }
 
 static void *unix_seq_next(struct seq_file *seq, void *v, loff_t *pos)
 {
+	struct unix_iter_state *iter = seq->private;
+	struct sock *sk = v;
 	++*pos;
 
 	if (v == (void *)1) 
-		return first_unix_socket(seq->private);
-	return next_unix_socket(seq->private, v);
+		sk = first_unix_socket(&iter->i);
+	else
+		sk = next_unix_socket(&iter->i, sk);
+	while (sk && !net_eq(sk->sk_net, iter->net))
+		sk = next_unix_socket(&am

...

[ Show the rest of the message ]

Report message to a moderator

[PATCH RFC 31/31] net: Add etun driver [message #17369 is a reply to message #17338]

Thu, 25 January 2007 19:00

ebiederm
Messages: 1354
Registered: February 2006

Senior Member

From: Eric W. Biederman <ebiederm@xmission.com> - unquoted

etun is a simple two headed tunnel driver that at the link layer
looks like ethernet.  It's target audience is communicating
between network namespaces but it is general enough it may
have other uses as well.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 drivers/net/Kconfig  |   14 ++
 drivers/net/Makefile |    1 +
 drivers/net/etun.c   |  470 ++++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 485 insertions(+), 0 deletions(-)

diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
index 8aa8dd0..969d3df 100644
--- a/drivers/net/Kconfig
+++ b/drivers/net/Kconfig
@@ -119,6 +119,20 @@ config TUN
 
 	  If you don't know what to use this for, you don't need it.
 
+config ETUN
+	tristate "Ethernet tunnel device driver support"
+	depends on SYSFS
+	---help---
+	  ETUN provices a pair of network devices that can be used for
+	  configuring interesting topolgies.  What one devices transmits
+	  the other receives and vice versa.  The link level framing
+	  is ethernet for wide compatibility with network stacks.
+
+	  To compile this driver as a module, choose M here: the module
+	  will be called etun.
+
+	  If you don't know what to use this for, you don't need it.
+
 config NET_SB1000
 	tristate "General Instruments Surfboard 1000"
 	depends on PNP
diff --git a/drivers/net/Makefile b/drivers/net/Makefile
index 4c0d4e5..396af4f 100644
--- a/drivers/net/Makefile
+++ b/drivers/net/Makefile
@@ -185,6 +185,7 @@ obj-$(CONFIG_MACSONIC) += macsonic.o
 obj-$(CONFIG_MACMACE) += macmace.o
 obj-$(CONFIG_MAC89x0) += mac89x0.o
 obj-$(CONFIG_TUN) += tun.o
+obj-$(CONFIG_ETUN) += etun.o
 obj-$(CONFIG_NET_NETX) += netx-eth.o
 obj-$(CONFIG_DL2K) += dl2k.o
 obj-$(CONFIG_R8169) += r8169.o
diff --git a/drivers/net/etun.c b/drivers/net/etun.c
new file mode 100644
index 0000000..1dd8cd8
--- /dev/null
+++ b/drivers/net/etun.c
@@ -0,0 +1,470 @@
+/*
+ *  ETUN - Universal ETUN device driver.
+ *  Copyright (C) 2006 Linux Networx
+ *
+ */
+
+#define DRV_NAME	"etun"
+#define DRV_VERSION	"1.0"
+#define DRV_DESCRIPTION	"Ethernet pseudo tunnel device driver"
+#define DRV_COPYRIGHT	"(C) 2007 Linux Networx"
+
+#include <linux/module.h>
+#include <linux/kernel.h>
+#include <linux/list.h>
+#include <linux/spinlock.h>
+#include <linux/skbuff.h>
+#include <linux/netdevice.h>
+#include <linux/etherdevice.h>
+#include <linux/ethtool.h>
+#include <linux/rtnetlink.h>
+#include <linux/if.h>
+#include <linux/if_ether.h>
+#include <linux/ctype.h>
+#include <net/net_namespace.h>
+#include <net/dst.h>
+
+
+/* Device cheksum strategy.
+ *
+ * etun is designed to a be a pair of virutal devices
+ * connecting two network stack instances.  
+ * 
+ * Typically it will either be used with ethernet bridging or
+ * it will be used to route packets between the two stacks.
+ * 
+ * The only checksum offloading I can do is to completely
+ * skip the checksumming step all together.
+ *
+ * When used for ethernet bridging I don't believe any
+ * checksum off loading is safe.  
+ * - If my source is an external interface the checksum may be
+ *   invalid so I don't want to report I have already checked it.
+ * - If my destination is an external interface I don't want to put
+ *   a packet on the wire with someone computing the checksum.
+ *
+ * When used for routing between two stacks checksums should
+ * be as unnecessary as they are on the loopback device.
+ *
+ * So by default I am safe and disable checksumming and
+ * other advanced features like SG and TSO.
+ *
+ * However because I think these features could be useful
+ * I provide the ethtool functions to and enable/disable
+ * them at runtime.
+ *
+ * If you think you can correctly enable these go ahead.
+ * For checksums both the transmitter and the receiver must
+ * agree before the are actually disabled.
+ */
+
+#define ETUN_NUM_STATS 1
+static struct {
+	const char string[ETH_GSTRING_LEN];
+} ethtool_stats_keys[ETUN_NUM_STATS] = {
+	{ "partner_ifindex" },
+};
+
+struct etun_info {
+	struct net_device 	*rx_dev;
+	unsigned		ip_summed;
+	struct net_device_stats	stats;
+	struct list_head	list;
+	struct net_device	*dev;
+};
+
+/*
+ * I have to hold the rtnl_lock during device delete.
+ * So I use the rtnl_lock to protect my list manipulations
+ * as well.  Crude but simple.
+ */
+static LIST_HEAD(etun_list);
+
+/*
+ * The higher levels take care of making this non-reentrant (it's
+ * called with bh's disabled).
+ */
+static int etun_xmit(struct sk_buff *skb, struct net_device *tx_dev)
+{
+	struct etun_info *tx_info = tx_dev->priv;
+	struct net_device *rx_dev = tx_info->rx_dev;
+	struct etun_info *rx_info = rx_dev->priv;
+
+	tx_info->stats.tx_packets++;
+	tx_info->stats.tx_bytes += skb->len;
+
+	/* Drop the skb state that was needed to get here */
+	skb_orphan(skb);
+	if (skb->dst)
+		skb->dst = dst_pop(skb->dst);	/* Allow for smart routing */
+	
+	/* Switch to the receiving device */
+	skb->pkt_type = PACKET_HOST;
+	skb->protocol = eth_type_trans(skb, rx_dev);
+	skb->dev = rx_dev;
+	skb->ip_summed = CHECKSUM_NONE;
+
+	/* If both halves agree no checksum is needed */
+	if (tx_dev->features & NETIF_F_NO_CSUM)
+		skb->ip_summed = rx_info->ip_summed;
+
+	rx_dev->last_rx = jiffies; 
+	rx_info->stats.rx_packets++;
+	rx_info->stats.rx_bytes += skb->len;
+	netif_rx(skb);
+
+	return 0;
+}
+
+static struct net_device_stats *etun_get_stats(struct net_device *dev)
+{
+	struct etun_info *info = dev->priv;
+	return &info->stats;
+}
+
+/* ethtool interface */
+static int etun_get_settings(struct net_device *dev, struct ethtool_cmd *cmd)
+{
+	cmd->supported		= 0;
+	cmd->advertising	= 0;
+	cmd->speed		= SPEED_10000; /* Memory is fast! */
+	cmd->duplex		= DUPLEX_FULL;
+	cmd->port		= PORT_TP;
+	cmd->phy_address	= 0;
+	cmd->transceiver	= XCVR_INTERNAL;
+	cmd->autoneg		= AUTONEG_DISABLE;
+	cmd->maxtxpkt		= 0;
+	cmd->maxrxpkt		= 0;
+	return 0;
+}
+
+static void etun_get_drvinfo(struct net_device *dev, struct ethtool_drvinfo *info)
+{
+	strcpy(info->driver, DRV_NAME);
+	strcpy(info->version, DRV_VERSION);
+	strcpy(info->fw_version, "N/A");
+}
+
+static void etun_get_strings(struct net_device *dev, u32 stringset, u8 *buf)
+{
+	switch(stringset) {
+	case ETH_SS_STATS:
+		memcpy(buf, &ethtool_stats_keys, sizeof(ethtool_stats_keys));
+		break;
+	case ETH_SS_TEST:
+	default:
+		break;
+	}
+}
+
+static int etun_get_stats_count(struct net_device *dev)
+{
+	return ETUN_NUM_STATS;
+}
+
+static void etun_get_ethtool_stats(struct net_device *dev,
+	struct ethtool_stats *stats, u64 *data)
+{
+	struct etun_info *info = dev->priv;
+
+	data[0] = info->rx_dev->ifindex;
+}
+
+static u32 etun_get_rx_csum(struct net_device *dev)
+{
+	struct etun_info *info = dev->priv;
+	return info->ip_summed == CHECKSUM_UNNECESSARY;
+}
+
+static int etun_set_rx_csum(struct net_device *dev, u32 data)
+{
+	struct etun_info *info = dev->priv;
+	
+	info->ip_summed = data ? CHECKSUM_UNNECESSARY : CHECKSUM_NONE;
+
+	return 0;
+}
+
+static u32 etun_get_tx_csum(struct net_device *dev)
+{
+	return (dev->features & NETIF_F_NO_CSUM) != 0;
+}
+
+static int etun_set_tx_csum(struct net_device *dev, u32 data)
+{
+	dev->features &= NETIF_F_NO_CSUM;
+	if (data)
+		dev->features |= NETIF_F_NO_CSUM;
+
+	return 0;
+}
+
+static struct ethtool_ops etun_ethtool_ops = {
+	.get_settings		= etun_get_settings,
+	.get_drvinfo		= etun_get_drvinfo,
+	.get_link		= ethtool_op_get_link,
+	.get_rx_csum		= etun_get_rx_csum,
+	.set_rx_csum		= etun_set_rx_csum,
+	.get_tx_csum		= etun_get_tx_csum,
+	.set_tx_csum		= etun_set_tx_csum,
+	.get_sg			= ethtool_op_get_sg,
+	.set_sg			= ethtool_op_set_sg,
+#if 0 /* Does just setting the bit successfuly emulate tso? */
+	.get_tso		= ethtool_op_get_tso,
+	.set_tso		= ethtool_op_set_tso,
+#endif
+	.get_strings		= etun_get_strings,
+	.get_stats_count	= etun_get_stats_count,
+	.get_ethtool_stats	= etun_get_ethtool_stats,
+	.get_perm_addr		= ethtool_op_get_perm_addr,
+};
+
+static int etun_open(struct net_device *tx_dev)
+{
+	struct etun_info *tx_info = tx_dev->priv;
+	struct net_device *rx_dev = tx_info->rx_dev;
+	if (rx_dev->flags & IFF_UP) {
+		netif_carrier_on(tx_dev);
+		netif_carrier_on(rx_dev);
+	}
+	netif_start_queue(tx_dev);
+	return 0;
+}
+
+static int etun_stop(struct net_device *tx_dev)
+{
+	struct etun_info *tx_info = tx_dev->priv;
+	struct net_device *rx_dev = tx_info->rx_dev;
+	netif_stop_queue(tx_dev);
+	if (netif_carrier_ok(tx_dev)) {
+		netif_carrier_off(tx_dev);
+		netif_carrier_off(rx_dev);
+	}
+	return 0;
+}
+
+static void etun_set_multicast_list(struct net_device *dev)
+{
+	/* Nothing sane I can do here */
+	return;
+}
+
+static int etun_ioctl(struct net_device *dev, struct ifreq *rq, int cmd)
+{
+	return -EOPNOTSUPP;
+}
+
+/* Only allow letters and numbers in an etun device name */
+static int is_valid_name(const char *name)
+{
+	const char *ptr;
+	for (ptr = name; *ptr; ptr++) {
+		if (!isalnum(*ptr))
+			return 0;
+	}
+	return 1;
+}
+
+static struct net_device *etun_alloc(net_t net, const char *name)
+{
+	struct net_device *dev;
+	struct etun_info *info;
+	int err;
+
+	if (!name || !is_valid_name(name))
+		return ERR_PTR(-EINVAL);
+
+	dev = alloc_netdev(sizeof(struct etun_info), name, ether_setup);
+	if (!dev)
+		return ERR_PTR(-ENOMEM);
+	
+	info = dev->priv;
+	info->dev = dev;
+	dev->nd_net = net;
+
+	random_ether_addr(dev->dev_addr);
+	dev->tx_queue_len	= 0; /* A queue is silly for a loopback device */
+	dev->hard_start_xmit	= etun_xmit;
+	dev->get_stats		= etun_get_stats;
+	dev->open		= etun_open;
+	dev->stop		= etun_stop;
+	dev->set_multicast_list	= etun_set_multicast_list;
+	dev-

...

[ Show the rest of the message ]

Report message to a moderator

Re: [PATCH RFC 1/31] net: Add net_namespace_type.h to allow for per network namespace variables. [message #17370 is a reply to message #17339]

Thu, 25 January 2007 20:30

Stephen Hemminger
Messages: 37
Registered: August 2006

Member

Can all this be a nop if a CONFIG option is not selected?




>
> diff --git a/include/linux/net_namespace_type.h b/include/linux/net_namespace_type.h
> new file mode 100644
> index 0000000..8173f59
> --- /dev/null
> +++ b/include/linux/net_namespace_type.h
> @@ -0,0 +1,52 @@
> +/* 
> + * Definition of the network namespace reference type
> + * And operations upon it.
> + */
> +#ifndef __LINUX_NET_NAMESPACE_TYPE_H
> +#define __LINUX_NET_NAMESPACE_TYPE_H
> +
> +#define __pernetname(name) per_net__##name

Code obfuscation, please don't do that

> +typedef struct {} net_t;

No typedef for this please.

> +
> +#define __data_pernet 
> +
> +/* Look up a per network namespace variable */
> +static inline unsigned long __per_net_offset(net_t net) { return 0; }
> +
> +/* Like per_net but returns a pseudo variable address that must be moved
> + * __per_net_offset() bytes before it will point to a real variable.
> + * Useful for static initializers.
> + */
> +#define __per_net_base(name)   __pernetname(name)
> +
> +/* Get the network namespace reference from a per_net variable address */
> +#define net_of(ptr, name) ({ net_t net; ptr; net; })
> +
> +/* Look up a per network namespace variable */
> +#define per_net(name, net) \
> +	(*(__per_net_offset(net), &__per_net_base(name)))
> + 
> +/* Are the two network namespaces the same */
> +static inline int net_eq(net_t a, net_t b) { return 1; }
> +/* Get an unsigned value appropriate for hashing the network namespace */
> +static inline unsigned int net_hval(net_t net) { return 0; }
> +
> +/* Convert to and from to and from void pointers */
> +static inline void *net_to_voidp(net_t net) { return NULL; }
> +static inline net_t net_from_voidp(void *ptr) { net_t net; return net; }
> +
> +static inline int null_net(net_t net) { return 0; }
> +
> +#define DEFINE_PER_NET(type, name) 			\
> +	__data_pernet __typeof__(type) __pernetname(name)
> +
> +#define DECLARE_PER_NET(type, name) \
> +	extern __typeof__(type) __pernetname(name)
> +
> +#define EXPORT_PER_NET_SYMBOL(var)	\
> +	EXPORT_SYMBOL(__pernetname(var))
> +#define EXPORT_PER_NET_SYMBOL_GPL(var)	\
> +	EXPORT_SYMBOL_GPL(__pernetname(var))
> +
> +#endif /* __LINUX_NET_NAMESPACE_TYPE_H */


-- 
Stephen Hemminger <shemminger@linux-foundation.org>
_______________________________________________
Containers mailing list
Containers@lists.osdl.org
https://lists.osdl.org/mailman/listinfo/containers

Report message to a moderator

Re: [PATCH RFC 1/31] net: Add net_namespace_type.h to allow for per network namespace variables. [message #17372 is a reply to message #17370]

Thu, 25 January 2007 20:53

ebiederm
Messages: 1354
Registered: February 2006

Senior Member

Stephen Hemminger <shemminger@linux-foundation.org> writes:

> Can all this be a nop if a CONFIG option is not selected?

That is exactly what this infrastructure supports.
What you see is the version that comes into effect when
the CONFIG option is not selected.

>From using an empty structure to replace a pointer to make
that a NOP to most of the rest below.

>> diff --git a/include/linux/net_namespace_type.h
> b/include/linux/net_namespace_type.h
>> new file mode 100644
>> index 0000000..8173f59
>> --- /dev/null
>> +++ b/include/linux/net_namespace_type.h
>> @@ -0,0 +1,52 @@
>> +/* 
>> + * Definition of the network namespace reference type
>> + * And operations upon it.
>> + */
>> +#ifndef __LINUX_NET_NAMESPACE_TYPE_H
>> +#define __LINUX_NET_NAMESPACE_TYPE_H
>> +
>> +#define __pernetname(name) per_net__##name
>
> Code obfuscation, please don't do that

Single point of making the naming rules, better maintenance.
The basic point is that variables that come through this path
you should not access directly.  Tweaking the name enforces that
even in the compiled out state.

>> +typedef struct {} net_t;
>
> No typedef for this please.

Why.  That is conventially how we do opaque types in linux
when someone is doing something sophisticated.

You probably want to look down to patch 21 to see what the compiled
in version of these look like.

Eric
_______________________________________________
Containers mailing list
Containers@lists.osdl.org
https://lists.osdl.org/mailman/listinfo/containers

Report message to a moderator

Re: [PATCH RFC 22/31] net: Add network namespace clone support. [message #17530 is a reply to message #17360]

Wed, 28 February 2007 15:05

ebiederm
Messages: 1354
Registered: February 2006

Senior Member

Daniel Lezcano <dlezcano@fr.ibm.com> writes:


>> +
>> +	mutex_lock(&net_mutex);
>> +	err = setup_net(new_net);
>> +	if (err)
>> +		goto out_unlock;
>>
> Should we "net_free" in case of error ?

Oops.  Yes we should.
Thanks.

>> +	net_lock();
>> +	net_list_append(new_net);
>> +	net_unlock();
>> +
>> +	tsk->nsproxy->net_ns = new_net;
>> +
>> +out_unlock:
>> +	mutex_unlock(&net_mutex);
	net_free(new_net);
>> +out:
>> +	put_net(old_net);
>> +	return err;
>> +}
>> +
>>

Eric
_______________________________________________
Containers mailing list
Containers@lists.osdl.org
https://lists.osdl.org/mailman/listinfo/containers

Report message to a moderator

Re: [PATCH RFC 18/31] net: Implment network device movement between namespaces [message #17531 is a reply to message #17356]

Wed, 28 February 2007 15:12

ebiederm
Messages: 1354
Registered: February 2006

Senior Member

Daniel Lezcano <dlezcano@fr.ibm.com> writes:

> Eric W. Biederman wrote:
>> From: Eric W. Biederman <ebiederm@xmission.com> - unquoted
>>
>> This patch introduces NETIF_F_NETNS_LOCAL a flag to indicate
>> a network device is local to a single network namespace and
>> should never be moved.  Useful for pseudo devices that we
>> need an instance in each network namespace (like the loopback
>> device) and for any device we find that cannot handle multiple
>> network namespaces so we may trap them in the initial network
>> namespace.
>>
>> This patch introduces the function dev_change_net_namespace
>> a function used to move a network device from one network
>> namespace to another.  To the network device nothing
>> special appears to happen, to the components of the network
>> stack it appears as if the network device was unregistered
>> in the network namespace it is in, and a new device
>> was registered in the network namespace the device
>> was moved to.
>>
>> This patch sets up a namespace device destructor that
>> upon the exit of a network namespace moves all of the
>> movable network devices  to the initial network namespace
>> so they are not lost.
>>
> If you:
> * create etun0/etun1
> * create a namespace
> * move etun1 to this namespace
> *  rename the etun1 to eth0
> *  kill the namespace
>
> the former network device etun1 will be lost if you have in your parent
> namespace an interface eth0 because it will conflict.
> Perhaps, the first name should be restored before moving the device back to the
> initial network namespace ?

Restoration of a previous name is no guarantee of anything.  Someone may have
renamed the some other interface etun1 in the original network namespace.

However if you look closely at the code.  You will discover that if it can't
keep the same name it will rename the device as it switches namespaces.
In particular it will become devN where N is replaced by some unused number.

That is what the pat parameter to dev_change_net_namespace is about.

I'm not exactly thrilled about the generic name but the code should work,
and I don't know if there is a name that makes better sense.


>  -- Daniel
>
> ps : nice patchset

Thanks.

Eric

_______________________________________________
Containers mailing list
Containers@lists.osdl.org
https://lists.osdl.org/mailman/listinfo/containers

Report message to a moderator

Re: [RFC PATCH 0/31] An introduction and A path for merging network namespace work [message #17532 is a reply to message #17338]

Wed, 28 February 2007 19:45

ebiederm
Messages: 1354
Registered: February 2006

Senior Member

Daniel Lezcano <dlezcano@fr.ibm.com> writes:

> Hi Eric,
>
> Do you plan to propose to merge into mainline your patchset ?

I'm hung up at the moment in the sysfs support.  Network device
renaming is broken in 2.6.21-rc2 at the moment.

Then I would like to see the best of etun/veth merged.

After that yes I would like to propose getting a network namespace
implementation into mainline.  Which would like be based on the
patchset I posted.

> Shouldn't we ask netdev guys what they think about the explicit network
> namespace parameter into function you did versus the implicit network context
> using the push_net_ns/pop_net_ns function ?

It is an important question.

My impression is that in the larger context it seems to be a minor
detail.

>From what I have seen of Dmitry's patches and from what I have seen of
my own.  When not talking functions parameters we make roughly the
same set of code changes.  For example in my git tree without
referencing Dmitry's work, I made roughly the same set of changes to
the fib code as he did.

I currently prefer my register_pernet_subsys infrastructure as it is
much easier to deal with then gradually accumulating the code into
the namespace initialization.  There is nothing to clever about it,
so we should be fine.

I think my current per_net() function is questionable (it borders on
being too clever) but it is very straight forward to use.  In fact
I am probably over using it a bit and making my network namespace data
structures a little too big.  If you look at the slab or the
initialization messages you will see I have exceeded page size.  Oops.

The big advantage of my pernet work (as opposed to other techniques)
is that it makes compiling out any the effect of my code possible,
and it allows for pernet variables with file scope.  If continuing to
support compiling out the pernet code isn't a requirement we could
get less clever solution, and just pass a pointer around.

If we do start passing a pointer around there becomes the question of
how do we support modules.  Which is particularly important in the
IPv6 case.  

So long as my per_net() function doesn't cause problems I suspect
it is easier to work with than to work without.

As long as we are supporting compiling things out I think using
net_t instead of a raw pointer makes a lot of sense.  I really like
the fact that using an empty type when we compile things out so gcc
can just optimize everything away, instead of having to ifdef
everything.

The big practical difference between the approaches comes down to
push_net_ns/pop_net_ns, or doing something else to get the argument
where it belongs.  The fact that push_net_ns/pop_net_ns cannot be
universally used in the network stack is a pain.  It means that a
function that can used on both the receive and the transmit path has
to have a network namespace argument. 

The verification that we are doing the right thing with
push_net_ns/pop_net_ns is also harder as we have to check that we have 
the proper value for the entire call chain instead of a check that is
simply local.  For doing the conversion it is not a big pain, as we
need to audit the call chain anyway.  For dealing with future changes
it could be a problem if to verify every little patch we had to check
the entire call chain.

The biggest argument against explicit parameters that I could see in
earlier conversations was that you could not compile them out.  Now
that I have gotten clever and can compile my explicit parameters out
that argument goes away.

The big downside of the explicit parameters is that in some cases
you get a lot of noise patches just to get the value where you need 
it.  So it more difficult to merge and a little more difficult to
maintain out of the tree.

However for long term maintenance there is a big advantage of explicit
parameters as you only need to test a patch to see if it is locally
correct.

So unless explicit parameters hurt performance my impression is that
they are the better solution.

Dmitry? Daniel? What do you think.

Eric
_______________________________________________
Containers mailing list
Containers@lists.osdl.org
https://lists.osdl.org/mailman/listinfo/containers

Report message to a moderator

Re: [PATCH RFC 18/31] net: Implment network device movement between namespaces [message #17533 is a reply to message #17356]

Wed, 28 February 2007 14:35

Daniel Lezcano
Messages: 417
Registered: June 2006

Senior Member

Eric W. Biederman wrote:
> From: Eric W. Biederman <ebiederm@xmission.com> - unquoted
>
> This patch introduces NETIF_F_NETNS_LOCAL a flag to indicate
> a network device is local to a single network namespace and
> should never be moved.  Useful for pseudo devices that we
> need an instance in each network namespace (like the loopback
> device) and for any device we find that cannot handle multiple
> network namespaces so we may trap them in the initial network
> namespace.
>
> This patch introduces the function dev_change_net_namespace
> a function used to move a network device from one network
> namespace to another.  To the network device nothing
> special appears to happen, to the components of the network
> stack it appears as if the network device was unregistered
> in the network namespace it is in, and a new device
> was registered in the network namespace the device
> was moved to.
>
> This patch sets up a namespace device destructor that
> upon the exit of a network namespace moves all of the
> movable network devices  to the initial network namespace
> so they are not lost.
>   
If you:
 * create etun0/etun1
 * create a namespace
 * move etun1 to this namespace
 *  rename the etun1 to eth0
 *  kill the namespace

the former network device etun1 will be lost if you have in your parent 
namespace an interface eth0 because it will conflict.
Perhaps, the first name should be restored before moving the device back 
to the initial network namespace ?

  -- Daniel

ps : nice patchset
_______________________________________________
Containers mailing list
Containers@lists.osdl.org
https://lists.osdl.org/mailman/listinfo/containers

Report message to a moderator

Re: [PATCH RFC 22/31] net: Add network namespace clone support. [message #17534 is a reply to message #17360]

Wed, 28 February 2007 14:42

Daniel Lezcano
Messages: 417
Registered: June 2006

Senior Member

Eric W. Biederman wrote:
> From: Eric W. Biederman <ebiederm@xmission.com> - unquoted
>
> This patch allows you to create a new network namespace
> using sys_clone(...).
>
> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
> ---
>  include/linux/sched.h    |    1 +
>  kernel/nsproxy.c         |   11 +++++++++++
>  net/core/net_namespace.c |   38 ++++++++++++++++++++++++++++++++++++++
>  3 files changed, 50 insertions(+), 0 deletions(-)
>
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 4463735..9e0f91a 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -26,6 +26,7 @@
>  #define CLONE_STOPPED		0x02000000	/* Start in stopped state */
>  #define CLONE_NEWUTS		0x04000000	/* New utsname group? */
>  #define CLONE_NEWIPC		0x08000000	/* New ipcs */
> +#define CLONE_NEWNET		0x20000000	/* New network namespace */
>
>  /*
>   * Scheduling policies
> diff --git a/kernel/nsproxy.c b/kernel/nsproxy.c
> index 4f3c95a..7861c4c 100644
> --- a/kernel/nsproxy.c
> +++ b/kernel/nsproxy.c
> @@ -20,6 +20,7 @@
>  #include <linux/mnt_namespace.h>
>  #include <linux/utsname.h>
>  #include <linux/pid_namespace.h>
> +#include <net/net_namespace.h>
>
>  struct nsproxy init_nsproxy = INIT_NSPROXY(init_nsproxy);
>  EXPORT_SYMBOL_GPL(init_nsproxy);
> @@ -70,6 +71,7 @@ struct nsproxy *dup_namespaces(struct nsproxy *orig)
>  			get_ipc_ns(ns->ipc_ns);
>  		if (ns->pid_ns)
>  			get_pid_ns(ns->pid_ns);
> +		get_net(ns->net_ns);
>  	}
>
>  	return ns;
> @@ -117,10 +119,18 @@ int copy_namespaces(int flags, struct task_struct *tsk)
>  	if (err)
>  		goto out_pid;
>
> +	err = copy_net(flags, tsk);
> +	if (err)
> +		goto out_net;
> +
>  out:
>  	put_nsproxy(old_ns);
>  	return err;
>
> +out_net:
> +	if (new_ns->pid_ns)
> +		put_pid_ns(new_ns->pid_ns);
> +
>  out_pid:
>  	if (new_ns->ipc_ns)
>  		put_ipc_ns(new_ns->ipc_ns);
> @@ -146,5 +156,6 @@ void free_nsproxy(struct nsproxy *ns)
>  		put_ipc_ns(ns->ipc_ns);
>  	if (ns->pid_ns)
>  		put_pid_ns(ns->pid_ns);
> +	put_net(ns->net_ns);
>  	kfree(ns);
>  }
> diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
> index 93e3879..cc56105 100644
> --- a/net/core/net_namespace.c
> +++ b/net/core/net_namespace.c
> @@ -175,6 +175,44 @@ out_undo:
>  	goto out;
>  }
>
> +int copy_net(int flags, struct task_struct *tsk)
> +{
> +	net_t old_net = tsk->nsproxy->net_ns;
> +	net_t new_net;
> +	int err;
> +
> +	get_net(old_net);
> +
> +	if (!(flags & CLONE_NEWNET))
> +		return 0;
> +
> +	err = -EPERM;
> +	if (!capable(CAP_SYS_ADMIN))
> +		goto out;
> +
> +	err = -ENOMEM;
> +	new_net = net_alloc();
> +	if (null_net(new_net))
> +		goto out;
> +
> +	mutex_lock(&net_mutex);
> +	err = setup_net(new_net);
> +	if (err)
> +		goto out_unlock;
>   
Should we "net_free" in case of error ?
> +
> +	net_lock();
> +	net_list_append(new_net);
> +	net_unlock();
> +
> +	tsk->nsproxy->net_ns = new_net;
> +
> +out_unlock:
> +	mutex_unlock(&net_mutex);
> +out:
> +	put_net(old_net);
> +	return err;
> +}
> +
>  void pernet_modcopy(void *pnetdst, const void *src, unsigned long size)
>  {
>  	net_t net;
>   

_______________________________________________
Containers mailing list
Containers@lists.osdl.org
https://lists.osdl.org/mailman/listinfo/containers

Report message to a moderator

Re: [RFC PATCH 0/31] An introduction and A path for merging network namespace work [message #17536 is a reply to message #17338]

Wed, 28 February 2007 16:38

Daniel Lezcano
Messages: 417
Registered: June 2006

Senior Member

Hi Eric,

Do you plan to propose to merge into mainline your patchset ?

Shouldn't we ask netdev guys what they think about the explicit network 
namespace parameter into function you did versus the implicit network 
context using the push_net_ns/pop_net_ns function ?

  -- Daniel

_______________________________________________
Containers mailing list
Containers@lists.osdl.org
https://lists.osdl.org/mailman/listinfo/containers

Report message to a moderator

Pages (2): [1 2 › »]

Previous Topic:	Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!
Next Topic:	[RFC] Containers infrastructure problems

Goto Forum:

-=] Back to Top [=-

[ Syndicate this forum (XML) ] [

] [

]

Current Time: Sat Jan 10 01:10:31 GMT 2026

Total time taken to generate the page: 0.62227 seconds