OpenVZ Forum: Devel » [PATCH v5 00/18] slab accounting for memcg

Home » Mailing lists » Devel » [PATCH v5 00/18] slab accounting for memcg

Show: Today's Messages :: Show Polls :: Message Navigator
E-mail to friend

[PATCH v5 08/18] memcg: infrastructure to match an allocation to the right cache [message #48533 is a reply to message #48518]

Fri, 19 October 2012 14:20

Glauber Costa
Messages: 916
Registered: October 2011

Senior Member

The page allocator is able to bind a page to a memcg when it is
allocated. But for the caches, we'd like to have as many objects as
possible in a page belonging to the same cache.

This is done in this patch by calling memcg_kmem_get_cache in the
beginning of every allocation function. This routing is patched out by
static branches when kernel memory controller is not being used.

It assumes that the task allocating, which determines the memcg in the
page allocator, belongs to the same cgroup throughout the whole process.
Misacounting can happen if the task calls memcg_kmem_get_cache() while
belonging to a cgroup, and later on changes. This is considered
acceptable, and should only happen upon task migration.

Before the cache is created by the memcg core, there is also a possible
imbalance: the task belongs to a memcg, but the cache being allocated
from is the global cache, since the child cache is not yet guaranteed to
be ready. This case is also fine, since in this case the GFP_KMEMCG will
not be passed and the page allocator will not attempt any cgroup
accounting.

[ v4: use a standard workqueue mechanism, create right away if
possible, index from cache side ]

Signed-off-by: Glauber Costa <glommer@parallels.com>
CC: Christoph Lameter <cl@linux.com>
CC: Pekka Enberg <penberg@cs.helsinki.fi>
CC: Michal Hocko <mhocko@suse.cz>
CC: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
CC: Johannes Weiner <hannes@cmpxchg.org>
CC: Suleiman Souhlal <suleiman@google.com>
CC: Tejun Heo <tj@kernel.org>
---
include/linux/memcontrol.h | 41 ++++++++++
init/Kconfig | 2 +-
mm/memcontrol.c | 186 +++++++++++++++++++++++++++++++++++++++++++++
3 files changed, 228 insertions(+), 1 deletion(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 491d96c..92fc47a 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -422,6 +422,10 @@ void memcg_cache_list_add(struct mem_cgroup *memcg, struct kmem_cache *cachep);

int memcg_update_cache_size(struct kmem_cache *s, int num_groups);
void memcg_update_array_size(int num_groups);
+
+struct kmem_cache *
+__memcg_kmem_get_cache(struct kmem_cache *cachep, gfp_t gfp);
+
/**
* memcg_kmem_newpage_charge: verify if a new kmem allocation is allowed.
* @gfp: the gfp allocation flags.
@@ -491,6 +495,37 @@ memcg_kmem_commit_charge(struct page *page, struct mem_cgroup *memcg, int order)
__memcg_kmem_commit_charge(page, memcg, order);
}

+/**
+ * memcg_kmem_get_cache: selects the correct per-memcg cache for allocation
+ * @cachep: the original global kmem cache
+ * @gfp: allocation flags.
+ *
+ * This function assumes that the task allocating, which determines the memcg
+ * in the page allocator, belongs to the same cgroup throughout the whole
+ * process. Misacounting can happen if the task calls memcg_kmem_get_cache()
+ * while belonging to a cgroup, and later on changes. This is considered
+ * acceptable, and should only happen upon task migration.
+ *
+ * Before the cache is created by the memcg core, there is also a possible
+ * imbalance: the task belongs to a memcg, but the cache being allocated from
+ * is the global cache, since the child cache is not yet guaranteed to be
+ * ready. This case is also fine, since in this case the GFP_KMEMCG will not be
+ * passed and the page allocator will not attempt any cgroup accounting.
+ */
+static __always_inline struct kmem_cache *
+memcg_kmem_get_cache(struct kmem_cache *cachep, gfp_t gfp)
+{
+ if (!memcg_kmem_enabled())
+ return cachep;
+ if (gfp & __GFP_NOFAIL)
+ return cachep;
+ if (in_interrupt() || (!current->mm) || (current->flags & PF_KTHREAD))
+ return cachep;
+ if (unlikely(fatal_signal_pending(current)))
+ return cachep;
+
+ return __memcg_kmem_get_cache(cachep, gfp);
+}
#else
static inline void sock_update_memcg(struct sock *sk)
{
@@ -529,6 +564,12 @@ static inline void memcg_cache_list_add(struct mem_cgroup *memcg,
{
BUG();
}
+
+static inline struct kmem_cache *
+memcg_kmem_get_cache(struct kmem_cache *cachep, gfp_t gfp)
+{
+ return cachep;
+}
#endif /* CONFIG_MEMCG_KMEM */
#endif /* _LINUX_MEMCONTROL_H */

diff --git a/init/Kconfig b/init/Kconfig
index af6c7f8..62b1f28 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -741,7 +741,7 @@ config MEMCG_SWAP_ENABLED
then swapaccount=0 does the trick).
config MEMCG_KMEM
bool "Memory Resource Controller Kernel Memory accounting (EXPERIMENTAL)"
- depends on MEMCG && EXPERIMENTAL
+ depends on MEMCG && EXPERIMENTAL && !SLOB
default n
help
The Kernel Memory extension for Memory Resource Controller can limit
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index dd6ac6a..ac2e621 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -562,7 +562,14 @@ static int memcg_limited_groups_array_size;
*/
#define MEMCG_CACHES_MAX_SIZE 65535

+/*
+ * A lot of the calls to the cache allocation functions are expected to be
+ * inlined by the compiler. Since the calls to memcg_kmem_get_cache are
+ * conditional to this static branch, we'll have to allow modules that does
+ * kmem_cache_alloc and the such to see this symbol as well
+ */
struct static_key memcg_kmem_enabled_key;
+EXPORT_SYMBOL(memcg_kmem_enabled_key);

static void disarm_kmem_keys(struct mem_cgroup *memcg)
{
@@ -2930,9 +2937,188 @@ int memcg_register_cache(struct mem_cgroup *memcg, struct kmem_cache *s)

void memcg_release_cache(struct kmem_cache *s)
{
+ struct kmem_cache *root;
+ int id = memcg_css_id(s->memcg_params->memcg);
+
+ if (s->memcg_params->is_root_cache)
+ goto out;
+
+ root = s->memcg_params->root_cache;
+ root->memcg_params->memcg_caches[id] = NULL;
+ mem_cgroup_put(s->memcg_params->memcg);
+out:
kfree(s->memcg_params);
}

+static char *memcg_cache_name(struct mem_cgroup *memcg, struct kmem_cache *cachep)
+{
+ char *name;
+ struct dentry *dentry;
+
+ rcu_read_lock();
+ dentry = rcu_dereference(memcg->css.cgroup->dentry);
+ rcu_read_unlock();
+
+ BUG_ON(dentry == NULL);
+
+ name = kasprintf(GFP_KERNEL, "%s(%d:%s)",
+ cachep->name, css_id(&memcg->css), dentry->d_name.name);
+
+ return name;
+}
+
+static struct kmem_cache *kmem_cache_dup(struct mem_cgroup *memcg,
+ struct kmem_cache *s)
+{
+ char *name;
+ struct kmem_cache *new;
+
+ name = memcg_cache_name(memcg, s);
+ if (!name)
+ return NULL;
+
+ new = kmem_cache_create_memcg(memcg, name, s->object_size, s->align,
+ (s->flags & ~SLAB_PANIC), s->ctor);
+
+ kfree(name);
+ return new;
+}
+
+/*
+ * This lock protects updaters, not readers. We want readers to be as fast as
+ * they can, and they will either see NULL or a valid cache value. Our model
+ * allow them to see NULL, in which case the root memcg will be selected.
+ *
+ * We need this lock because multiple allocations to the same cache from a non
+ * GFP_WAIT area will span more than one worker. Only one of them can create
+ * the cache.
+ */
+static DEFINE_MUTEX(memcg_cache_mutex);
+static struct kmem_cache *memcg_create_kmem_cache(struct mem_cgroup *memcg,
+ struct kmem_cache *cachep)
+{
+ struct kmem_cache *new_cachep;
+ int idx;
+
+ BUG_ON(!memcg_can_account_kmem(memcg));
+
+ idx = memcg_css_id(memcg);
+
+ mutex_lock(&memcg_cache_mutex);
+ new_cachep = cachep->memcg_params->memcg_caches[idx];
+ if (new_cachep)
+ goto out;
+
+ new_cachep = kmem_cache_dup(memcg, cachep);
+
+ if (new_cachep == NULL) {
+ new_cachep = cachep;
+ goto out;
+ }
+
+ mem_cgroup_get(memcg);
+ cachep->memcg_params->memcg_caches[idx] = new_cachep;
+ wmb(); /* the readers won't lock, make sure everybody sees it */
+ new_cachep->memcg_params->memcg = memcg;
+ new_cachep->memcg_params->root_cache = cachep;
+out:
+ mutex_unlock(&memcg_cache_mutex);
+ return new_cachep;
+}
+
+struct create_work {
+ struct mem_cgroup *memcg;
+ struct kmem_cache *cachep;
+ struct work_struct work;
+};
+
+static void memcg_create_cache_work_func(struct work_struct *w)
+{
+ struct create_work *cw;
+
+ cw = container_of(w, struct create_work, work);
+ memcg_create_kmem_cache(cw->memcg, cw->cachep);
+ /* Drop the reference gotten when we enqueued. */
+ css_put(&cw->memcg->css);
+ kfree(cw);
+}
+
+/*
+ * Enqueue the creation of a per-memcg kmem_cache.
+ * Called with rcu_read_lock.
+ */
+static void memcg_create_cache_enqueue(struct mem_cgroup *memcg,
+ struct kmem_cache *cachep)
+{
+ struct create_work *cw;
+
+ cw = kmalloc(sizeof(struct create_work), GFP_NOWAIT);
+ if (cw == NULL)
+ return;
+
+ /* The corresponding put will be done in the workqueue. */
+ if (!css_tryget(&memcg->css))
+ return;
+
+ cw->memcg = memcg;
+ cw->cachep = cachep;
+
+ INIT_WORK(&cw->work, memcg_create_cache_work_func);
+ schedule_work(&cw->work);
+}
+
+/*
+ * Return the kmem_cache we're supposed to use for a slab allocation.
+ * We try to use the current memcg's version of the cache.
+ *
+ * If the cache does not exist yet, if we are the first user of it,
+ * we either create it immediately, if possible, or create it asynchronously
+ * in a workqueue.
+ * In the latter case, we will let the current allocation go through with
+ * the original cache.
+ *
+ * Can't be called in interrupt context or from kernel threads.
+ * This function needs to be called with rcu_read_lock() held.
+ */
+struct kmem_cache *__memcg_kmem_get_cache(struct kmem_cache *cachep,
+ gfp_t gfp)
+{
+ struct mem_cgroup *memcg;
+ int idx;
+
+ if (cachep->memcg_params && cachep->memcg_params->memcg)
+ return cachep;
+
+ rcu_read_lock();
+ memcg = mem_cgroup_from_task(rcu_dereference(current->mm->owner));
+ rcu_read_unlock();
+
+ if (!memcg_can_account_kmem(memcg))
+ return cachep;
+
+ idx = memcg_css_id(memcg);
+ VM_BUG_ON(idx == -1);
+
+ if (cachep->memcg_par ...

[ Show the rest of the message ]

Report message to a moderator

[Message index]

		[PATCH v5 00/18] slab accounting for memcg By: Glauber Costa on Fri, 19 October 2012 14:20
		[PATCH v5 10/18] sl[au]b: always get the cache from its page in kfree By: Glauber Costa on Fri, 19 October 2012 14:20
		Re: [PATCH v5 10/18] sl[au]b: always get the cache from its page in kfree By: Christoph Lameter on Fri, 19 October 2012 19:44
		Re: [PATCH v5 10/18] sl[au]b: always get the cache from its page in kfree By: Glauber Costa on Mon, 22 October 2012 10:13
		[PATCH v5 04/18] slab: don't preemptively remove element from list in cache destroy By: Glauber Costa on Fri, 19 October 2012 14:20
		Re: [PATCH v5 04/18] slab: don't preemptively remove element from list in cache destroy By: Christoph Lameter on Fri, 19 October 2012 19:34
		Re: [PATCH v5 04/18] slab: don't preemptively remove element from list in cache destroy By: Glauber Costa on Mon, 22 October 2012 08:40
		Re: [PATCH v5 04/18] slab: don't preemptively remove element from list in cache destroy By: Pekka Enberg on Wed, 24 October 2012 06:54
		Re: [PATCH v5 04/18] slab: don't preemptively remove element from list in cache destroy By: Glauber Costa on Wed, 24 October 2012 08:21
		[PATCH v5 09/18] memcg: skip memcg kmem allocations in specified code regions By: Glauber Costa on Fri, 19 October 2012 14:20
		[PATCH v5 02/18] move print_slabinfo_header to slab_common.c By: Glauber Costa on Fri, 19 October 2012 14:20
		[PATCH v5 11/18] sl[au]b: Allocate objects from memcg cache By: Glauber Costa on Fri, 19 October 2012 14:20
		Re: [PATCH v5 11/18] sl[au]b: Allocate objects from memcg cache By: Christoph Lameter on Fri, 19 October 2012 19:46
		Re: [PATCH v5 11/18] sl[au]b: Allocate objects from memcg cache By: JoonSoo Kim on Mon, 29 October 2012 15:14
		Re: [PATCH v5 11/18] sl[au]b: Allocate objects from memcg cache By: Glauber Costa on Mon, 29 October 2012 15:19
		[PATCH v5 17/18] slub: slub-specific propagation changes. By: Glauber Costa on Fri, 19 October 2012 14:20
		[PATCH v5 06/18] consider a memcg parameter in kmem_create_cache By: Glauber Costa on Fri, 19 October 2012 14:20
		Re: [PATCH v5 06/18] consider a memcg parameter in kmem_create_cache By: JoonSoo Kim on Tue, 23 October 2012 17:50
		Re: [PATCH v5 06/18] consider a memcg parameter in kmem_create_cache By: Glauber Costa on Wed, 24 October 2012 08:42
		Re: [PATCH v5 06/18] consider a memcg parameter in kmem_create_cache By: Glauber Costa on Thu, 25 October 2012 13:42
		[PATCH v5 05/18] slab/slub: struct memcg_params By: Glauber Costa on Fri, 19 October 2012 14:20
		Re: [PATCH v5 05/18] slab/slub: struct memcg_params By: JoonSoo Kim on Tue, 23 October 2012 17:25
		Re: [PATCH v5 05/18] slab/slub: struct memcg_params By: Glauber Costa on Wed, 24 October 2012 08:42
		[PATCH v5 18/18] Add slab-specific documentation about the kmem controller By: Glauber Costa on Fri, 19 October 2012 14:20
		[PATCH v5 13/18] memcg/sl[au]b Track all the memcg children of a kmem_cache. By: Glauber Costa on Fri, 19 October 2012 14:20
		Re: [PATCH v5 13/18] memcg/sl[au]b Track all the memcg children of a kmem_cache. By: JoonSoo Kim on Mon, 29 October 2012 15:26
		Re: [PATCH v5 13/18] memcg/sl[au]b Track all the memcg children of a kmem_cache. By: Glauber Costa on Tue, 30 October 2012 11:31
		[PATCH v5 14/18] memcg/sl[au]b: shrink dead caches By: Glauber Costa on Fri, 19 October 2012 14:20
		Re: [PATCH v5 14/18] memcg/sl[au]b: shrink dead caches By: Christoph Lameter on Fri, 19 October 2012 19:47
		Re: [PATCH v5 14/18] memcg/sl[au]b: shrink dead caches By: Glauber Costa on Mon, 22 October 2012 07:37
		[PATCH v5 03/18] sl[au]b: process slabinfo_show in common code By: Glauber Costa on Fri, 19 October 2012 14:20
		[PATCH v5 01/18] move slabinfo processing to slab_common.c By: Glauber Costa on Fri, 19 October 2012 14:20
		Re: [PATCH v5 01/18] move slabinfo processing to slab_common.c By: Pekka Enberg on Wed, 24 October 2012 06:43
		[PATCH v5 15/18] Aggregate memcg cache values in slabinfo By: Glauber Costa on Fri, 19 October 2012 14:20
		Re: [PATCH v5 15/18] Aggregate memcg cache values in slabinfo By: Christoph Lameter on Fri, 19 October 2012 19:50
		Re: [PATCH v5 15/18] Aggregate memcg cache values in slabinfo By: Glauber Costa on Mon, 22 October 2012 15:11
		[PATCH v5 08/18] memcg: infrastructure to match an allocation to the right cache By: Glauber Costa on Fri, 19 October 2012 14:20
		Re: [PATCH v5 08/18] memcg: infrastructure to match an allocation to the right cache By: JoonSoo Kim on Wed, 24 October 2012 18:10
		Re: [PATCH v5 08/18] memcg: infrastructure to match an allocation to the right cache By: Glauber Costa on Thu, 25 October 2012 11:05
		Re: [PATCH v5 08/18] memcg: infrastructure to match an allocation to the right cache By: Tejun Heo on Thu, 25 October 2012 18:06
		Re: [PATCH v5 08/18] memcg: infrastructure to match an allocation to the right cache By: Tejun Heo on Thu, 25 October 2012 18:08
		[PATCH v5 07/18] Allocate memory for memcg caches whenever a new memcg appears By: Glauber Costa on Fri, 19 October 2012 14:20
		[PATCH v5 12/18] memcg: destroy memcg caches By: Glauber Costa on Fri, 19 October 2012 14:20
		[PATCH v5 16/18] slab: propagate tunables values By: Glauber Costa on Fri, 19 October 2012 14:20
		Re: [PATCH v5 16/18] slab: propagate tunables values By: Christoph Lameter on Fri, 19 October 2012 19:51
		Re: [PATCH v5 16/18] slab: propagate tunables values By: Glauber Costa on Mon, 22 October 2012 07:48
		Re: [PATCH v5 16/18] slab: propagate tunables values By: Christoph Lameter on Tue, 23 October 2012 20:44

Previous Topic:	[PATCH v3] SUNRPC: set desired file system root before connecting local transports
Next Topic:	[PATCH v5] slab: Ignore internal flags in cache creation

Goto Forum:

-=] Back to Top [=-

[ Syndicate this forum (XML) ] [

] [

]

Current Time: Tue Jun 16 02:52:07 GMT 2026

Total time taken to generate the page: 0.21963 seconds