OpenVZ Forum


Home » Mailing lists » Devel » [PATCH v3 00/16] slab accounting for memcg
Re: [PATCH v3 09/16] sl[au]b: always get the cache from its page in kfree [message #47983 is a reply to message #47973] Fri, 21 September 2012 20:07 Go to previous messageGo to next message
Tejun Heo is currently offline  Tejun Heo
Messages: 184
Registered: November 2006
Senior Member
Hello,

On Fri, Sep 21, 2012 at 12:41:52PM +0300, Pekka Enberg wrote:
> > I am already using static keys extensively in this patchset, and that is
> > how I intend to handle this particular case.
>
> Cool.
>
> The key point here is that !CONFIG_MEMCG_KMEM should have exactly *zero*
> performance impact and CONFIG_MEMCG_KMEM disabled at runtime should have
> absolute minimal impact.

Not necessarily disagreeing, but I don't think it's helpful to set the
bar impossibly high. Even static_key doesn't have "exactly *zero*"
impact. Let's stick to as minimal as possible when not in use and
reasonable in use.

And, yeah, this one can be easily solved by using static_key.

Thanks.

--
tejun
Re: [PATCH v3 09/16] sl[au]b: always get the cache from its page in kfree [message #47984 is a reply to message #47983] Fri, 21 September 2012 20:14 Go to previous messageGo to next message
Pekka Enberg is currently offline  Pekka Enberg
Messages: 22
Registered: November 2006
Junior Member
On Fri, Sep 21, 2012 at 11:07 PM, Tejun Heo <tj@kernel.org> wrote:
> Not necessarily disagreeing, but I don't think it's helpful to set the
> bar impossibly high. Even static_key doesn't have "exactly *zero*"
> impact. Let's stick to as minimal as possible when not in use and
> reasonable in use.

For !CONFIG_MEMCG_KMEM, it should be exactly zero. No need to play
games with static_key.
Re: [PATCH v3 09/16] sl[au]b: always get the cache from its page in kfree [message #47985 is a reply to message #47984] Fri, 21 September 2012 20:16 Go to previous messageGo to next message
Tejun Heo is currently offline  Tejun Heo
Messages: 184
Registered: November 2006
Senior Member
On Fri, Sep 21, 2012 at 11:14:45PM +0300, Pekka Enberg wrote:
> On Fri, Sep 21, 2012 at 11:07 PM, Tejun Heo <tj@kernel.org> wrote:
> > Not necessarily disagreeing, but I don't think it's helpful to set the
> > bar impossibly high. Even static_key doesn't have "exactly *zero*"
> > impact. Let's stick to as minimal as possible when not in use and
> > reasonable in use.
>
> For !CONFIG_MEMCG_KMEM, it should be exactly zero. No need to play
> games with static_key.

Ah, misread, though you were talking about MEMCG_KMEM && not in use.
Yeap, if turned off in Kconfig, it shouldn't have any performance
overhead.

Thanks.

--
tejun
Re: [PATCH v3 11/16] memcg: destroy memcg caches [message #47987 is a reply to message #47914] Fri, 21 September 2012 20:22 Go to previous messageGo to next message
Tejun Heo is currently offline  Tejun Heo
Messages: 184
Registered: November 2006
Senior Member
Hello, Glauber.

On Tue, Sep 18, 2012 at 06:12:05PM +0400, Glauber Costa wrote:
> +static LIST_HEAD(destroyed_caches);
> +
> +static void kmem_cache_destroy_work_func(struct work_struct *w)
> +{
> + struct kmem_cache *cachep;
> + struct mem_cgroup_cache_params *p, *tmp;
> + unsigned long flags;
> + LIST_HEAD(del_unlocked);
> +
> + spin_lock_irqsave(&cache_queue_lock, flags);
> + list_for_each_entry_safe(p, tmp, &destroyed_caches, destroyed_list) {
> + cachep = container_of(p, struct kmem_cache, memcg_params);
> + list_move(&cachep->memcg_params.destroyed_list, &del_unlocked);
> + }
> + spin_unlock_irqrestore(&cache_queue_lock, flags);
> +
> + list_for_each_entry_safe(p, tmp, &del_unlocked, destroyed_list) {
> + cachep = container_of(p, struct kmem_cache, memcg_params);
> + list_del(&cachep->memcg_params.destroyed_list);
> + if (!atomic_read(&cachep->memcg_params.nr_pages)) {
> + mem_cgroup_put(cachep->memcg_params.memcg);
> + kmem_cache_destroy(cachep);
> + }
> + }
> +}
> +static DECLARE_WORK(kmem_cache_destroy_work, kmem_cache_destroy_work_func);

Again, please don't build your own worklist. Just embed a work item
into mem_cgroup_cache_params and manipulate them. No need to
duplicate what workqueue already implements.

Thanks.

--
tejun
Re: [PATCH v3 12/16] memcg/sl[au]b Track all the memcg children of a kmem_cache. [message #47988 is a reply to message #47906] Fri, 21 September 2012 20:31 Go to previous messageGo to next message
Tejun Heo is currently offline  Tejun Heo
Messages: 184
Registered: November 2006
Senior Member
Hello, Glauber.

On Tue, Sep 18, 2012 at 06:12:06PM +0400, Glauber Costa wrote:
> This enables us to remove all the children of a kmem_cache being
> destroyed, if for example the kernel module it's being used in
> gets unloaded. Otherwise, the children will still point to the
> destroyed parent.

I find the terms parent / child / sibling a bit confusing. It usually
implies proper tree structure. Maybe we can use better terms which
reflect the single layer structure better?

And, again, in general, please add some comments. If someone tries to
understand this for the first time and takes a look at
mem_cgroup_cache_params, there's almost nothing to guide that person.
What's the struct for? What does each field do? What are the
synchronization rules?

> @@ -626,6 +630,9 @@ void memcg_release_cache(struct kmem_cache *cachep)
> {
> if (cachep->memcg_params.id != -1)
> ida_simple_remove(&cache_types, cachep->memcg_params.id);
> + else
> + list_del(&cachep->memcg_params.sibling_list);
> +

list_del_init() please.

Thanks.

--
tejun
Re: [PATCH v3 15/16] memcg/sl[au]b: shrink dead caches [message #47989 is a reply to message #47913] Fri, 21 September 2012 20:40 Go to previous messageGo to next message
Tejun Heo is currently offline  Tejun Heo
Messages: 184
Registered: November 2006
Senior Member
Hello, Glauber.

On Tue, Sep 18, 2012 at 06:12:09PM +0400, Glauber Costa wrote:
> @@ -764,10 +777,21 @@ static struct kmem_cache *memcg_create_kmem_cache(struct mem_cgroup *memcg,
> goto out;
> }
>
> + /*
> + * Because the cache is expected to duplicate the string,
> + * we must make sure it has opportunity to copy its full
> + * name. Only now we can remove the dead part from it
> + */
> + name = (char *)new_cachep->name;
> + if (name)
> + name[strlen(name) - 4] = '\0';

This is kinda nasty. Do we really need to do this? How long would a
dead cache stick around?

> diff --git a/mm/slab.c b/mm/slab.c
> index bd9928f..6cb4abf 100644
> --- a/mm/slab.c
> +++ b/mm/slab.c
> @@ -3785,6 +3785,8 @@ static inline void __cache_free(struct kmem_cache *cachep, void *objp,
> }
>
> ac_put_obj(cachep, ac, objp);
> +
> + kmem_cache_verify_dead(cachep);

Reaping dead caches doesn't exactly sound like a high priority thing
and adding a branch to hot path for that might not be the best way to
do it. Why not schedule an extremely lazy deferrable delayed_work
which polls for emptiness, say, every miniute or whatever?

Thanks.

--
tejun
Re: [PATCH v3 00/16] slab accounting for memcg [message #47990 is a reply to message #47897] Fri, 21 September 2012 20:46 Go to previous messageGo to next message
Tejun Heo is currently offline  Tejun Heo
Messages: 184
Registered: November 2006
Senior Member
Hello,

On Tue, Sep 18, 2012 at 06:11:54PM +0400, Glauber Costa wrote:
> This is a followup to the previous kmem series. I divided them logically
> so it gets easier for reviewers. But I believe they are ready to be merged
> together (although we can do a two-pass merge if people would prefer)
>
> Throwaway git tree found at:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/glommer/memcg. git kmemcg-slab
>
> There are mostly bugfixes since last submission.
>
> For a detailed explanation about this series, please refer to my previous post
> (Subj: [PATCH v3 00/13] kmem controller for memcg.)

In general, things look good to me. I think the basic approach is
manageable and does a decent job of avoiding introducing complications
on the usual code paths.

Pekka seems generally happy with the approach too. Christoph, what do
you think?

Thanks.

--
tejun
Re: [PATCH v3 00/16] slab accounting for memcg [message #47991 is a reply to message #47990] Fri, 21 September 2012 20:47 Go to previous messageGo to next message
Tejun Heo is currently offline  Tejun Heo
Messages: 184
Registered: November 2006
Senior Member
On Fri, Sep 21, 2012 at 01:46:19PM -0700, Tejun Heo wrote:
> In general, things look good to me. I think the basic approach is
> manageable and does a decent job of avoiding introducing complications
> on the usual code paths.
>
> Pekka seems generally happy with the approach too. Christoph, what do
> you think?

Ooh, also, why aren't Andrew and Michal not cc'd?

--
tejun
Re: [PATCH v3 06/16] memcg: infrastructure to match an allocation to the right cache [message #47992 is a reply to message #47903] Fri, 21 September 2012 20:52 Go to previous messageGo to next message
Tejun Heo is currently offline  Tejun Heo
Messages: 184
Registered: November 2006
Senior Member
Missed some stuff.

On Tue, Sep 18, 2012 at 06:12:00PM +0400, Glauber Costa wrote:
> +static struct kmem_cache *memcg_create_kmem_cache(struct mem_cgroup *memcg,
> + struct kmem_cache *cachep)
> +{
...
> + memcg->slabs[idx] = new_cachep;
...
> +struct kmem_cache *__memcg_kmem_get_cache(struct kmem_cache *cachep,
> + gfp_t gfp)
> +{
...
> + return memcg->slabs[idx];

I think you need memory barriers for the above pair.

Thanks.

--
tejun
Re: [PATCH v3 05/16] consider a memcg parameter in kmem_create_cache [message #47999 is a reply to message #47980] Mon, 24 September 2012 08:12 Go to previous messageGo to next message
Glauber Costa is currently offline  Glauber Costa
Messages: 916
Registered: October 2011
Senior Member
On 09/21/2012 10:14 PM, Tejun Heo wrote:
> Hello, Glauber.
>
> On Tue, Sep 18, 2012 at 06:11:59PM +0400, Glauber Costa wrote:
>> +void memcg_register_cache(struct mem_cgroup *memcg, struct kmem_cache *cachep)
>> +{
>> + int id = -1;
>> +
>> + if (!memcg)
>> + id = ida_simple_get(&cache_types, 0, MAX_KMEM_CACHE_TYPES,
>> + GFP_KERNEL);
>> + cachep->memcg_params.id = id;
>> +}
>
> I'm a bit confused. Why is id allocated only when memcg is NULL?
>

I think you figured that out already from your answer in another patch,
right? But I'll add a comment here since it seems to be a a natural
search point for people, explaining the mechanism.

> Also, how would the per-memcg slab/slubs appear in slabinfo? If they
> appear separately it might be better to give them readable cgroup
> names.
>

The new caches will appear under /proc/slabinfo with the rest, with a
string appended that identifies the group.

> Thanks.
>
Re: [PATCH v3 00/16] slab accounting for memcg [message #48000 is a reply to message #47990] Mon, 24 September 2012 08:15 Go to previous messageGo to next message
Glauber Costa is currently offline  Glauber Costa
Messages: 916
Registered: October 2011
Senior Member
On 09/22/2012 12:46 AM, Tejun Heo wrote:
> Hello,
>
> On Tue, Sep 18, 2012 at 06:11:54PM +0400, Glauber Costa wrote:
>> This is a followup to the previous kmem series. I divided them logically
>> so it gets easier for reviewers. But I believe they are ready to be merged
>> together (although we can do a two-pass merge if people would prefer)
>>
>> Throwaway git tree found at:
>>
>> git://git.kernel.org/pub/scm/linux/kernel/git/glommer/memcg. git kmemcg-slab
>>
>> There are mostly bugfixes since last submission.
>>
>> For a detailed explanation about this series, please refer to my previous post
>> (Subj: [PATCH v3 00/13] kmem controller for memcg.)
>
> In general, things look good to me. I think the basic approach is
> manageable and does a decent job of avoiding introducing complications
> on the usual code paths.
>
> Pekka seems generally happy with the approach too. Christoph, what do
> you think?
>
> Thanks.
>
I myself think that while the approach is okay, this would need one or
two more versions for us to sort out some issues that are still remaining.

I'd very much like to have the kmemcg-stack series seeing its way
forward first. Mel Gorman said he would try his best to review it his
week, and I don't plan to resubmit anything before that.

After that, I plan to rebase at least this one again.
Re: [PATCH v3 06/16] memcg: infrastructure to match an allocation to the right cache [message #48001 is a reply to message #47992] Mon, 24 September 2012 08:17 Go to previous messageGo to next message
Glauber Costa is currently offline  Glauber Costa
Messages: 916
Registered: October 2011
Senior Member
On 09/22/2012 12:52 AM, Tejun Heo wrote:
> Missed some stuff.
>
> On Tue, Sep 18, 2012 at 06:12:00PM +0400, Glauber Costa wrote:
>> +static struct kmem_cache *memcg_create_kmem_cache(struct mem_cgroup *memcg,
>> + struct kmem_cache *cachep)
>> +{
> ...
>> + memcg->slabs[idx] = new_cachep;
> ...
>> +struct kmem_cache *__memcg_kmem_get_cache(struct kmem_cache *cachep,
>> + gfp_t gfp)
>> +{
> ...
>> + return memcg->slabs[idx];
>
> I think you need memory barriers for the above pair.
>
> Thanks.
>

Why is that?

We'll either see a value, or NULL. If we see NULL, we assume the cache
is not yet created. Not a big deal.
Re: [PATCH v3 15/16] memcg/sl[au]b: shrink dead caches [message #48002 is a reply to message #47989] Mon, 24 September 2012 08:25 Go to previous messageGo to next message
Glauber Costa is currently offline  Glauber Costa
Messages: 916
Registered: October 2011
Senior Member
On 09/22/2012 12:40 AM, Tejun Heo wrote:
> Hello, Glauber.
>
> On Tue, Sep 18, 2012 at 06:12:09PM +0400, Glauber Costa wrote:
>> @@ -764,10 +777,21 @@ static struct kmem_cache *memcg_create_kmem_cache(struct mem_cgroup *memcg,
>> goto out;
>> }
>>
>> + /*
>> + * Because the cache is expected to duplicate the string,
>> + * we must make sure it has opportunity to copy its full
>> + * name. Only now we can remove the dead part from it
>> + */
>> + name = (char *)new_cachep->name;
>> + if (name)
>> + name[strlen(name) - 4] = '\0';
>
> This is kinda nasty. Do we really need to do this? How long would a
> dead cache stick around?

Without targeted shrinking, until all objects are manually freed, which
may need to wait global reclaim to kick in.

In general, if we agree with duplicating the caches, the problem that
they may stick around for some time will not be avoidable. If you have
any suggestions about alternative ways for it, I'm all ears.

>
>> diff --git a/mm/slab.c b/mm/slab.c
>> index bd9928f..6cb4abf 100644
>> --- a/mm/slab.c
>> +++ b/mm/slab.c
>> @@ -3785,6 +3785,8 @@ static inline void __cache_free(struct kmem_cache *cachep, void *objp,
>> }
>>
>> ac_put_obj(cachep, ac, objp);
>> +
>> + kmem_cache_verify_dead(cachep);
>
> Reaping dead caches doesn't exactly sound like a high priority thing
> and adding a branch to hot path for that might not be the best way to
> do it. Why not schedule an extremely lazy deferrable delayed_work
> which polls for emptiness, say, every miniute or whatever?
>

Because this branch is marked as unlikely, I would expect it not to be a
big problem. It will be not taken most of the time, and becomes a very
cheap branch. I considered this to be simpler than a deferred work
mechanism.

If even then, you guys believe this is still too high, I can resort to that.



> Thanks.
>
Re: [PATCH v3 06/16] memcg: infrastructure to match an allocation to the right cache [message #48003 is a reply to message #47981] Mon, 24 September 2012 08:46 Go to previous messageGo to next message
Glauber Costa is currently offline  Glauber Costa
Messages: 916
Registered: October 2011
Senior Member
On 09/21/2012 10:32 PM, Tejun Heo wrote:
> On Tue, Sep 18, 2012 at 06:12:00PM +0400, Glauber Costa wrote:
>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>> index 04851bb..1cce5c3 100644
>> --- a/mm/memcontrol.c
>> +++ b/mm/memcontrol.c
>> @@ -339,6 +339,11 @@ struct mem_cgroup {
>> #ifdef CONFIG_INET
>> struct tcp_memcontrol tcp_mem;
>> #endif
>> +
>> +#ifdef CONFIG_MEMCG_KMEM
>> + /* Slab accounting */
>> + struct kmem_cache *slabs[MAX_KMEM_CACHE_TYPES];
>> +#endif
>
> Bah, 400 entry array in struct mem_cgroup. Can't we do something a
> bit more flexible?
>

I guess. I still would like it to be an array, so we can easily access
its fields. There are two ways around this:

1) Do like the events mechanism and allocate this in a separate
structure. Add a pointer chase in the access, and I don't think it helps
much because it gets allocated anyway. But we could at least
defer it to the time when we limit the cache.

2) The indexes are only assigned after the slab is FULL. At that time, a
lot of the caches are already initialized. We can, for instance, allow
for "twice the number we have in the system", which already provides
room for a couple of more appearing. Combining with the 1st approach, we
can defer it to limit-time, and then allow for, say, 50 % more caches
than we already have. The pointer chasing may very well be worth it...

>> +static char *memcg_cache_name(struct mem_cgroup *memcg, struct kmem_cache *cachep)
>> +{
>> + char *name;
>> + struct dentry *dentry;
>> +
>> + rcu_read_lock();
>> + dentry = rcu_dereference(memcg->css.cgroup->dentry);
>> + rcu_read_unlock();
>> +
>> + BUG_ON(dentry == NULL);
>> +
>> + name = kasprintf(GFP_KERNEL, "%s(%d:%s)",
>> + cachep->name, css_id(&memcg->css), dentry->d_name.name);
>
> Maybe including full path is better, I don't know.

It can get way too big.

>> +static void memcg_create_cache_enqueue(struct mem_cgroup *memcg,
>> + struct kmem_cache *cachep)
>> +{
>> + struct create_work *cw;
>> + unsigned long flags;
>> +
>> + spin_lock_irqsave(&cache_queue_lock, flags);
>> + list_for_each_entry(cw, &create_queue, list) {
>> + if (cw->memcg == memcg && cw->cachep == cachep) {
>> + spin_unlock_irqrestore(&cache_queue_lock, flags);
>> + return;
>> + }
>> + }
>> + spin_unlock_irqrestore(&cache_queue_lock, flags);
>> +
>> + /* The corresponding put will be done in the workqueue. */
>> + if (!css_tryget(&memcg->css))
>> + return;
>> +
>> + cw = kmalloc(sizeof(struct create_work), GFP_NOWAIT);
>> + if (cw == NULL) {
>> + css_put(&memcg->css);
>> + return;
>> + }
>> +
>> + cw->memcg = memcg;
>> + cw->cachep = cachep;
>> + spin_lock_irqsave(&cache_queue_lock, flags);
>> + list_add_tail(&cw->list, &create_queue);
>> + spin_unlock_irqrestore(&cache_queue_lock, flags);
>> +
>> + schedule_work(&memcg_create_cache_work);
>> +}
>
> Why create your own worklist and flush mechanism? Just embed a work
> item in create_work and use a dedicated workqueue for flushing.

I'll take a look at this.

>
>> +/*
>> + * Return the kmem_cache we're supposed to use for a slab allocation.
>> + * We try to use the current memcg's version of the cache.
>> + *
>> + * If the cache does not exist yet, if we are the first user of it,
>> + * we either create it immediately, if possible, or create it asynchronously
>> + * in a workqueue.
>> + * In the latter case, we will let the current allocation go through with
>> + * the original cache.
>> + *
>> + * Can't be called in interrupt context or from kernel threads.
>> + * This function needs to be called with rcu_read_lock() held.
>> + */
>> +struct kmem_cache *__memcg_kmem_get_cache(struct kmem_cache *cachep,
>> + gfp_t gfp)
>> +{
>> + struct mem_cgroup *memcg;
>> + int idx;
>> + struct task_struct *p;
>> +
>> + if (cachep->memcg_params.memcg)
>> + return cachep;
>> +
>> + idx = cachep->memcg_params.id;
>> + VM_BUG_ON(idx == -1);
>> +
>> + rcu_read_lock();
>> + p = rcu_dereference(current->mm->owner);
>> + memcg = mem_cgroup_from_task(p);
>> + rcu_read_unlock();
>> +
>> + if (!memcg_can_account_kmem(memcg))
>> + return cachep;
>> +
>> + if (memcg->slabs[idx] == NULL) {
>> + memcg_create_cache_enqueue(memcg, cachep);
>
> Do we want to wait for the work item if @gfp allows?
>

I tried this once, and it got complicated enough that I deemed as "not
worth it". I honestly don't remember much of the details now, it was one
of the first things I tried, and a bunch of time has passed. If you
think it is absolutely worth it, I can try it again. But at the very
best, I view this as an optimization.
Re: [PATCH v3 07/16] memcg: skip memcg kmem allocations in specified code regions [message #48005 is a reply to message #47982] Mon, 24 September 2012 09:09 Go to previous messageGo to next message
Glauber Costa is currently offline  Glauber Costa
Messages: 916
Registered: October 2011
Senior Member
On 09/21/2012 11:59 PM, Tejun Heo wrote:
> Hello,
>
> On Tue, Sep 18, 2012 at 06:12:01PM +0400, Glauber Costa wrote:
>> +static void memcg_stop_kmem_account(void)
>> +{
>> + if (!current->mm)
>> + return;
>> +
>> + current->memcg_kmem_skip_account++;
>> +}
>> +
>> +static void memcg_resume_kmem_account(void)
>> +{
>> + if (!current->mm)
>> + return;
>> +
>> + current->memcg_kmem_skip_account--;
>> +}
>
> I can't say I'm a big fan of this approach. If there are enough
> users, maybe but can't we just annotate the affected allocations
> explicitly? Is this gonna have many more users?
>

What exactly do you mean by annotating the affected allocations?

There are currently two users of this. In both places, we are interested
in disallowing recursion, because cache creation will trigger new cache
allocations that will bring us back here.

We can't rely on unsetting the GFP flag we're using for this, because
that affects only the page allocation, not the metadata allocation for
the cache.


> Also, in general, can we please add some comments? I know memcg code
> is dearth of comments but let's please not keep it that way.
>
All right here.
Re: [PATCH v3 05/16] consider a memcg parameter in kmem_create_cache [message #48006 is a reply to message #47912] Mon, 24 September 2012 12:41 Go to previous messageGo to next message
Glauber Costa is currently offline  Glauber Costa
Messages: 916
Registered: October 2011
Senior Member
On 09/24/2012 04:41 PM, Christoph wrote:
>
> On Sep 24, 2012, at 3:12, Glauber Costa <glommer@parallels.com> wrote:
>
>> On 09/21/2012 10:14 PM, Tejun Heo wrote:
>>
>> The new caches will appear under /proc/slabinfo with the rest, with a
>> string appended that identifies the group.
>
> There are f.e. meminfo files in the per node directories in sysfs. It would make sense to have a slabinfo file there (if you want to keep that around). Then the format would be the same.
>

But that is orthogonal, isn't it? People will still expect to see it in
the old slabinfo file.
Re: [PATCH v3 05/16] consider a memcg parameter in kmem_create_cache [message #48007 is a reply to message #47999] Mon, 24 September 2012 12:41 Go to previous messageGo to next message
christoph is currently offline  christoph
Messages: 19
Registered: July 2006
Junior Member
On Sep 24, 2012, at 3:12, Glauber Costa <glommer@parallels.com> wrote:

> On 09/21/2012 10:14 PM, Tejun Heo wrote:
>
> The new caches will appear under /proc/slabinfo with the rest, with a
> string appended that identifies the group.

There are f.e. meminfo files in the per node directories in sysfs. It would make sense to have a slabinfo file there (if you want to keep that around). Then the format would be the same.
Re: [PATCH v3 05/16] consider a memcg parameter in kmem_create_cache [message #48008 is a reply to message #47912] Mon, 24 September 2012 13:44 Go to previous messageGo to next message
Glauber Costa is currently offline  Glauber Costa
Messages: 916
Registered: October 2011
Senior Member
On 09/24/2012 05:42 PM, Christoph Lameter wrote:
> On Mon, 24 Sep 2012, Glauber Costa wrote:
>
>> But that is orthogonal, isn't it? People will still expect to see it in
>> the old slabinfo file.
>
> The current scheme for memory statistics is
>
> /proc/meminfo contains global counters
>
> /sys/devices/system/node/nodeX/meminfo
>
> contains node specific counters.
>
> The cgroups directory already contains various files. Adding a slabinfo
> file would make sense and it could be found easily.
>
> The sysfs hierachy /sys/kernel/slab could also show up there as a "slab"
> directory under which all the details of the various caches would be
> available and tunable. Very much in sync with the way the cgroups
> directories operate.
>

The reason I say it is orthogonal, is that people will still want to see
their caches in /proc/slabinfo, regardless of wherever else they'll be.
It was a requirement from Pekka in one of the first times I posted this,
IIRC.
Re: [PATCH v3 05/16] consider a memcg parameter in kmem_create_cache [message #48009 is a reply to message #47912] Mon, 24 September 2012 13:57 Go to previous messageGo to next message
Glauber Costa is currently offline  Glauber Costa
Messages: 916
Registered: October 2011
Senior Member
On 09/24/2012 05:56 PM, Christoph Lameter wrote:
> On Mon, 24 Sep 2012, Glauber Costa wrote:
>
>> The reason I say it is orthogonal, is that people will still want to see
>> their caches in /proc/slabinfo, regardless of wherever else they'll be.
>> It was a requirement from Pekka in one of the first times I posted this,
>> IIRC.
>
> They want to see total counts there true. But as I said we already have a
> duplication of the statistics otherwise. We have never done the scheme
> that you propose. That is unexpected. I would not expect the numbers to be
> there.
>

I myself personally believe it can potentially clutter slabinfo, and
won't put my energy in defending the current implementation. What I
don't want is to keep switching between implementations.

Pekka, Tejun, what do you guys say here?
Re: [PATCH v3 05/16] consider a memcg parameter in kmem_create_cache [message #48010 is a reply to message #48006] Mon, 24 September 2012 13:42 Go to previous messageGo to next message
Christoph Lameter is currently offline  Christoph Lameter
Messages: 123
Registered: September 2006
Senior Member
On Mon, 24 Sep 2012, Glauber Costa wrote:

> But that is orthogonal, isn't it? People will still expect to see it in
> the old slabinfo file.

The current scheme for memory statistics is

/proc/meminfo contains global counters

/sys/devices/system/node/nodeX/meminfo

contains node specific counters.

The cgroups directory already contains various files. Adding a slabinfo
file would make sense and it could be found easily.

The sysfs hierachy /sys/kernel/slab could also show up there as a "slab"
directory under which all the details of the various caches would be
available and tunable. Very much in sync with the way the cgroups
directories operate.
Re: [PATCH v3 05/16] consider a memcg parameter in kmem_create_cache [message #48011 is a reply to message #48008] Mon, 24 September 2012 13:56 Go to previous messageGo to next message
Christoph Lameter is currently offline  Christoph Lameter
Messages: 123
Registered: September 2006
Senior Member
On Mon, 24 Sep 2012, Glauber Costa wrote:

> The reason I say it is orthogonal, is that people will still want to see
> their caches in /proc/slabinfo, regardless of wherever else they'll be.
> It was a requirement from Pekka in one of the first times I posted this,
> IIRC.

They want to see total counts there true. But as I said we already have a
duplication of the statistics otherwise. We have never done the scheme
that you propose. That is unexpected. I would not expect the numbers to be
there.
Re: [PATCH v3 05/16] consider a memcg parameter in kmem_create_cache [message #48012 is a reply to message #48009] Mon, 24 September 2012 15:38 Go to previous messageGo to next message
Pekka Enberg is currently offline  Pekka Enberg
Messages: 22
Registered: November 2006
Junior Member
On 09/24/2012 05:56 PM, Christoph Lameter wrote:
>> On Mon, 24 Sep 2012, Glauber Costa wrote:
>>
>>> The reason I say it is orthogonal, is that people will still want to see
>>> their caches in /proc/slabinfo, regardless of wherever else they'll be.
>>> It was a requirement from Pekka in one of the first times I posted this,
>>> IIRC.
>>
>> They want to see total counts there true. But as I said we already have a
>> duplication of the statistics otherwise. We have never done the scheme
>> that you propose. That is unexpected. I would not expect the numbers to be
>> there.

On Mon, Sep 24, 2012 at 4:57 PM, Glauber Costa <glommer@parallels.com> wrote:
> I myself personally believe it can potentially clutter slabinfo, and
> won't put my energy in defending the current implementation. What I
> don't want is to keep switching between implementations.
>
> Pekka, Tejun, what do you guys say here?

So Christoph is proposing that the new caches appear somewhere under
the cgroups directory and /proc/slabinfo includes aggregated counts,
right? I'm certainly OK with that.
Re: [PATCH v3 05/16] consider a memcg parameter in kmem_create_cache [message #48013 is a reply to message #48012] Mon, 24 September 2012 15:36 Go to previous messageGo to next message
Glauber Costa is currently offline  Glauber Costa
Messages: 916
Registered: October 2011
Senior Member
On 09/24/2012 07:38 PM, Pekka Enberg wrote:
> On 09/24/2012 05:56 PM, Christoph Lameter wrote:
>>> On Mon, 24 Sep 2012, Glauber Costa wrote:
>>>
>>>> The reason I say it is orthogonal, is that people will still want to see
>>>> their caches in /proc/slabinfo, regardless of wherever else they'll be.
>>>> It was a requirement from Pekka in one of the first times I posted this,
>>>> IIRC.
>>>
>>> They want to see total counts there true. But as I said we already have a
>>> duplication of the statistics otherwise. We have never done the scheme
>>> that you propose. That is unexpected. I would not expect the numbers to be
>>> there.
>
> On Mon, Sep 24, 2012 at 4:57 PM, Glauber Costa <glommer@parallels.com> wrote:
>> I myself personally believe it can potentially clutter slabinfo, and
>> won't put my energy in defending the current implementation. What I
>> don't want is to keep switching between implementations.
>>
>> Pekka, Tejun, what do you guys say here?
>
> So Christoph is proposing that the new caches appear somewhere under
> the cgroups directory and /proc/slabinfo includes aggregated counts,
> right? I'm certainly OK with that.
>
Just for clarification, I am not sure about the aggregate counts -
although it surely makes sense.

Christoph, is that what you're proposing ?
Re: [PATCH v3 15/16] memcg/sl[au]b: shrink dead caches [message #48014 is a reply to message #48002] Mon, 24 September 2012 17:43 Go to previous messageGo to next message
Tejun Heo is currently offline  Tejun Heo
Messages: 184
Registered: November 2006
Senior Member
Hello,

On Mon, Sep 24, 2012 at 12:25:00PM +0400, Glauber Costa wrote:
> > This is kinda nasty. Do we really need to do this? How long would a
> > dead cache stick around?
>
> Without targeted shrinking, until all objects are manually freed, which
> may need to wait global reclaim to kick in.
>
> In general, if we agree with duplicating the caches, the problem that
> they may stick around for some time will not be avoidable. If you have
> any suggestions about alternative ways for it, I'm all ears.

I don't have much problem with caches sticking around waiting to be
reaped. I'm just wondering whether renaming trick is really
necessary.

> > Reaping dead caches doesn't exactly sound like a high priority thing
> > and adding a branch to hot path for that might not be the best way to
> > do it. Why not schedule an extremely lazy deferrable delayed_work
> > which polls for emptiness, say, every miniute or whatever?
> >
>
> Because this branch is marked as unlikely, I would expect it not to be a
> big problem. It will be not taken most of the time, and becomes a very
> cheap branch. I considered this to be simpler than a deferred work
> mechanism.
>
> If even then, you guys believe this is still too high, I can resort to that.

It's still an otherwise unnecessary branch on a very hot path. If you
can remove it, there's no reason not to.

Thanks.

--
tejun
Re: [PATCH v3 07/16] memcg: skip memcg kmem allocations in specified code regions [message #48015 is a reply to message #48005] Mon, 24 September 2012 17:47 Go to previous messageGo to next message
Tejun Heo is currently offline  Tejun Heo
Messages: 184
Registered: November 2006
Senior Member
Hello,

On Mon, Sep 24, 2012 at 01:09:23PM +0400, Glauber Costa wrote:
> > I can't say I'm a big fan of this approach. If there are enough
> > users, maybe but can't we just annotate the affected allocations
> > explicitly? Is this gonna have many more users?
>
> What exactly do you mean by annotating the affected allocations?

IOW, can't you just pass down an extra argument / flag / whatever
instead?

Thanks.

--
tejun
Re: [PATCH v3 06/16] memcg: infrastructure to match an allocation to the right cache [message #48016 is a reply to message #48003] Mon, 24 September 2012 17:56 Go to previous messageGo to next message
Tejun Heo is currently offline  Tejun Heo
Messages: 184
Registered: November 2006
Senior Member
Hello, Glauber.

On Mon, Sep 24, 2012 at 12:46:35PM +0400, Glauber Costa wrote:
> >> +#ifdef CONFIG_MEMCG_KMEM
> >> + /* Slab accounting */
> >> + struct kmem_cache *slabs[MAX_KMEM_CACHE_TYPES];
> >> +#endif
> >
> > Bah, 400 entry array in struct mem_cgroup. Can't we do something a
> > bit more flexible?
> >
>
> I guess. I still would like it to be an array, so we can easily access
> its fields. There are two ways around this:
>
> 1) Do like the events mechanism and allocate this in a separate
> structure. Add a pointer chase in the access, and I don't think it helps
> much because it gets allocated anyway. But we could at least
> defer it to the time when we limit the cache.

Start at some reasonable size and then double it as usage grows? How
many kmem_caches do we typically end up using?

> >> + if (memcg->slabs[idx] == NULL) {
> >> + memcg_create_cache_enqueue(memcg, cachep);
> >
> > Do we want to wait for the work item if @gfp allows?
> >
>
> I tried this once, and it got complicated enough that I deemed as "not
> worth it". I honestly don't remember much of the details now, it was one
> of the first things I tried, and a bunch of time has passed. If you
> think it is absolutely worth it, I can try it again. But at the very
> best, I view this as an optimization.

I don't know. It seems like a logical thing to try and depends on how
complex it gets. I don't think it's a must. The whole thing is
somewhat opportunistic after all.

Thanks.

--
tejun
Re: [PATCH v3 06/16] memcg: infrastructure to match an allocation to the right cache [message #48017 is a reply to message #48001] Mon, 24 September 2012 17:58 Go to previous messageGo to next message
Tejun Heo is currently offline  Tejun Heo
Messages: 184
Registered: November 2006
Senior Member
On Mon, Sep 24, 2012 at 12:17:37PM +0400, Glauber Costa wrote:
> On 09/22/2012 12:52 AM, Tejun Heo wrote:
> > Missed some stuff.
> >
> > On Tue, Sep 18, 2012 at 06:12:00PM +0400, Glauber Costa wrote:
> >> +static struct kmem_cache *memcg_create_kmem_cache(struct mem_cgroup *memcg,
> >> + struct kmem_cache *cachep)
> >> +{
> > ...
> >> + memcg->slabs[idx] = new_cachep;
> > ...
> >> +struct kmem_cache *__memcg_kmem_get_cache(struct kmem_cache *cachep,
> >> + gfp_t gfp)
> >> +{
> > ...
> >> + return memcg->slabs[idx];
> >
> > I think you need memory barriers for the above pair.
> >
> > Thanks.
> >
>
> Why is that?
>
> We'll either see a value, or NULL. If we see NULL, we assume the cache
> is not yet created. Not a big deal.

Because when you see !NULL cache pointer you want to be able to see
the cache fully initialized. You need wmb between cache creation and
pointer assignment and at least read_barrier_depends() between
fetching the cache pointer and dereferencing it.

Thanks.

--
tejun
Re: [PATCH v3 05/16] consider a memcg parameter in kmem_create_cache [message #48018 is a reply to message #48013] Mon, 24 September 2012 17:38 Go to previous messageGo to next message
Christoph Lameter is currently offline  Christoph Lameter
Messages: 123
Registered: September 2006
Senior Member
On Mon, 24 Sep 2012, Glauber Costa wrote:

> > So Christoph is proposing that the new caches appear somewhere under
> > the cgroups directory and /proc/slabinfo includes aggregated counts,
> > right? I'm certainly OK with that.
> >
> Just for clarification, I am not sure about the aggregate counts -
> although it surely makes sense.
>
> Christoph, is that what you're proposing ?

Yes. Make it similar to the way /proc/meminfo is handled.
Re: [PATCH v3 05/16] consider a memcg parameter in kmem_create_cache [message #48019 is a reply to message #48012] Mon, 24 September 2012 17:39 Go to previous messageGo to next message
Christoph Lameter is currently offline  Christoph Lameter
Messages: 123
Registered: September 2006
Senior Member
On Mon, 24 Sep 2012, Pekka Enberg wrote:

> So Christoph is proposing that the new caches appear somewhere under
> the cgroups directory and /proc/slabinfo includes aggregated counts,
> right? I'm certainly OK with that.

Caches would appear either in cgroup/slabinfo (which would have the same
format as /proc/slabinfo) or in cgroup/slab/<slabname>/<fieldname>
(similar to /sys/kernel/slab/...)
Re: [PATCH v3 06/16] memcg: infrastructure to match an allocation to the right cache [message #48030 is a reply to message #48016] Tue, 25 September 2012 13:57 Go to previous messageGo to next message
Glauber Costa is currently offline  Glauber Costa
Messages: 916
Registered: October 2011
Senior Member
On 09/24/2012 09:56 PM, Tejun Heo wrote:
> Hello, Glauber.
>
> On Mon, Sep 24, 2012 at 12:46:35PM +0400, Glauber Costa wrote:
>>>> +#ifdef CONFIG_MEMCG_KMEM
>>>> + /* Slab accounting */
>>>> + struct kmem_cache *slabs[MAX_KMEM_CACHE_TYPES];
>>>> +#endif
>>>
>>> Bah, 400 entry array in struct mem_cgroup. Can't we do something a
>>> bit more flexible?
>>>
>>
>> I guess. I still would like it to be an array, so we can easily access
>> its fields. There are two ways around this:
>>
>> 1) Do like the events mechanism and allocate this in a separate
>> structure. Add a pointer chase in the access, and I don't think it helps
>> much because it gets allocated anyway. But we could at least
>> defer it to the time when we limit the cache.
>
> Start at some reasonable size and then double it as usage grows? How
> many kmem_caches do we typically end up using?
>

So my Fedora box here, recently booted on a Fedora kernel, will have 111
caches. How would 150 sound to you?
Re: [PATCH v3 06/16] memcg: infrastructure to match an allocation to the right cache [message #48031 is a reply to message #48030] Tue, 25 September 2012 16:28 Go to previous messageGo to next message
Christoph Lameter is currently offline  Christoph Lameter
Messages: 123
Registered: September 2006
Senior Member
On Tue, 25 Sep 2012, Glauber Costa wrote:

> >> 1) Do like the events mechanism and allocate this in a separate
> >> structure. Add a pointer chase in the access, and I don't think it helps
> >> much because it gets allocated anyway. But we could at least
> >> defer it to the time when we limit the cache.
> >
> > Start at some reasonable size and then double it as usage grows? How
> > many kmem_caches do we typically end up using?
> >
>
> So my Fedora box here, recently booted on a Fedora kernel, will have 111
> caches. How would 150 sound to you?

Some drivers/subsystems can dynamically create slabs as needed for new
devices or instances of metadata. You cannot use a fixed size
array and cannot establish an upper boundary for the number of slabs on
the system.
Re: [PATCH v3 15/16] memcg/sl[au]b: shrink dead caches [message #48040 is a reply to message #47913] Fri, 21 September 2012 04:48 Go to previous messageGo to next message
JoonSoo Kim is currently offline  JoonSoo Kim
Messages: 9
Registered: September 2012
Junior Member
Hi Glauber.

2012/9/18 Glauber Costa <glommer@parallels.com>:
> diff --git a/mm/slub.c b/mm/slub.c
> index 0b68d15..9d79216 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -2602,6 +2602,7 @@ redo:
> } else
> __slab_free(s, page, x, addr);
>
> + kmem_cache_verify_dead(s);
> }

As far as u know, I am not a expert and don't know anything about memcg.
IMHO, this implementation may hurt system performance in some case.

In case of memcg is destoried, remained kmem_cache is marked "dead".
After it is marked,
every free operation to this "dead" kmem_cache call
kmem_cache_verify_dead() and finally call kmem_cache_shrink().
kmem_cache_shrink() do invoking kmalloc and flush_all() and taking a
lock for online node and invoking kfree.
Especially, flush_all() may hurt performance largely, because it call
has_cpu_slab() against all the cpus.

And I know some other case it can hurt system performance.
But, I don't mention it, because above case is sufficient to worry.

And, I found one case that destroying memcg's kmem_cache don't works properly.
If we destroy memcg after all object is freed, current implementation
doesn't destroy kmem_cache.
kmem_cache_destroy_work_func() check "cachep->memcg_params.nr_pages == 0",
but in this case, it return false, because kmem_cache may have
cpu_slab, and cpu_partials_slabs.
As we already free all objects, kmem_cache_verify_dead() is not invoked forever.
I think that we need another kmem_cache_shrink() in
kmem_cache_destroy_work_func().

I don't convince that I am right, so think carefully my humble opinion.

Thanks.
Re: [PATCH v3 15/16] memcg/sl[au]b: shrink dead caches [message #48042 is a reply to message #47966] Fri, 21 September 2012 09:28 Go to previous messageGo to next message
JoonSoo Kim is currently offline  JoonSoo Kim
Messages: 9
Registered: September 2012
Junior Member
Hi, Glauber.

>> 2012/9/18 Glauber Costa <glommer@parallels.com>:
>>> diff --git a/mm/slub.c b/mm/slub.c
>>> index 0b68d15..9d79216 100644
>>> --- a/mm/slub.c
>>> +++ b/mm/slub.c
>>> @@ -2602,6 +2602,7 @@ redo:
>>> } else
>>> __slab_free(s, page, x, addr);
>>>
>>> + kmem_cache_verify_dead(s);
>>> }
>>
>> As far as u know, I am not a expert and don't know anything about memcg.
>> IMHO, this implementation may hurt system performance in some case.
>>
>> In case of memcg is destoried, remained kmem_cache is marked "dead".
>> After it is marked,
>> every free operation to this "dead" kmem_cache call
>> kmem_cache_verify_dead() and finally call kmem_cache_shrink().
>
> As long as it is restricted to that cache, this is a non issue.
> dead caches are exactly what they name imply: dead.
>
> Means that we actively want them to go away, and just don't kill them
> right away because they have some inflight objects - which we expect not
> to be too much.

Hmm.. I don't think so.
We can destroy memcg whenever we want, is it right?
If it is right, there is many inflight objects when we destory memcg.
If there is so many inflight objects, performance of these processes
can be hurt too much.

>> And, I found one case that destroying memcg's kmem_cache don't works properly.
>> If we destroy memcg after all object is freed, current implementation
>> doesn't destroy kmem_cache.
>> kmem_cache_destroy_work_func() check "cachep->memcg_params.nr_pages == 0",
>> but in this case, it return false, because kmem_cache may have
>> cpu_slab, and cpu_partials_slabs.
>> As we already free all objects, kmem_cache_verify_dead() is not invoked forever.
>> I think that we need another kmem_cache_shrink() in
>> kmem_cache_destroy_work_func().
>
> I'll take a look here. What you describe makes sense, and can
> potentially happen. I tried to handle this case with care in
> destroy_all_caches, but I may have always made a mistake...
>
> Did you see this actively happening, or are you just assuming this can
> happen from your read of the code?

Just read of the code.

Thanks.
Re: [PATCH v3 05/16] consider a memcg parameter in kmem_create_cache [message #48194 is a reply to message #47912] Tue, 02 October 2012 14:46 Go to previous message
Michal Hocko is currently offline  Michal Hocko
Messages: 109
Registered: December 2011
Senior Member
On Tue 18-09-12 18:11:59, Glauber Costa wrote:
> Allow a memcg parameter to be passed during cache creation.
> When the slub allocator is being used, it will only merge
> caches that belong to the same memcg.
>
> Default function is created as a wrapper, passing NULL
> to the memcg version. We only merge caches that belong
> to the same memcg.
>
> From the memcontrol.c side, 3 helper functions are created:
>
> 1) memcg_css_id: because slub needs a unique cache name
> for sysfs. Since this is visible, but not the canonical
> location for slab data, the cache name is not used, the
> css_id should suffice.
>
> 2) mem_cgroup_register_cache: is responsible for assigning
> a unique index to each cache, and other general purpose
> setup. The index is only assigned for the root caches. All
> others are assigned index == -1.

It would be nice to describe what is memcg_params.id intended for. There
is no usage in this patch (except for create_unique_id in slub).
I guess that by root caches you mean all default caches with
memcg==NULL, right?

[...]
--
Michal Hocko
SUSE Labs
Previous Topic: [PATCH 0/5] nfs: few cleanup patches inspired by sparse
Next Topic: [PATCH v3 04/13] kmem accounting basic infrastructure
Goto Forum:
  


Current Time: Fri Oct 24 04:35:13 GMT 2025

Total time taken to generate the page: 0.08236 seconds