OpenVZ Forum


Home » Mailing lists » Devel » Re: [PATCH v2 07/13] memcg: Slab accounting.
Re: [PATCH v2 07/13] memcg: Slab accounting. [message #45826 is a reply to message #45452] Tue, 13 March 2012 22:50 Go to previous message
Suleiman Souhlal is currently offline  Suleiman Souhlal
Messages: 64
Registered: February 2012
Member
On Sun, Mar 11, 2012 at 3:25 AM, Glauber Costa <glommer@parallels.com> wrote:
> On 03/10/2012 12:39 AM, Suleiman Souhlal wrote:
>> +static inline void
>> +mem_cgroup_kmem_cache_prepare_sleep(struct kmem_cache *cachep)
>> +{
>> +       /*
>> +        * Make sure the cache doesn't get freed while we have interrupts
>> +        * enabled.
>> +        */
>> +       kmem_cache_get_ref(cachep);
>> +       rcu_read_unlock();
>> +}
>
>
> Is this really needed ? After this function call in slab.c, the slab code
> itself accesses cachep a thousand times. If it could be freed, it would
> already explode today for other reasons?
> Am I missing something here?

We need this because once we drop the rcu_read_lock and go to sleep,
the memcg could get deleted, which could lead to the cachep from
getting deleted as well.

So, we need to grab a reference to the cache, to make sure that the
cache doesn't disappear from under us.

>> diff --git a/init/Kconfig b/init/Kconfig
>> index 3f42cd6..e7eb652 100644
>> --- a/init/Kconfig
>> +++ b/init/Kconfig
>> @@ -705,7 +705,7 @@ config CGROUP_MEM_RES_CTLR_SWAP_ENABLED
>>          then swapaccount=0 does the trick).
>>  config CGROUP_MEM_RES_CTLR_KMEM
>>        bool "Memory Resource Controller Kernel Memory accounting
>> (EXPERIMENTAL)"
>> -       depends on CGROUP_MEM_RES_CTLR&&  EXPERIMENTAL
>> +       depends on CGROUP_MEM_RES_CTLR&&  EXPERIMENTAL&&  !SLOB
>
> Orthogonal question: Will we ever want this (SLOB) ?

I honestly don't know why someone would want to use this and slob at
the same time.
It really doesn't seem like a required feature, in my opinion.
Especially at first.

>> +static struct kmem_cache *
>> +memcg_create_kmem_cache(struct mem_cgroup *memcg, struct kmem_cache
>> *cachep)
>> +{
>> +       struct kmem_cache *new_cachep;
>> +       struct dentry *dentry;
>> +       char *name;
>> +       int idx;
>> +
>> +       idx = cachep->memcg_params.id;
>> +
>> +       dentry = memcg->css.cgroup->dentry;
>> +       BUG_ON(dentry == NULL);
>> +
>> +       /* Preallocate the space for "dead" at the end */
>> +       name = kasprintf(GFP_KERNEL, "%s(%d:%s)dead",
>> +           cachep->name, css_id(&memcg->css), dentry->d_name.name);
>> +       if (name == NULL)
>> +               return cachep;
>> +       /* Remove "dead" */
>> +       name[strlen(name) - 4] = '\0';
>> +
>> +       new_cachep = kmem_cache_create_memcg(cachep, name);
>> +
>> +       /*
>> +        * Another CPU is creating the same cache?
>> +        * We'll use it next time.
>> +        */
>
> This comment is a bit misleading. Is it really the only reason
> it can fail?
>
> The impression I got is that it can also fail under the normal conditions in
> which kmem_cache_create() fails.

kmem_cache_create() isn't expected to fail often.
I wasn't making an exhaustive lists of why this condition can happen,
just what I think is the most common one is.

>> +/*
>> + * Enqueue the creation of a per-memcg kmem_cache.
>> + * Called with rcu_read_lock.
>> + */
>> +static void
>> +memcg_create_cache_enqueue(struct mem_cgroup *memcg, struct kmem_cache
>> *cachep)
>> +{
>> +       struct create_work *cw;
>> +       unsigned long flags;
>> +
>> +       spin_lock_irqsave(&create_queue_lock, flags);
>
> If we can sleep, why not just create the cache now?
>
> Maybe it would be better to split this in two, and create the cache if
> possible, and a worker if not possible. Then w

That's how I had it in my initial patch, but I was under the
impression that you preferred if we always kicked off the creation to
the workqueue?

Which way do you prefer?

>> @@ -1756,17 +1765,23 @@ static void *kmem_getpages(struct kmem_cache
>> *cachep, gfp_t flags, int nodeid)
>>        if (cachep->flags&  SLAB_RECLAIM_ACCOUNT)
>>
>>                flags |= __GFP_RECLAIMABLE;
>>
>> +       nr_pages = (1<<  cachep->gfporder);
>> +       if (!mem_cgroup_charge_slab(cachep, flags, nr_pages * PAGE_SIZE))
>> +               return NULL;
>> +
>>        page = alloc_pages_exact_node(nodeid, flags | __GFP_NOTRACK,
>> cachep->gfporder);
>> -       if (!page)
>> +       if (!page) {
>> +               mem_cgroup_uncharge_slab(cachep, nr_pages * PAGE_SIZE);
>>                return NULL;
>> +       }
>
>
>
> Can't the following happen:
>
>  *) mem_cgroup_charge_slab() is the first one to touch the slab.
>    Therefore, this first one is billed to root.
>  *) A slab is queued for creation.
>  *) alloc_pages sleep.
>  *) our workers run, and create the cache, therefore filling
>    cachep->memcg_param.memcg
>  *) alloc_pages still can't allocate.
>  *) uncharge tries to uncharge from cachep->memcg_param.memcg,
>    which doesn't have any charges...
>
> Unless you have a strong oposition to this, to avoid this kind of
> corner cases, we could do what I was doing in the slub:
> Allocate the page first, and then account it.
> (freeing the page if it fails).
>
> I know it is not the way it is done for the user pages, but I believe it to
> be better suited for the slab.

I don't think the situation you're describing can happen, because the
memcg caches get created and selected at the beginning of the slab
allocation, in mem_cgroup_get_kmem_cache() and not in
mem_cgroup_charge_slab(), which is much later.

Once we are in mem_cgroup_charge_slab() we know that the allocation
will be charged to the cgroup.

>> @@ -2269,10 +2288,12 @@ kmem_cache_create (const char *name, size_t size,
>> size_t align,
>>                }
>>
>>                if (!strcmp(pc->name, name)) {
>> -                       printk(KERN_ERR
>> -                              "kmem_cache_create: duplicate cache %s\n",
>> name);
>> -                       dump_stack();
>> -                       goto oops;
>> +                       if (!memcg) {
>> +                               printk(KERN_ERR "kmem_cache_create:
>> duplicate"
>> +                                   " cache %s\n", name);
>> +                               dump_stack();
>> +                               goto oops;
>> +                       }
>
> Why? Since we are apending the memcg name at the end anyway, duplicates
> still aren't expected.

Duplicates can happen if you have hierarchies, because we're only
appending the basename of the cgroup.

>> @@ -2703,12 +2787,74 @@ void kmem_cache_destroy(struct kmem_cache *cachep)
>>        if (unlikely(cachep->flags&  SLAB_DESTROY_BY_RCU))
>>
>>                rcu_barrier();
>>
>> +#ifdef CONFIG_CGROUP_MEM_RES_CTLR_KMEM
>> +       /* Not a memcg cache */
>> +       if (cachep->memcg_params.id != -1) {
>> +               __clear_bit(cachep->memcg_params.id, cache_types);
>> +               mem_cgroup_flush_cache_create_queue();
>> +       }
>> +#endif
>
>
> This will clear the id when a leaf cache is destroyed. It seems it is not
> what we want, right? We want this id to be cleared only when
> the parent cache is gone.

id != -1, for parent caches (that's what the comment is trying to point out).
I will improve the comment.

>> +static void
>> +kmem_cache_destroy_work_func(struct work_struct *w)
>> +{
>> +       struct kmem_cache *cachep;
>> +       char *name;
>> +
>> +       spin_lock_irq(&destroy_lock);
>> +       while (!list_empty(&destroyed_caches)) {
>> +               cachep = container_of(list_first_entry(&destroyed_caches,
>> +                   struct mem_cgroup_cache_params, destroyed_list),
>> struct
>> +                   kmem_cache, memcg_params);
>> +               name = (char *)cachep->name;
>> +               list_del(&cachep->memcg_params.destroyed_list);
>> +               spin_unlock_irq(&destroy_lock);
>> +               synchronize_rcu();
>> +               kmem_cache_destroy(cachep);
>
>                ^^^^^^
>                will destroy the id.

See my previous comment.

>> @@ -3866,9 +4030,35 @@ void kmem_cache_free(struct kmem_cache *cachep,
>> void *objp)
>>
>>        local_irq_save(flags);
>>        debug_check_no_locks_freed(objp, obj_size(cachep));
>> +
>> +#ifdef CONFIG_CGROUP_MEM_RES_CTLR_KMEM
>> +       {
>> +               struct kmem_cache *actual_cachep;
>> +
>> +               actual_cachep = virt_to_cache(objp);
>> +               if (actual_cachep != cachep) {
>> +                       VM_BUG_ON(actual_cachep->memcg_params.id != -1);
>> +                       VM_BUG_ON(actual_cachep->memc
...

 
Read Message
Read Message
Previous Topic: Re: [PATCH v2 03/13] memcg: Uncharge all kmem when deleting a cgroup.
Next Topic: Re: [PATCH v2 03/13] memcg: Uncharge all kmem when deleting a cgroup.
Goto Forum:
  


Current Time: Fri Aug 16 20:15:38 GMT 2024

Total time taken to generate the page: 0.02857 seconds