OpenVZ Forum - RDF feed
https://new-forum.openvz.org/index.php
Re: [PATCH v5] slab: Ignore internal flags in cache creation
https://new-forum.openvz.org/index.phpindex.php?t=rview&goto=48686&th=11214#msg_48686
rientjes@google.com> wrote:
> On Wed, 17 Oct 2012, Glauber Costa wrote:
>
>> Some flags are used internally by the allocators for management
>> purposes. One example of that is the CFLGS_OFF_SLAB flag that slab uses
>> to mark that the metadata for that cache is stored outside of the slab.
>>
>> No cache should ever pass those as a creation flags. We can just ignore
>> this bit if it happens to be passed (such as when duplicating a cache in
>> the kmem memcg patches).
>>
>> Because such flags can vary from allocator to allocator, we allow them
>> to make their own decisions on that, defining SLAB_AVAILABLE_FLAGS with
>> all flags that are valid at creation time. Allocators that doesn't have
>> any specific flag requirement should define that to mean all flags.
>>
>> Common code will mask out all flags not belonging to that set.
>>
>> [ v2: leave the mask out decision up to the allocators ]
>> [ v3: define flags for all allocators ]
>> [ v4: move all definitions to slab.h ]
>>
>> Signed-off-by: Glauber Costa <glommer@parallels.com>
>> Acked-by: Christoph Lameter <cl@linux.com>
>> CC: David Rientjes <rientjes@google.com>
>> CC: Pekka Enberg <penberg@cs.helsinki.fi>
>
> Acked-by: David Rientjes <rientjes@google.com>
Applied, thanks!]]>Pekka Enberg2012-10-31T07:13:49-00:00Re: [PATCH v5 13/18] memcg/sl[au]b Track all the memcg children of a kmem_cache.
https://new-forum.openvz.org/index.phpindex.php?t=rview&goto=48679&th=11233#msg_48679
> 2012/10/19 Glauber Costa <glommer@parallels.com>:
>> +void kmem_cache_destroy_memcg_children(struct kmem_cache *s)
>> +{
>> + struct kmem_cache *c;
>> + int i;
>> +
>> + if (!s->memcg_params)
>> + return;
>> + if (!s->memcg_params->is_root_cache)
>> + return;
>> +
>> + /*
>> + * If the cache is being destroyed, we trust that there is no one else
>> + * requesting objects from it. Even if there are, the sanity checks in
>> + * kmem_cache_destroy should caught this ill-case.
>> + *
>> + * Still, we don't want anyone else freeing memcg_caches under our
>> + * noses, which can happen if a new memcg comes to life. As usual,
>> + * we'll take the set_limit_mutex to protect ourselves against this.
>> + */
>> + mutex_lock(&set_limit_mutex);
>> + for (i = 0; i < memcg_limited_groups_array_size; i++) {
>> + c = s->memcg_params->memcg_caches[i];
>> + if (c)
>> + kmem_cache_destroy(c);
>> + }
>> + mutex_unlock(&set_limit_mutex);
>> +}
>
> It may cause NULL deref.
> Look at the following scenario.
>
> 1. some memcg slab caches has remained object.
> 2. start to destroy memcg.
> 3. schedule_delayed_work(kmem_cache_destroy_work_func, @delay 60hz)
> 4. all remained object is freed.
> 5. start to destroy root cache.
> 6. kmem_cache_destroy makes 's->memcg_params->memcg_caches[i]" NULL!!
> 7. Start delayed work function.
> 8. cachep in kmem_cache_destroy_work_func() may be NULL
>
> Thanks.
>
Thanks for spotting. This is the same problem we have in
memcg_cache_destroy(),
which I solved by not respawning the worker.
In here, I believe it should be possible to just cancel all remaining
pending work, since we are now in the process of deleting the caches
ourselves.]]>Glauber Costa2012-10-30T11:31:37-00:00Re: [PATCH v5 13/18] memcg/sl[au]b Track all the memcg children of a kmem_cache.
https://new-forum.openvz.org/index.phpindex.php?t=rview&goto=48675&th=11233#msg_48675
glommer@parallels.com>:
> +void kmem_cache_destroy_memcg_children(struct kmem_cache *s)
> +{
> + struct kmem_cache *c;
> + int i;
> +
> + if (!s->memcg_params)
> + return;
> + if (!s->memcg_params->is_root_cache)
> + return;
> +
> + /*
> + * If the cache is being destroyed, we trust that there is no one else
> + * requesting objects from it. Even if there are, the sanity checks in
> + * kmem_cache_destroy should caught this ill-case.
> + *
> + * Still, we don't want anyone else freeing memcg_caches under our
> + * noses, which can happen if a new memcg comes to life. As usual,
> + * we'll take the set_limit_mutex to protect ourselves against this.
> + */
> + mutex_lock(&set_limit_mutex);
> + for (i = 0; i < memcg_limited_groups_array_size; i++) {
> + c = s->memcg_params->memcg_caches[i];
> + if (c)
> + kmem_cache_destroy(c);
> + }
> + mutex_unlock(&set_limit_mutex);
> +}
It may cause NULL deref.
Look at the following scenario.
1. some memcg slab caches has remained object.
2. start to destroy memcg.
3. schedule_delayed_work(kmem_cache_destroy_work_func, @delay 60hz)
4. all remained object is freed.
5. start to destroy root cache.
6. kmem_cache_destroy makes 's->memcg_params->memcg_caches[i]" NULL!!
7. Start delayed work function.
8. cachep in kmem_cache_destroy_work_func() may be NULL
Thanks.]]>JoonSoo Kim2012-10-29T15:26:43-00:00Re: [PATCH v5 11/18] sl[au]b: Allocate objects from memcg cache
https://new-forum.openvz.org/index.phpindex.php?t=rview&goto=48674&th=11233#msg_48674
> Hi, Glauber.
>
> 2012/10/19 Glauber Costa <glommer@parallels.com>:
>> We are able to match a cache allocation to a particular memcg. If the
>> task doesn't change groups during the allocation itself - a rare event,
>> this will give us a good picture about who is the first group to touch a
>> cache page.
>>
>> This patch uses the now available infrastructure by calling
>> memcg_kmem_get_cache() before all the cache allocations.
>>
>> Signed-off-by: Glauber Costa <glommer@parallels.com>
>> CC: Christoph Lameter <cl@linux.com>
>> CC: Pekka Enberg <penberg@cs.helsinki.fi>
>> CC: Michal Hocko <mhocko@suse.cz>
>> CC: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
>> CC: Johannes Weiner <hannes@cmpxchg.org>
>> CC: Suleiman Souhlal <suleiman@google.com>
>> CC: Tejun Heo <tj@kernel.org>
>> ---
>> include/linux/slub_def.h | 15 ++++++++++-----
>> mm/memcontrol.c | 3 +++
>> mm/slab.c | 6 +++++-
>> mm/slub.c | 5 +++--
>> 4 files changed, 21 insertions(+), 8 deletions(-)
>>
>> diff --git a/include/linux/slub_def.h b/include/linux/slub_def.h
>> index 961e72e..ed330df 100644
>> --- a/include/linux/slub_def.h
>> +++ b/include/linux/slub_def.h
>> @@ -13,6 +13,8 @@
>> #include <linux/kobject.h>
>>
>> #include <linux/kmemleak.h>
>> +#include <linux/memcontrol.h>
>> +#include <linux/mm.h>
>>
>> enum stat_item {
>> ALLOC_FASTPATH, /* Allocation from cpu slab */
>> @@ -209,14 +211,14 @@ static __always_inline int kmalloc_index(size_t size)
>> * This ought to end up with a global pointer to the right cache
>> * in kmalloc_caches.
>> */
>> -static __always_inline struct kmem_cache *kmalloc_slab(size_t size)
>> +static __always_inline struct kmem_cache *kmalloc_slab(gfp_t flags, size_t size)
>> {
>> int index = kmalloc_index(size);
>>
>> if (index == 0)
>> return NULL;
>>
>> - return kmalloc_caches[index];
>> + return memcg_kmem_get_cache(kmalloc_caches[index], flags);
>> }
>
> You don't need this,
> because memcg_kmem_get_cache() is invoked in both slab_alloc() and
> __cache_alloc_node().
>
Indeed, I had noticed this already, and fixed myself - to be sent in the
next version I intend to get out in the open tonight or tomorrow.]]>Glauber Costa2012-10-29T15:19:53-00:00Re: [PATCH v5 11/18] sl[au]b: Allocate objects from memcg cache
https://new-forum.openvz.org/index.phpindex.php?t=rview&goto=48673&th=11233#msg_48673
2012/10/19 Glauber Costa <glommer@parallels.com>:
> We are able to match a cache allocation to a particular memcg. If the
> task doesn't change groups during the allocation itself - a rare event,
> this will give us a good picture about who is the first group to touch a
> cache page.
>
> This patch uses the now available infrastructure by calling
> memcg_kmem_get_cache() before all the cache allocations.
>
> Signed-off-by: Glauber Costa <glommer@parallels.com>
> CC: Christoph Lameter <cl@linux.com>
> CC: Pekka Enberg <penberg@cs.helsinki.fi>
> CC: Michal Hocko <mhocko@suse.cz>
> CC: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> CC: Johannes Weiner <hannes@cmpxchg.org>
> CC: Suleiman Souhlal <suleiman@google.com>
> CC: Tejun Heo <tj@kernel.org>
> ---
> include/linux/slub_def.h | 15 ++++++++++-----
> mm/memcontrol.c | 3 +++
> mm/slab.c | 6 +++++-
> mm/slub.c | 5 +++--
> 4 files changed, 21 insertions(+), 8 deletions(-)
>
> diff --git a/include/linux/slub_def.h b/include/linux/slub_def.h
> index 961e72e..ed330df 100644
> --- a/include/linux/slub_def.h
> +++ b/include/linux/slub_def.h
> @@ -13,6 +13,8 @@
> #include <linux/kobject.h>
>
> #include <linux/kmemleak.h>
> +#include <linux/memcontrol.h>
> +#include <linux/mm.h>
>
> enum stat_item {
> ALLOC_FASTPATH, /* Allocation from cpu slab */
> @@ -209,14 +211,14 @@ static __always_inline int kmalloc_index(size_t size)
> * This ought to end up with a global pointer to the right cache
> * in kmalloc_caches.
> */
> -static __always_inline struct kmem_cache *kmalloc_slab(size_t size)
> +static __always_inline struct kmem_cache *kmalloc_slab(gfp_t flags, size_t size)
> {
> int index = kmalloc_index(size);
>
> if (index == 0)
> return NULL;
>
> - return kmalloc_caches[index];
> + return memcg_kmem_get_cache(kmalloc_caches[index], flags);
> }
You don't need this,
because memcg_kmem_get_cache() is invoked in both slab_alloc() and
__cache_alloc_node().]]>JoonSoo Kim2012-10-29T15:14:34-00:00Re: [PATCH v3] SUNRPC: set desired file system root before connecting local transports
https://new-forum.openvz.org/index.phpindex.php?t=rview&goto=48655&th=11174#msg_48655
> 10.10.2012 05:23, J. Bruce Fields пишет:
> >On Tue, Oct 09, 2012 at 03:47:42PM -0700, Eric W. Biederman wrote:
> >>"J. Bruce Fields" <bfields@fieldses.org> writes:
> >>
> >>>On Tue, Oct 09, 2012 at 01:20:48PM -0700, Eric W. Biederman wrote:
> >>>>"Myklebust, Trond" <Trond.Myklebust@netapp.com> writes:
> >>>>
> >>>>>On Tue, 2012-10-09 at 15:35 -0400, J. Bruce Fields wrote:
> >>>>>>Cc'ing Eric since I seem to recall he suggested doing it this way?
> >>>>
> >>>>Yes. On second look setting fs->root won't work. We need to change fs.
> >>>>The problem is that by default all kernel threads share fs so changing
> >>>>fs->root will have non-local consequences.
> >>>
> >>>Oh, huh. And we can't "unshare" it somehow?
> >>
> >>I don't fully understand how nfs uses kernel threads and work queues.
> >>My general understanding is work queues reuse their kernel threads
> >>between different users. So it is mostly a don't pollute your
> >>environment thing. If there was a dedicated kernel thread for each
> >>environment this would be trivial.
> >>
> >>What I was suggesting here is changing task->fs instead of
> >>task->fs.root. That should just require task_lock().
> >
> >Oh, OK, got it--if that works, great.
> >
>
> The main problem with swapping fs struct is actually the same as in
> root swapping. I.e. routines for copy fs_struct are not exported.
> It could be done on place, but I don't think, that Al Viro would
> support such implementation.
> Trond?
It seems like we got stalled here.... Could you go ahead and try a
patch, and see what people think?
--b.]]>bfields2012-10-26T17:52:54-00:00[PATCH 12/12] fuse: optimize __fuse_direct_io()
https://new-forum.openvz.org/index.phpindex.php?t=rview&goto=48653&th=11256#msg_48653
patch calculates 'n' based on iov[] array. This is useful because allocating
FUSE_MAX_PAGES_PER_REQ page pointers and descriptors for each fuse request
would be waste of memory in case of iov-s of smaller size.
@@ -1180,7 +1197,7 @@ static ssize_t __fuse_direct_io(struct file *file, const struct iovec *iov,
break;
if (count) {
fuse_put_request(fc, req);
- req = fuse_get_req(fc, FUSE_MAX_PAGES_PER_REQ);
+ req = fuse_get_req(fc, fuse_iter_npages(&ii));
if (IS_ERR(req))
break;
}]]>Maxim Patlasov2012-10-26T15:50:36-00:00[PATCH 11/12] fuse: optimize fuse_get_user_pages()
https://new-forum.openvz.org/index.phpindex.php?t=rview&goto=48652&th=11256#msg_48652
possible. This is very beneficial in case of iov[] consisting of many
iov-s of relatively small sizes (e.g. PAGE_SIZE).
- for (i = 0; i < req->num_pages; i++)
+ for (i = index; i < index + nr_pages; i++)
req->page_descs[i].length = PAGE_SIZE -
req->page_descs[i].offset;
}
/* okay, let's send it to the client */
req->in.h.opcode = FUSE_IOCTL;]]>Maxim Patlasov2012-10-26T15:50:29-00:00[PATCH 10/12] fuse: pass iov[] to fuse_get_user_pages()
https://new-forum.openvz.org/index.phpindex.php?t=rview&goto=48651&th=11256#msg_48651
direct IO. The idea is to allow fuse_get_user_pages() to pack as many iov-s
to each fuse request as possible. So, here we only rework all related
call-paths to carry iov[] from fuse_direct_IO() to fuse_get_user_pages().