Home » Mailing lists » Devel » [PATCH 00/11] kmem controller for memcg: stripped down version
Re: [PATCH 02/11] memcg: Reclaim when more than one page needed. [message #47014 is a reply to message #46951] |
Wed, 27 June 2012 10:03 |
Glauber Costa
Messages: 916 Registered: October 2011
|
Senior Member |
|
|
On 06/26/2012 08:09 AM, David Rientjes wrote:
> @@ -2206,7 +2214,7 @@ static int mem_cgroup_do_charge(struct mem_cgroup *memcg, gfp_t gfp_mask,
>> * unlikely to succeed so close to the limit, and we fall back
>> * to regular pages anyway in case of failure.
>> */
>>- if (nr_pages == 1 && ret)
>>+ if (nr_pages <= NR_PAGES_TO_RETRY && ret)
>> return CHARGE_RETRY;
Changed to costly order.
One more thing. The original version of this patch included
a cond_resched() here, that was also removed. From my re-reading
of the code in page_alloc.c and vmscan.c now, I tend to think
this is indeed not needed, since any cond_resched()s that might
be needed to ensure the safety of the code will be properly
inserted by the reclaim code itself, so there is no need for us
to include any when we signal that a retry is needed.
Do you/others agree?
|
|
|
Re: Fork bomb limitation in memcg WAS: Re: [PATCH 00/11] kmem controller for memcg: stripped down ve [message #47017 is a reply to message #47012] |
Wed, 27 June 2012 12:29 |
Frederic Weisbecker
Messages: 25 Registered: April 2012
|
Junior Member |
|
|
On Wed, Jun 27, 2012 at 01:29:04PM +0400, Glauber Costa wrote:
> On 06/27/2012 01:55 AM, Andrew Morton wrote:
> >>I can't speak for everybody here, but AFAIK, tracking the stack through
> >>the memory it used, therefore using my proposed kmem controller, was an
> >>idea that good quite a bit of traction with the memcg/memory people.
> >>So here you have something that people already asked a lot for, in a
> >>shape and interface that seem to be acceptable.
> >
> >mm, maybe. Kernel developers tend to look at code from the point of
> >view "does it work as designed", "is it clean", "is it efficient", "do
> >I understand it", etc. We often forget to step back and really
> >consider whether or not it should be merged at all.
> >
> >I mean, unless the code is an explicit simplification, we should have
> >a very strong bias towards "don't merge".
>
> Well, simplifications are welcome - this series itself was
> simplified beyond what I thought initially possible through the
> valuable comments
> of other people.
>
> But of course, this adds more complexity to the kernel as a whole.
> And this is true to every single new feature we may add, now or in
> the
> future.
>
> What I can tell you about this particular one, is that the justification
> for it doesn't come out of nowhere, but from a rather real use case that
> we support and maintain in OpenVZ and our line of products for years.
Right and we really need a solution to protect against forkbombs in LXC.
The task counter was more simple but only useful for our usecase and
defining the number of tasks as a resource was considered unnatural.
So limiting kernel stack allocations works for us. This patchset implements
this so I'm happy with it. If this is more broadly useful by limiting
resources others are interested in, that's even better. I doubt we are
interested in a solution that only concerns kernel stack allocation.
|
|
|
Re: Fork bomb limitation in memcg WAS: Re: [PATCH 00/11] kmem controller for memcg: stripped down ve [message #47018 is a reply to message #47017] |
Wed, 27 June 2012 12:28 |
Glauber Costa
Messages: 916 Registered: October 2011
|
Senior Member |
|
|
On 06/27/2012 04:29 PM, Frederic Weisbecker wrote:
> On Wed, Jun 27, 2012 at 01:29:04PM +0400, Glauber Costa wrote:
>> On 06/27/2012 01:55 AM, Andrew Morton wrote:
>>>> I can't speak for everybody here, but AFAIK, tracking the stack through
>>>> the memory it used, therefore using my proposed kmem controller, was an
>>>> idea that good quite a bit of traction with the memcg/memory people.
>>>> So here you have something that people already asked a lot for, in a
>>>> shape and interface that seem to be acceptable.
>>>
>>> mm, maybe. Kernel developers tend to look at code from the point of
>>> view "does it work as designed", "is it clean", "is it efficient", "do
>>> I understand it", etc. We often forget to step back and really
>>> consider whether or not it should be merged at all.
>>>
>>> I mean, unless the code is an explicit simplification, we should have
>>> a very strong bias towards "don't merge".
>>
>> Well, simplifications are welcome - this series itself was
>> simplified beyond what I thought initially possible through the
>> valuable comments
>> of other people.
>>
>> But of course, this adds more complexity to the kernel as a whole.
>> And this is true to every single new feature we may add, now or in
>> the
>> future.
>>
>> What I can tell you about this particular one, is that the justification
>> for it doesn't come out of nowhere, but from a rather real use case that
>> we support and maintain in OpenVZ and our line of products for years.
>
> Right and we really need a solution to protect against forkbombs in LXC.
Small correction: In containers. LXC is not the only one out there =p
|
|
|
Re: Fork bomb limitation in memcg WAS: Re: [PATCH 00/11] kmem controller for memcg: stripped down ve [message #47019 is a reply to message #47018] |
Wed, 27 June 2012 12:35 |
Frederic Weisbecker
Messages: 25 Registered: April 2012
|
Junior Member |
|
|
On Wed, Jun 27, 2012 at 04:28:14PM +0400, Glauber Costa wrote:
> On 06/27/2012 04:29 PM, Frederic Weisbecker wrote:
> > On Wed, Jun 27, 2012 at 01:29:04PM +0400, Glauber Costa wrote:
> >> On 06/27/2012 01:55 AM, Andrew Morton wrote:
> >>>> I can't speak for everybody here, but AFAIK, tracking the stack through
> >>>> the memory it used, therefore using my proposed kmem controller, was an
> >>>> idea that good quite a bit of traction with the memcg/memory people.
> >>>> So here you have something that people already asked a lot for, in a
> >>>> shape and interface that seem to be acceptable.
> >>>
> >>> mm, maybe. Kernel developers tend to look at code from the point of
> >>> view "does it work as designed", "is it clean", "is it efficient", "do
> >>> I understand it", etc. We often forget to step back and really
> >>> consider whether or not it should be merged at all.
> >>>
> >>> I mean, unless the code is an explicit simplification, we should have
> >>> a very strong bias towards "don't merge".
> >>
> >> Well, simplifications are welcome - this series itself was
> >> simplified beyond what I thought initially possible through the
> >> valuable comments
> >> of other people.
> >>
> >> But of course, this adds more complexity to the kernel as a whole.
> >> And this is true to every single new feature we may add, now or in
> >> the
> >> future.
> >>
> >> What I can tell you about this particular one, is that the justification
> >> for it doesn't come out of nowhere, but from a rather real use case that
> >> we support and maintain in OpenVZ and our line of products for years.
> >
> > Right and we really need a solution to protect against forkbombs in LXC.
> Small correction: In containers. LXC is not the only one out there =p
Sure. I was just speaking for the specific project I'm working on :)
But I'm definetly interested in solutions that work for everyone in containers in
general. And if Openvz is also interested in forkbombs protection that's even
better.
|
|
|
Re: [PATCH 11/11] protect architectures where THREAD_SIZE >= PAGE_SIZE against fork bombs [message #47020 is a reply to message #46924] |
Mon, 25 June 2012 16:55 |
Frederic Weisbecker
Messages: 25 Registered: April 2012
|
Junior Member |
|
|
On 06/25/2012 04:15 PM, Glauber Costa wrote:
> Because those architectures will draw their stacks directly from
> the page allocator, rather than the slab cache, we can directly
> pass __GFP_KMEMCG flag, and issue the corresponding free_pages.
>
> This code path is taken when the architecture doesn't define
> CONFIG_ARCH_THREAD_INFO_ALLOCATOR (only ia64 seems to), and has
> THREAD_SIZE >= PAGE_SIZE. Luckily, most - if not all - of the
> remaining architectures fall in this category.
>
> This will guarantee that every stack page is accounted to the memcg
> the process currently lives on, and will have the allocations to fail
> if they go over limit.
>
> For the time being, I am defining a new variant of THREADINFO_GFP, not
> to mess with the other path. Once the slab is also tracked by memcg,
> we can get rid of that flag.
>
> Tested to successfully protect against :(){ :|:& };:
>
> Signed-off-by: Glauber Costa <glommer@parallels.com>
> CC: Christoph Lameter <cl@linux.com>
> CC: Pekka Enberg <penberg@cs.helsinki.fi>
> CC: Michal Hocko <mhocko@suse.cz>
> CC: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> CC: Johannes Weiner <hannes@cmpxchg.org>
> CC: Suleiman Souhlal <suleiman@google.com>
Acked-by: Frederic Weisbecker <fweisbec@redhat.com>
Thanks!
|
|
|
|
Re: Fork bomb limitation in memcg WAS: Re: [PATCH 00/11] kmem controller for memcg: stripped down ve [message #47024 is a reply to message #47012] |
Wed, 27 June 2012 19:38 |
David Rientjes
Messages: 59 Registered: November 2006
|
Member |
|
|
On Wed, 27 Jun 2012, Glauber Costa wrote:
> fork bombs are a way bad behaved processes interfere with the rest of
> the system. In here, I propose fork bomb stopping as a natural
> consequence of the fact that the amount of kernel memory can be limited,
> and each process uses 1 or 2 pages for the stack, that are freed when the
> process goes away.
>
The obvious disadvantage is that if you use the full-featured kmem
controller that builds upon this patchset, then you're limiting the about
of all kmem, not just the stack that this particular set limits. I hope
you're not proposing it to go upstream before full support for the kmem
controller is added so that users who use it only to protect again
forkbombs soon realize that's no longer possible if your applications do
any substantial slab allocations, particularly anything that does a lot of
I/O.
In other words, if I want to run netperf in a memcg with the full-featured
kmem controller enabled, then its kmem limit must be high enough so that
it doesn't degrade performance that any limitation on stack allocations
would be too high to effectively stop forkbombs.
|
|
|
Re: [PATCH 06/11] memcg: kmem controller infrastructure [message #47025 is a reply to message #47013] |
Wed, 27 June 2012 19:46 |
David Rientjes
Messages: 59 Registered: November 2006
|
Member |
|
|
On Wed, 27 Jun 2012, Glauber Costa wrote:
> > > Nothing, but I also don't see how to prevent that.
> >
> > You can test for current->flags & PF_KTHREAD following the check for
> > in_interrupt() and return true, it's what you were trying to do with the
> > check for !current->mm.
> >
>
> am I right to believe that if not in interrupt context - already ruled out -
> and !(current->flags & PF_KTHREAD), I am guaranteed to have a mm context, and
> thus, don't need to test against it ?
>
No, because an mm may have been detached in the exit path by running
exit_mm(). We'd certainly hope that there are no slab allocations
following that point, though, but you'd still need to test current->mm.
|
|
|
Re: [PATCH 02/11] memcg: Reclaim when more than one page needed. [message #47026 is a reply to message #47014] |
Wed, 27 June 2012 19:48 |
David Rientjes
Messages: 59 Registered: November 2006
|
Member |
|
|
On Wed, 27 Jun 2012, Glauber Costa wrote:
> > @@ -2206,7 +2214,7 @@ static int mem_cgroup_do_charge(struct mem_cgroup
> > *memcg, gfp_t gfp_mask,
> > > * unlikely to succeed so close to the limit, and we fall back
> > > * to regular pages anyway in case of failure.
> > > */
> > > - if (nr_pages == 1 && ret)
> > > + if (nr_pages <= NR_PAGES_TO_RETRY && ret)
> > > return CHARGE_RETRY;
>
> Changed to costly order.
>
1 << PAGE_ALLOC_COSTLY_ORDER was the suggestion.
> One more thing. The original version of this patch included
> a cond_resched() here, that was also removed. From my re-reading
> of the code in page_alloc.c and vmscan.c now, I tend to think
> this is indeed not needed, since any cond_resched()s that might
> be needed to ensure the safety of the code will be properly
> inserted by the reclaim code itself, so there is no need for us
> to include any when we signal that a retry is needed.
>
For __GFP_WAIT, that sounds like a safe guarantee.
|
|
|
Re: [PATCH 02/11] memcg: Reclaim when more than one page needed. [message #47028 is a reply to message #47026] |
Wed, 27 June 2012 20:47 |
Glauber Costa
Messages: 916 Registered: October 2011
|
Senior Member |
|
|
On 06/27/2012 11:48 PM, David Rientjes wrote:
> On Wed, 27 Jun 2012, Glauber Costa wrote:
>
>>> @@ -2206,7 +2214,7 @@ static int mem_cgroup_do_charge(struct mem_cgroup
>>> *memcg, gfp_t gfp_mask,
>>>> * unlikely to succeed so close to the limit, and we fall back
>>>> * to regular pages anyway in case of failure.
>>>> */
>>>> - if (nr_pages == 1 && ret)
>>>> + if (nr_pages <= NR_PAGES_TO_RETRY && ret)
>>>> return CHARGE_RETRY;
>>
>> Changed to costly order.
>>
>
> 1 << PAGE_ALLOC_COSTLY_ORDER was the suggestion.
That is what I meant - to the costly order suggestion - , should have
been more explicit.
>> One more thing. The original version of this patch included
>> a cond_resched() here, that was also removed. From my re-reading
>> of the code in page_alloc.c and vmscan.c now, I tend to think
>> this is indeed not needed, since any cond_resched()s that might
>> be needed to ensure the safety of the code will be properly
>> inserted by the reclaim code itself, so there is no need for us
>> to include any when we signal that a retry is needed.
>>
>
> For __GFP_WAIT, that sounds like a safe guarantee.
>
|
|
|
Re: Fork bomb limitation in memcg WAS: Re: [PATCH 00/11] kmem controller for memcg: stripped down ve [message #47032 is a reply to message #47024] |
Thu, 28 June 2012 09:01 |
Glauber Costa
Messages: 916 Registered: October 2011
|
Senior Member |
|
|
On 06/27/2012 11:38 PM, David Rientjes wrote:
> On Wed, 27 Jun 2012, Glauber Costa wrote:
>
>> fork bombs are a way bad behaved processes interfere with the rest of
>> the system. In here, I propose fork bomb stopping as a natural
>> consequence of the fact that the amount of kernel memory can be limited,
>> and each process uses 1 or 2 pages for the stack, that are freed when the
>> process goes away.
>>
>
> The obvious disadvantage is that if you use the full-featured kmem
> controller that builds upon this patchset, then you're limiting the about
> of all kmem, not just the stack that this particular set limits. I hope
> you're not proposing it to go upstream before full support for the kmem
> controller is added so that users who use it only to protect again
> forkbombs soon realize that's no longer possible if your applications do
> any substantial slab allocations, particularly anything that does a lot of
> I/O.
Point by point:
1) This is not a disadvantage. The whole point of implementing it as
kmem, not as a "fork controller" or anything in the like, was our
understanding that people should not be "protecting against a x number
of processes". Because this is unnatural. All you have is total of
kernel memory, because that's what the interface gives you. If you can
overuse memory, you can't fork bomb. But that's a consequence.
I'll grant that maybe it was a mistake of mine to try to sell it this
way here, because it may give the wrong idea. In this case I welcome
your comment because it will allow me to be more careful in future
communications. But this was just me trying to show off the feature.
2) No admin should never, ever tune the system to a particular kmem
limit value. And again, this is the whole point of not limiting "number
of processes". I agree that adding slab tracking will raise kernel
memory consumption by a reasonable amount. But reality is that even if
this controller is totally stable, that not only can, but will happen.
Unlike user memory, where whoever writes the application control its
behavior (forget about the libraries for a moment), users never really
control the internals of the kernel. If you ever rely on the fact that
you are currently using X Gb of kmem, and that should be enough, your
setup will break when the data structure grows - as they do - when the
kernel memory consumption rises naturally by algorithms - as it does,
and so on.
3) Agreeing that of course, preventing disruption if we can help it is
good, this feature is not only marked experimental, but default of. This
was done precisely not to disrupt the amount of *user* memory used,
where I actually think it makes a lot of sense ("My application
allocates 4G, that's what I'll give it!"). Even if support is compiled
in, it won't be until you start limiting it that anything will happen.
Kernel Memory won't even be tracked until then. And I also understand
that our use case here may be quite unique: We want the kernel memory
limit to be way lower than the user limit (like 20 % - but that's still
a percentage, not a tune!). When I discussed this around with other
people, the vast majority of them wanted to set kmem = umem. Which
basically means "user memory is the limit, but we want the kernel memory
to be accounted as well".
So yes, although I understand your point - but not fully agree, I would
like to get it merged as is. I don't believe adding slab memory later
will be disruptive, in the same way I didn't believe adding stack later
- in my original slab tracking patch - would be. As I originally stated,
this would allow me to greatly simplify the slab tracking patches, since
we'll be able to focus on that only, instead of a gigantic patch that
does a lot.
Other people have interest in this, so this would allow them to start
building on it as well. That said, I won't oppose adding more code if
you suggest so to make sure people rely less on the accidental fact that
we're only tracking a part of what we will, even if they shouldn't.
- It's already experimental
- It's already default off.
- I could add a warn message first time it is set.
- I could add a boot option.
Or anything like this.
Last, but not least, note that it is totally within my interests to
merge the slab tracking as fast as we can. it'll be a matter of going
back to it, and agreeing in the final form.
>
> In other words, if I want to run netperf in a memcg with the full-featured
> kmem controller enabled, then its kmem limit must be high enough so that
> it doesn't degrade performance that any limitation on stack allocations
> would be too high to effectively stop forkbombs.
>
That is a user setup problem. As I explained before, That's exactly what
we want to discourage by exposing "kernel memory" instead of a
particular tunable.
As it is, I agree, it will stop fork bombs but will take a lot more time
for it than it should. But this only makes it analogous to the
evolutionary precursors to the human eye: It's not perfect, but will
achieve something that may be already of great value for a class of
users. Improvements are certain to follow.
Thank you David!
|
|
|
|
Re: Fork bomb limitation in memcg WAS: Re: [PATCH 00/11] kmem controller for memcg: stripped down ve [message #47059 is a reply to message #47038] |
Tue, 03 July 2012 11:38 |
Glauber Costa
Messages: 916 Registered: October 2011
|
Senior Member |
|
|
On 06/29/2012 02:25 AM, Andrew Morton wrote:
> On Thu, 28 Jun 2012 13:01:23 +0400
> Glauber Costa <glommer@parallels.com> wrote:
>
>>
>> ...
>>
>
> OK, that all sounds convincing ;) Please summarise and capture this
> discussion in the [patch 0/n] changelog so we (or others) don't have to
> go through this all again. And let's remember this in the next
> patchset!
Thanks, will surely do.
>> Last, but not least, note that it is totally within my interests to
>> merge the slab tracking as fast as we can. it'll be a matter of going
>> back to it, and agreeing in the final form.
>
> Yes, I'd very much like to have the whole slab implementation in a
> reasonably mature state before proceeding too far with this base
> patchset.
Does that means that you want to merge them together? I am more than
happy to post the slab part again ontop of that to have people reviewing it.
But if possible, I believe that merging this part first would help us to
split up testing in a beneficial way, in the sense that if it breaks, we
know at least in which part it is. Not to mention, of course, that
reviewers will have an easier time reviewing it as two pieces.
|
|
|
Re: Fork bomb limitation in memcg WAS: Re: [PATCH 00/11] kmem controller for memcg: stripped down ve [message #47160 is a reply to message #47059] |
Thu, 12 July 2012 15:40 |
Frederic Weisbecker
Messages: 25 Registered: April 2012
|
Junior Member |
|
|
On Tue, Jul 03, 2012 at 03:38:39PM +0400, Glauber Costa wrote:
> On 06/29/2012 02:25 AM, Andrew Morton wrote:
> > On Thu, 28 Jun 2012 13:01:23 +0400
> > Glauber Costa <glommer@parallels.com> wrote:
> >
> >>
> >> ...
> >>
> >
> > OK, that all sounds convincing ;) Please summarise and capture this
> > discussion in the [patch 0/n] changelog so we (or others) don't have to
> > go through this all again. And let's remember this in the next
> > patchset!
>
> Thanks, will surely do.
>
> >> Last, but not least, note that it is totally within my interests to
> >> merge the slab tracking as fast as we can. it'll be a matter of going
> >> back to it, and agreeing in the final form.
> >
> > Yes, I'd very much like to have the whole slab implementation in a
> > reasonably mature state before proceeding too far with this base
> > patchset.
>
> Does that means that you want to merge them together? I am more than
> happy to post the slab part again ontop of that to have people reviewing it.
>
> But if possible, I believe that merging this part first would help us to
> split up testing in a beneficial way, in the sense that if it breaks, we
> know at least in which part it is. Not to mention, of course, that
> reviewers will have an easier time reviewing it as two pieces.
Definetly yeah. This makes the review easier for this tricky chunk.
|
|
|
|
Re: Fork bomb limitation in memcg WAS: Re: [PATCH 00/11] kmem controller for memcg: stripped down ve [message #47381 is a reply to message #47377] |
Wed, 08 August 2012 14:15 |
Glauber Costa
Messages: 916 Registered: October 2011
|
Senior Member |
|
|
On 08/07/2012 05:59 PM, Glauber Costa wrote:
> On 06/29/2012 02:25 AM, Andrew Morton wrote:
>> On Thu, 28 Jun 2012 13:01:23 +0400
>> Glauber Costa <glommer@parallels.com> wrote:
>>
>>>
>>> ...
>>>
>>
>> OK, that all sounds convincing ;) Please summarise and capture this
>> discussion in the [patch 0/n] changelog so we (or others) don't have to
>> go through this all again. And let's remember this in the next
>> patchset!
>>
>>> Last, but not least, note that it is totally within my interests to
>>> merge the slab tracking as fast as we can. it'll be a matter of going
>>> back to it, and agreeing in the final form.
>>
>> Yes, I'd very much like to have the whole slab implementation in a
>> reasonably mature state before proceeding too far with this base
>> patchset.
>>
> So, that was posted separately as well.
>
> Although there is a thing to fix here and there - all of them I am
> working on already - I believe that to be mature enough.
>
> Do you have any comments on that? Would you be willing to take this
> first part (modified with the comments on this thread itself) and let it
> start sitting in the tree?
>
In the mean time, for any interested parties, I've set up a tree at:
git://github.com/glommer/linux.git
branches kmemcg-slab and kmemcg-stack
Intended to be a throw-away tree.
|
|
|
Goto Forum:
Current Time: Tue Nov 19 01:47:19 GMT 2024
Total time taken to generate the page: 0.03070 seconds
|