Home » Mailing lists » Devel » [RFC][PATCH 0/7] Resource controllers based on process containers
Re: controlling mmap()'d vs read/write() pages [message #18000 is a reply to message #17892] |
Fri, 23 March 2007 10:12 |
ebiederm
Messages: 1354 Registered: February 2006
|
Senior Member |
|
|
Nick Piggin <nickpiggin@yahoo.com.au> writes:
> Eric W. Biederman wrote:
>> Dave Hansen <hansendc@us.ibm.com> writes:
>>
>>
>>>So, I think we have a difference of opinion. I think it's _all_ about
>>>memory pressure, and you think it is _not_ about accounting for memory
>>>pressure. :) Perhaps we mean different things, but we appear to
>>>disagree greatly on the surface.
>>
>>
>> I think it is about preventing a badly behaved container from having a
>> significant effect on the rest of the system, and in particular other
>> containers on the system.
>
> That's Dave's point, I believe. Limiting mapped memory may be
> mostly OK for well behaved applications, but it doesn't do anything
> to stop bad ones from effectively DoSing the system or ruining any
> guarantees you might proclaim (not that hard guarantees are always
> possible without using virtualisation anyway).
>
> This is why I'm surprised at efforts that go to such great lengths
> to get accounting "just right" (but only for mmaped memory). You
> may as well not even bother, IMO.
>
> Give me an RSS limit big enough to run a couple of system calls and
> a loop...
Would any of them work on a system on which every filesystem was on
ramfs, and there was no swap? If not then they are not memory attacks
but I/O attacks.
I completely concede that you can DOS the system with I/O if that is
not limited as well.
My point is that is not a memory problem but a disk I/O problem which is
much easier to and cheaper to solve. Disk I/O is fundamentally a slow
path which makes it hard to modify it in a way that negatively affects
system performance.
I don't think with a memory RSS limit you can DOS the system in a way
that is purely about memory. You have to pick a different kind of DOS
attack.
As for virtualization that is what a kernel is about virtualizing it's
resources so you can have multiple users accessing them at the same
time. You don't need some hypervisor or virtual machine to give you
that. That is where we start. However it was found long ago that
global optimizations give better system through put then the rigid
systems you can get with hypervisors. Although things are not
quite as deterministic when you optimize globally. They should be
sufficiently deterministic you can avoid the worst of the DOS
attacks.
The real practical problem with the current system is that nearly
all of our limits are per process and applications now span more than
one process so the limits provided by linux are generally useless
to limit real world applications. This isn't generally a problem
until we start trying to run multiple applications on the same system
because the hardware is so powerful. Which the namespace work which
will allow you to run several different instances of user space
simultaneously is likely to allow.
At the moment I very much in a position of doing review not
implementing this part of it. I'm trying to get the people doing the
implementation to make certain they have actually been paying
attention to how their proposed limits will interact with the rest of
the system. So far generally the conversation has centered on memory
limits because it seems that is where people have decided the
conversation should focus. What I haven't seen is people with the
limitations coming back to me tearing my arguments apart and showing
or telling me where I'm confused. In general I can challenge even the
simplest things and not get a good response. All of which tells me
the implementations are not ready.
I do have some practical use cases and I have some clue how these
subsystems work, and I do care. Which puts in a decent position to
at least to high level design review.
My biggest disappointment is that none of this is new, and that we
seem to have forgotten a lot of the lessons of the past.
Eric
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers
|
|
|
|
Re: controlling mmap()'d vs read/write() pages [message #18003 is a reply to message #18000] |
Fri, 23 March 2007 16:41 |
Dave Hansen
Messages: 240 Registered: October 2005
|
Senior Member |
|
|
On Fri, 2007-03-23 at 04:12 -0600, Eric W. Biederman wrote:
> Would any of them work on a system on which every filesystem was on
> ramfs, and there was no swap? If not then they are not memory attacks
> but I/O attacks.
I truly understand your point here. But, I don't think this thought
exercise is really helpful here. In a pure sense, nothing is keeping an
unmapped page cache file in memory, other than the user's prayers. But,
please don't discount their prayers, it's what they want!
I seem to remember a quote attributed to Alan Cox around OLS time last
year, something about any memory controller being able to be fair, fast,
and accurate. Please pick any two, but only two. Alan, did I get
close?
To me, one of the keys of Linux's "global optimizations" is being able
to use any memory globally for its most effective purpose, globally
(please ignore highmem :). Let's say I have a 1GB container on a
machine that is at least 100% committed. I mmap() a 1GB file and touch
the entire thing (I never touch it again). I then go open another 1GB
file and r/w to it until the end of time. I'm at or below my RSS limit,
but that 1GB of RAM could surely be better used for the second file.
How do we do this if we only account for a user's RSS? Does this fit
into Alan's unfair bucket? ;)
Also, in a practical sense, it is also a *LOT* easier to describe to a
customer that they're getting 1GB of RAM than >=20GB/hr of bandwidth
from the disk.
-- Dave
P.S. Do we have an quotas on ramfs? If we have an ramfs filesystems,
what keeps the containerized users from just filling up RAM?
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers
|
|
|
|
Re: controlling mmap()'d vs read/write() pages [message #18006 is a reply to message #18003] |
Fri, 23 March 2007 18:16 |
Herbert Poetzl
Messages: 239 Registered: February 2006
|
Senior Member |
|
|
On Fri, Mar 23, 2007 at 09:41:12AM -0700, Dave Hansen wrote:
> On Fri, 2007-03-23 at 04:12 -0600, Eric W. Biederman wrote:
> > Would any of them work on a system on which every filesystem was on
> > ramfs, and there was no swap? If not then they are not memory attacks
> > but I/O attacks.
>
> I truly understand your point here. But, I don't think this thought
> exercise is really helpful here. In a pure sense, nothing is keeping
> an unmapped page cache file in memory, other than the user's prayers.
> But, please don't discount their prayers, it's what they want!
>
> I seem to remember a quote attributed to Alan Cox around OLS time last
> year, something about any memory controller being able to be fair,
> fast, and accurate. Please pick any two, but only two. Alan, did I get
> close?
so we would pick fair and fast then :)
> To me, one of the keys of Linux's "global optimizations" is being able
> to use any memory globally for its most effective purpose, globally
> (please ignore highmem :). Let's say I have a 1GB container on a
> machine that is at least 100% committed. I mmap() a 1GB file and touch
> the entire thing (I never touch it again). I then go open another 1GB
> file and r/w to it until the end of time. I'm at or below my RSS limit,
> but that 1GB of RAM could surely be better used for the second file.
> How do we do this if we only account for a user's RSS? Does this fit
> into Alan's unfair bucket? ;)
what's the difference to a normal Linux system here?
when low on memory, the system will reclaim pages, and
guess what pages will be reclaimed first ...
> Also, in a practical sense, it is also a *LOT* easier to describe to a
> customer that they're getting 1GB of RAM than >=20GB/hr of bandwidth
> from the disk.
if you want something which is easy to describe for the
'customer', then a VM is what you are looking for, it has
a perfectly well defined amount of resources which will
not be shared or used by other machines ...
> -- Dave
>
> P.S. Do we have an quotas on ramfs? If we have an ramfs filesystems,
> what keeps the containerized users from just filling up RAM?
tmpfs has hard limits, you simply specify it on mount
none /tmp tmpfs size=16m,mode=1777 0 0
best,
Herbert
> _______________________________________________
> Containers mailing list
> Containers@lists.linux-foundation.org
> https://lists.linux-foundation.org/mailman/listinfo/containers
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers
|
|
|
Re: controlling mmap()'d vs read/write() pages [message #18009 is a reply to message #18000] |
Fri, 23 March 2007 10:47 |
Nick Piggin
Messages: 35 Registered: March 2006
|
Member |
|
|
Eric W. Biederman wrote:
> Nick Piggin <nickpiggin@yahoo.com.au> writes:
>
>
>>Eric W. Biederman wrote:
>>
>>>Dave Hansen <hansendc@us.ibm.com> writes:
>>>
>>>
>>>
>>>>So, I think we have a difference of opinion. I think it's _all_ about
>>>>memory pressure, and you think it is _not_ about accounting for memory
>>>>pressure. :) Perhaps we mean different things, but we appear to
>>>>disagree greatly on the surface.
>>>
>>>
>>>I think it is about preventing a badly behaved container from having a
>>>significant effect on the rest of the system, and in particular other
>>>containers on the system.
>>
>>That's Dave's point, I believe. Limiting mapped memory may be
>>mostly OK for well behaved applications, but it doesn't do anything
>>to stop bad ones from effectively DoSing the system or ruining any
>>guarantees you might proclaim (not that hard guarantees are always
>>possible without using virtualisation anyway).
>>
>>This is why I'm surprised at efforts that go to such great lengths
>>to get accounting "just right" (but only for mmaped memory). You
>>may as well not even bother, IMO.
>>
>>Give me an RSS limit big enough to run a couple of system calls and
>>a loop...
>
>
> Would any of them work on a system on which every filesystem was on
> ramfs, and there was no swap? If not then they are not memory attacks
> but I/O attacks.
>
> I completely concede that you can DOS the system with I/O if that is
> not limited as well.
>
> My point is that is not a memory problem but a disk I/O problem which is
> much easier to and cheaper to solve. Disk I/O is fundamentally a slow
> path which makes it hard to modify it in a way that negatively affects
> system performance.
>
> I don't think with a memory RSS limit you can DOS the system in a way
> that is purely about memory. You have to pick a different kind of DOS
> attack.
It can be done trivially without performing any IO or swap, yes.
--
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers
|
|
|
Re: [RFC][PATCH 4/7] RSS accounting hooks over the code [message #18048 is a reply to message #17809] |
Wed, 28 March 2007 20:15 |
Ethan Solomita
Messages: 2 Registered: March 2007
|
Junior Member |
|
|
Nick Piggin wrote:
> Eric W. Biederman wrote:
>> First touch page ownership does not guarantee give me anything useful
>> for knowing if I can run my application or not. Because of page
>> sharing my application might run inside the rss limit only because
>> I got lucky and happened to share a lot of pages with another running
>> application. If the next I run and it isn't running my application
>> will fail. That is ridiculous.
>
> Let's be practical here, what you're asking is basically impossible.
>
> Unless by deterministic you mean that it never enters the a non
> trivial syscall, in which case, you just want to know about maximum
> RSS of the process, which we already account).
If we used Beancounters as Pavel and Kirill mentioned, that would
keep track of each container that has referenced a page, not just the
first container. It sounds like beancounters can return a usage count
where each page is divided by the number of referencing containers (e.g.
1/3rd if 3 containers share a page). Presumably it could also return a
full count of 1 to each container.
If we look at data in the latter form, i.e. each container must pay
fully for each page used, then Eric could use that to determine real
usage needs of the container. However we could also use the fractional
count in order to do things such as charging the container for its
actual usage. i.e. full count for setting guarantees, fractional for
actual usage.
-- Ethan
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers
|
|
|
Re: controlling mmap()'d vs read/write() pages [message #18050 is a reply to message #18006] |
Wed, 28 March 2007 09:18 |
Balbir Singh
Messages: 491 Registered: August 2006
|
Senior Member |
|
|
Herbert Poetzl wrote:
>> To me, one of the keys of Linux's "global optimizations" is being able
>> to use any memory globally for its most effective purpose, globally
>> (please ignore highmem :). Let's say I have a 1GB container on a
>> machine that is at least 100% committed. I mmap() a 1GB file and touch
>> the entire thing (I never touch it again). I then go open another 1GB
>> file and r/w to it until the end of time. I'm at or below my RSS limit,
>> but that 1GB of RAM could surely be better used for the second file.
>> How do we do this if we only account for a user's RSS? Does this fit
>> into Alan's unfair bucket? ;)
>
> what's the difference to a normal Linux system here?
> when low on memory, the system will reclaim pages, and
> guess what pages will be reclaimed first ...
>
But would it not bias application writers towards using read()/write()
calls over mmap()? They know that their calls are likely to be faster
when the application is run in a container. Without page cache control
we'll end up creating an asymmetrical container, where certain usage is
charged and some usage is not.
Also, please note that when a page is unmapped and moved to swap cache;
the swap cache uses the page cache. Without page cache control, we could
end up with too many pages moving over to the swap cache and still
occupying memory, while the original intension was to avoid this
scenario.
--
Warm Regards,
Balbir Singh
Linux Technology Center
IBM, ISTL
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers
|
|
|
Re: controlling mmap()'d vs read/write() pages [message #18051 is a reply to message #18002] |
Wed, 28 March 2007 07:33 |
Nick Piggin
Messages: 35 Registered: March 2006
|
Member |
|
|
Eric W. Biederman wrote:
> Nick Piggin <nickpiggin@yahoo.com.au> writes:
>>It can be done trivially without performing any IO or swap, yes.
>
>
> Please give me a rough sketch of how to do so.
Reading sparse files is just one I had in mind. But I'm not very
creative compared to university students doing their assignments.
> Or is this about DOS'ing the system by getting the kernel to allocate
> a large number of data structures (struct file, struct inode, or the like)?
That works too. And I don't believe hand-accounting and limiting
all these things individually as a means to limit RAM usage is sane,
when you have a much more comprehensive and relatively unintrusive
page level scheme.
--
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers
|
|
|
Goto Forum:
Current Time: Wed Nov 06 04:30:13 GMT 2024
Total time taken to generate the page: 0.03458 seconds
|