OpenVZ Forum: Devel » Containers and /proc/sys/vm/drop

Home » Mailing lists » Devel » Containers and /proc/sys/vm/drop_caches

Show: Today's Messages :: Show Polls :: Message Navigator
E-mail to friend

Containers and /proc/sys/vm/drop_caches [message #41988]

Wed, 05 January 2011 09:40

Mike Hommey
Messages: 1
Registered: January 2011

Junior Member

[Copy/pasted from a previous message to lkml, where it was suggested to
try containers@]

Hi,

I noticed that from within a lxc container, writing "3" to
/proc/sys/vm/drop_caches would flush the host page cache. That sounds a
little dangerous for VPS offerings that would be based on lxc, as in one
VPS instance root user could impact the overall performance of the host.
I don't know about other containers but I've been told openvz isn't
subject to this problem.
I only tested the current Debian Squeeze kernel, which is based on
2.6.32.27.

Cheers,

Mike
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containe rs

Report message to a moderator

Re: Containers and /proc/sys/vm/drop_caches [message #41989 is a reply to message #41988]

Fri, 07 January 2011 13:03

Rob Landley
Messages: 19
Registered: December 2010

Junior Member

On 01/06/2011 03:43 PM, Matt Helsley wrote:
> On Wed, Jan 05, 2011 at 07:46:17PM +0530, Balbir Singh wrote:
>> On Wed, Jan 5, 2011 at 7:31 PM, Serge Hallyn <serge.hallyn@canonical.com> wrote:
>>> Quoting Daniel Lezcano (daniel.lezcano@free.fr):
>>>> On 01/05/2011 10:40 AM, Mike Hommey wrote:
>>>>> [Copy/pasted from a previous message to lkml, where it was suggested to
>>>>> try containers@]
>>>>>
>>>>> Hi,
>>>>>
>>>>> I noticed that from within a lxc container, writing "3" to
>>>>> /proc/sys/vm/drop_caches would flush the host page cache. That sounds a
>>>>> little dangerous for VPS offerings that would be based on lxc, as in one
>>>>> VPS instance root user could impact the overall performance of the host.
>>>>> I don't know about other containers but I've been told openvz isn't
>>>>> subject to this problem.
>>>>> I only tested the current Debian Squeeze kernel, which is based on
>>>>> 2.6.32.27.
>>>>
>>>> There is definitively a big work to do with /proc.
>>>>
>>>> Some files should be not accessible (/proc/sys/vm/drop_caches,
>>>> /proc/sys/kernel/sysrq, ...) and some other should be virtualized
>>>> (/proc/meminfo, /proc/cpuinfo, ...).
>>>>
>>>> Serge suggested to create something similar to the cgroup device
>>>> whitelist but for /proc, maybe it is a good approach for denying
>>>> access a specific proc's file.
>>>
>>> Long-term, user namespaces should fix this - /proc will be owned
>>> by the user namespace which mounted it, but we can tell proc to
>>> always have some files (like drop_caches) be owned by init_user_ns.

Changing ownership so a script can't open a file that it otherwise
could may cause scripts to fail when run in a container. Makes the
containers less transparent.

>>> I'm hoping to push my final targeted capabilities prototype in the
>>> next few weeks, and after that I start seriously attacking VFS
>>> interaction.
>>>
>>> In the meantime, though, you can use SELinux/Smack, or a custom
>>> cgroup file does sound useful. Can cgroups be modules nowadays?
>>> (I can't keep up) If so, an out of tree proc-cgroup module seems
>>> like a good interim solution.
>>>
>>
>> Ideally a drop_cache should drop page cache in that container, but
>> given container have a lot of shared page cache, what is suggested
>> might be a good way to work around the problem
>
> One gross hack that comes to mind: Instead of a hard permission model
> limit the frequency with which the container could actually drop caches.
> Then the container's ability to interfere with host performance is more
> limited (but still non-zero). Or limit frequency on a per-user basis
> (more like Serge's design) because running more containers by a
> compromised user account shouldn't allow more frequent cache dropping.

Disk access causes at best multi-milisecond latency spikes, which can cause
a heavily loaded server to go into thrashing meltdown. So a container
could screw up another container with this pretty badly.

The easy short-term fix is to make containers silently ignore writes to
drop_caches.

> That said, the more important question is why should we provide
> drop_caches inside a container? My understanding is it's largely a
> workload-debugging tool and not something meant to truly solve
> problems.

A heavily loaded system that goes deep into swap without triggering
the OOM killer can become pretty useless. My home laptop with 2 gigs
of ram gets so sluggish whenever I compile something that you can't
use the touchpad anymore because hitting the boundary of a widget
with the mouse pointer causes a 5 second freeze while it bounces a
off three or four processes to handle the message, evicting yet more
pages to fault in the pages to handle the X events. By the time
the pointer moves again it's way overshot. (Ok, having firefox,
chrome, and kmail open with several dozen tabs open in each may have
something to do with this.)

When it does this, ctrl-alt-f1 echo 1 > /proc/sys/vm/drop_caches
is just about the only thing that will snap it out of it short of
killing processes. The system has ~600 megs of ram tied up in
disk cache while being so short of anonymous pages the mouse is
useless.

That doesn't necessarily apply to containers but that's one use case
of using it as a stick to hit the darn overburdened machine when it's
making stupid memory allocation decisions. (Playing with swappiness
puts the OOM killer on a hair trigger, depending on kernel version
du jour.)

However, it's not guaranteed to do anything (the cached data could
be dirty, mmaped by some process, immediately faulted back in
by some other process), so ignoring writes to drop_caches from a
container is probably legal behavior anyway.

Rob
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containe rs

Report message to a moderator

Re: Containers and /proc/sys/vm/drop_caches [message #41990 is a reply to message #41988]

Sat, 08 January 2011 12:39

Rob Landley
Messages: 19
Registered: December 2010

Junior Member

On 01/07/2011 09:12 AM, Serge Hallyn wrote:
>> Changing ownership so a script can't open a file that it otherwise
>> could may cause scripts to fail when run in a container. Makes
>> the containers less transparent.
>
> While my goal next week is to make containers more transparent, the
> official stance from kernel summit a few years ago was: transparent
> containers are not a valid goal (as seen from kernel).

Do you have a reference for that? I'm still coming up to speed on all this. Trying to collect documentation...

>> A heavily loaded system that goes deep into swap without triggering
>> the OOM killer can become pretty useless. My home laptop with 2
>> gigs
>
> Isn't a cgroup that controls both memory and swap access the right
> answer to this?

There are other ways to work around it, sure. (It's yet to be proven that they do actually work better in resource constrained desktop environments under real-world load, but they seem very promising.)

I was just pointing out that this has seen some use as a recovery mechanism, slightly less drastic than the OOM killer. (Didn't say it was a _good_ use. Also, error avoidance and error recovery are different issues, and virtual memory is an inherently overcommitted resource domain.)

> (And do we have that now, btw?)

I think it's coming, rather than actually here. (I thought the beancounters stuff was OpenVZ, controlled by syscalls that the kernel developers rejected. Have resource constraints on anything other than scheduler made it into vanilla yet? If so, what's the UI to control them?)

By the way, from a UI perspective, most of the containers stuff I've seen so far is apparently aimed at big iron deployments (or attempts to make PC clusters look like mainframes, I.E. this "cloud" stuff). I'm glad to see more diverse uses of it, but one of the downsides of cobbling together a mechanism from a dozen different unrelated pieces of infrastructure (clone flags, cgroup filesystem, extra mount flags on proc and such so they behave differently) is that we need a lot of documentation/example code/libraries to make it easy to use. "You can do X" and "it's easy to reliably do X" have a gap that may take a while to close...

Rob
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containe rs

Report message to a moderator

Previous Topic:	Mapping PIDs from parent->child namespaces
Next Topic:	[PATCH] Teach cifs about network namespaces.

Goto Forum:

-=] Back to Top [=-

[ Syndicate this forum (XML) ] [

] [

]

Current Time: Wed Dec 17 23:14:19 GMT 2025

Total time taken to generate the page: 0.62060 seconds