OpenVZ Forum


Home » Mailing lists » Devel » [PATCH] BC: resource beancounters (v4) (added user memory)
Re: [ckrm-tech] [PATCH] BC: resource beancounters (v4) (added user memory) [message #6231 is a reply to message #6222] Tue, 12 September 2006 11:06 Go to previous messageGo to next message
Pavel Emelianov is currently offline  Pavel Emelianov
Messages: 1149
Registered: September 2006
Senior Member
Srivatsa Vaddagiri wrote:
> On Tue, Sep 12, 2006 at 02:24:25PM +0400, Pavel Emelianov wrote:
>
>> Srivatsa Vaddagiri wrote:
>>
>>> On Mon, Sep 11, 2006 at 11:02:06AM +0400, Pavel Emelianov wrote:
>>>
>>>
>>>> Sure. At the beginning I have one task with one BC. Then
>>>> 1. A thread is spawned and new BC is created;
>>>>
>>>>
>>> Why do we have to create a BC for every new thread? A new BC is needed
>>> for every new service level instead IMO. And typically there wont be
>>> unlimited service levels.
>>>
>>>
>> That's the scenario we started from - each domain is served in a separate
>> BC with *threaded* Apache.
>>
>
> Sure ..but you can still meet that requirement by creating fixed set of
> BCs (for each domain) and let each new thread be associated with a
> corresponding BC (w/o requiring to create BC for every new thread),
> depending on which domain's request it is serving?
>
Hmmm... Beancounters can provide this after trivial changes.
We may schedule them in current set of "pending" features
(http://wiki.openvz.org/UBC_discussion)

But this can create a kind of DoS within an application:
A thread continuously touches new and new pages to it's BC and
these pages are get touched by other threads also. Sooner or later
this BC will hit it's limit and reclaiming this set of pages would affect
all the other threads.

Also such accounting reveals you NOTHING about real memory usage.
E.g. 100Mb charged for one BC can mean "this BC ate 100Mb of
memory" as well as "this BC uses one page really, but all the others
are just used by other threads" and anything between these two
corner cases.

Well. We've digressed from our main thread - discussing (dis)advantages
of current BC implemenation.
>
>>>
>>>
>>>> 2. New thread touches a new page (e.g. maps a new file) which is charged
>>>> to new BC
>>>> (and this means that this BC's must stay in memory till page is
>>>> uncharged);
>>>> 3. Thread exits after serving the request, but since it's mm is shared
>>>> with parent
>>>> all the touched pages stay resident and, thus, the new BC is still
>>>> pinned in memory.
>>>> Steps 1-3 are done multiple times for new pages (new files).
>>>> Remember that we're discussing the case when pages are not recharged.
>>>>
Re: [ckrm-tech] [PATCH] BC: resource beancounters (v4) (added user memory) [message #6234 is a reply to message #6200] Tue, 12 September 2006 10:44 Go to previous messageGo to next message
Srivatsa Vaddagiri is currently offline  Srivatsa Vaddagiri
Messages: 241
Registered: August 2006
Senior Member
On Mon, Sep 11, 2006 at 12:10:31PM -0700, Rohit Seth wrote:
> It seems that a single notion of limit should suffice, and that limit
> should more be treated as something beyond which that resource
> consumption in the container will be throttled/not_allowed.

The big question is : are containers/RG allowed to use *upto* their
limit always? In other words, will you typically setup limits such that
sum of all limits = max resource capacity?

If it is setup like that, then what you are considering as limit is
actually guar no?

If it wont be setup like that, then I dont see how one can provide QoS.

--
Regards,
vatsa
Re: [ckrm-tech] [PATCH] BC: resource beancounters (v4) (added user memory) [message #6236 is a reply to message #6224] Tue, 12 September 2006 12:01 Go to previous messageGo to next message
Balbir Singh is currently offline  Balbir Singh
Messages: 491
Registered: August 2006
Senior Member
Pavel Emelianov wrote:
<snip>

>>>>> E.g. I have a node with 1Gb of ram and 10 containers with 100Mb
>>>>> guarantee each.
>>>>> I want to start one more. What shall I do not to break guarantees?
>>>> Don't start the new container or change the guarantees of the existing
>>>> ones
>>>> to accommodate this one :) The QoS design (done by the administrator)
>>>> should
>>>> take care of such use-cases. It would be perfectly ok to have a
>>>> container
>>>> that does not care about guarantees to set their guarantee to 0 and set
>>>> their limit to the desired value. As Chandra has been stating we
>>>> need two
>>>> parameters (guarantee, limit), either can be optional, but not both.
>>> If I set up 9 groups to have 100Mb limit then I have 100Mb assured (on
>>> 1Gb node)
>>> for the 10th one exactly. And I do not have to set up any guarantee as
>>> it won't affect
>>> anything. So what a guarantee parameter is needed for?
>> This use case works well for providing guarantee to one container.
>> What if
>> I want guarantees of 100Mb and 200Mb for two containers? How do I setup
>> the system using limits?
> You may set any value from 100 up to 800 Mb for the first one and
> 200-900Mb for
> the second. In case of no other groups first will receive its 100Mb for
> sure and
> so does the second. If there are other groups - their guarantees should
> be concerned.

If I add another group with a guarantee of 100MB, then its limit will
be anywhere between 100MB-800MB ?

I do not understand the guarantees being concerned part.

>> Even I restrict everyone else to 700Mb. With this I cannot be sure that
>> the remaining 300Mb will be distributed as 100Mb and 200Mb.
> There's no "everyone else" here - we're talking about a "static" case.
> When new group arrives we need to recalculate guarantees as you said.

I was speaking in general where we have 'n' groups, so thats why I had
"everyone else".

> And here's my next question - what to do if the new guarantee would become
> lower that current amount of unreclaimable memory in BC?
>

I am not quite sure I understand this question. Let me try with an example.
Lets say I have 1GB of memory and I have guarantees of 100 and 200MB for
two groups. Lets say the total unreclaimable memory is 100MB. If I add a new
group with guarantee of 50MB, which is lower than the total amount of
total unreclaimable memory. The addition of the new group should be work
fine since we have 1GB - 100 - 100 - 200MB of memory available.

--

Balbir Singh,
Linux Technology Center,
IBM Software Labs
Re: [ckrm-tech] [PATCH] BC: resource beancounters (v4) (added user memory) [message #6237 is a reply to message #6231] Tue, 12 September 2006 12:04 Go to previous messageGo to next message
Srivatsa Vaddagiri is currently offline  Srivatsa Vaddagiri
Messages: 241
Registered: August 2006
Senior Member
On Tue, Sep 12, 2006 at 03:06:35PM +0400, Pavel Emelianov wrote:
> Hmmm... Beancounters can provide this after trivial changes.

All that is needed is some interface to set a thread's BC id (which you
seem to have already - sys_set_bcid)

> We may schedule them in current set of "pending" features
> (http://wiki.openvz.org/UBC_discussion)
>
> But this can create a kind of DoS within an application:
> A thread continuously touches new and new pages to it's BC and
> these pages are get touched by other threads also. Sooner or later

Any good reason why threads will touch each other's working set?
Sure nothing prevents them from touching, but I would expect each thread
(serving a separate domain) to work on its own set of private pages?

> this BC will hit it's limit and reclaiming this set of pages would affect
> all the other threads.

--
Regards,
vatsa
Re: [ckrm-tech] [PATCH] BC: resource beancounters (v4) (added user memory) [message #6252 is a reply to message #6234] Tue, 12 September 2006 17:22 Go to previous messageGo to next message
Rohit Seth is currently offline  Rohit Seth
Messages: 101
Registered: August 2006
Senior Member
On Tue, 2006-09-12 at 16:14 +0530, Srivatsa Vaddagiri wrote:
> On Mon, Sep 11, 2006 at 12:10:31PM -0700, Rohit Seth wrote:
> > It seems that a single notion of limit should suffice, and that limit
> > should more be treated as something beyond which that resource
> > consumption in the container will be throttled/not_allowed.
>
> The big question is : are containers/RG allowed to use *upto* their
> limit always? In other words, will you typically setup limits such that
> sum of all limits = max resource capacity?
>

If a user is really interested in ensuring that all scheduled jobs (or
containers) get what they have asked for (guarantees) then making the
sum of all container limits equal to total system limit is the right
thing to do.

> If it is setup like that, then what you are considering as limit is
> actually guar no?
>
Right. And if we do it like this then it is up to sysadmin to configure
the thing right without adding additional logic in kernel.

-rohit
Re: [ckrm-tech] [PATCH] BC: resource beancounters (v4) (added user memory) [message #6254 is a reply to message #6252] Tue, 12 September 2006 17:40 Go to previous messageGo to next message
Srivatsa Vaddagiri is currently offline  Srivatsa Vaddagiri
Messages: 241
Registered: August 2006
Senior Member
On Tue, Sep 12, 2006 at 10:22:32AM -0700, Rohit Seth wrote:
> On Tue, 2006-09-12 at 16:14 +0530, Srivatsa Vaddagiri wrote:
> > On Mon, Sep 11, 2006 at 12:10:31PM -0700, Rohit Seth wrote:
> > > It seems that a single notion of limit should suffice, and that limit
> > > should more be treated as something beyond which that resource
> > > consumption in the container will be throttled/not_allowed.
> >
> > The big question is : are containers/RG allowed to use *upto* their
> > limit always? In other words, will you typically setup limits such that
> > sum of all limits = max resource capacity?
> >
>
> If a user is really interested in ensuring that all scheduled jobs (or
> containers) get what they have asked for (guarantees) then making the
> sum of all container limits equal to total system limit is the right
> thing to do.
>
> > If it is setup like that, then what you are considering as limit is
> > actually guar no?
> >
> Right. And if we do it like this then it is up to sysadmin to configure
> the thing right without adding additional logic in kernel.

Perhaps calling it as "limit" in confusing then (otoh it may go down well
with Linus!). I perhaps agree we need to go with one for now (in the
interest of making some progress), but we probably will come back to
this at a later point. For ex, I chanced upon this document:

www.vmware.com/pdf/vmware_drs_wp.pdf

which explains how supporting a hard limit (in contrast to guar as we
have been discussing) can be usefull sometimes.

--
Regards,
vatsa
Re: [ckrm-tech] [PATCH] BC: resource beancounters (v4) (added user memory) [message #6258 is a reply to message #6204] Tue, 12 September 2006 23:54 Go to previous messageGo to next message
Chandra Seetharaman is currently offline  Chandra Seetharaman
Messages: 88
Registered: August 2006
Member
On Mon, 2006-09-11 at 16:58 -0700, Rohit Seth wrote:
> On Mon, 2006-09-11 at 12:42 -0700, Chandra Seetharaman wrote:
> > On Mon, 2006-09-11 at 12:10 -0700, Rohit Seth wrote:
> > > On Mon, 2006-09-11 at 11:25 -0700, Chandra Seetharaman wrote:
>
> > > > There could be a default container which doesn't have any guarantee or
> > > > limit.
> > >
> > > First, I think it is critical that we allow processes to run outside of
> > > any container (unless we know for sure that the penalty of running a
> > > process inside a container is very very minimal).
> >
> > When I meant a default container I meant a default "resource group". In
> > case of container that would be the default environment. I do not see
> > any additional overhead associated with it, it is only associated with
> > how resource are allocated/accounted.
> >
>
> There should be some cost when you do atomic inc/dec accounting and
> locks for add/remove resources from any container (including default
> resource group). No?

yes, it would be there, but is not heavy, IMO.
>
> > >
> > > And anything running outside a container should be limited by default
> > > Linux settings.
> >
> > note that the resource available to the default RG will be (total system
> > resource - allocated to RGs).
>
> I think it will be preferable to not change the existing behavior for
> applications that are running outside any container (in your case
> default resource group).

hmm, when you provide QoS for a set of apps, you will affect (the
resource availability of) other apps. I don't see any way around it. Any
ideas ?

>
> > >
> > > > When you create containers and assign guarantees to each of them
> > > > make sure that you leave some amount of resource unassigned.
> > > ^^^^^ This will force the "default" container
> > > with limits (indirectly). IMO, the whole guarantee feature gets defeated
> >
> > You _will_ have limits for the default RG even if we don't have
> > guarantees.
> >
> > > the moment you bring in this fuzziness.
> >
> > Not really.
> > - Each RG will have a guarantee and limit of each resource.
> > - default RG will have (system resource - sum of guarantees)
> > - Every RG will be guaranteed some amount of resource to provide QoS
> > - Every RG will be limited at "limit" to prevent DoS attacks.
> > - Whoever doesn't care either of those set them to don't care values.
> >
>
> For the cases that put this don't care, do you depend on existing
> reclaim algorithm (for memory) in kernel?

Yes.
>
> > >
> > > > That
> > > > unassigned resources can be used by the default container or can be used
> > > > by containers that want more than their guarantee (and less than their
> > > > limit). This is how CKRM/RG handles this issue.
> > > >
> > > >
> > >
> > > It seems that a single notion of limit should suffice, and that limit
> > > should more be treated as something beyond which that resource
> > > consumption in the container will be throttled/not_allowed.
> >
> > As I stated in an earlier email "Limit only" approach can prevent a
> > system from DoS attacks (and also fits the container model nicely),
> > whereas to provide QoS one would need guarantee.
> >
> > Without guarantee, a RG that the admin cares about can starve if
> > all/most of the other RGs consume upto their limits.
> >
> > >
>
> If the limits are set appropriately so that containers total memory
> consumption does not exceed the system memory then there shouldn't be
> any QoS issue (to whatever extent it is applicable for specific
> scenario).

Then you will not be work-conserving (IOW over-committing), which is one
of the main advantage of this type of feature.

>
> -rohit
>
>
> ------------------------------------------------------------ -------------
> Using Tomcat but need to do more? Need to support web services, security?
> Get stuff done quickly with pre-integrated technology to make your job easier
> Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
> http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&b id=263057&dat=121642
> _______________________________________________
> ckrm-tech mailing list
> https://lists.sourceforge.net/lists/listinfo/ckrm-tech
--

------------------------------------------------------------ ----------
Chandra Seetharaman | Be careful what you choose....
- sekharan@us.ibm.com | .......you may get it.
------------------------------------------------------------ ----------
Re: [ckrm-tech] [PATCH] BC: resource beancounters (v4) (added user memory) [message #6259 is a reply to message #6227] Tue, 12 September 2006 23:58 Go to previous messageGo to next message
Chandra Seetharaman is currently offline  Chandra Seetharaman
Messages: 88
Registered: August 2006
Member
On Tue, 2006-09-12 at 14:48 +0400, Pavel Emelianov wrote:
<snip>
> > I do not think it is that simple since
> > - there is typically more than one class I want to set guarantee to
> > - I will not able to use both limit and guarantee
> > - Implementation will not be work-conserving.
> >
> > Also, How would you configure the following in your model ?
> >
> > 5 classes: Class A(10, 40), Class B(20, 100), Class C (30, 100), Class D
> > (5, 100), Class E(15, 50); (class_name(guarantee, limit))
> >
> What's the total memory amount on the node? Without it it's hard to make
> any
> guarantee.

I wrote the example treating them as %, so 100 would be the total amount
of memory.

> > "Limit only" approach works for DoS prevention. But for providing QoS
> > you would need guarantee.
> >
> You may not provide guarantee on physycal resource for a particular group
> without limiting its usage by other groups. That's my major idea.

I agree with that, but the other way around (i.e provide guarantee for
everyone by imposing limits on everyone) is what I am saying is not
possible.
>
> ------------------------------------------------------------ -------------
> Using Tomcat but need to do more? Need to support web services, security?
> Get stuff done quickly with pre-integrated technology to make your job easier
> Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
> http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&b id=263057&dat=121642
> _______________________________________________
> ckrm-tech mailing list
> https://lists.sourceforge.net/lists/listinfo/ckrm-tech
--

------------------------------------------------------------ ----------
Chandra Seetharaman | Be careful what you choose....
- sekharan@us.ibm.com | .......you may get it.
------------------------------------------------------------ ----------
Re: [ckrm-tech] [PATCH] BC: resource beancounters (v4) (added user memory) [message #6260 is a reply to message #6252] Wed, 13 September 2006 00:02 Go to previous messageGo to next message
Chandra Seetharaman is currently offline  Chandra Seetharaman
Messages: 88
Registered: August 2006
Member
On Tue, 2006-09-12 at 10:22 -0700, Rohit Seth wrote:
> On Tue, 2006-09-12 at 16:14 +0530, Srivatsa Vaddagiri wrote:
> > On Mon, Sep 11, 2006 at 12:10:31PM -0700, Rohit Seth wrote:
> > > It seems that a single notion of limit should suffice, and that limit
> > > should more be treated as something beyond which that resource
> > > consumption in the container will be throttled/not_allowed.
> >
> > The big question is : are containers/RG allowed to use *upto* their
> > limit always? In other words, will you typically setup limits such that
> > sum of all limits = max resource capacity?
> >
>
> If a user is really interested in ensuring that all scheduled jobs (or
> containers) get what they have asked for (guarantees) then making the
> sum of all container limits equal to total system limit is the right
> thing to do.
>
> > If it is setup like that, then what you are considering as limit is
> > actually guar no?
> >
> Right. And if we do it like this then it is up to sysadmin to configure
> the thing right without adding additional logic in kernel.

It won't be a complete solution, as the user won't be able to
- set both guarantee and limit for a resource group
- use limit on some and guarantee on some
- optimize the usage of available resources
>
> -rohit
>
>
>
> ------------------------------------------------------------ -------------
> Using Tomcat but need to do more? Need to support web services, security?
> Get stuff done quickly with pre-integrated technology to make your job easier
> Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
> http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&b id=263057&dat=121642
> _______________________________________________
> ckrm-tech mailing list
> https://lists.sourceforge.net/lists/listinfo/ckrm-tech
--

------------------------------------------------------------ ----------
Chandra Seetharaman | Be careful what you choose....
- sekharan@us.ibm.com | .......you may get it.
------------------------------------------------------------ ----------
Re: [ckrm-tech] [PATCH] BC: resource beancounters (v4) (added user memory) [message #6261 is a reply to message #6258] Wed, 13 September 2006 00:39 Go to previous messageGo to next message
Rohit Seth is currently offline  Rohit Seth
Messages: 101
Registered: August 2006
Senior Member
On Tue, 2006-09-12 at 16:54 -0700, Chandra Seetharaman wrote:
> On Mon, 2006-09-11 at 16:58 -0700, Rohit Seth wrote:
> > On Mon, 2006-09-11 at 12:42 -0700, Chandra Seetharaman wrote:
> > > On Mon, 2006-09-11 at 12:10 -0700, Rohit Seth wrote:
> > > > On Mon, 2006-09-11 at 11:25 -0700, Chandra Seetharaman wrote:
> >
> > > > > There could be a default container which doesn't have any guarantee or
> > > > > limit.
> > > >
> > > > First, I think it is critical that we allow processes to run outside of
> > > > any container (unless we know for sure that the penalty of running a
> > > > process inside a container is very very minimal).
> > >
> > > When I meant a default container I meant a default "resource group". In
> > > case of container that would be the default environment. I do not see
> > > any additional overhead associated with it, it is only associated with
> > > how resource are allocated/accounted.
> > >
> >
> > There should be some cost when you do atomic inc/dec accounting and
> > locks for add/remove resources from any container (including default
> > resource group). No?
>
> yes, it would be there, but is not heavy, IMO.

I think anything greater than 1% could be a concern for people who are
not very interested in containers but would be forced to live with them.

> >
> > > >
> > > > And anything running outside a container should be limited by default
> > > > Linux settings.
> > >
> > > note that the resource available to the default RG will be (total system
> > > resource - allocated to RGs).
> >
> > I think it will be preferable to not change the existing behavior for
> > applications that are running outside any container (in your case
> > default resource group).
>
> hmm, when you provide QoS for a set of apps, you will affect (the
> resource availability of) other apps. I don't see any way around it. Any
> ideas ?

When I say, existing behavior, I mean not getting impacted by some
artificial limits that are imposed by container subsystem. IOW, if a
sysadmin is okay to have certain apps running outside of container then
he is basically forgoing any QoS for any container on that system.

>
> >
> > > >
> > > > > When you create containers and assign guarantees to each of them
> > > > > make sure that you leave some amount of resource unassigned.
> > > > ^^^^^ This will force the "default" container
> > > > with limits (indirectly). IMO, the whole guarantee feature gets defeated
> > >
> > > You _will_ have limits for the default RG even if we don't have
> > > guarantees.
> > >
> > > > the moment you bring in this fuzziness.
> > >
> > > Not really.
> > > - Each RG will have a guarantee and limit of each resource.
> > > - default RG will have (system resource - sum of guarantees)
> > > - Every RG will be guaranteed some amount of resource to provide QoS
> > > - Every RG will be limited at "limit" to prevent DoS attacks.
> > > - Whoever doesn't care either of those set them to don't care values.
> > >
> >
> > For the cases that put this don't care, do you depend on existing
> > reclaim algorithm (for memory) in kernel?
>
> Yes.

So one container with these don't care condition(s) can turn the whole
guarantee thing bad. Because existing kernel reclaimer does not know
about memory commitments to other containers. Right?

> >
> > > >
> > > > > That
> > > > > unassigned resources can be used by the default container or can be used
> > > > > by containers that want more than their guarantee (and less than their
> > > > > limit). This is how CKRM/RG handles this issue.
> > > > >
> > > > >
> > > >
> > > > It seems that a single notion of limit should suffice, and that limit
> > > > should more be treated as something beyond which that resource
> > > > consumption in the container will be throttled/not_allowed.
> > >
> > > As I stated in an earlier email "Limit only" approach can prevent a
> > > system from DoS attacks (and also fits the container model nicely),
> > > whereas to provide QoS one would need guarantee.
> > >
> > > Without guarantee, a RG that the admin cares about can starve if
> > > all/most of the other RGs consume upto their limits.
> > >
> > > >
> >
> > If the limits are set appropriately so that containers total memory
> > consumption does not exceed the system memory then there shouldn't be
> > any QoS issue (to whatever extent it is applicable for specific
> > scenario).
>
> Then you will not be work-conserving (IOW over-committing), which is one
> of the main advantage of this type of feature.
>

If for the systems where QoS is important, not over-committing will be
fine (at least to start with).

-rohit
Re: [ckrm-tech] [PATCH] BC: resource beancounters (v4) (added user memory) [message #6262 is a reply to message #6260] Wed, 13 September 2006 00:43 Go to previous messageGo to next message
Rohit Seth is currently offline  Rohit Seth
Messages: 101
Registered: August 2006
Senior Member
On Tue, 2006-09-12 at 17:02 -0700, Chandra Seetharaman wrote:
> On Tue, 2006-09-12 at 10:22 -0700, Rohit Seth wrote:
> > On Tue, 2006-09-12 at 16:14 +0530, Srivatsa Vaddagiri wrote:
> > > On Mon, Sep 11, 2006 at 12:10:31PM -0700, Rohit Seth wrote:
> > > > It seems that a single notion of limit should suffice, and that limit
> > > > should more be treated as something beyond which that resource
> > > > consumption in the container will be throttled/not_allowed.
> > >
> > > The big question is : are containers/RG allowed to use *upto* their
> > > limit always? In other words, will you typically setup limits such that
> > > sum of all limits = max resource capacity?
> > >
> >
> > If a user is really interested in ensuring that all scheduled jobs (or
> > containers) get what they have asked for (guarantees) then making the
> > sum of all container limits equal to total system limit is the right
> > thing to do.
> >
> > > If it is setup like that, then what you are considering as limit is
> > > actually guar no?
> > >
> > Right. And if we do it like this then it is up to sysadmin to configure
> > the thing right without adding additional logic in kernel.
>
> It won't be a complete solution, as the user won't be able to
> - set both guarantee and limit for a resource group
> - use limit on some and guarantee on some
> - optimize the usage of available resources

I think, if we have some of the dynamic resource limit adjustments
possible then some of the above functionality could be achieved. And I
think that could be a good start point.

-rohit
Re: [ckrm-tech] [PATCH] BC: resource beancounters (v4) (added user memory) [message #6263 is a reply to message #6261] Wed, 13 September 2006 01:10 Go to previous messageGo to next message
Chandra Seetharaman is currently offline  Chandra Seetharaman
Messages: 88
Registered: August 2006
Member
On Tue, 2006-09-12 at 17:39 -0700, Rohit Seth wrote:
<snip>
> > yes, it would be there, but is not heavy, IMO.
>
> I think anything greater than 1% could be a concern for people who are
> not very interested in containers but would be forced to live with them.

If they are not interested in resource management and/or containers, i
do not think they need to pay.
>
> > >
> > > > >
> > > > > And anything running outside a container should be limited by default
> > > > > Linux settings.
> > > >
> > > > note that the resource available to the default RG will be (total system
> > > > resource - allocated to RGs).
> > >
> > > I think it will be preferable to not change the existing behavior for
> > > applications that are running outside any container (in your case
> > > default resource group).
> >
> > hmm, when you provide QoS for a set of apps, you will affect (the
> > resource availability of) other apps. I don't see any way around it. Any
> > ideas ?
>
> When I say, existing behavior, I mean not getting impacted by some
> artificial limits that are imposed by container subsystem. IOW, if a

That is what I understood and replied above.
> sysadmin is okay to have certain apps running outside of container then
> he is basically forgoing any QoS for any container on that system.

Not at all. If the container they are interested in is guaranteed, I do
not see how apps running outside a container would affect them.

<snip>
> > > > Not really.
> > > > - Each RG will have a guarantee and limit of each resource.
> > > > - default RG will have (system resource - sum of guarantees)
> > > > - Every RG will be guaranteed some amount of resource to provide QoS
> > > > - Every RG will be limited at "limit" to prevent DoS attacks.
> > > > - Whoever doesn't care either of those set them to don't care values.
> > > >
> > >
> > > For the cases that put this don't care, do you depend on existing
> > > reclaim algorithm (for memory) in kernel?
> >
> > Yes.
>
> So one container with these don't care condition(s) can turn the whole
> guarantee thing bad. Because existing kernel reclaimer does not know
> about memory commitments to other containers. Right?

No, the reclaimer would free up pages associated with the don't care RGs
( as the user don't care about the resource made available to them).

<snip>
> > > If the limits are set appropriately so that containers total memory
> > > consumption does not exceed the system memory then there shouldn't be
> > > any QoS issue (to whatever extent it is applicable for specific
> > > scenario).
> >
> > Then you will not be work-conserving (IOW over-committing), which is one
> > of the main advantage of this type of feature.
> >
>
> If for the systems where QoS is important, not over-committing will be
> fine (at least to start with).

The problem is that you can't do it with just limit.

--

------------------------------------------------------------ ----------
Chandra Seetharaman | Be careful what you choose....
- sekharan@us.ibm.com | .......you may get it.
------------------------------------------------------------ ----------
Re: [ckrm-tech] [PATCH] BC: resource beancounters (v4) (added user memory) [message #6264 is a reply to message #6263] Wed, 13 September 2006 01:25 Go to previous messageGo to next message
Rohit Seth is currently offline  Rohit Seth
Messages: 101
Registered: August 2006
Senior Member
On Tue, 2006-09-12 at 18:10 -0700, Chandra Seetharaman wrote:
> On Tue, 2006-09-12 at 17:39 -0700, Rohit Seth wrote:
> <snip>
> > > yes, it would be there, but is not heavy, IMO.
> >
> > I think anything greater than 1% could be a concern for people who are
> > not very interested in containers but would be forced to live with them.
>
> If they are not interested in resource management and/or containers, i
> do not think they need to pay.
> >

Think of a single kernel from a vendor that has container support built
in.

> > > >
> > > > > >
> > > > > > And anything running outside a container should be limited by default
> > > > > > Linux settings.
> > > > >
> > > > > note that the resource available to the default RG will be (total system
> > > > > resource - allocated to RGs).
> > > >
> > > > I think it will be preferable to not change the existing behavior for
> > > > applications that are running outside any container (in your case
> > > > default resource group).
> > >
> > > hmm, when you provide QoS for a set of apps, you will affect (the
> > > resource availability of) other apps. I don't see any way around it. Any
> > > ideas ?
> >
> > When I say, existing behavior, I mean not getting impacted by some
> > artificial limits that are imposed by container subsystem. IOW, if a
>
> That is what I understood and replied above.
> > sysadmin is okay to have certain apps running outside of container then
> > he is basically forgoing any QoS for any container on that system.
>
> Not at all. If the container they are interested in is guaranteed, I do
> not see how apps running outside a container would affect them.
>

Because the kernel (outside the container subsystem) doesn't know of
these guarantees...unless you modify the page allocator to have another
variant of overcommit memory.

> <snip>
> > > > > Not really.
> > > > > - Each RG will have a guarantee and limit of each resource.
> > > > > - default RG will have (system resource - sum of guarantees)
> > > > > - Every RG will be guaranteed some amount of resource to provide QoS
> > > > > - Every RG will be limited at "limit" to prevent DoS attacks.
> > > > > - Whoever doesn't care either of those set them to don't care values.
> > > > >
> > > >
> > > > For the cases that put this don't care, do you depend on existing
> > > > reclaim algorithm (for memory) in kernel?
> > >
> > > Yes.
> >
> > So one container with these don't care condition(s) can turn the whole
> > guarantee thing bad. Because existing kernel reclaimer does not know
> > about memory commitments to other containers. Right?
>
> No, the reclaimer would free up pages associated with the don't care RGs
> ( as the user don't care about the resource made available to them).
>

And how will the kernel reclaimer know which RGs are don't care?

-rohit
Re: [ckrm-tech] [PATCH] BC: resource beancounters (v4) (added user memory) [message #6266 is a reply to message #6262] Wed, 13 September 2006 01:13 Go to previous messageGo to next message
Chandra Seetharaman is currently offline  Chandra Seetharaman
Messages: 88
Registered: August 2006
Member
On Tue, 2006-09-12 at 17:43 -0700, Rohit Seth wrote:
<snip>

> > It won't be a complete solution, as the user won't be able to
> > - set both guarantee and limit for a resource group
> > - use limit on some and guarantee on some
> > - optimize the usage of available resources
>
> I think, if we have some of the dynamic resource limit adjustments
> possible then some of the above functionality could be achieved. And I
> think that could be a good start point.


Yes, dynamic resource adjustments should be available. But, you can't
expect the sysadmin to sit around and keep tweaking the limits so as to
achieve the QoS he wants. (Even if you have an application sitting and
doing it, as I pointed in other email it may not be possible for
different scenarios).
>
> -rohit
>
>
> ------------------------------------------------------------ -------------
> Using Tomcat but need to do more? Need to support web services, security?
> Get stuff done quickly with pre-integrated technology to make your job easier
> Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
> http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&b id=263057&dat=121642
> _______________________________________________
> ckrm-tech mailing list
> https://lists.sourceforge.net/lists/listinfo/ckrm-tech
--

------------------------------------------------------------ ----------
Chandra Seetharaman | Be careful what you choose....
- sekharan@us.ibm.com | .......you may get it.
------------------------------------------------------------ ----------
Re: [ckrm-tech] [PATCH] BC: resource beancounters (v4) (added user memory) [message #6268 is a reply to message #6264] Wed, 13 September 2006 04:41 Go to previous messageGo to next message
Srivatsa Vaddagiri is currently offline  Srivatsa Vaddagiri
Messages: 241
Registered: August 2006
Senior Member
On Tue, Sep 12, 2006 at 06:25:51PM -0700, Rohit Seth wrote:
> And how will the kernel reclaimer know which RGs are don't care?

The UBC code can provide this information certainly. For ex: in my CPU
controller patch, I do keep track of such don't care groups in the
controller itself:

http://lkml.org/lkml/2006/8/20/120


--
Regards,
vatsa
Re: [ckrm-tech] [PATCH] BC: resource beancounters (v4) (added user memory) [message #6274 is a reply to message #6259] Wed, 13 September 2006 08:06 Go to previous messageGo to next message
Pavel Emelianov is currently offline  Pavel Emelianov
Messages: 1149
Registered: September 2006
Senior Member
Chandra Seetharaman wrote:
> On Tue, 2006-09-12 at 14:48 +0400, Pavel Emelianov wrote:
> <snip>
>
>>> I do not think it is that simple since
>>> - there is typically more than one class I want to set guarantee to
>>> - I will not able to use both limit and guarantee
>>> - Implementation will not be work-conserving.
>>>
>>> Also, How would you configure the following in your model ?
>>>
>>> 5 classes: Class A(10, 40), Class B(20, 100), Class C (30, 100), Class D
>>> (5, 100), Class E(15, 50); (class_name(guarantee, limit))
>>>
>>>
>> What's the total memory amount on the node? Without it it's hard to make
>> any
>> guarantee.
>>
>
> I wrote the example treating them as %, so 100 would be the total amount
> of memory.
>
OK. Then limiting must be done this way (unreclaimable limit/total limit)
A (15/40)
B (25/100)
C (35/100)
D (10/100)
E (20/50)
In this case each group will receive it's guarantee for sure.

E.g. even if A, B, E and D will eat all it's unreclaimable memory then
we'll have
100 - 15 - 25 - 20 - 10 = 30% of memory left (maybe after reclaiming) which
is perfectly enough for C's guarantee.
>
>>> "Limit only" approach works for DoS prevention. But for providing QoS
>>> you would need guarantee.
>>>
>>>
>> You may not provide guarantee on physycal resource for a particular group
>> without limiting its usage by other groups. That's my major idea.
>>
>
> I agree with that, but the other way around (i.e provide guarantee for
> everyone by imposing limits on everyone) is what I am saying is not
> possible.
Then how do you make sure that memory WILL be available when the group needs
it without limiting the others in a proper way?
Re: [ckrm-tech] [PATCH] BC: resource beancounters (v4) (added user memory) [message #6285 is a reply to message #6274] Wed, 13 September 2006 12:15 Go to previous messageGo to next message
Srivatsa Vaddagiri is currently offline  Srivatsa Vaddagiri
Messages: 241
Registered: August 2006
Senior Member
On Wed, Sep 13, 2006 at 12:06:41PM +0400, Pavel Emelianov wrote:
> OK. Then limiting must be done this way (unreclaimable limit/total limit)
> A (15/40)
> B (25/100)
> C (35/100)

s/35/30?

Also the different b/n total and unreclaimable limits goes towards
limiting reclaimable memory i suppose? And 1st limit seems to be a
hard-limit while the 2nd one is soft?

> D (10/100)
> E (20/50)
> In this case each group will receive it's guarantee for sure.
>
> E.g. even if A, B, E and D will eat all it's unreclaimable memory then
> we'll have
> 100 - 15 - 25 - 20 - 10 = 30% of memory left (maybe after reclaiming) which
> is perfectly enough for C's guarantee.

I agree by carefully choosing these limits, we can provide some sort of
QoS, which is a good step to begin with.


--
Regards,
vatsa
Re: [ckrm-tech] [PATCH] BC: resource beancounters (v4) (added user memory) [message #6287 is a reply to message #6285] Wed, 13 September 2006 13:35 Go to previous messageGo to next message
Pavel Emelianov is currently offline  Pavel Emelianov
Messages: 1149
Registered: September 2006
Senior Member
Srivatsa Vaddagiri wrote:
> On Wed, Sep 13, 2006 at 12:06:41PM +0400, Pavel Emelianov wrote:
>> OK. Then limiting must be done this way (unreclaimable limit/total limit)
>> A (15/40)
>> B (25/100)
>> C (35/100)
>
> s/35/30?
Hmmm... No, it must be 35. It IS higher than guarantee you proposed,
but that's OK to have a limit higher than guarantee, isn't it?
>
> Also the different b/n total and unreclaimable limits goes towards
> limiting reclaimable memory i suppose? And 1st limit seems to be a
> hard-limit while the 2nd one is soft?
The first limit (let's call it soft one) is limit for unreclaimable
memory, the second (hard limit) - for booth reclaimable and not.

The ploicy is
1. if BC tries to *mmap()* unreclaimable region (e.g. w/o backed
file as moving page to swap is not a pure "reclamation") then
check the soft limit and prohibit mapping in case it is hit;
2. if BC tries to *touch* a page - then check for the hard limit
and start reclaiming this BC's pages if the limit is hit.

That's how guarantees can be met. Current BC code does perform the
first check and gives you all the levers for the second one - just
the patch(es) with reclamation mechanism is required.
>
>> D (10/100)
>> E (20/50)
>> In this case each group will receive it's guarantee for sure.
>>
>> E.g. even if A, B, E and D will eat all it's unreclaimable memory then
>> we'll have
>> 100 - 15 - 25 - 20 - 10 = 30% of memory left (maybe after reclaiming) which
>> is perfectly enough for C's guarantee.
>
> I agree by carefully choosing these limits, we can provide some sort of
> QoS, which is a good step to begin with.
Sure. As I've said - soft limiting is already done with BC patches, the
hard one is not prohibited by BC (BCs even prepare a good pad for it).
When reclaiming is done we'll have a hard limit described above.
Re: [ckrm-tech] [PATCH] BC: resource beancounters (v4) (added user memory) [message #6288 is a reply to message #6236] Wed, 13 September 2006 13:39 Go to previous messageGo to next message
Pavel Emelianov is currently offline  Pavel Emelianov
Messages: 1149
Registered: September 2006
Senior Member
Balbir Singh wrote:
> Pavel Emelianov wrote:
> <snip>
>
>>>>>> E.g. I have a node with 1Gb of ram and 10 containers with 100Mb
>>>>>> guarantee each.
>>>>>> I want to start one more. What shall I do not to break guarantees?
>>>>> Don't start the new container or change the guarantees of the
>>>>> existing
>>>>> ones
>>>>> to accommodate this one :) The QoS design (done by the administrator)
>>>>> should
>>>>> take care of such use-cases. It would be perfectly ok to have a
>>>>> container
>>>>> that does not care about guarantees to set their guarantee to 0
>>>>> and set
>>>>> their limit to the desired value. As Chandra has been stating we
>>>>> need two
>>>>> parameters (guarantee, limit), either can be optional, but not both.
>>>> If I set up 9 groups to have 100Mb limit then I have 100Mb assured (on
>>>> 1Gb node)
>>>> for the 10th one exactly. And I do not have to set up any guarantee as
>>>> it won't affect
>>>> anything. So what a guarantee parameter is needed for?
>>> This use case works well for providing guarantee to one container.
>>> What if
>>> I want guarantees of 100Mb and 200Mb for two containers? How do I setup
>>> the system using limits?
>> You may set any value from 100 up to 800 Mb for the first one and
>> 200-900Mb for
>> the second. In case of no other groups first will receive its 100Mb for
>> sure and
>> so does the second. If there are other groups - their guarantees should
>> be concerned.
>
> If I add another group with a guarantee of 100MB, then its limit will
> be anywhere between 100MB-800MB ?
I've described this in details in my letter to sekharan@.
>
> I do not understand the guarantees being concerned part.
>
>>> Even I restrict everyone else to 700Mb. With this I cannot be sure that
>>> the remaining 300Mb will be distributed as 100Mb and 200Mb.
>> There's no "everyone else" here - we're talking about a "static" case.
>> When new group arrives we need to recalculate guarantees as you said.
>
> I was speaking in general where we have 'n' groups, so thats why I had
> "everyone else".
Well, when we talk about guarantee this implies that the number
of group doesn't chage - when it does limits/guarantees generally
must be recalculated to satisfy new group set.
Re: [ckrm-tech] [PATCH] BC: resource beancounters (v4) (added user memory) [message #6308 is a reply to message #6264] Wed, 13 September 2006 22:20 Go to previous messageGo to next message
Chandra Seetharaman is currently offline  Chandra Seetharaman
Messages: 88
Registered: August 2006
Member
On Tue, 2006-09-12 at 18:25 -0700, Rohit Seth wrote:
> On Tue, 2006-09-12 at 18:10 -0700, Chandra Seetharaman wrote:
> > On Tue, 2006-09-12 at 17:39 -0700, Rohit Seth wrote:
> > <snip>
> > > > yes, it would be there, but is not heavy, IMO.
> > >
> > > I think anything greater than 1% could be a concern for people who are
> > > not very interested in containers but would be forced to live with them.
> >
> > If they are not interested in resource management and/or containers, i
> > do not think they need to pay.
> > >
>
> Think of a single kernel from a vendor that has container support built
> in.

Ok. Understood.

Here are results of some of the benchmarks we have run in the past
(April 2005) with CKRM which showed no/negligible performance impact in
that scenario.
http://marc.theaimsgroup.com/?l=ckrm-tech&m=111325064322 305&w=2
http://marc.theaimsgroup.com/?l=ckrm-tech&m=111385973226 267&w=2
http://marc.theaimsgroup.com/?l=ckrm-tech&m=111291409731 929&w=2
>
<snip>

> > Not at all. If the container they are interested in is guaranteed, I do
> > not see how apps running outside a container would affect them.
> >
>
> Because the kernel (outside the container subsystem) doesn't know of

The core resource subsystem (VM subsystem for memory) would know about
the guarantees and don't cares, and it would handle it appropriately.

> these guarantees...unless you modify the page allocator to have another
> variant of overcommit memory.
>

<snip>
>
> > No, the reclaimer would free up pages associated with the don't care RGs
> > ( as the user don't care about the resource made available to them).
> >
>
> And how will the kernel reclaimer know which RGs are don't care?

By looking into the beancounter associated with the container/RG


--

------------------------------------------------------------ ----------
Chandra Seetharaman | Be careful what you choose....
- sekharan@us.ibm.com | .......you may get it.
------------------------------------------------------------ ----------
Re: [ckrm-tech] [PATCH] BC: resource beancounters (v4) (added user memory) [message #6311 is a reply to message #6274] Wed, 13 September 2006 22:31 Go to previous messageGo to next message
Chandra Seetharaman is currently offline  Chandra Seetharaman
Messages: 88
Registered: August 2006
Member
On Wed, 2006-09-13 at 12:06 +0400, Pavel Emelianov wrote:
> Chandra Seetharaman wrote:
> > On Tue, 2006-09-12 at 14:48 +0400, Pavel Emelianov wrote:
> > <snip>
> >
> >>> I do not think it is that simple since
> >>> - there is typically more than one class I want to set guarantee to
> >>> - I will not able to use both limit and guarantee
> >>> - Implementation will not be work-conserving.
> >>>
> >>> Also, How would you configure the following in your model ?
> >>>
> >>> 5 classes: Class A(10, 40), Class B(20, 100), Class C (30, 100), Class D
> >>> (5, 100), Class E(15, 50); (class_name(guarantee, limit))
> >>>
> >>>
> >> What's the total memory amount on the node? Without it it's hard to make
> >> any
> >> guarantee.
> >>
> >
> > I wrote the example treating them as %, so 100 would be the total amount
> > of memory.
> >
> OK. Then limiting must be done this way (unreclaimable limit/total limit)
> A (15/40)
> B (25/100)
> C (35/100)
> D (10/100)
> E (20/50)
> In this case each group will receive it's guarantee for sure.
>
> E.g. even if A, B, E and D will eat all it's unreclaimable memory then
> we'll have
> 100 - 15 - 25 - 20 - 10 = 30% of memory left (maybe after reclaiming) which
> is perfectly enough for C's guarantee.

How did you arrive at the +5 number ?

What if I have 40 containers each with 2% guarantee ? what do we do
then ? and many other different combinations (what I gave was not the
_only_ scenario).

> >
> >>> "Limit only" approach works for DoS prevention. But for providing QoS
> >>> you would need guarantee.
> >>>
> >>>
> >> You may not provide guarantee on physycal resource for a particular group
> >> without limiting its usage by other groups. That's my major idea.
> >>
> >
> > I agree with that, but the other way around (i.e provide guarantee for
> > everyone by imposing limits on everyone) is what I am saying is not
> > possible.
> Then how do you make sure that memory WILL be available when the group needs
> it without limiting the others in a proper way?

You could limit others only if you _know_ somebody is not getting what
they are supposed to get (based on guarantee).

>
> ------------------------------------------------------------ -------------
> Using Tomcat but need to do more? Need to support web services, security?
> Get stuff done quickly with pre-integrated technology to make your job easier
> Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
> http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&b id=263057&dat=121642
> _______________________________________________
> ckrm-tech mailing list
> https://lists.sourceforge.net/lists/listinfo/ckrm-tech
--

------------------------------------------------------------ ----------
Chandra Seetharaman | Be careful what you choose....
- sekharan@us.ibm.com | .......you may get it.
------------------------------------------------------------ ----------
Re: [ckrm-tech] [PATCH] BC: resource beancounters (v4) (added user memory) [message #6316 is a reply to message #6308] Thu, 14 September 2006 01:22 Go to previous messageGo to next message
Rohit Seth is currently offline  Rohit Seth
Messages: 101
Registered: August 2006
Senior Member
On Wed, 2006-09-13 at 15:20 -0700, Chandra Seetharaman wrote:
> On Tue, 2006-09-12 at 18:25 -0700, Rohit Seth wrote:
> > On Tue, 2006-09-12 at 18:10 -0700, Chandra Seetharaman wrote:
> > > On Tue, 2006-09-12 at 17:39 -0700, Rohit Seth wrote:
> > > <snip>
> > > > > yes, it would be there, but is not heavy, IMO.
> > > >
> > > > I think anything greater than 1% could be a concern for people who are
> > > > not very interested in containers but would be forced to live with them.
> > >
> > > If they are not interested in resource management and/or containers, i
> > > do not think they need to pay.
> > > >
> >
> > Think of a single kernel from a vendor that has container support built
> > in.
>
> Ok. Understood.
>
> Here are results of some of the benchmarks we have run in the past
> (April 2005) with CKRM which showed no/negligible performance impact in
> that scenario.
> http://marc.theaimsgroup.com/?l=ckrm-tech&m=111325064322 305&w=2
> http://marc.theaimsgroup.com/?l=ckrm-tech&m=111385973226 267&w=2
> http://marc.theaimsgroup.com/?l=ckrm-tech&m=111291409731 929&w=2
> >


These are good results. But I still think the cost will increase over a
period of time as more logic gets added. Any data on microbenchmarks
like lmbench.

> <snip>
>
> > > Not at all. If the container they are interested in is guaranteed, I do
> > > not see how apps running outside a container would affect them.
> > >
> >
> > Because the kernel (outside the container subsystem) doesn't know of
>
> The core resource subsystem (VM subsystem for memory) would know about
> the guarantees and don't cares, and it would handle it appropriately.
>

...meaning hooks in the generic kernel reclaim algorithm. Getting
something like that in mainline will be at best tricky.


-rohit
Re: [ckrm-tech] [PATCH] BC: resource beancounters (v4) (added user memory) [message #6325 is a reply to message #6311] Thu, 14 September 2006 07:53 Go to previous messageGo to next message
Pavel Emelianov is currently offline  Pavel Emelianov
Messages: 1149
Registered: September 2006
Senior Member
Chandra Seetharaman wrote:
> On Wed, 2006-09-13 at 12:06 +0400, Pavel Emelianov wrote:
>
>> Chandra Seetharaman wrote:
>>
>>> On Tue, 2006-09-12 at 14:48 +0400, Pavel Emelianov wrote:
>>> <snip>
>>>
>>>
>>>>> I do not think it is that simple since
>>>>> - there is typically more than one class I want to set guarantee to
>>>>> - I will not able to use both limit and guarantee
>>>>> - Implementation will not be work-conserving.
>>>>>
>>>>> Also, How would you configure the following in your model ?
>>>>>
>>>>> 5 classes: Class A(10, 40), Class B(20, 100), Class C (30, 100), Class D
>>>>> (5, 100), Class E(15, 50); (class_name(guarantee, limit))
>>>>>
>>>>>
>>>>>
>>>> What's the total memory amount on the node? Without it it's hard to make
>>>> any
>>>> guarantee.
>>>>
>>>>
>>> I wrote the example treating them as %, so 100 would be the total amount
>>> of memory.
>>>
>>>
>> OK. Then limiting must be done this way (unreclaimable limit/total limit)
>> A (15/40)
>> B (25/100)
>> C (35/100)
>> D (10/100)
>> E (20/50)
>> In this case each group will receive it's guarantee for sure.
>>
>> E.g. even if A, B, E and D will eat all it's unreclaimable memory then
>> we'll have
>> 100 - 15 - 25 - 20 - 10 = 30% of memory left (maybe after reclaiming) which
>> is perfectly enough for C's guarantee.
>>
>
> How did you arrive at the +5 number ?
>
I've solved a linear equations set :)
> What if I have 40 containers each with 2% guarantee ? what do we do
> then ? and many other different combinations (what I gave was not the
> _only_ scenario).
>
Then you need to solve a set of 40 equations. This sounds weird, but
don't afraid - sets like these are solved lightly.
>
>>>
>>>
>>>>> "Limit only" approach works for DoS prevention. But for providing QoS
>>>>> you would need guarantee.
>>>>>
>>>>>
>>>>>
>>>> You may not provide guarantee on physycal resource for a particular group
>>>> without limiting its usage by other groups. That's my major idea.
>>>>
>>>>
>>> I agree with that, but the other way around (i.e provide guarantee for
>>> everyone by imposing limits on everyone) is what I am saying is not
>>> possible.
>>>
>> Then how do you make sure that memory WILL be available when the group needs
>> it without limiting the others in a proper way?
>>
>
> You could limit others only if you _know_ somebody is not getting what
> they are supposed to get (based on guarantee).
>
I don't understand your idea. Limit does _not_ imply anything - it's
just a limit.
You may limit anything to anyone w/o bothering the consequences.
Guarantee implies that the resource you guarantee will be available and
this "will be" is something not that easy.

So I repeat my question - how can you be sure that these X megabytes you
guarantee to some group won't be used by others so that you won't be able
to reclaim them?
Re: [ckrm-tech] [PATCH] BC: resource beancounters (v4) (added user memory) [message #6326 is a reply to message #6325] Thu, 14 September 2006 08:06 Go to previous messageGo to next message
Balbir Singh is currently offline  Balbir Singh
Messages: 491
Registered: August 2006
Senior Member
Pavel Emelianov wrote:

> I don't understand your idea. Limit does _not_ imply anything - it's
> just a limit.
> You may limit anything to anyone w/o bothering the consequences.
> Guarantee implies that the resource you guarantee will be available and
> this "will be" is something not that easy.
>
> So I repeat my question - how can you be sure that these X megabytes you
> guarantee to some group won't be used by others so that you won't be able
> to reclaim them?
>
>

May be we can treat a guarantee as a soft guarantee. A soft
guarantee would imply that when a group needs its guaranteed resources, the
system makes its best effort to make it available.

In soft guarantees, resources not actively used by a group can be shared with
other groups.

Hard guarantees would probably require reserving the resource in advance and
sharing of the resources not used, with other groups, might not be possible.

Comments?

--

Balbir Singh,
Linux Technology Center,
IBM Software Labs
Re: [ckrm-tech] [PATCH] BC: resource beancounters (v4) (added user memory) [message #6349 is a reply to message #6326] Thu, 14 September 2006 13:02 Go to previous messageGo to next message
Pavel Emelianov is currently offline  Pavel Emelianov
Messages: 1149
Registered: September 2006
Senior Member
Balbir Singh wrote:
> Pavel Emelianov wrote:
>
>> I don't understand your idea. Limit does _not_ imply anything - it's
>> just a limit.
>> You may limit anything to anyone w/o bothering the consequences.
>> Guarantee implies that the resource you guarantee will be available and
>> this "will be" is something not that easy.
>>
>> So I repeat my question - how can you be sure that these X megabytes you
>> guarantee to some group won't be used by others so that you won't be
>> able
>> to reclaim them?
>>
>>
>
> May be we can treat a guarantee as a soft guarantee. A soft
> guarantee would imply that when a group needs its guaranteed
> resources, the
> system makes its best effort to make it available.
>
> In soft guarantees, resources not actively used by a group can be
> shared with
> other groups.
>
> Hard guarantees would probably require reserving the resource in
> advance and
> sharing of the resources not used, with other groups, might not be
> possible.
>
> Comments?
>
Reserving in advance means that sometimes you won't be able to start a
new group without taking back some of reserved pages. This is ... strange.

I think that a satisfactory solution now would be:
- limit unreclaimable memory during mmap() against soft limit to prevent
potential rejects during page faults;
- reclaim memory in case of hitting hard limit;
- guarantees are done via setting soft and hard limits as I've shown
before.

The question still open is wether or not to account fractions.
I propose to skip fractions for a while and try to charge the page to
it's first user.

So final BC design is:
1. three resources:
- kernel memory
- user unreclaimable memory
- user reclaimable memory
2. unreclaimable memory is charged "in advance", reclaimable
is charged "on demand" with reclamation if needed
3. each object (kernel one or user page) is charged to the
first user
4. each resource controller declares it's own
- meaning of "limit" parameter (percent/size/bandwidth/etc)
- behaviour on changing limit (e.g. reclamation)
- behaviour on hitting the limit (e.g. reclamation)
5. BC can be assigned to any task by pid (not just current)
without recharging currently charged resources.
Re: [ckrm-tech] [PATCH] BC: resource beancounters (v4) (added user memory) [message #6363 is a reply to message #6316] Thu, 14 September 2006 23:13 Go to previous messageGo to next message
Chandra Seetharaman is currently offline  Chandra Seetharaman
Messages: 88
Registered: August 2006
Member
On Wed, 2006-09-13 at 18:22 -0700, Rohit Seth wrote:
<snip>
> >
> > Here are results of some of the benchmarks we have run in the past
> > (April 2005) with CKRM which showed no/negligible performance impact in
> > that scenario.
> > http://marc.theaimsgroup.com/?l=ckrm-tech&m=111325064322 305&w=2
> > http://marc.theaimsgroup.com/?l=ckrm-tech&m=111385973226 267&w=2
> > http://marc.theaimsgroup.com/?l=ckrm-tech&m=111291409731 929&w=2
> > >
>
>
> These are good results. But I still think the cost will increase over a
> period of time as more logic gets added. Any data on microbenchmarks

IMO, overhead may not increase for a _non-user_ of the feature.

> like lmbench.

I think we have run those, but I could not find the results in the
mailing list.
>
> > <snip>
> >
> > > > Not at all. If the container they are interested in is guaranteed, I do
> > > > not see how apps running outside a container would affect them.
> > > >
> > >
> > > Because the kernel (outside the container subsystem) doesn't know of
> >
> > The core resource subsystem (VM subsystem for memory) would know about
> > the guarantees and don't cares, and it would handle it appropriately.
> >
>
> ...meaning hooks in the generic kernel reclaim algorithm. Getting
> something like that in mainline will be at best tricky.

Yes, it does mean doing something in the reclamation path.

>
>
> -rohit
>
--

------------------------------------------------------------ ----------
Chandra Seetharaman | Be careful what you choose....
- sekharan@us.ibm.com | .......you may get it.
------------------------------------------------------------ ----------
Re: [ckrm-tech] [PATCH] BC: resource beancounters (v4) (added user memory) [message #6365 is a reply to message #6325] Thu, 14 September 2006 23:42 Go to previous messageGo to next message
Chandra Seetharaman is currently offline  Chandra Seetharaman
Messages: 88
Registered: August 2006
Member
On Thu, 2006-09-14 at 11:53 +0400, Pavel Emelianov wrote:

<snip>

> > What if I have 40 containers each with 2% guarantee ? what do we do
> > then ? and many other different combinations (what I gave was not the
> > _only_ scenario).
> >
> Then you need to solve a set of 40 equations. This sounds weird, but
> don't afraid - sets like these are solved lightly.

extrapolate that to a varying # of permutations and real time changes in
the system workload. Won't it be complex ?

Wouldn't it be a lot simpler if we have the guarantee support instead ?
Why you do not like guarantee ? :)

<snip>

> >> Then how do you make sure that memory WILL be available when the group needs
> >> it without limiting the others in a proper way?
> >>
> >
> > You could limit others only if you _know_ somebody is not getting what
> > they are supposed to get (based on guarantee).
> >
> I don't understand your idea. Limit does _not_ imply anything - it's
> just a limit.

I didn't mean "limit" as defined in BC. I meant it in the generic sense.
IOW, if we have to provide guarantees then it would limit other RGs from
getting that (amount of guaranteed) resource.

> You may limit anything to anyone w/o bothering the consequences.
> Guarantee implies that the resource you guarantee will be available and
> this "will be" is something not that easy.
>
> So I repeat my question - how can you be sure that these X megabytes you
> guarantee to some group won't be used by others so that you won't be able
> to reclaim them?

It depends on how the memory controller is implemented. It could be
implemented in different ways:
- reclamation path will _not_ free pages belonging to a RG that is
below its guarantee.
- allocation from a "over guarantee" RG can succeed iff there is
memory after satisfying all guarantees (or will free pages from the
requesting RG before it will succeed).
- ...

BTW, my point is to have guarantees for _all_ resources not just memory.

>
>
> ------------------------------------------------------------ -------------
> Using Tomcat but need to do more? Need to support web services, security?
> Get stuff done quickly with pre-integrated technology to make your job easier
> Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
> http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&b id=263057&dat=121642
> _______________________________________________
> ckrm-tech mailing list
> https://lists.sourceforge.net/lists/listinfo/ckrm-tech
--

------------------------------------------------------------ ----------
Chandra Seetharaman | Be careful what you choose....
- sekharan@us.ibm.com | .......you may get it.
------------------------------------------------------------ ----------
Re: [ckrm-tech] [PATCH] BC: resource beancounters (v4) (added user memory) [message #6366 is a reply to message #6349] Fri, 15 September 2006 00:02 Go to previous messageGo to next message
Chandra Seetharaman is currently offline  Chandra Seetharaman
Messages: 88
Registered: August 2006
Member
On Thu, 2006-09-14 at 17:02 +0400, Pavel Emelianov wrote:

<snip>
> >
> Reserving in advance means that sometimes you won't be able to start a
> new group without taking back some of reserved pages. This is ... strange.

I do not see it strange. At the time of creation, user sees the failure
(that there isn't enough resource to provide the required/requested
guarantee) and can act accordingly.

BTW, VMware does it this way.

>
> I think that a satisfactory solution now would be:
> - limit unreclaimable memory during mmap() against soft limit to prevent
> potential rejects during page faults;

we can have guarantee and still handle it this way.
> - reclaim memory in case of hitting hard limit;
> - guarantees are done via setting soft and hard limits as I've shown
> before.

complexity is high in doing that.
>
> The question still open is wether or not to account fractions.
> I propose to skip fractions for a while and try to charge the page to
> it's first user.

sounds fine

>
> So final BC design is:
> 1. three resources:
> - kernel memory
> - user unreclaimable memory
> - user reclaimable memory

should be able to get other controllers also under this framework.

> 2. unreclaimable memory is charged "in advance", reclaimable
> is charged "on demand" with reclamation if needed
> 3. each object (kernel one or user page) is charged to the
> first user
> 4. each resource controller declares it's own
> - meaning of "limit" parameter (percent/size/bandwidth/etc)
> - behaviour on changing limit (e.g. reclamation)
> - behaviour on hitting the limit (e.g. reclamation)
> 5. BC can be assigned to any task by pid (not just current)
> without recharging currently charged resources.

Please see the emails i sent earlier in this context:
http://marc.theaimsgroup.com/?l=ckrm-tech&m=115593001810 616&w=2

We would need at least:
- BC should be created/deleted explicitly by the user
- cleaner interface for controller writers

--

------------------------------------------------------------ ----------
Chandra Seetharaman | Be careful what you choose....
- sekharan@us.ibm.com | .......you may get it.
------------------------------------------------------------ ----------
Re: [ckrm-tech] [PATCH] BC: resource beancounters (v4) (added user memory) [message #6378 is a reply to message #6365] Fri, 15 September 2006 07:15 Go to previous messageGo to next message
Pavel Emelianov is currently offline  Pavel Emelianov
Messages: 1149
Registered: September 2006
Senior Member
Chandra Seetharaman wrote:
> On Thu, 2006-09-14 at 11:53 +0400, Pavel Emelianov wrote:
>
> <snip>
>
>
>>> What if I have 40 containers each with 2% guarantee ? what do we do
>>> then ? and many other different combinations (what I gave was not the
>>> _only_ scenario).
>>>
>>>
>> Then you need to solve a set of 40 equations. This sounds weird, but
>> don't afraid - sets like these are solved lightly.
>>
>
> extrapolate that to a varying # of permutations and real time changes in
> the system workload. Won't it be complex ?
>
I have a C program that computes limits to obtain desired guarantees
in a single 'for (i = 0; i < n; n++)' loop for any given set of guarantees.
With all error handling, beautifull output, nice formatting etc it weights
only 60 lines.
> Wouldn't it be a lot simpler if we have the guarantee support instead ?
> Why you do not like guarantee ? :)
>
I do not 'do not like guarantee'. I'm just sure that there are two ways
for providing guarantee (for unreclaimable resorces):
1. reserving resource for group in advance
2. limit resource for others
Reserving is worse as it is essentially limiting (you cut off 100Mb from
1Gb RAM thus limiting the other groups by 900Mb RAM), but this limiting
is too strict - you _have_ to reserve less than RAM size. Limiting in
run-time is more flexible (you may create an overcommited BC if you
want to) and leads to the same result - guarantee.
> <snip>
>
[snip]
Re: [ckrm-tech] [PATCH] BC: resource beancounters (v4) (added user memory) [message #6379 is a reply to message #6366] Fri, 15 September 2006 07:21 Go to previous messageGo to next message
Pavel Emelianov is currently offline  Pavel Emelianov
Messages: 1149
Registered: September 2006
Senior Member
Chandra Seetharaman wrote:
> On Thu, 2006-09-14 at 17:02 +0400, Pavel Emelianov wrote:
>
> <snip>
>
>> Reserving in advance means that sometimes you won't be able to start a
>> new group without taking back some of reserved pages. This is ... strange.
>>
>
> I do not see it strange. At the time of creation, user sees the failure
> (that there isn't enough resource to provide the required/requested
> guarantee) and can act accordingly.
>
> BTW, VMware does it this way.
>
And VPS density in VMware is MUCH lower than in
OpenVZ with beancounters :)
>
>> I think that a satisfactory solution now would be:
>> - limit unreclaimable memory during mmap() against soft limit to prevent
>> potential rejects during page faults;
>>
>
> we can have guarantee and still handle it this way.
>
>> - reclaim memory in case of hitting hard limit;
>> - guarantees are done via setting soft and hard limits as I've shown
>> before.
>>
>
> complexity is high in doing that.
>
Nope. I've already said in another letter that a program of 60 lines
does this in a single loop.
>> The question still open is wether or not to account fractions.
>> I propose to skip fractions for a while and try to charge the page to
>> it's first user.
>>
>
> sounds fine
>
>
>> So final BC design is:
>> 1. three resources:
>> - kernel memory
>> - user unreclaimable memory
>> - user reclaimable memory
>>
>
> should be able to get other controllers also under this framework.
>
OK. But note, that it's easy to add new resource to current BC code.
The most difficult thing is placing 'charge/uncharge' calls over the kernel.
>
>> 2. unreclaimable memory is charged "in advance", reclaimable
>> is charged "on demand" with reclamation if needed
>> 3. each object (kernel one or user page) is charged to the
>> first user
>> 4. each resource controller declares it's own
>> - meaning of "limit" parameter (percent/size/bandwidth/etc)
>> - behaviour on changing limit (e.g. reclamation)
>> - behaviour on hitting the limit (e.g. reclamation)
>> 5. BC can be assigned to any task by pid (not just current)
>> without recharging currently charged resources.
>>
>
> Please see the emails i sent earlier in this context:
> http://marc.theaimsgroup.com/?l=ckrm-tech&m=115593001810 616&w=2
>
> We would need at least:
> - BC should be created/deleted explicitly by the user
> - cleaner interface for controller writers
>
OK.
Next week we'll try to send a new set of patches.
Re: [ckrm-tech] [PATCH] BC: resource beancounters (v4) (added user memory) [message #6381 is a reply to message #6366] Fri, 15 September 2006 08:46 Go to previous messageGo to next message
dev is currently offline  dev
Messages: 1693
Registered: September 2005
Location: Moscow
Senior Member

> <snip>
>
>>Reserving in advance means that sometimes you won't be able to start a
>>new group without taking back some of reserved pages. This is ... strange.
>
>
> I do not see it strange. At the time of creation, user sees the failure
> (that there isn't enough resource to provide the required/requested
> guarantee) and can act accordingly.
>
> BTW, VMware does it this way.
This is not true at least for ESX server.
It overcommits memory and does dirty tricks like balooning to free memory then.

[...]

> We would need at least:
> - BC should be created/deleted explicitly by the user
> - cleaner interface for controller writers
why do you bother for the last too much?
The number of controlers is quite limited...

Kirill
Re: [ckrm-tech] [PATCH] BC: resource beancounters (v4) (added user memory) [message #6382 is a reply to message #6378] Fri, 15 September 2006 08:51 Go to previous messageGo to next message
dev is currently offline  dev
Messages: 1693
Registered: September 2005
Location: Moscow
Senior Member

Chandra,

>>>>What if I have 40 containers each with 2% guarantee ? what do we do
>>>>then ? and many other different combinations (what I gave was not the
>>>>_only_ scenario).
>>>>
>>>>
>>>
>>>Then you need to solve a set of 40 equations. This sounds weird, but
>>>don't afraid - sets like these are solved lightly.
>>>
>>
>>extrapolate that to a varying # of permutations and real time changes in
>>the system workload. Won't it be complex ?
>>
>
> I have a C program that computes limits to obtain desired guarantees
> in a single 'for (i = 0; i < n; n++)' loop for any given set of guarantees.
> With all error handling, beautifull output, nice formatting etc it weights
> only 60 lines.
>
>>Wouldn't it be a lot simpler if we have the guarantee support instead ?
the calculation above doesn't seem hard :)

>>Why you do not like guarantee ? :)

> I do not 'do not like guarantee'. I'm just sure that there are two ways
> for providing guarantee (for unreclaimable resorces):
> 1. reserving resource for group in advance
> 2. limit resource for others
> Reserving is worse as it is essentially limiting (you cut off 100Mb from
> 1Gb RAM thus limiting the other groups by 900Mb RAM), but this limiting
> is too strict - you _have_ to reserve less than RAM size. Limiting in
> run-time is more flexible (you may create an overcommited BC if you
> want to) and leads to the same result - guarantee.
I think this deserves putting on Wiki.
It is very good clear point.

Chanrda, do you propose some 3rd way (we are unaware of) of implementing guarantees?

Thanks,
Kirill
Re: [ckrm-tech] [PATCH] BC: resource beancounters (v4) (added user memory) [message #6383 is a reply to message #6197] Fri, 15 September 2006 08:53 Go to previous messageGo to next message
dev is currently offline  dev
Messages: 1693
Registered: September 2005
Location: Moscow
Senior Member

> CKRM/RG handles it this way:
>
> Amount of a resource a child RG gets is the ratio of its share value to
> the parent's total # of shares. Children's resource allocation can be
> changed just by changing the parent's total # of shares.
>
> If you case about initial situation would be:
> Total memory in the system 100MB
> parent's total # of shares: 100 (1 share == 1MB)
> 10 children with # of shares: 10 (i.e each children has 10MB)
>
> When I want to add another child, just change parent's total # of shares
> to be say 125:
> Total memory in the system 100MB
> parent's total # of shares: 125 (1 share == 0.8MB)
> 10 children with # of shares: 10 (i.e each children has 8MB)
> Now you are left with 25 shares (or 20MB) that you can assign to new
> child(ren) as you please.

setting memory in "shares" doesn't look user friendly at all...

Kirill
Re: [ckrm-tech] [PATCH] BC: resource beancounters (v4) (added user memory) [message #6399 is a reply to message #6382] Fri, 15 September 2006 11:15 Go to previous messageGo to next message
Pavel Emelianov is currently offline  Pavel Emelianov
Messages: 1149
Registered: September 2006
Senior Member
Kirill Korotaev wrote:

[snip]
>> I have a C program that computes limits to obtain desired guarantees
>> in a single 'for (i = 0; i < n; n++)' loop for any given set of guarantees.
>> With all error handling, beautifull output, nice formatting etc it weights
>> only 60 lines.

Look at http://wiki.openvz.org/Containers/Guarantees_for_resources
I've described there how a guarantee can be get with limiting in details.

[snip]

>> I do not 'do not like guarantee'. I'm just sure that there are two ways
>> for providing guarantee (for unreclaimable resorces):
>> 1. reserving resource for group in advance
>> 2. limit resource for others
>> Reserving is worse as it is essentially limiting (you cut off 100Mb from
>> 1Gb RAM thus limiting the other groups by 900Mb RAM), but this limiting
>> is too strict - you _have_ to reserve less than RAM size. Limiting in
>> run-time is more flexible (you may create an overcommited BC if you
>> want to) and leads to the same result - guarantee.
> I think this deserves putting on Wiki.
> It is very good clear point.

This is also on the page I gave link at.
Re: [ckrm-tech] [PATCH] BC: resource beancounters (v4) (added user memory) [message #6447 is a reply to message #6399] Mon, 18 September 2006 08:25 Go to previous messageGo to next message
Balbir Singh is currently offline  Balbir Singh
Messages: 491
Registered: August 2006
Senior Member
Pavel Emelianov wrote:
> Kirill Korotaev wrote:
>
> [snip]
>>> I have a C program that computes limits to obtain desired guarantees
>>> in a single 'for (i = 0; i < n; n++)' loop for any given set of guarantees.
>>> With all error handling, beautifull output, nice formatting etc it weights
>>> only 60 lines.
>
> Look at http://wiki.openvz.org/Containers/Guarantees_for_resources
> I've described there how a guarantee can be get with limiting in details.
>
> [snip]
>
>>> I do not 'do not like guarantee'. I'm just sure that there are two ways
>>> for providing guarantee (for unreclaimable resorces):
>>> 1. reserving resource for group in advance
>>> 2. limit resource for others
>>> Reserving is worse as it is essentially limiting (you cut off 100Mb from
>>> 1Gb RAM thus limiting the other groups by 900Mb RAM), but this limiting
>>> is too strict - you _have_ to reserve less than RAM size. Limiting in
>>> run-time is more flexible (you may create an overcommited BC if you
>>> want to) and leads to the same result - guarantee.
>> I think this deserves putting on Wiki.
>> It is very good clear point.
>
> This is also on the page I gave link at.
>

This approach has the following disadvantages
1. Lets consider initialization - When we create 'n' groups initially, we need
to spend O(n^2) time to assign guarantees.
2. Every time a limit or a guarantee changes, we need to recalculate guarantees
and ensure that the change will not break any guarantees
3. The same thing as stated above, when a resource group is created or deleted

This can lead to some instability; a change in one group propagates to all other groups.


--

Balbir Singh,
Linux Technology Center,
IBM Software Labs
Re: [ckrm-tech] [PATCH] BC: resource beancounters (v4) (added user memory) [message #6450 is a reply to message #6447] Mon, 18 September 2006 08:56 Go to previous messageGo to next message
Pavel Emelianov is currently offline  Pavel Emelianov
Messages: 1149
Registered: September 2006
Senior Member
Balbir Singh wrote:

[snip]

> This approach has the following disadvantages
> 1. Lets consider initialization - When we create 'n' groups
> initially, we need
> to spend O(n^2) time to assign guarantees.

1. Not guarantees - limits. If you do not need guarantees - assign
overcommited limits. Most of OpenVZ users do so and nobody claims.
2. If you start n groups at once then limits are calculated in O(n)
time, not O(n^2).

> 2. Every time a limit or a guarantee changes, we need to recalculate
> guarantees
> and ensure that the change will not break any guarantees

The same.

> 3. The same thing as stated above, when a resource group is created
> or deleted
>
> This can lead to some instability; a change in one group propagates to
> all other groups.

Let me cite a part of your answer on my letter from 11.09.2006:
"...
xemul> I have a node with 1Gb of ram and 10 containers with 100Mb
xemul> guarantee each. I want to start one more.
xemul> What shall I do not to break guarantees?

Don't start the new container or change the guarantees of the
existing ones to accommodate this one ... It would be perfectly
ok to have a container that does not care about guarantees to
set their guarantee to 0 and set their limit to the desired value
..."

The same for the limiting - either do not start new container, or
recalculate limits to meet new requirements. You may not take care of
guarantees as weel and create an overcommited configuration.

And one more thing. We've asked it many times and I ask it again -
please, show us the other way for providing guarantee rather than
limiting or reserving.
Re: [ckrm-tech] [PATCH] BC: resource beancounters (v4) (added user memory) [message #6461 is a reply to message #6450] Mon, 18 September 2006 11:20 Go to previous messageGo to next message
Balbir Singh is currently offline  Balbir Singh
Messages: 491
Registered: August 2006
Senior Member
Pavel Emelianov wrote:
> Balbir Singh wrote:
>
> [snip]
>
>> This approach has the following disadvantages
>> 1. Lets consider initialization - When we create 'n' groups
>> initially, we need
>> to spend O(n^2) time to assign guarantees.
>
> 1. Not guarantees - limits. If you do not need guarantees - assign
> overcommited limits. Most of OpenVZ users do so and nobody claims.
> 2. If you start n groups at once then limits are calculated in O(n)
> time, not O(n^2).

Yes.. if you start them at once, but if they are incrementally
added and started it is O(n^2)

>
>> 2. Every time a limit or a guarantee changes, we need to recalculate
>> guarantees
>> and ensure that the change will not break any guarantees
>
> The same.
>
>> 3. The same thing as stated above, when a resource group is created
>> or deleted
>>
>> This can lead to some instability; a change in one group propagates to
>> all other groups.
>
> Let me cite a part of your answer on my letter from 11.09.2006:
> "...
> xemul> I have a node with 1Gb of ram and 10 containers with 100Mb
> xemul> guarantee each. I want to start one more.
> xemul> What shall I do not to break guarantees?
>
> Don't start the new container or change the guarantees of the
> existing ones to accommodate this one ... It would be perfectly
> ok to have a container that does not care about guarantees to
> set their guarantee to 0 and set their limit to the desired value
> ..."
>
> The same for the limiting - either do not start new container, or
> recalculate limits to meet new requirements. You may not take care of
> guarantees as weel and create an overcommited configuration.
>
> And one more thing. We've asked it many times and I ask it again -
> please, show us the other way for providing guarantee rather than
> limiting or reserving.

There are some other options, I am sure Chandra will probably have
more.

1. Reclaim resources from other containers. This can be done well for
user-pages, if we ensure that each container does not mlock more
than its guaranteed share of memory.
2. Provide best effort guarantees for non-reclaimable memory
3. oom-kill a container or a task within a resource group that has
exceeded its guarantee and some other container is unable to meet its
guarantee

--

Balbir Singh,
Linux Technology Center,
IBM Software Labs
Re: [ckrm-tech] [PATCH] BC: resource beancounters (v4) (added user memory) [message #6462 is a reply to message #6461] Mon, 18 September 2006 11:32 Go to previous messageGo to next message
Pavel Emelianov is currently offline  Pavel Emelianov
Messages: 1149
Registered: September 2006
Senior Member
Balbir Singh wrote:
> Pavel Emelianov wrote:
>> Balbir Singh wrote:
>>
>> [snip]
>>
>>> This approach has the following disadvantages
>>> 1. Lets consider initialization - When we create 'n' groups
>>> initially, we need
>>> to spend O(n^2) time to assign guarantees.
>>
>> 1. Not guarantees - limits. If you do not need guarantees - assign
>> overcommited limits. Most of OpenVZ users do so and nobody claims.
>> 2. If you start n groups at once then limits are calculated in O(n)
>> time, not O(n^2).
>
> Yes.. if you start them at once, but if they are incrementally
> added and started it is O(n^2)

See my comment below.

>
>>
>>> 2. Every time a limit or a guarantee changes, we need to recalculate
>>> guarantees
>>> and ensure that the change will not break any guarantees
>>
>> The same.
>>
>>> 3. The same thing as stated above, when a resource group is created
>>> or deleted
>>>
>>> This can lead to some instability; a change in one group propagates to
>>> all other groups.
>>
>> Let me cite a part of your answer on my letter from 11.09.2006:
>> "...
>> xemul> I have a node with 1Gb of ram and 10 containers with 100Mb
>> xemul> guarantee each. I want to start one more.
>> xemul> What shall I do not to break guarantees?
>>
>> Don't start the new container or change the guarantees of the
>> existing ones to accommodate this one ... It would be perfectly
>> ok to have a container that does not care about guarantees to
>> set their guarantee to 0 and set their limit to the desired value
>> ..."
>>
>> The same for the limiting - either do not start new container, or
>> recalculate limits to meet new requirements. You may not take care of
>> guarantees as weel and create an overcommited configuration.

As I do not see any reply on this I consider "O(n^2) disadvantage" to
be irrelevant.

>>
>> And one more thing. We've asked it many times and I ask it again -
>> please, show us the other way for providing guarantee rather than
>> limiting or reserving.
>
> There are some other options, I am sure Chandra will probably have
> more.
>
> 1. Reclaim resources from other containers. This can be done well for
> user-pages, if we ensure that each container does not mlock more
> than its guaranteed share of memory.

We've already agreed to consider unreclaimable resources only.
If we provide reclaimable memory *only* then we can provide any
guarantee with a single page available for user-space.
Unreclaimable resource is the most interesting one.

> 2. Provide best effort guarantees for non-reclaimable memory

That's the question - how?

> 3. oom-kill a container or a task within a resource group that has
> exceeded its guarantee and some other container is unable to meet its
> guarantee

Oom-killer must start only when there are no other ways to find memory.
This must be a "last argument", not the regular solution.
Re: [ckrm-tech] [PATCH] BC: resource beancounters (v4) (added user memory) [message #6468 is a reply to message #6399] Mon, 18 September 2006 11:27 Go to previous messageGo to next message
Balbir Singh is currently offline  Balbir Singh
Messages: 491
Registered: August 2006
Senior Member
Pavel Emelianov wrote:
> Kirill Korotaev wrote:
>
> [snip]
>>> I have a C program that computes limits to obtain desired guarantees
>>> in a single 'for (i = 0; i < n; n++)' loop for any given set of guarantees.
>>> With all error handling, beautifull output, nice formatting etc it weights
>>> only 60 lines.
>
> Look at http://wiki.openvz.org/Containers/Guarantees_for_resources
> I've described there how a guarantee can be get with limiting in details.
>
> [snip]
>
>>> I do not 'do not like guarantee'. I'm just sure that there are two ways
>>> for providing guarantee (for unreclaimable resorces):
>>> 1. reserving resource for group in advance
>>> 2. limit resource for others
>>> Reserving is worse as it is essentially limiting (you cut off 100Mb from
>>> 1Gb RAM thus limiting the other groups by 900Mb RAM), but this limiting
>>> is too strict - you _have_ to reserve less than RAM size. Limiting in
>>> run-time is more flexible (you may create an overcommited BC if you
>>> want to) and leads to the same result - guarantee.
>> I think this deserves putting on Wiki.
>> It is very good clear point.
>
> This is also on the page I gave link at.


The program (calculate_limits()) listed on the website does not work for
the following case

N=2;
R=100;
g[2] = {30, 30};


The output is -10 and -10 for the limits

For

N=3;
R=100;
g[3] = {30, 30, 10};

I get -70, -70 and -110 as the limits

Am I interpreting the parameters correctly? Or the program is broken?

--

Balbir Singh,
Linux Technology Center,
IBM Software Labs
Re: [ckrm-tech] [PATCH] BC: resource beancounters (v4) (added user memory) [message #6470 is a reply to message #6468] Mon, 18 September 2006 12:37 Go to previous messageGo to previous message
Pavel Emelianov is currently offline  Pavel Emelianov
Messages: 1149
Registered: September 2006
Senior Member
Balbir Singh wrote:

[snip]
>
> The program (calculate_limits()) listed on the website does not work for
> the following case
>
> N=2;
> R=100;
> g[2] = {30, 30};
>
>
> The output is -10 and -10 for the limits
>
> For
>
> N=3;
> R=100;
> g[3] = {30, 30, 10};
>
> I get -70, -70 and -110 as the limits
>
> Am I interpreting the parameters correctly? Or the program is broken?
>

Program on site is broken. Thanks for noticing:

$ gcc guar.c -o guar
$ ./guar 30 30
guar lim
30 70 ( 70/1)
30 70 ( 70/1)
$ ./guar 30 30 10
guar lim
30 45 ( 90/2)
30 45 ( 90/2)
10 25 ( 50/2)


To stop future "errors" remember that this is a simplified program that
considers guarantees to be <= 100%, sum of guarantees to be <= 100% etc.
Previous Topic: Acks for 3 pid-namespace patches
Next Topic: [Patch 01/05]- Containers: Documentation on using containers
Goto Forum:
  


Current Time: Fri Oct 24 20:02:37 GMT 2025

Total time taken to generate the page: 0.09849 seconds