OpenVZ Forum


Home » Mailing lists » Devel » [PATCH] BC: resource beancounters (v4) (added user memory)
Re: [ckrm-tech] [PATCH] BC: resource beancounters (v4) (added user memory) [message #6043 is a reply to message #6024] Thu, 07 September 2006 07:29 Go to previous messageGo to next message
Pavel Emelianov is currently offline  Pavel Emelianov
Messages: 1149
Registered: September 2006
Senior Member
Chandra Seetharaman wrote:

[snip]
>>> Will we need new user/kernel interfaces for cpu, i/o bandwidth, etc...?
>>>
>> no. no new interfaces are required.
>>
>
> Good to know that.
>
> Your CPU controller supports guarantee ?
>
It does, but CPU controller is not so simple as memory one.
> Do you have a i/o controller ?
>
>
>> BUT: I remind you the talks at OKS/OLS and in previous UBC discussions.
>> It was noted that having a separate interfaces for CPU, I/O bandwidth
>>
>
> But, it will be lot simpler for the user to configure/use if they are
> together. We should discuss this also.
>
IMHO such unification may only imply that one syscall is used to pass
configuration info into kernel.
Each controller has specific configurating parameters different from the
other ones. E.g. CPU controller must assign a "weight" to each group to
share CPU time accordingly, but what is a "weight" for memory controller?
IO may operate on "bandwidth" and it's not clear what is a "bandwidth" in
Kb/sec for CPU controller and so on.

[snip]
>> The question is - whether web server is multithreaded or not...
>> If it is not - then no problem here, you can change current
>> context and new resources will be charged accordingly.
>>
>> And current BC code is _able_ to handle it with _minor_ changes.
>> (One just need to save bc not on mm struct, but rather on vma struct
>> and change mm->bc on set_bc_id()).
>>
>> However, no one (can some one from CKRM team please?) explained so far
>> what to do with threads. Consider the following example.
>>
>> 1. Threaded web server spawns a child to serve a client.
>> 2. child thread touches some pages and they are charged to child BC
>> (which differs from parent's one)
>> 3. child exits, but since its mm is shared with parent, these pages
>> stay mapped and charged to child BC.
>>
>> So the question is: what to do with these pages?
>> - should we recharge them to another BC?
>> - leave them charged?
>>
>
> Leave them charged. It will be charged to the appropriate UBC when they
> touch it again.
>
Do you mean that page must be re-charged each time someone touches it?
Re: [ckrm-tech] [PATCH 11/13] BC: vmrss (preparations) [message #6078 is a reply to message #5933] Thu, 07 September 2006 16:28 Go to previous messageGo to next message
Balbir Singh is currently offline  Balbir Singh
Messages: 491
Registered: August 2006
Senior Member
Kirill Korotaev wrote:
> This patch does simple things:
> - intruduces an bc_magic field on beancunter to make sure
> union on struct page is correctly used in next patches
> - adds nr_beancounters
> - adds unused_privvmpages variable (counter of privvm pages
> which are not mapped into VM address space and thus potentially
> can be allocated later)
>
> +static inline void privvm_uncharge(struct beancounter *bc, unsigned long sz)
> +{
> + if (unlikely(bc->unused_privvmpages < sz)) {
> + printk("BC: overuncharging %d unused pages: val %lu held %lu\n",
> + bc->bc_id, sz, bc->unused_privvmpages);

I hit this path, when I do not enable CONFIG_BEANCOUNTERS_RSS. I suspect it has
something to do with the code in mod_rss_pages(). I suspect the that
CONFIG_BEANCOUNTERS_RSS needs to be enabled to get the accounting right.

In addition, Could you please make this a warning with KERN_WARNING.

> + sz = bc->unused_privvmpages;
> + }
> + bc->unused_privvmpages -= sz;
> + bc_update_privvmpages(bc);
> +}
> +
--

Balbir Singh,
Linux Technology Center,
IBM Software Labs
Re: [ckrm-tech] [PATCH] BC: resource beancounters (v4) (added user memory) [message #6083 is a reply to message #6043] Thu, 07 September 2006 19:16 Go to previous messageGo to next message
Chandra Seetharaman is currently offline  Chandra Seetharaman
Messages: 88
Registered: August 2006
Member
On Thu, 2006-09-07 at 11:29 +0400, Pavel Emelianov wrote:
> Chandra Seetharaman wrote:
>
> [snip]
> >>> Will we need new user/kernel interfaces for cpu, i/o bandwidth, etc...?
> >>>
> >> no. no new interfaces are required.
> >>
> >
> > Good to know that.
> >
> > Your CPU controller supports guarantee ?
> >
> It does, but CPU controller is not so simple as memory one.

Hmm... the reason I asked is that the UBC infrastructure doesn't provide
guarantee support and Kirill mentioned there is no changes required to
UBC if you have to move your CPU controller to be under UBC.

>From your reply it does look like you need to make some changes (add
guarantee support) to UBC, if you want to move the CPU controller to be
under UBC.

> > Do you have a i/o controller ?
> >
> >
> >> BUT: I remind you the talks at OKS/OLS and in previous UBC discussions.
> >> It was noted that having a separate interfaces for CPU, I/O bandwidth
> >>
> >
> > But, it will be lot simpler for the user to configure/use if they are
> > together. We should discuss this also.
> >
> IMHO such unification may only imply that one syscall is used to pass
> configuration info into kernel.
> Each controller has specific configurating parameters different from the
> other ones. E.g. CPU controller must assign a "weight" to each group to
> share CPU time accordingly, but what is a "weight" for memory controller?
> IO may operate on "bandwidth" and it's not clear what is a "bandwidth" in
> Kb/sec for CPU controller and so on.
>
> [snip]
> >> The question is - whether web server is multithreaded or not...
> >> If it is not - then no problem here, you can change current
> >> context and new resources will be charged accordingly.
> >>
> >> And current BC code is _able_ to handle it with _minor_ changes.
> >> (One just need to save bc not on mm struct, but rather on vma struct
> >> and change mm->bc on set_bc_id()).
> >>
> >> However, no one (can some one from CKRM team please?) explained so far
> >> what to do with threads. Consider the following example.
> >>
> >> 1. Threaded web server spawns a child to serve a client.
> >> 2. child thread touches some pages and they are charged to child BC
> >> (which differs from parent's one)
> >> 3. child exits, but since its mm is shared with parent, these pages
> >> stay mapped and charged to child BC.
> >>
> >> So the question is: what to do with these pages?
> >> - should we recharge them to another BC?
> >> - leave them charged?
> >>
> >
> > Leave them charged. It will be charged to the appropriate UBC when they
> > touch it again.
> >
> Do you mean that page must be re-charged each time someone touches it?

What I meant is that to leave them charged, and if when they are
ummapped and mapped later, charge it to the appropriate BC.

--

------------------------------------------------------------ ----------
Chandra Seetharaman | Be careful what you choose....
- sekharan@us.ibm.com | .......you may get it.
------------------------------------------------------------ ----------
Re: [ckrm-tech] [PATCH] BC: resource beancounters (v4) (added user memory) [message #6085 is a reply to message #6043] Thu, 07 September 2006 19:29 Go to previous messageGo to next message
Chandra Seetharaman is currently offline  Chandra Seetharaman
Messages: 88
Registered: August 2006
Member
On Thu, 2006-09-07 at 11:29 +0400, Pavel Emelianov wrote:
<snip>

> >> BUT: I remind you the talks at OKS/OLS and in previous UBC discussions.
> >> It was noted that having a separate interfaces for CPU, I/O bandwidth
> >>
> >
> > But, it will be lot simpler for the user to configure/use if they are
> > together. We should discuss this also.
> >
> IMHO such unification may only imply that one syscall is used to pass
> configuration info into kernel.
> Each controller has specific configurating parameters different from the
> other ones. E.g. CPU controller must assign a "weight" to each group to
> share CPU time accordingly, but what is a "weight" for memory controller?
> IO may operate on "bandwidth" and it's not clear what is a "bandwidth" in
> Kb/sec for CPU controller and so on.

CKRM/RG handles this by eliminating the units from the interface and
abstracting them to be "shares". Each resource controller converts the
shares to its own units and handles properly.

User can specify the quantities simply as a percentage. CPU controller
would treat it as cycles/ticks (within a time), memory controller would
treat it as number of pages, and I/O controller would treat it as
bandwidth, and so on...

<snip>
--

------------------------------------------------------------ ----------
Chandra Seetharaman | Be careful what you choose....
- sekharan@us.ibm.com | .......you may get it.
------------------------------------------------------------ ----------
Re: [ckrm-tech] [PATCH] BC: resource beancounters (v4) (added user memory) [message #6095 is a reply to message #6083] Fri, 08 September 2006 07:22 Go to previous messageGo to next message
Pavel Emelianov is currently offline  Pavel Emelianov
Messages: 1149
Registered: September 2006
Senior Member
Chandra Seetharaman wrote:

[snip]
>>>> The question is - whether web server is multithreaded or not...
>>>> If it is not - then no problem here, you can change current
>>>> context and new resources will be charged accordingly.
>>>>
>>>> And current BC code is _able_ to handle it with _minor_ changes.
>>>> (One just need to save bc not on mm struct, but rather on vma struct
>>>> and change mm->bc on set_bc_id()).
>>>>
>>>> However, no one (can some one from CKRM team please?) explained so far
>>>> what to do with threads. Consider the following example.
>>>>
>>>> 1. Threaded web server spawns a child to serve a client.
>>>> 2. child thread touches some pages and they are charged to child BC
>>>> (which differs from parent's one)
>>>> 3. child exits, but since its mm is shared with parent, these pages
>>>> stay mapped and charged to child BC.
>>>>
>>>> So the question is: what to do with these pages?
>>>> - should we recharge them to another BC?
>>>> - leave them charged?
>>>>
>>>>
>>> Leave them charged. It will be charged to the appropriate UBC when they
>>> touch it again.
>>>
>>>
>> Do you mean that page must be re-charged each time someone touches it?
>>
>
> What I meant is that to leave them charged, and if when they are
> ummapped and mapped later, charge it to the appropriate BC.
>
In this case multithreaded apache that tries to serve each domain in
separate BC will fill the memory with BC-s, held by pages allocated
and mapped in threads.
Re: [ckrm-tech] [PATCH] BC: resource beancounters (v4) (added user memory) [message #6096 is a reply to message #6085] Fri, 08 September 2006 07:26 Go to previous messageGo to next message
Pavel Emelianov is currently offline  Pavel Emelianov
Messages: 1149
Registered: September 2006
Senior Member
Chandra Seetharaman wrote:
> On Thu, 2006-09-07 at 11:29 +0400, Pavel Emelianov wrote:
> <snip>
>
>
>>>> BUT: I remind you the talks at OKS/OLS and in previous UBC discussions.
>>>> It was noted that having a separate interfaces for CPU, I/O bandwidth
>>>>
>>>>
>>> But, it will be lot simpler for the user to configure/use if they are
>>> together. We should discuss this also.
>>>
>>>
>> IMHO such unification may only imply that one syscall is used to pass
>> configuration info into kernel.
>> Each controller has specific configurating parameters different from the
>> other ones. E.g. CPU controller must assign a "weight" to each group to
>> share CPU time accordingly, but what is a "weight" for memory controller?
>> IO may operate on "bandwidth" and it's not clear what is a "bandwidth" in
>> Kb/sec for CPU controller and so on.
>>
>
> CKRM/RG handles this by eliminating the units from the interface and
> abstracting them to be "shares". Each resource controller converts the
> shares to its own units and handles properly.
>
That's what I'm talking about - common syscall/ioct/etc and each controller
parses its input itself. That's OK for us.

[snip]
Re: [ckrm-tech] [PATCH] BC: resource beancounters (v4) (added user memory) [message #6098 is a reply to message #6025] Fri, 08 September 2006 07:33 Go to previous messageGo to next message
Pavel Emelianov is currently offline  Pavel Emelianov
Messages: 1149
Registered: September 2006
Senior Member
Chandra Seetharaman wrote:
> On Thu, 2006-09-07 at 00:47 +0530, Balbir Singh wrote:
>
> <snip>
>> Some not quite so urgent ones - like support for guarantees. I think
>> this can
>
> IMO, guarantee support should be considered to be part of the
> infrastructure. Controller functionalities/implementation will be
> different with/without guarantee support. In other words, adding
> guarantee feature later will cause re-implementations.
I'm afraid we have different understandings of what a "guarantee" is.
Don't we?
Guarantee may be one of

1. container will be able to touch that number of pages
2. container will be able to sys_mmap() that number of pages
3. container will not be killed unless it touches that number of pages
4. anything else

Let's decide what kind of a guarantee we want.
>> be worked out as we make progress.
>>
>>> I agree with these requirements and lets move into this direction.
>>> But moving so far can't be done without accepting:
>>> 1. core functionality
>>> 2. accounting
>>>
>> Some of the core functionality might be a limiting factor for the requirements.
>> Lets agree on the requirements, I think its a great step forward and then
>> build the core functionality with these requirements in mind.
>>
>>> Thanks,
>>> Kirill
>>>
Re: [ckrm-tech] [PATCH] BC: resource beancounters (v4) (added user memory) [message #6118 is a reply to message #5992] Fri, 08 September 2006 15:33 Go to previous messageGo to next message
Dave Hansen is currently offline  Dave Hansen
Messages: 240
Registered: October 2005
Senior Member
On Wed, 2006-09-06 at 17:06 +0400, Kirill Korotaev wrote:
> It was discussed multiple times already.
> The key problem here is the objects which do not _belong_ to tasks.

Heh. The original CKRM patches didn't have a strong binding to tasks.
They took it away to make them more mergeable. ;)

-- Dave
Re: [ckrm-tech] [PATCH] BC: resource beancounters (v4) (added user memory) [message #6120 is a reply to message #6098] Fri, 08 September 2006 15:43 Go to previous messageGo to next message
Dave Hansen is currently offline  Dave Hansen
Messages: 240
Registered: October 2005
Senior Member
On Fri, 2006-09-08 at 11:33 +0400, Pavel Emelianov wrote:
> I'm afraid we have different understandings of what a "guarantee" is.

It appears so.

> Don't we?
> Guarantee may be one of
>
> 1. container will be able to touch that number of pages
> 2. container will be able to sys_mmap() that number of pages
> 3. container will not be killed unless it touches that number of pages

A "death sentence" guarantee? I like it. :)

> 4. anything else
>
> Let's decide what kind of a guarantee we want.

I think of it as: "I will be allowed to use this many total pages, and
they are guaranteed not to fail." (1), I think. The sum of all of the
system's guarantees must be less than or equal to the amount of free
memory on the machine.

If we knew to which NUMA node the memory was going to go, we might as
well take the pages out of the allocator.

-- Dave
Re: [ckrm-tech] [PATCH] BC: resource beancounters (v4) (added user memory) [message #6122 is a reply to message #5954] Fri, 08 September 2006 15:30 Go to previous messageGo to next message
Dave Hansen is currently offline  Dave Hansen
Messages: 240
Registered: October 2005
Senior Member
On Tue, 2006-09-05 at 17:17 -0700, Rohit Seth wrote:
> I'm wondering why not have different processes to serve different
> domains on the same physical server...particularly when they have
> different database to work on.

This is largely because this is I think how it is done today, and it has
a lot of disadvantages. They also want to be able to account for
traffic on the same database. Think of a large web hosting environment
where you charged everyone (hundreds or thousands of users) by CPU and
I/O bandwidth used at all levels of a given transaction.

> Is the amount of memory that you save by
> having a single copy that much useful that you are even okay to
> serialize the whole operation (What would happen, while the request for
> foo.com is getting worked on, there is another request for
> foo_bar.com...does it need to wait for foo.com request to get done
> before it can be served).

Let's put it this way. Enterprise databases can be memory pigs. It
isn't feasible to run hundreds or thousands of copies on each machine.

-- Dave
Re: [ckrm-tech] [PATCH] BC: resource beancounters (v4) (added user memory) [message #6125 is a reply to message #6122] Fri, 08 September 2006 17:10 Go to previous messageGo to next message
Rohit Seth is currently offline  Rohit Seth
Messages: 101
Registered: August 2006
Senior Member
On Fri, 2006-09-08 at 08:30 -0700, Dave Hansen wrote:
> On Tue, 2006-09-05 at 17:17 -0700, Rohit Seth wrote:
> > I'm wondering why not have different processes to serve different
> > domains on the same physical server...particularly when they have
> > different database to work on.
>
> This is largely because this is I think how it is done today, and it has
> a lot of disadvantages.

If it has lot of disadvantages then we should try to avoid that
mechanism. Though I think it is okay to allow processes to be moved
around with the clear expectation that it is a very heavy operation (as
I think at least all the anon pages should be moved too along with task)
and should not be generally done.

> They also want to be able to account for
> traffic on the same database. Think of a large web hosting environment
> where you charged everyone (hundreds or thousands of users) by CPU and
> I/O bandwidth used at all levels of a given transaction.
>
> > Is the amount of memory that you save by
> > having a single copy that much useful that you are even okay to
> > serialize the whole operation (What would happen, while the request for
> > foo.com is getting worked on, there is another request for
> > foo_bar.com...does it need to wait for foo.com request to get done
> > before it can be served).
>
> Let's put it this way. Enterprise databases can be memory pigs. It
> isn't feasible to run hundreds or thousands of copies on each machine.
>


The extra cost is probably the stack and private data segment...yes
there could be trade offs there depending on how big these segments are.
Though if there are big shared segments then that can be charged to a
single container.

Thanks,
-rohit
Re: [ckrm-tech] [PATCH] BC: resource beancounters (v4) (added user memory) [message #6131 is a reply to message #6120] Fri, 08 September 2006 18:26 Go to previous messageGo to next message
Balbir Singh is currently offline  Balbir Singh
Messages: 491
Registered: August 2006
Senior Member
Dave Hansen wrote:
> On Fri, 2006-09-08 at 11:33 +0400, Pavel Emelianov wrote:
>> I'm afraid we have different understandings of what a "guarantee" is.
>
> It appears so.
>
>> Don't we?
>> Guarantee may be one of
>>
>> 1. container will be able to touch that number of pages
>> 2. container will be able to sys_mmap() that number of pages
>> 3. container will not be killed unless it touches that number of pages
>
> A "death sentence" guarantee? I like it. :)
>
>> 4. anything else
>>
>> Let's decide what kind of a guarantee we want.

I think of guarantees w.r.t resources as the lower limit on the resource.
Guarantees and limits can be thought of as the range (guarantee, limit]
for the usage of the resource.

>
> I think of it as: "I will be allowed to use this many total pages, and
> they are guaranteed not to fail." (1), I think. The sum of all of the
> system's guarantees must be less than or equal to the amount of free
> memory on the machine.
>

Yes, totally agree.

> If we knew to which NUMA node the memory was going to go, we might as
> well take the pages out of the allocator.
>
> -- Dave
>
--

Balbir Singh,
Linux Technology Center,
IBM Software Labs
Re: [ckrm-tech] [PATCH] BC: resource beancounters (v4) (added user memory) [message #6132 is a reply to message #6095] Fri, 08 September 2006 19:07 Go to previous messageGo to next message
Chandra Seetharaman is currently offline  Chandra Seetharaman
Messages: 88
Registered: August 2006
Member
On Fri, 2006-09-08 at 11:22 +0400, Pavel Emelianov wrote:
> Chandra Seetharaman wrote:
>
> [snip]
> >>>> The question is - whether web server is multithreaded or not...
> >>>> If it is not - then no problem here, you can change current
> >>>> context and new resources will be charged accordingly.
> >>>>
> >>>> And current BC code is _able_ to handle it with _minor_ changes.
> >>>> (One just need to save bc not on mm struct, but rather on vma struct
> >>>> and change mm->bc on set_bc_id()).
> >>>>
> >>>> However, no one (can some one from CKRM team please?) explained so far
> >>>> what to do with threads. Consider the following example.
> >>>>
> >>>> 1. Threaded web server spawns a child to serve a client.
> >>>> 2. child thread touches some pages and they are charged to child BC
> >>>> (which differs from parent's one)
> >>>> 3. child exits, but since its mm is shared with parent, these pages
> >>>> stay mapped and charged to child BC.
> >>>>
> >>>> So the question is: what to do with these pages?
> >>>> - should we recharge them to another BC?
> >>>> - leave them charged?
> >>>>
> >>>>
> >>> Leave them charged. It will be charged to the appropriate UBC when they
> >>> touch it again.
> >>>
> >>>
> >> Do you mean that page must be re-charged each time someone touches it?
> >>
> >
> > What I meant is that to leave them charged, and if when they are
> > ummapped and mapped later, charge it to the appropriate BC.
> >
> In this case multithreaded apache that tries to serve each domain in
> separate BC will fill the memory with BC-s, held by pages allocated
> and mapped in threads.

I do not understand how the memory will be filled with BCs. Can you
explain, please.
--

------------------------------------------------------------ ----------
Chandra Seetharaman | Be careful what you choose....
- sekharan@us.ibm.com | .......you may get it.
------------------------------------------------------------ ----------
Re: [ckrm-tech] [PATCH] BC: resource beancounters (v4) (added user memory) [message #6133 is a reply to message #6096] Fri, 08 September 2006 19:10 Go to previous messageGo to next message
Chandra Seetharaman is currently offline  Chandra Seetharaman
Messages: 88
Registered: August 2006
Member
On Fri, 2006-09-08 at 11:26 +0400, Pavel Emelianov wrote:
> Chandra Seetharaman wrote:
> > On Thu, 2006-09-07 at 11:29 +0400, Pavel Emelianov wrote:
> > <snip>
> >
> >
> >>>> BUT: I remind you the talks at OKS/OLS and in previous UBC discussions.
> >>>> It was noted that having a separate interfaces for CPU, I/O bandwidth
> >>>>
> >>>>
> >>> But, it will be lot simpler for the user to configure/use if they are
> >>> together. We should discuss this also.
> >>>
> >>>
> >> IMHO such unification may only imply that one syscall is used to pass
> >> configuration info into kernel.
> >> Each controller has specific configurating parameters different from the
> >> other ones. E.g. CPU controller must assign a "weight" to each group to
> >> share CPU time accordingly, but what is a "weight" for memory controller?
> >> IO may operate on "bandwidth" and it's not clear what is a "bandwidth" in
> >> Kb/sec for CPU controller and so on.
> >>
> >
> > CKRM/RG handles this by eliminating the units from the interface and
> > abstracting them to be "shares". Each resource controller converts the
> > shares to its own units and handles properly.
> >
> That's what I'm talking about - common syscall/ioct/etc and each controller
> parses its input itself. That's OK for us.

Yes, we can eliminate the "units"(KBs, cycles/ticks, pages etc.,) from
the interface and use a (unitless) number to specify the amount of
resource a resource group/container uses.
>
> [snip]
>
> ------------------------------------------------------------ -------------
> Using Tomcat but need to do more? Need to support web services, security?
> Get stuff done quickly with pre-integrated technology to make your job easier
> Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
> http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&b id=263057&dat=121642
> _______________________________________________
> ckrm-tech mailing list
> https://lists.sourceforge.net/lists/listinfo/ckrm-tech
--

------------------------------------------------------------ ----------
Chandra Seetharaman | Be careful what you choose....
- sekharan@us.ibm.com | .......you may get it.
------------------------------------------------------------ ----------
Re: [ckrm-tech] [PATCH] BC: resource beancounters (v4) (added user memory) [message #6134 is a reply to message #6098] Fri, 08 September 2006 19:23 Go to previous messageGo to next message
Chandra Seetharaman is currently offline  Chandra Seetharaman
Messages: 88
Registered: August 2006
Member
On Fri, 2006-09-08 at 11:33 +0400, Pavel Emelianov wrote:
> Chandra Seetharaman wrote:
> > On Thu, 2006-09-07 at 00:47 +0530, Balbir Singh wrote:
> >
> > <snip>
> >> Some not quite so urgent ones - like support for guarantees. I think
> >> this can
> >
> > IMO, guarantee support should be considered to be part of the
> > infrastructure. Controller functionalities/implementation will be
> > different with/without guarantee support. In other words, adding
> > guarantee feature later will cause re-implementations.
> I'm afraid we have different understandings of what a "guarantee" is.
> Don't we?

may be (I am not sure :), lets get it clarified.

> Guarantee may be one of
>
> 1. container will be able to touch that number of pages
> 2. container will be able to sys_mmap() that number of pages
> 3. container will not be killed unless it touches that number of pages
> 4. anything else

I would say (1) with slight modification
"container will be able to touch _at least_ that number of pages"

Note that it is not only in the context of memory alone, it is generic
across resources.

For CPU it will be, "container will get _at least_ X ticks in Y seconds"

For number of tasks it will be, "container will get _at least_ X active
tasks at any point of time" and so on.

And as Dave pointed, sum of guarantees of all containers _can not_
exceed the total amount of that resource available at the system level.

>
> Let's decide what kind of a guarantee we want.
> >> be worked out as we make progress.
> >>
> >>> I agree with these requirements and lets move into this direction.
> >>> But moving so far can't be done without accepting:
> >>> 1. core functionality
> >>> 2. accounting
> >>>
> >> Some of the core functionality might be a limiting factor for the requirements.
> >> Lets agree on the requirements, I think its a great step forward and then
> >> build the core functionality with these requirements in mind.
> >>
> >>> Thanks,
> >>> Kirill
> >>>
>
>
> ------------------------------------------------------------ -------------
> Using Tomcat but need to do more? Need to support web services, security?
> Get stuff done quickly with pre-integrated technology to make your job easier
> Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
> http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&b id=263057&dat=121642
> _______________________________________________
> ckrm-tech mailing list
> https://lists.sourceforge.net/lists/listinfo/ckrm-tech
--

------------------------------------------------------------ ----------
Chandra Seetharaman | Be careful what you choose....
- sekharan@us.ibm.com | .......you may get it.
------------------------------------------------------------ ----------
Re: [ckrm-tech] [PATCH] BC: resource beancounters (v4) (added user memory) [message #6136 is a reply to message #6134] Fri, 08 September 2006 21:43 Go to previous messageGo to next message
Rohit Seth is currently offline  Rohit Seth
Messages: 101
Registered: August 2006
Senior Member
On Fri, 2006-09-08 at 12:23 -0700, Chandra Seetharaman wrote:
> On Fri, 2006-09-08 at 11:33 +0400, Pavel Emelianov wrote:
> > Chandra Seetharaman wrote:
> > > On Thu, 2006-09-07 at 00:47 +0530, Balbir Singh wrote:
> > >
> > > <snip>
> > >> Some not quite so urgent ones - like support for guarantees. I think
> > >> this can
> > >
> > > IMO, guarantee support should be considered to be part of the
> > > infrastructure. Controller functionalities/implementation will be
> > > different with/without guarantee support. In other words, adding
> > > guarantee feature later will cause re-implementations.
> > I'm afraid we have different understandings of what a "guarantee" is.
> > Don't we?
>
> may be (I am not sure :), lets get it clarified.
>
> > Guarantee may be one of
> >
> > 1. container will be able to touch that number of pages
> > 2. container will be able to sys_mmap() that number of pages
> > 3. container will not be killed unless it touches that number of pages
> > 4. anything else
>
> I would say (1) with slight modification
> "container will be able to touch _at least_ that number of pages"
>

Does this scheme support running of tasks outside of containers on the
same platform where you have tasks running inside containers. If so
then how will you ensure processes running out side any container will
not leave less than the total guaranteed memory to different containers.



-rohit
Re: [ckrm-tech] [PATCH] BC: resource beancounters (v4) (added user memory) [message #6140 is a reply to message #6125] Fri, 08 September 2006 17:26 Go to previous messageGo to next message
Shailabh Nagar is currently offline  Shailabh Nagar
Messages: 2
Registered: September 2006
Junior Member
Rohit Seth wrote:
> On Fri, 2006-09-08 at 08:30 -0700, Dave Hansen wrote:
>> On Tue, 2006-09-05 at 17:17 -0700, Rohit Seth wrote:
>>> I'm wondering why not have different processes to serve different
>>> domains on the same physical server...particularly when they have
>>> different database to work on.
>> This is largely because this is I think how it is done today, and it has
>> a lot of disadvantages.
>
> If it has lot of disadvantages then we should try to avoid that
> mechanism. Though I think it is okay to allow processes to be moved
> around with the clear expectation that it is a very heavy operation (as
> I think at least all the anon pages should be moved too along with task)
> and should not be generally done.
>
>> They also want to be able to account for
>> traffic on the same database. Think of a large web hosting environment
>> where you charged everyone (hundreds or thousands of users) by CPU and
>> I/O bandwidth used at all levels of a given transaction.
>>
>>> Is the amount of memory that you save by
>>> having a single copy that much useful that you are even okay to
>>> serialize the whole operation (What would happen, while the request for
>>> foo.com is getting worked on, there is another request for
>>> foo_bar.com...does it need to wait for foo.com request to get done
>>> before it can be served).
>> Let's put it this way. Enterprise databases can be memory pigs. It
>> isn't feasible to run hundreds or thousands of copies on each machine.
>>
>
>
> The extra cost is probably the stack and private data segment...

Also maintenability, licensing, blah, blah.
Replicating the software stack for each service level one
wishes to provide, if avoidable as it seems to be, isn't such a good idea.
Same sort of reasoning for why containers make sense compared to Xen/VMWare
instances.

Memory resources, by their very nature, will be tougher to account when a
single database/app server services multiple clients and we can essentially
give up on that (taking the approach that only limited recharging can ever
be achieved). But cpu atleast is easy to charge correctly and since that will
also indirectly influence the requests for memory & I/O, its useful to allow
middleware to change the accounting base for a thread/task.

--Shailabh

> yes
> there could be trade offs there depending on how big these segments are.
> Though if there are big shared segments then that can be charged to a
> single container.

>
> Thanks,
> -rohit
>
>
> ------------------------------------------------------------ -------------
> Using Tomcat but need to do more? Need to support web services, security?
> Get stuff done quickly with pre-integrated technology to make your job easier
> Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
> http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&b id=263057&dat=121642
> _______________________________________________
> ckrm-tech mailing list
> https://lists.sourceforge.net/lists/listinfo/ckrm-tech
Re: [ckrm-tech] [PATCH] BC: resource beancounters (v4) (added user memory) [message #6163 is a reply to message #6131] Mon, 11 September 2006 06:56 Go to previous messageGo to next message
Pavel Emelianov is currently offline  Pavel Emelianov
Messages: 1149
Registered: September 2006
Senior Member
Balbir Singh wrote:
> Dave Hansen wrote:
>> On Fri, 2006-09-08 at 11:33 +0400, Pavel Emelianov wrote:
>>> I'm afraid we have different understandings of what a "guarantee" is.
>>
>> It appears so.
>>
>>> Don't we?
>>> Guarantee may be one of
>>>
>>> 1. container will be able to touch that number of pages
>>> 2. container will be able to sys_mmap() that number of pages
>>> 3. container will not be killed unless it touches that number of
>>> pages
>>
>> A "death sentence" guarantee? I like it. :)
>>
>>> 4. anything else
>>>
>>> Let's decide what kind of a guarantee we want.
>
> I think of guarantees w.r.t resources as the lower limit on the resource.
> Guarantees and limits can be thought of as the range (guarantee, limit]
> for the usage of the resource.
>
>>
>> I think of it as: "I will be allowed to use this many total pages, and
>> they are guaranteed not to fail." (1), I think. The sum of all of the
>> system's guarantees must be less than or equal to the amount of free
>> memory on the machine.
>
> Yes, totally agree.

Such a guarantee is really a limit and this limit is even harder than
BC's one :)

E.g. I have a node with 1Gb of ram and 10 containers with 100Mb
guarantee each.
I want to start one more. What shall I do not to break guarantees?

>
>> If we knew to which NUMA node the memory was going to go, we might as
>> well take the pages out of the allocator.
>>
>> -- Dave
>>
Re: [ckrm-tech] [PATCH] BC: resource beancounters (v4) (added user memory) [message #6165 is a reply to message #6132] Mon, 11 September 2006 07:02 Go to previous messageGo to next message
Pavel Emelianov is currently offline  Pavel Emelianov
Messages: 1149
Registered: September 2006
Senior Member
Chandra Seetharaman wrote:
> On Fri, 2006-09-08 at 11:22 +0400, Pavel Emelianov wrote:
>
>> Chandra Seetharaman wrote:
>>
>> [snip]
>>
>>>>>> The question is - whether web server is multithreaded or not...
>>>>>> If it is not - then no problem here, you can change current
>>>>>> context and new resources will be charged accordingly.
>>>>>>
>>>>>> And current BC code is _able_ to handle it with _minor_ changes.
>>>>>> (One just need to save bc not on mm struct, but rather on vma struct
>>>>>> and change mm->bc on set_bc_id()).
>>>>>>
>>>>>> However, no one (can some one from CKRM team please?) explained so far
>>>>>> what to do with threads. Consider the following example.
>>>>>>
>>>>>> 1. Threaded web server spawns a child to serve a client.
>>>>>> 2. child thread touches some pages and they are charged to child BC
>>>>>> (which differs from parent's one)
>>>>>> 3. child exits, but since its mm is shared with parent, these pages
>>>>>> stay mapped and charged to child BC.
>>>>>>
>>>>>> So the question is: what to do with these pages?
>>>>>> - should we recharge them to another BC?
>>>>>> - leave them charged?
>>>>>>
>>>>>>
>>>>>>
>>>>> Leave them charged. It will be charged to the appropriate UBC when they
>>>>> touch it again.
>>>>>
>>>>>
>>>>>
>>>> Do you mean that page must be re-charged each time someone touches it?
>>>>
>>>>
>>> What I meant is that to leave them charged, and if when they are
>>> ummapped and mapped later, charge it to the appropriate BC.
>>>
>>>
>> In this case multithreaded apache that tries to serve each domain in
>> separate BC will fill the memory with BC-s, held by pages allocated
>> and mapped in threads.
>>
>
> I do not understand how the memory will be filled with BCs. Can you
> explain, please.
>
Sure. At the beginning I have one task with one BC. Then
1. A thread is spawned and new BC is created;
2. New thread touches a new page (e.g. maps a new file) which is charged
to new BC
(and this means that this BC's must stay in memory till page is
uncharged);
3. Thread exits after serving the request, but since it's mm is shared
with parent
all the touched pages stay resident and, thus, the new BC is still
pinned in memory.
Steps 1-3 are done multiple times for new pages (new files).
Remember that we're discussing the case when pages are not recharged.
Re: [ckrm-tech] [PATCH] BC: resource beancounters (v4) (added user memory) [message #6169 is a reply to message #6163] Mon, 11 September 2006 07:54 Go to previous messageGo to next message
Balbir Singh is currently offline  Balbir Singh
Messages: 491
Registered: August 2006
Senior Member
Pavel Emelianov wrote:
> Balbir Singh wrote:
>> Dave Hansen wrote:
>>> On Fri, 2006-09-08 at 11:33 +0400, Pavel Emelianov wrote:
>>>> I'm afraid we have different understandings of what a "guarantee" is.
>>> It appears so.
>>>
>>>> Don't we?
>>>> Guarantee may be one of
>>>>
>>>> 1. container will be able to touch that number of pages
>>>> 2. container will be able to sys_mmap() that number of pages
>>>> 3. container will not be killed unless it touches that number of
>>>> pages
>>> A "death sentence" guarantee? I like it. :)
>>>
>>>> 4. anything else
>>>>
>>>> Let's decide what kind of a guarantee we want.
>> I think of guarantees w.r.t resources as the lower limit on the resource.
>> Guarantees and limits can be thought of as the range (guarantee, limit]
>> for the usage of the resource.
>>
>>> I think of it as: "I will be allowed to use this many total pages, and
>>> they are guaranteed not to fail." (1), I think. The sum of all of the
>>> system's guarantees must be less than or equal to the amount of free
>>> memory on the machine.
>> Yes, totally agree.
>
> Such a guarantee is really a limit and this limit is even harder than
> BC's one :)
>
> E.g. I have a node with 1Gb of ram and 10 containers with 100Mb
> guarantee each.
> I want to start one more. What shall I do not to break guarantees?

Don't start the new container or change the guarantees of the existing ones
to accommodate this one :) The QoS design (done by the administrator) should
take care of such use-cases. It would be perfectly ok to have a container
that does not care about guarantees to set their guarantee to 0 and set
their limit to the desired value. As Chandra has been stating we need two
parameters (guarantee, limit), either can be optional, but not both.


>
>>> If we knew to which NUMA node the memory was going to go, we might as
>>> well take the pages out of the allocator.
>>>
>>> -- Dave
>>>


--

Balbir Singh,
Linux Technology Center,
IBM Software Labs
Re: [ckrm-tech] [PATCH] BC: resource beancounters (v4) (added user memory) [message #6170 is a reply to message #6169] Mon, 11 September 2006 08:13 Go to previous messageGo to next message
Pavel Emelianov is currently offline  Pavel Emelianov
Messages: 1149
Registered: September 2006
Senior Member
Balbir Singh wrote:
> Pavel Emelianov wrote:
>> Balbir Singh wrote:
>>> Dave Hansen wrote:
>>>> On Fri, 2006-09-08 at 11:33 +0400, Pavel Emelianov wrote:
>>>>> I'm afraid we have different understandings of what a "guarantee" is.
>>>> It appears so.
>>>>
>>>>> Don't we?
>>>>> Guarantee may be one of
>>>>>
>>>>> 1. container will be able to touch that number of pages
>>>>> 2. container will be able to sys_mmap() that number of pages
>>>>> 3. container will not be killed unless it touches that number of
>>>>> pages
>>>> A "death sentence" guarantee? I like it. :)
>>>>
>>>>> 4. anything else
>>>>>
>>>>> Let's decide what kind of a guarantee we want.
>>> I think of guarantees w.r.t resources as the lower limit on the
>>> resource.
>>> Guarantees and limits can be thought of as the range (guarantee, limit]
>>> for the usage of the resource.
>>>
>>>> I think of it as: "I will be allowed to use this many total pages, and
>>>> they are guaranteed not to fail." (1), I think. The sum of all of
>>>> the
>>>> system's guarantees must be less than or equal to the amount of free
>>>> memory on the machine.
>>> Yes, totally agree.
>>
>> Such a guarantee is really a limit and this limit is even harder than
>> BC's one :)
>>
>> E.g. I have a node with 1Gb of ram and 10 containers with 100Mb
>> guarantee each.
>> I want to start one more. What shall I do not to break guarantees?
>
> Don't start the new container or change the guarantees of the existing
> ones
> to accommodate this one :) The QoS design (done by the administrator)
> should
> take care of such use-cases. It would be perfectly ok to have a container
> that does not care about guarantees to set their guarantee to 0 and set
> their limit to the desired value. As Chandra has been stating we need two
> parameters (guarantee, limit), either can be optional, but not both.
If I set up 9 groups to have 100Mb limit then I have 100Mb assured (on
1Gb node)
for the 10th one exactly. And I do not have to set up any guarantee as
it won't affect
anything. So what a guarantee parameter is needed for?
Re: [ckrm-tech] [PATCH] BC: resource beancounters (v4) (added user memory) [message #6172 is a reply to message #6170] Mon, 11 September 2006 08:19 Go to previous messageGo to next message
Balbir Singh is currently offline  Balbir Singh
Messages: 491
Registered: August 2006
Senior Member
Pavel Emelianov wrote:
> Balbir Singh wrote:
>> Pavel Emelianov wrote:
>>> Balbir Singh wrote:
>>>> Dave Hansen wrote:
>>>>> On Fri, 2006-09-08 at 11:33 +0400, Pavel Emelianov wrote:
>>>>>> I'm afraid we have different understandings of what a "guarantee" is.
>>>>> It appears so.
>>>>>
>>>>>> Don't we?
>>>>>> Guarantee may be one of
>>>>>>
>>>>>> 1. container will be able to touch that number of pages
>>>>>> 2. container will be able to sys_mmap() that number of pages
>>>>>> 3. container will not be killed unless it touches that number of
>>>>>> pages
>>>>> A "death sentence" guarantee? I like it. :)
>>>>>
>>>>>> 4. anything else
>>>>>>
>>>>>> Let's decide what kind of a guarantee we want.
>>>> I think of guarantees w.r.t resources as the lower limit on the
>>>> resource.
>>>> Guarantees and limits can be thought of as the range (guarantee, limit]
>>>> for the usage of the resource.
>>>>
>>>>> I think of it as: "I will be allowed to use this many total pages, and
>>>>> they are guaranteed not to fail." (1), I think. The sum of all of
>>>>> the
>>>>> system's guarantees must be less than or equal to the amount of free
>>>>> memory on the machine.
>>>> Yes, totally agree.
>>> Such a guarantee is really a limit and this limit is even harder than
>>> BC's one :)
>>>
>>> E.g. I have a node with 1Gb of ram and 10 containers with 100Mb
>>> guarantee each.
>>> I want to start one more. What shall I do not to break guarantees?
>> Don't start the new container or change the guarantees of the existing
>> ones
>> to accommodate this one :) The QoS design (done by the administrator)
>> should
>> take care of such use-cases. It would be perfectly ok to have a container
>> that does not care about guarantees to set their guarantee to 0 and set
>> their limit to the desired value. As Chandra has been stating we need two
>> parameters (guarantee, limit), either can be optional, but not both.
> If I set up 9 groups to have 100Mb limit then I have 100Mb assured (on
> 1Gb node)
> for the 10th one exactly. And I do not have to set up any guarantee as
> it won't affect
> anything. So what a guarantee parameter is needed for?

This use case works well for providing guarantee to one container. What if
I want guarantees of 100Mb and 200Mb for two containers? How do I setup
the system using limits?

Even I restrict everyone else to 700Mb. With this I cannot be sure that
the remaining 300Mb will be distributed as 100Mb and 200Mb.


--

Balbir Singh,
Linux Technology Center,
IBM Software Labs
Re: [ckrm-tech] [PATCH] BC: resource beancounters (v4) (added user memory) [message #6177 is a reply to message #6170] Mon, 11 September 2006 10:21 Go to previous messageGo to next message
Srivatsa Vaddagiri is currently offline  Srivatsa Vaddagiri
Messages: 241
Registered: August 2006
Senior Member
On Mon, Sep 11, 2006 at 12:13:59PM +0400, Pavel Emelianov wrote:
> If I set up 9 groups to have 100Mb limit then I have 100Mb assured (on
> 1Gb node)
> for the 10th one exactly. And I do not have to set up any guarantee as
> it won't affect
> anything. So what a guarantee parameter is needed for?

I presume you are talking of hard-limiting each group to 100 MB here. In
which case, wont the 100MB (reserved for 10th group) be unutilized
untill 10th group is started (it may never be started for that matter!).

IMO it would be better to go and use that free 100 MB for reclaimable memory
and give that up when 10th group is started.


--
Regards,
vatsa
Re: [ckrm-tech] [PATCH] BC: resource beancounters (v4) (added user memory) [message #6188 is a reply to message #6165] Mon, 11 September 2006 13:04 Go to previous messageGo to next message
Srivatsa Vaddagiri is currently offline  Srivatsa Vaddagiri
Messages: 241
Registered: August 2006
Senior Member
On Mon, Sep 11, 2006 at 11:02:06AM +0400, Pavel Emelianov wrote:
> Sure. At the beginning I have one task with one BC. Then
> 1. A thread is spawned and new BC is created;

Why do we have to create a BC for every new thread? A new BC is needed
for every new service level instead IMO. And typically there wont be
unlimited service levels.

> 2. New thread touches a new page (e.g. maps a new file) which is charged
> to new BC
> (and this means that this BC's must stay in memory till page is
> uncharged);
> 3. Thread exits after serving the request, but since it's mm is shared
> with parent
> all the touched pages stay resident and, thus, the new BC is still
> pinned in memory.
> Steps 1-3 are done multiple times for new pages (new files).
> Remember that we're discussing the case when pages are not recharged.


--
Regards,
vatsa
Re: [ckrm-tech] [PATCH] BC: resource beancounters (v4) (added user memory) [message #6195 is a reply to message #6136] Mon, 11 September 2006 18:25 Go to previous messageGo to next message
Chandra Seetharaman is currently offline  Chandra Seetharaman
Messages: 88
Registered: August 2006
Member
On Fri, 2006-09-08 at 14:43 -0700, Rohit Seth wrote:
<snip>

> > > Guarantee may be one of
> > >
> > > 1. container will be able to touch that number of pages
> > > 2. container will be able to sys_mmap() that number of pages
> > > 3. container will not be killed unless it touches that number of pages
> > > 4. anything else
> >
> > I would say (1) with slight modification
> > "container will be able to touch _at least_ that number of pages"
> >
>
> Does this scheme support running of tasks outside of containers on the
> same platform where you have tasks running inside containers. If so
> then how will you ensure processes running out side any container will
> not leave less than the total guaranteed memory to different containers.
>

There could be a default container which doesn't have any guarantee or
limit. When you create containers and assign guarantees to each of them
make sure that you leave some amount of resource unassigned. That
unassigned resources can be used by the default container or can be used
by containers that want more than their guarantee (and less than their
limit). This is how CKRM/RG handles this issue.


>
>
> -rohit
>
>
> ------------------------------------------------------------ -------------
> Using Tomcat but need to do more? Need to support web services, security?
> Get stuff done quickly with pre-integrated technology to make your job easier
> Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
> http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&b id=263057&dat=121642
> _______________________________________________
> ckrm-tech mailing list
> https://lists.sourceforge.net/lists/listinfo/ckrm-tech
--

------------------------------------------------------------ ----------
Chandra Seetharaman | Be careful what you choose....
- sekharan@us.ibm.com | .......you may get it.
------------------------------------------------------------ ----------
Re: [ckrm-tech] [PATCH] BC: resource beancounters (v4) (added user memory) [message #6197 is a reply to message #6163] Mon, 11 September 2006 18:44 Go to previous messageGo to next message
Chandra Seetharaman is currently offline  Chandra Seetharaman
Messages: 88
Registered: August 2006
Member
On Mon, 2006-09-11 at 10:56 +0400, Pavel Emelianov wrote:

<snip>

> >> I think of it as: "I will be allowed to use this many total pages, and
> >> they are guaranteed not to fail." (1), I think. The sum of all of the
> >> system's guarantees must be less than or equal to the amount of free
> >> memory on the machine.
> >
> > Yes, totally agree.
>
> Such a guarantee is really a limit and this limit is even harder than
> BC's one :)
>
> E.g. I have a node with 1Gb of ram and 10 containers with 100Mb
> guarantee each.

In the first place system administrator should not be configuring it
that way, Then they are using it as a strict hard limit than guarantee
(as the resources guaranteed to one container is _not_ available to
others).

Besides, the above configuration is clearly _not_ work conservative.

They should use both guarantee and limit to associate resources to a
container/RG.

> I want to start one more. What shall I do not to break guarantees?

CKRM/RG handles it this way:

Amount of a resource a child RG gets is the ratio of its share value to
the parent's total # of shares. Children's resource allocation can be
changed just by changing the parent's total # of shares.

If you case about initial situation would be:
Total memory in the system 100MB
parent's total # of shares: 100 (1 share == 1MB)
10 children with # of shares: 10 (i.e each children has 10MB)

When I want to add another child, just change parent's total # of shares
to be say 125:
Total memory in the system 100MB
parent's total # of shares: 125 (1 share == 0.8MB)
10 children with # of shares: 10 (i.e each children has 8MB)
Now you are left with 25 shares (or 20MB) that you can assign to new
child(ren) as you please.

<snip>
--

------------------------------------------------------------ ----------
Chandra Seetharaman | Be careful what you choose....
- sekharan@us.ibm.com | .......you may get it.
------------------------------------------------------------ ----------
Re: [ckrm-tech] [PATCH] BC: resource beancounters (v4) (added user memory) [message #6198 is a reply to message #6165] Mon, 11 September 2006 18:47 Go to previous messageGo to next message
Chandra Seetharaman is currently offline  Chandra Seetharaman
Messages: 88
Registered: August 2006
Member
On Mon, 2006-09-11 at 11:02 +0400, Pavel Emelianov wrote:
<snip>

> >> In this case multithreaded apache that tries to serve each domain in
> >> separate BC will fill the memory with BC-s, held by pages allocated
> >> and mapped in threads.
> >>
> >
> > I do not understand how the memory will be filled with BCs. Can you
> > explain, please.
> >
> Sure. At the beginning I have one task with one BC. Then
> 1. A thread is spawned and new BC is created;

You do not have to create a new BC for each new thread, just associate
the thread to an existing appropriate BC.

> 2. New thread touches a new page (e.g. maps a new file) which is charged
> to new BC
> (and this means that this BC's must stay in memory till page is
> uncharged);
> 3. Thread exits after serving the request, but since it's mm is shared
> with parent
> all the touched pages stay resident and, thus, the new BC is still
> pinned in memory.
> Steps 1-3 are done multiple times for new pages (new files).
> Remember that we're discussing the case when pages are not recharged.
>
> ------------------------------------------------------------ -------------
> Using Tomcat but need to do more? Need to support web services, security?
> Get stuff done quickly with pre-integrated technology to make your job easier
> Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
> http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&b id=263057&dat=121642
> _______________________________________________
> ckrm-tech mailing list
> https://lists.sourceforge.net/lists/listinfo/ckrm-tech
--

------------------------------------------------------------ ----------
Chandra Seetharaman | Be careful what you choose....
- sekharan@us.ibm.com | .......you may get it.
------------------------------------------------------------ ----------
Re: [ckrm-tech] [PATCH] BC: resource beancounters (v4) (added user memory) [message #6199 is a reply to message #6170] Mon, 11 September 2006 18:49 Go to previous messageGo to next message
Chandra Seetharaman is currently offline  Chandra Seetharaman
Messages: 88
Registered: August 2006
Member
On Mon, 2006-09-11 at 12:13 +0400, Pavel Emelianov wrote:

<snip>
> >
> > Don't start the new container or change the guarantees of the existing
> > ones
> > to accommodate this one :) The QoS design (done by the administrator)
> > should
> > take care of such use-cases. It would be perfectly ok to have a container
> > that does not care about guarantees to set their guarantee to 0 and set
> > their limit to the desired value. As Chandra has been stating we need two
> > parameters (guarantee, limit), either can be optional, but not both.
> If I set up 9 groups to have 100Mb limit then I have 100Mb assured (on
> 1Gb node)
> for the 10th one exactly. And I do not have to set up any guarantee as
> it won't affect
> anything. So what a guarantee parameter is needed for?

I do not think it is that simple since
- there is typically more than one class I want to set guarantee to
- I will not able to use both limit and guarantee
- Implementation will not be work-conserving.

Also, How would you configure the following in your model ?

5 classes: Class A(10, 40), Class B(20, 100), Class C (30, 100), Class D
(5, 100), Class E(15, 50); (class_name(guarantee, limit))

"Limit only" approach works for DoS prevention. But for providing QoS
you would need guarantee.
--

------------------------------------------------------------ ----------
Chandra Seetharaman | Be careful what you choose....
- sekharan@us.ibm.com | .......you may get it.
------------------------------------------------------------ ----------
Re: [ckrm-tech] [PATCH] BC: resource beancounters (v4) (added user memory) [message #6200 is a reply to message #6195] Mon, 11 September 2006 19:10 Go to previous messageGo to next message
Rohit Seth is currently offline  Rohit Seth
Messages: 101
Registered: August 2006
Senior Member
On Mon, 2006-09-11 at 11:25 -0700, Chandra Seetharaman wrote:
> On Fri, 2006-09-08 at 14:43 -0700, Rohit Seth wrote:
> <snip>
>
> > > > Guarantee may be one of
> > > >
> > > > 1. container will be able to touch that number of pages
> > > > 2. container will be able to sys_mmap() that number of pages
> > > > 3. container will not be killed unless it touches that number of pages
> > > > 4. anything else
> > >
> > > I would say (1) with slight modification
> > > "container will be able to touch _at least_ that number of pages"
> > >
> >
> > Does this scheme support running of tasks outside of containers on the
> > same platform where you have tasks running inside containers. If so
> > then how will you ensure processes running out side any container will
> > not leave less than the total guaranteed memory to different containers.
> >
>
> There could be a default container which doesn't have any guarantee or
> limit.

First, I think it is critical that we allow processes to run outside of
any container (unless we know for sure that the penalty of running a
process inside a container is very very minimal).

And anything running outside a container should be limited by default
Linux settings.

> When you create containers and assign guarantees to each of them
> make sure that you leave some amount of resource unassigned.
^^^^^ This will force the "default" container
with limits (indirectly). IMO, the whole guarantee feature gets defeated
the moment you bring in this fuzziness.

> That
> unassigned resources can be used by the default container or can be used
> by containers that want more than their guarantee (and less than their
> limit). This is how CKRM/RG handles this issue.
>
>

It seems that a single notion of limit should suffice, and that limit
should more be treated as something beyond which that resource
consumption in the container will be throttled/not_allowed.

-rohit
Re: [ckrm-tech] [PATCH] BC: resource beancounters (v4) (added user memory) [message #6201 is a reply to message #6200] Mon, 11 September 2006 19:42 Go to previous messageGo to next message
Chandra Seetharaman is currently offline  Chandra Seetharaman
Messages: 88
Registered: August 2006
Member
On Mon, 2006-09-11 at 12:10 -0700, Rohit Seth wrote:
> On Mon, 2006-09-11 at 11:25 -0700, Chandra Seetharaman wrote:
> > On Fri, 2006-09-08 at 14:43 -0700, Rohit Seth wrote:
> > <snip>
> >
> > > > > Guarantee may be one of
> > > > >
> > > > > 1. container will be able to touch that number of pages
> > > > > 2. container will be able to sys_mmap() that number of pages
> > > > > 3. container will not be killed unless it touches that number of pages
> > > > > 4. anything else
> > > >
> > > > I would say (1) with slight modification
> > > > "container will be able to touch _at least_ that number of pages"
> > > >
> > >
> > > Does this scheme support running of tasks outside of containers on the
> > > same platform where you have tasks running inside containers. If so
> > > then how will you ensure processes running out side any container will
> > > not leave less than the total guaranteed memory to different containers.
> > >
> >
> > There could be a default container which doesn't have any guarantee or
> > limit.
>
> First, I think it is critical that we allow processes to run outside of
> any container (unless we know for sure that the penalty of running a
> process inside a container is very very minimal).

When I meant a default container I meant a default "resource group". In
case of container that would be the default environment. I do not see
any additional overhead associated with it, it is only associated with
how resource are allocated/accounted.

>
> And anything running outside a container should be limited by default
> Linux settings.

note that the resource available to the default RG will be (total system
resource - allocated to RGs).
>
> > When you create containers and assign guarantees to each of them
> > make sure that you leave some amount of resource unassigned.
> ^^^^^ This will force the "default" container
> with limits (indirectly). IMO, the whole guarantee feature gets defeated

You _will_ have limits for the default RG even if we don't have
guarantees.

> the moment you bring in this fuzziness.

Not really.
- Each RG will have a guarantee and limit of each resource.
- default RG will have (system resource - sum of guarantees)
- Every RG will be guaranteed some amount of resource to provide QoS
- Every RG will be limited at "limit" to prevent DoS attacks.
- Whoever doesn't care either of those set them to don't care values.

>
> > That
> > unassigned resources can be used by the default container or can be used
> > by containers that want more than their guarantee (and less than their
> > limit). This is how CKRM/RG handles this issue.
> >
> >
>
> It seems that a single notion of limit should suffice, and that limit
> should more be treated as something beyond which that resource
> consumption in the container will be throttled/not_allowed.

As I stated in an earlier email "Limit only" approach can prevent a
system from DoS attacks (and also fits the container model nicely),
whereas to provide QoS one would need guarantee.

Without guarantee, a RG that the admin cares about can starve if
all/most of the other RGs consume upto their limits.

>
> -rohit
>
>
> ------------------------------------------------------------ -------------
> Using Tomcat but need to do more? Need to support web services, security?
> Get stuff done quickly with pre-integrated technology to make your job easier
> Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
> http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&b id=263057&dat=121642
> _______________________________________________
> ckrm-tech mailing list
> https://lists.sourceforge.net/lists/listinfo/ckrm-tech
--

------------------------------------------------------------ ----------
Chandra Seetharaman | Be careful what you choose....
- sekharan@us.ibm.com | .......you may get it.
------------------------------------------------------------ ----------
Re: Re: [ckrm-tech] [PATCH] BC: resource beancounters (v4) (added user memory) [message #6202 is a reply to message #6200] Mon, 11 September 2006 19:47 Go to previous messageGo to next message
kir is currently offline  kir
Messages: 1645
Registered: August 2005
Location: Moscow, Russia
Senior Member

Rohit Seth wrote:
> On Mon, 2006-09-11 at 11:25 -0700, Chandra Seetharaman wrote:
>
>> On Fri, 2006-09-08 at 14:43 -0700, Rohit Seth wrote:
>> <snip>
>>
>>
>>>>> Guarantee may be one of
>>>>>
>>>>> 1. container will be able to touch that number of pages
>>>>> 2. container will be able to sys_mmap() that number of pages
>>>>> 3. container will not be killed unless it touches that number of pages
>>>>> 4. anything else
>>>>>
>>>> I would say (1) with slight modification
>>>> "container will be able to touch _at least_ that number of pages"
>>>>
>>>>
>>> Does this scheme support running of tasks outside of containers on the
>>> same platform where you have tasks running inside containers. If so
>>> then how will you ensure processes running out side any container will
>>> not leave less than the total guaranteed memory to different containers.
>>>
>>>
>> There could be a default container which doesn't have any guarantee or
>> limit.
>>
>
> First, I think it is critical that we allow processes to run outside of
> any container (unless we know for sure that the penalty of running a
> process inside a container is very very minimal).
>
(1) there is a set of processes running outside of any container. In
OpenVZ we call that "VE0" or "host system", probably Chandra meant that
by "default container".
(2) The host system is used to manage the containers (start/stop/set
parameters/create/destroy).
(3) the penalty of running a process inside a container is indeed very low.

> And anything running outside a container should be limited by default
> Linux settings.
>
(4) due to (2), it is not recommended to run anything but the tasks used
to manage the containers -- otherwise your gonna have security problems
(5) "Default Linux settings" do not cover everything (for example --
dentry cache), thus the need for beancounters.
>> When you create containers and assign guarantees to each of them
>> make sure that you leave some amount of resource unassigned.
>>
> ^^^^^ This will force the "default" container
> with limits (indirectly). IMO, the whole guarantee feature gets defeated
> the moment you bring in this fuzziness.
>
>
>> That
>> unassigned resources can be used by the default container or can be used
>> by containers that want more than their guarantee (and less than their
>> limit). This is how CKRM/RG handles this issue.
>>
>>
>>
>
> It seems that a single notion of limit should suffice, and that limit
> should more be treated as something beyond which that resource
> consumption in the container will be throttled/not_allowed.
>
Beancounters have a notion of "barrier" and "limit". For some parameters
they are the same, but for some parameters they differ -- and there is
some "safety gap" between the barrier and the limit. The problem is for
some types of resources you can not throttle or deny -- the only way is
to kill the process. The one (but not the only one) example is process
stack expansion. See more at http://wiki.openvz.org/UBC (and follow the
menu at the right side).
Re: [ckrm-tech] [PATCH] BC: resource beancounters (v4) (added user memory) [message #6204 is a reply to message #6201] Mon, 11 September 2006 23:58 Go to previous messageGo to next message
Rohit Seth is currently offline  Rohit Seth
Messages: 101
Registered: August 2006
Senior Member
On Mon, 2006-09-11 at 12:42 -0700, Chandra Seetharaman wrote:
> On Mon, 2006-09-11 at 12:10 -0700, Rohit Seth wrote:
> > On Mon, 2006-09-11 at 11:25 -0700, Chandra Seetharaman wrote:

> > > There could be a default container which doesn't have any guarantee or
> > > limit.
> >
> > First, I think it is critical that we allow processes to run outside of
> > any container (unless we know for sure that the penalty of running a
> > process inside a container is very very minimal).
>
> When I meant a default container I meant a default "resource group". In
> case of container that would be the default environment. I do not see
> any additional overhead associated with it, it is only associated with
> how resource are allocated/accounted.
>

There should be some cost when you do atomic inc/dec accounting and
locks for add/remove resources from any container (including default
resource group). No?

> >
> > And anything running outside a container should be limited by default
> > Linux settings.
>
> note that the resource available to the default RG will be (total system
> resource - allocated to RGs).

I think it will be preferable to not change the existing behavior for
applications that are running outside any container (in your case
default resource group).

> >
> > > When you create containers and assign guarantees to each of them
> > > make sure that you leave some amount of resource unassigned.
> > ^^^^^ This will force the "default" container
> > with limits (indirectly). IMO, the whole guarantee feature gets defeated
>
> You _will_ have limits for the default RG even if we don't have
> guarantees.
>
> > the moment you bring in this fuzziness.
>
> Not really.
> - Each RG will have a guarantee and limit of each resource.
> - default RG will have (system resource - sum of guarantees)
> - Every RG will be guaranteed some amount of resource to provide QoS
> - Every RG will be limited at "limit" to prevent DoS attacks.
> - Whoever doesn't care either of those set them to don't care values.
>

For the cases that put this don't care, do you depend on existing
reclaim algorithm (for memory) in kernel?

> >
> > > That
> > > unassigned resources can be used by the default container or can be used
> > > by containers that want more than their guarantee (and less than their
> > > limit). This is how CKRM/RG handles this issue.
> > >
> > >
> >
> > It seems that a single notion of limit should suffice, and that limit
> > should more be treated as something beyond which that resource
> > consumption in the container will be throttled/not_allowed.
>
> As I stated in an earlier email "Limit only" approach can prevent a
> system from DoS attacks (and also fits the container model nicely),
> whereas to provide QoS one would need guarantee.
>
> Without guarantee, a RG that the admin cares about can starve if
> all/most of the other RGs consume upto their limits.
>
> >

If the limits are set appropriately so that containers total memory
consumption does not exceed the system memory then there shouldn't be
any QoS issue (to whatever extent it is applicable for specific
scenario).

-rohit
Re: Re: [ckrm-tech] [PATCH] BC: resource beancounters (v4) (added user memory) [message #6207 is a reply to message #6202] Tue, 12 September 2006 00:28 Go to previous messageGo to next message
Rohit Seth is currently offline  Rohit Seth
Messages: 101
Registered: August 2006
Senior Member
On Mon, 2006-09-11 at 23:48 +0400, Kir Kolyshkin wrote:
> Rohit Seth wrote:
> > On Mon, 2006-09-11 at 11:25 -0700, Chandra Seetharaman wrote:
> >
> >> On Fri, 2006-09-08 at 14:43 -0700, Rohit Seth wrote:
> >> <snip>
> >>
> >>
> >>>>> Guarantee may be one of
> >>>>>
> >>>>> 1. container will be able to touch that number of pages
> >>>>> 2. container will be able to sys_mmap() that number of pages
> >>>>> 3. container will not be killed unless it touches that number of pages
> >>>>> 4. anything else
> >>>>>
> >>>> I would say (1) with slight modification
> >>>> "container will be able to touch _at least_ that number of pages"
> >>>>
> >>>>
> >>> Does this scheme support running of tasks outside of containers on the
> >>> same platform where you have tasks running inside containers. If so
> >>> then how will you ensure processes running out side any container will
> >>> not leave less than the total guaranteed memory to different containers.
> >>>
> >>>
> >> There could be a default container which doesn't have any guarantee or
> >> limit.
> >>
> >
> > First, I think it is critical that we allow processes to run outside of
> > any container (unless we know for sure that the penalty of running a
> > process inside a container is very very minimal).
> >
> (1) there is a set of processes running outside of any container. In
> OpenVZ we call that "VE0" or "host system", probably Chandra meant that
> by "default container".
> (2) The host system is used to manage the containers (start/stop/set
> parameters/create/destroy).
> (3) the penalty of running a process inside a container is indeed very low.
>
> > And anything running outside a container should be limited by default
> > Linux settings.
> >
> (4) due to (2), it is not recommended to run anything but the tasks used
> to manage the containers -- otherwise your gonna have security problems

Just like you want to run those special threads outside of any
container, some sysadmin might be interested in running different
processes that they don't want to bind to any container limits.

I think it is critical that you provide the capability to have tasks
running outside any container. Whether sysadmin wants to do it or not
for a system is a different thing.

-rohit
Re: [ckrm-tech] [PATCH] BC: resource beancounters (v4) (added user memory) [message #6220 is a reply to message #6204] Tue, 12 September 2006 09:53 Go to previous messageGo to next message
Balbir Singh is currently offline  Balbir Singh
Messages: 491
Registered: August 2006
Senior Member
Rohit Seth wrote:

> If the limits are set appropriately so that containers total memory
> consumption does not exceed the system memory then there shouldn't be
> any QoS issue (to whatever extent it is applicable for specific
> scenario).
>
> -rohit
>

What if the guarantee and limits are subject to change? Consider many groups,
with changing limits - how do we provide guarantees then?

Limit is the upper bound on resource utilization and guarantee is the lower
bound. In a dynamic system, how can we provide a lower bound on a resource
for a group by manipulating the upper bounds on the rest of the groups?

Consider a system with 1GB of ram and two groups such that they need a guarantee
of 100MB and 200MB of memory. How would you setup limits to ensure that
the guarantees are met? The remaining groups will be limited to 700MB, but
how do we ensure that these classes get 100MB and 200MB of the remaining 300MB
respectively?

--

Balbir Singh,
Linux Technology Center,
IBM Software Labs
Re: [ckrm-tech] [PATCH] BC: resource beancounters (v4) (added user memory) [message #6221 is a reply to message #6188] Tue, 12 September 2006 10:24 Go to previous messageGo to next message
Pavel Emelianov is currently offline  Pavel Emelianov
Messages: 1149
Registered: September 2006
Senior Member
Srivatsa Vaddagiri wrote:
> On Mon, Sep 11, 2006 at 11:02:06AM +0400, Pavel Emelianov wrote:
>
>> Sure. At the beginning I have one task with one BC. Then
>> 1. A thread is spawned and new BC is created;
>>
>
> Why do we have to create a BC for every new thread? A new BC is needed
> for every new service level instead IMO. And typically there wont be
> unlimited service levels.
>
That's the scenario we started from - each domain is served in a separate
BC with *threaded* Apache.
>
>> 2. New thread touches a new page (e.g. maps a new file) which is charged
>> to new BC
>> (and this means that this BC's must stay in memory till page is
>> uncharged);
>> 3. Thread exits after serving the request, but since it's mm is shared
>> with parent
>> all the touched pages stay resident and, thus, the new BC is still
>> pinned in memory.
>> Steps 1-3 are done multiple times for new pages (new files).
>> Remember that we're discussing the case when pages are not recharged.
>>
>
>
>
Re: [ckrm-tech] [PATCH] BC: resource beancounters (v4) (added user memory) [message #6222 is a reply to message #6221] Tue, 12 September 2006 10:29 Go to previous messageGo to next message
Srivatsa Vaddagiri is currently offline  Srivatsa Vaddagiri
Messages: 241
Registered: August 2006
Senior Member
On Tue, Sep 12, 2006 at 02:24:25PM +0400, Pavel Emelianov wrote:
> Srivatsa Vaddagiri wrote:
> > On Mon, Sep 11, 2006 at 11:02:06AM +0400, Pavel Emelianov wrote:
> >
> >> Sure. At the beginning I have one task with one BC. Then
> >> 1. A thread is spawned and new BC is created;
> >>
> >
> > Why do we have to create a BC for every new thread? A new BC is needed
> > for every new service level instead IMO. And typically there wont be
> > unlimited service levels.
> >
> That's the scenario we started from - each domain is served in a separate
> BC with *threaded* Apache.

Sure ..but you can still meet that requirement by creating fixed set of
BCs (for each domain) and let each new thread be associated with a
corresponding BC (w/o requiring to create BC for every new thread),
depending on which domain's request it is serving?

> >
> >> 2. New thread touches a new page (e.g. maps a new file) which is charged
> >> to new BC
> >> (and this means that this BC's must stay in memory till page is
> >> uncharged);
> >> 3. Thread exits after serving the request, but since it's mm is shared
> >> with parent
> >> all the touched pages stay resident and, thus, the new BC is still
> >> pinned in memory.
> >> Steps 1-3 are done multiple times for new pages (new files).
> >> Remember that we're discussing the case when pages are not recharged.
> >>
> >
> >
> >
>
>
> ------------------------------------------------------------ -------------
> Using Tomcat but need to do more? Need to support web services, security?
> Get stuff done quickly with pre-integrated technology to make your job easier
> Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
> http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&b id=263057&dat=121642
> _______________________________________________
> ckrm-tech mailing list
> https://lists.sourceforge.net/lists/listinfo/ckrm-tech

--
Regards,
vatsa
Re: [ckrm-tech] [PATCH] BC: resource beancounters (v4) (added user memory) [message #6223 is a reply to message #6177] Tue, 12 September 2006 10:35 Go to previous messageGo to next message
Pavel Emelianov is currently offline  Pavel Emelianov
Messages: 1149
Registered: September 2006
Senior Member
Srivatsa Vaddagiri wrote:
> On Mon, Sep 11, 2006 at 12:13:59PM +0400, Pavel Emelianov wrote:
>
>> If I set up 9 groups to have 100Mb limit then I have 100Mb assured (on
>> 1Gb node)
>> for the 10th one exactly. And I do not have to set up any guarantee as
>> it won't affect
>> anything. So what a guarantee parameter is needed for?
>>
>
> I presume you are talking of hard-limiting each group to 100 MB here. In
> which case, wont the 100MB (reserved for 10th group) be unutilized
> untill 10th group is started (it may never be started for that matter!).
>
> IMO it would be better to go and use that free 100 MB for reclaimable memory
> and give that up when 10th group is started.
>
Sure. I've talked about the unreclaimable memory.
Sorry, for not specifying it explicitly.
Re: [ckrm-tech] [PATCH] BC: resource beancounters (v4) (added user memory) [message #6224 is a reply to message #6172] Tue, 12 September 2006 10:40 Go to previous messageGo to next message
Pavel Emelianov is currently offline  Pavel Emelianov
Messages: 1149
Registered: September 2006
Senior Member
Balbir Singh wrote:
> Pavel Emelianov wrote:
>> Balbir Singh wrote:
>>> Pavel Emelianov wrote:
>>>> Balbir Singh wrote:
>>>>> Dave Hansen wrote:
>>>>>> On Fri, 2006-09-08 at 11:33 +0400, Pavel Emelianov wrote:
>>>>>>> I'm afraid we have different understandings of what a
>>>>>>> "guarantee" is.
>>>>>> It appears so.
>>>>>>
>>>>>>> Don't we?
>>>>>>> Guarantee may be one of
>>>>>>>
>>>>>>> 1. container will be able to touch that number of pages
>>>>>>> 2. container will be able to sys_mmap() that number of pages
>>>>>>> 3. container will not be killed unless it touches that number of
>>>>>>> pages
>>>>>> A "death sentence" guarantee? I like it. :)
>>>>>>
>>>>>>> 4. anything else
>>>>>>>
>>>>>>> Let's decide what kind of a guarantee we want.
>>>>> I think of guarantees w.r.t resources as the lower limit on the
>>>>> resource.
>>>>> Guarantees and limits can be thought of as the range (guarantee,
>>>>> limit]
>>>>> for the usage of the resource.
>>>>>
>>>>>> I think of it as: "I will be allowed to use this many total
>>>>>> pages, and
>>>>>> they are guaranteed not to fail." (1), I think. The sum of all of
>>>>>> the
>>>>>> system's guarantees must be less than or equal to the amount of free
>>>>>> memory on the machine.
>>>>> Yes, totally agree.
>>>> Such a guarantee is really a limit and this limit is even harder than
>>>> BC's one :)
>>>>
>>>> E.g. I have a node with 1Gb of ram and 10 containers with 100Mb
>>>> guarantee each.
>>>> I want to start one more. What shall I do not to break guarantees?
>>> Don't start the new container or change the guarantees of the existing
>>> ones
>>> to accommodate this one :) The QoS design (done by the administrator)
>>> should
>>> take care of such use-cases. It would be perfectly ok to have a
>>> container
>>> that does not care about guarantees to set their guarantee to 0 and set
>>> their limit to the desired value. As Chandra has been stating we
>>> need two
>>> parameters (guarantee, limit), either can be optional, but not both.
>> If I set up 9 groups to have 100Mb limit then I have 100Mb assured (on
>> 1Gb node)
>> for the 10th one exactly. And I do not have to set up any guarantee as
>> it won't affect
>> anything. So what a guarantee parameter is needed for?
>
> This use case works well for providing guarantee to one container.
> What if
> I want guarantees of 100Mb and 200Mb for two containers? How do I setup
> the system using limits?
You may set any value from 100 up to 800 Mb for the first one and
200-900Mb for
the second. In case of no other groups first will receive its 100Mb for
sure and
so does the second. If there are other groups - their guarantees should
be concerned.
>
> Even I restrict everyone else to 700Mb. With this I cannot be sure that
> the remaining 300Mb will be distributed as 100Mb and 200Mb.
There's no "everyone else" here - we're talking about a "static" case.
When new group arrives we need to recalculate guarantees as you said.
And here's my next question - what to do if the new guarantee would become
lower that current amount of unreclaimable memory in BC?
Re: [ckrm-tech] [PATCH] BC: resource beancounters (v4) (added user memory) [message #6225 is a reply to message #6172] Tue, 12 September 2006 10:39 Go to previous messageGo to next message
Pavel Emelianov is currently offline  Pavel Emelianov
Messages: 1149
Registered: September 2006
Senior Member
Balbir Singh wrote:
> Pavel Emelianov wrote:
>> Balbir Singh wrote:
>>> Pavel Emelianov wrote:
>>>> Balbir Singh wrote:
>>>>> Dave Hansen wrote:
>>>>>> On Fri, 2006-09-08 at 11:33 +0400, Pavel Emelianov wrote:
>>>>>>> I'm afraid we have different understandings of what a
>>>>>>> "guarantee" is.
>>>>>> It appears so.
>>>>>>
>>>>>>> Don't we?
>>>>>>> Guarantee may be one of
>>>>>>>
>>>>>>> 1. container will be able to touch that number of pages
>>>>>>> 2. container will be able to sys_mmap() that number of pages
>>>>>>> 3. container will not be killed unless it touches that number of
>>>>>>> pages
>>>>>> A "death sentence" guarantee? I like it. :)
>>>>>>
>>>>>>> 4. anything else
>>>>>>>
>>>>>>> Let's decide what kind of a guarantee we want.
>>>>> I think of guarantees w.r.t resources as the lower limit on the
>>>>> resource.
>>>>> Guarantees and limits can be thought of as the range (guarantee,
>>>>> limit]
>>>>> for the usage of the resource.
>>>>>
>>>>>> I think of it as: "I will be allowed to use this many total
>>>>>> pages, and
>>>>>> they are guaranteed not to fail." (1), I think. The sum of all of
>>>>>> the
>>>>>> system's guarantees must be less than or equal to the amount of free
>>>>>> memory on the machine.
>>>>> Yes, totally agree.
>>>> Such a guarantee is really a limit and this limit is even harder than
>>>> BC's one :)
>>>>
>>>> E.g. I have a node with 1Gb of ram and 10 containers with 100Mb
>>>> guarantee each.
>>>> I want to start one more. What shall I do not to break guarantees?
>>> Don't start the new container or change the guarantees of the existing
>>> ones
>>> to accommodate this one :) The QoS design (done by the administrator)
>>> should
>>> take care of such use-cases. It would be perfectly ok to have a
>>> container
>>> that does not care about guarantees to set their guarantee to 0 and set
>>> their limit to the desired value. As Chandra has been stating we
>>> need two
>>> parameters (guarantee, limit), either can be optional, but not both.
>> If I set up 9 groups to have 100Mb limit then I have 100Mb assured (on
>> 1Gb node)
>> for the 10th one exactly. And I do not have to set up any guarantee as
>> it won't affect
>> anything. So what a guarantee parameter is needed for?
>
> This use case works well for providing guarantee to one container.
> What if
> I want guarantees of 100Mb and 200Mb for two containers? How do I setup
> the system using limits?
You may set any value from 100 up to 800 Mb for the first one and
200-900Mb for
the second. In case of no other groups first will receive its 100Mb for
sure and
so does the second. If there are other groups - their guarantees should
be concerned.
>
> Even I restrict everyone else to 700Mb. With this I cannot be sure that
> the remaining 300Mb will be distributed as 100Mb and 200Mb.
There's no "everyone else" here - we're talking about a "static" case.
When new group arrives we need to recalculate guarantees as you said.
And here's my next question - what to do if the new guarantee would become
lower that current amount of unreclaimable memory in BC?
Re: [ckrm-tech] [PATCH] BC: resource beancounters (v4) (added user memory) [message #6227 is a reply to message #6199] Tue, 12 September 2006 10:48 Go to previous messageGo to previous message
Pavel Emelianov is currently offline  Pavel Emelianov
Messages: 1149
Registered: September 2006
Senior Member
Chandra Seetharaman wrote:
> On Mon, 2006-09-11 at 12:13 +0400, Pavel Emelianov wrote:
>
> <snip>
>
>>> Don't start the new container or change the guarantees of the existing
>>> ones
>>> to accommodate this one :) The QoS design (done by the administrator)
>>> should
>>> take care of such use-cases. It would be perfectly ok to have a container
>>> that does not care about guarantees to set their guarantee to 0 and set
>>> their limit to the desired value. As Chandra has been stating we need two
>>> parameters (guarantee, limit), either can be optional, but not both.
>>>
>> If I set up 9 groups to have 100Mb limit then I have 100Mb assured (on
>> 1Gb node)
>> for the 10th one exactly. And I do not have to set up any guarantee as
>> it won't affect
>> anything. So what a guarantee parameter is needed for?
>>
>
> I do not think it is that simple since
> - there is typically more than one class I want to set guarantee to
> - I will not able to use both limit and guarantee
> - Implementation will not be work-conserving.
>
> Also, How would you configure the following in your model ?
>
> 5 classes: Class A(10, 40), Class B(20, 100), Class C (30, 100), Class D
> (5, 100), Class E(15, 50); (class_name(guarantee, limit))
>
What's the total memory amount on the node? Without it it's hard to make
any
guarantee.
> "Limit only" approach works for DoS prevention. But for providing QoS
> you would need guarantee.
>
You may not provide guarantee on physycal resource for a particular group
without limiting its usage by other groups. That's my major idea.
Previous Topic: Acks for 3 pid-namespace patches
Next Topic: [Patch 01/05]- Containers: Documentation on using containers
Goto Forum:
  


Current Time: Fri Oct 24 16:51:53 GMT 2025

Total time taken to generate the page: 0.09488 seconds