OpenVZ Forum: Devel » [PATCH v2 0/5] per-cgroup /proc/stat statistics

Home » Mailing lists » Devel » [PATCH v2 0/5] per-cgroup /proc/stat statistics

Show: Today's Messages :: Show Polls :: Message Navigator
E-mail to friend

Re: [PATCH v2 2/5] account guest time per-cgroup as well. [message #46551 is a reply to message #46546]

Mon, 28 May 2012 09:03

Glauber Costa
Messages: 916
Registered: October 2011

Senior Member

On 05/26/2012 08:44 AM, Paul Turner wrote:
> On 04/09/2012 03:25 PM, Glauber Costa wrote:
>> In the interest of providing a per-cgroup figure of common statistics,
>> this patch adds a nr_switches counter to each group runqueue (both cfs
>> and rt).
>>
>> To avoid impact on schedule(), we don't walk the tree at stat gather
>> time. This is because schedule() is called much more frequently than
>> the tick functions, in which we do walk the tree.
>>
>> When this figure needs to be read (different patch), we will
>> aggregate them at read time.
>>
>> Signed-off-by: Glauber Costa<glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
>> ---
>> kernel/sched/core.c | 32 ++++++++++++++++++++++++++++++++
>> kernel/sched/sched.h | 3 +++
>> 2 files changed, 35 insertions(+), 0 deletions(-)
>>
>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>> index 1cfb7f0..1ee3772 100644
>> --- a/kernel/sched/core.c
>> +++ b/kernel/sched/core.c
>> @@ -3168,6 +3168,37 @@ pick_next_task(struct rq *rq)
>> }
>>
>> /*
>> + * For all other data, we do a tree walk at the time of
>> + * gathering. We want, however, to minimize the impact over schedule(),
>> + * because... well... it's schedule().
>> + *
>> + * Therefore we only gather for the current cgroup, and walk the tree
>> + * at read time
>> + */
>> +static void update_switches_task_group(struct rq *rq,
>> + struct task_struct *prev,
>> + struct task_struct *next)
>> +{
>> +#ifdef CONFIG_CGROUP_SCHED
>> + int cpu = cpu_of(rq);
>> +
>> + if (rq->curr_tg ==&root_task_group)
>> + goto out;
>> +
>> +#ifdef CONFIG_FAIR_GROUP_SCHED
>> + if (prev->sched_class ==&fair_sched_class)
>> + rq->curr_tg->cfs_rq[cpu]->nr_switches++;
>> +#endif
>> +#ifdef CONFIG_RT_GROUP_SCHED
>> + if (prev->sched_class ==&rt_sched_class)
>> + rq->curr_tg->rt_rq[cpu]->nr_switches++;
>> +#endif
>
> With this approach why differentiate cfs vs rt? These could both just
> be on the task_group.
>
> This could then just be
> if (prev != root_task_group)
> task_group(prev)->nr_switches++;

well, no. Then it needs to be an atomic update, or something like it.
The runqueue is a percpu data, the task_group is not. That's why I
choosed to use a rq (and the runqueues are separated between classes).

It all boils down to the fact that I wanted to avoid an atomic update in
this path.

But if you think that would be okay, I could change it. Alternatively, I
could come up with another percpu storage as well, since we're
ultimately just reading it later (and for that we need to iterate on all
cpus anyway).

> Which you could wrap in a nice static inline that disappears when
> CONFIG_CGROUP_PROC_STAT isn't there

That I can do.

>
> Another way to go about this would be to promote (demote?) nr_switches
> to the sched_entity. At which point you know you only need to update
> yours, and conditionally update your parents.

You mean the global one ?

Not sure it will work, because that always refer to the root cgroup...

> But.. that's still gross.. Hmm..
>
>> +out:
>> + rq->curr_tg = task_group(next);
>
> If you're going to task_group every time anyway you might as well just
> take it against prev -- then you don't have to cache rq->curr_tg?
>
> Another way to do this would be:
>
> On cfs_rq, rt_rq add:
> int prev_rq_nr_switches, nr_switches
>
> On put_prev_prev_task_fair (against a task)
> cfs_rq_of(prev->se)->prev_rq_nr_switches = rq->nr_switches
>
> On pick_next_task_fair:
> if (cfs_rq_of(prev->se)->prev_rq_nr_switches != rq->nr_switches)
> cfs_rq->nr_switches++;
>
> On aggregating the value for read: +1 if prev_rq_nr_running != rq->nr_running
> [And equivalent for sched_rt]
>
> While this is no nicer (and fractionally more expensive but this is
> never something we'd enable by default), it at least gets the goop out
> of schedule().

At first look this sounds a bit weird to me, but OTOH, this is how a lot
of the stuff is done... All the other statistics in the patch set are
collected this exact same way - because it draws from schedstats, that
always touch the specific rqs, so maybe this gain points for consistency.

I'll give it a shot.

BTW, this means that your first comment about merging cfq and rt is
basically to be disconsidered should I take this route, right ?

Report message to a moderator

[Message index]

		[PATCH v2 0/5] per-cgroup /proc/stat statistics By: Glauber Costa on Mon, 09 April 2012 22:25
		[PATCH v2 1/5] measure exec_clock for rt sched entities By: Glauber Costa on Mon, 09 April 2012 22:25
		[PATCH v2 2/5] account guest time per-cgroup as well. By: Glauber Costa on Mon, 09 April 2012 22:25
		Re: [PATCH v2 2/5] account guest time per-cgroup as well. By: Paul Turner on Sat, 26 May 2012 04:44
		Re: [PATCH v2 2/5] account guest time per-cgroup as well. By: Glauber Costa on Mon, 28 May 2012 09:03
		Re: [PATCH v2 2/5] account guest time per-cgroup as well. By: Glauber Costa on Mon, 28 May 2012 13:26
		Re: [PATCH v2 2/5] account guest time per-cgroup as well. By: Glauber Costa on Tue, 29 May 2012 10:34
		[PATCH v2 3/5] record nr_switches per task_group By: Glauber Costa on Mon, 09 April 2012 22:25
		[PATCH v2 4/5] expose fine-grained per-cpu data for cpuacct stats By: Glauber Costa on Mon, 09 April 2012 22:25
		Re: [PATCH v2 4/5] expose fine-grained per-cpu data for cpuacct stats By: Glauber Costa on Wed, 18 April 2012 16:14
		Re: [PATCH v2 4/5] expose fine-grained per-cpu data for cpuacct stats By: Sha Zhengju on Wed, 18 April 2012 12:30
		[PATCH v2 5/5] expose per-taskgroup schedstats in cgroup By: Glauber Costa on Mon, 09 April 2012 22:25
		Re: [PATCH v2 5/5] expose per-taskgroup schedstats in cgroup By: Glauber Costa on Wed, 18 April 2012 16:24
		Re: [PATCH v2 5/5] expose per-taskgroup schedstats in cgroup By: Sha Zhengju on Thu, 19 April 2012 13:30
		Re: [PATCH v2 5/5] expose per-taskgroup schedstats in cgroup By: Glauber Costa on Thu, 19 April 2012 15:00
		Re: [PATCH v2 5/5] expose per-taskgroup schedstats in cgroup By: Sha Zhengju on Wed, 18 April 2012 14:44
		Re: [PATCH v2 5/5] expose per-taskgroup schedstats in cgroup By: Sha Zhengju on Wed, 18 April 2012 14:57
		Re: [PATCH v2 0/5] per-cgroup /proc/stat statistics By: Glauber Costa on Thu, 24 May 2012 09:10

Previous Topic:	[PATCH 0/5] NFS: create blocklayout pipe per network namesapce context
Next Topic:	[PATCH v3 05/28] memcg: Reclaim when more than one page needed.

Goto Forum:

-=] Back to Top [=-

[ Syndicate this forum (XML) ] [

] [

]

Current Time: Tue Feb 17 06:50:08 GMT 2026

Total time taken to generate the page: 0.23008 seconds