OpenVZ Forum: Devel » [RFC][PATCH] another swap controller for cgroup

Home » Mailing lists » Devel » [RFC][PATCH] another swap controller for cgroup

Show: Today's Messages :: Show Polls :: Message Navigator
E-mail to friend

Re: [RFC][PATCH] another swap controller for cgroup [message #29997 is a reply to message #29914]

Mon, 05 May 2008 06:11

Balbir Singh
Messages: 491
Registered: August 2006

Senior Member

* YAMAMOTO Takashi <yamamoto@valinux.co.jp> [2008-04-30 07:50:47]:

> > > > - anonymous objects (shmem) are not accounted.
> > > IMHO, shmem should be accounted.
> > > I agree it's difficult in your implementation,
> > > but are you going to support it?
> > 
> > it should be trivial to track how much swap an anonymous object is using.
> > i'm not sure how it should be associated with cgroups, tho.
> > 
> > YAMAMOTO Takashi
> 
> i implemented shmem swap accounting.  see below.
> 
> YAMAMOTO Takashi
> 
> 
> the following is another swap controller, which was designed and
> implemented independently from nishimura-san's one.
>

Hi, Thanks for the patches and your patience. I've just applied your
patches on top of 2.6.25-mm1 (it had a minor reject, that I've fixed).
I am building and testing the patches along with KAMEZAWA-San's low
overhead patches.
 
> some random differences from nishimura-san's one:
> - counts and limits the number of ptes with swap entries instead of
>   on-disk swap slots.  (except swap slots used by anonymous objects,
>   which are counted differently.  see below.)
> - no swapon-time memory allocation.
> - precise wrt moving tasks between cgroups.
> - [NEW] anonymous objects (shmem) are associated to a cgroup when they are
>   created and the number of on-disk swap slots used by them are counted for
>   the cgroup.  the association is persistent except cgroup removal,
>   in which case, its associated objects are moved to init_mm's cgroup.
> 
> 
> Signed-off-by: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
> ---
> 
> --- linux-2.6.25-rc8-mm2/mm/swapcontrol.c.BACKUP	2008-04-21 12:06:44.000000000 +0900
> +++ linux-2.6.25-rc8-mm2/mm/swapcontrol.c	2008-04-28 14:16:03.000000000 +0900
> @@ -0,0 +1,447 @@
> +
> +/*
> + * swapcontrol.c COPYRIGHT FUJITSU LIMITED 2008
> + *
> + * Author: yamamoto@valinux.co.jp
> + */
> +
> +#include <linux/err.h>
> +#include <linux/cgroup.h>
> +#include <linux/hugetlb.h>

My powerpc build fails, we need to move hugetlb.h down to the bottom

> +#include <linux/res_counter.h>
> +#include <linux/pagemap.h>
> +#include <linux/slab.h>
> +#include <linux/swap.h>
> +#include <linux/swapcontrol.h>
> +#include <linux/swapops.h>
> +#include <linux/shmem_fs.h>
> +
> +struct swap_cgroup {
> +	struct cgroup_subsys_state scg_css;

Can't we call this just css. Since the structure is swap_cgroup it
already has the required namespace required to distinguish it from
other css's. Please see page 4 of "The practice of programming", be
consistent. The same comment applies to all members below.

> +	struct res_counter scg_counter;
> +	struct list_head scg_shmem_list;
> +};
> +
> +#define	task_to_css(task) task_subsys_state((task), swap_cgroup_subsys_id)
> +#define	css_to_scg(css)	container_of((css), struct swap_cgroup, scg_css)
> +#define	cg_to_css(cg)	cgroup_subsys_state((cg), swap_cgroup_subsys_id)
> +#define	cg_to_scg(cg)	css_to_scg(cg_to_css(cg))

Aren't static inline better than macros? I would suggest moving to
them.

> +
> +static DEFINE_MUTEX(swap_cgroup_shmem_lock);
> +
> +static struct swap_cgroup *
> +swap_cgroup_prepare_ptp(struct page *ptp, struct mm_struct *mm)
> +{
> +	struct swap_cgroup *scg = ptp->ptp_swap_cgroup;
> +

Is this routine safe w.r.t. concurrent operations, modifications to
ptp_swap_cgroup?

> +	BUG_ON(mm == NULL);
> +	BUG_ON(mm->swap_cgroup == NULL);
> +	if (scg == NULL) {
> +		/*
> +		 * see swap_cgroup_attach.
> +		 */
> +		smp_rmb();
> +		scg = mm->swap_cgroup;

With the mm->owner patches, we need not maintain a separate
mm->swap_cgroup.

> +		BUG_ON(scg == NULL);
> +		ptp->ptp_swap_cgroup = scg;
> +	}
> +
> +	return scg;
> +}
> +
> +/*
> + * called with page table locked.
> + */
> +int
> +swap_cgroup_charge(struct page *ptp, struct mm_struct *mm)
> +{
> +	struct swap_cgroup * const scg = swap_cgroup_prepare_ptp(ptp, mm);
> +
> +	return res_counter_charge(&scg->scg_counter, PAGE_CACHE_SIZE);
> +}
> +
> +/*
> + * called with page table locked.
> + */
> +void
> +swap_cgroup_uncharge(struct page *ptp)
> +{
> +	struct swap_cgroup * const scg = ptp->ptp_swap_cgroup;
> +
> +	BUG_ON(scg == NULL);
> +	res_counter_uncharge(&scg->scg_counter, PAGE_CACHE_SIZE);
> +}
> +
> +/*
> + * called with both page tables locked.
> + */

Some more comments here would help.

> +void
> +swap_cgroup_remap_charge(struct page *oldptp, struct page *newptp,
> +    struct mm_struct *mm)
> +{
> +	struct swap_cgroup * const oldscg = oldptp->ptp_swap_cgroup;
> +	struct swap_cgroup * const newscg = swap_cgroup_prepare_ptp(newptp, mm);
> +
> +	BUG_ON(oldscg == NULL);
> +	BUG_ON(newscg == NULL);
> +	if (oldscg == newscg) {
> +		return;
> +	}
> +
> +	/*
> +	 * swap_cgroup_attach is in progress.
> +	 */
> +
> +	res_counter_charge_force(&newscg->scg_counter, PAGE_CACHE_SIZE);

So, we force the counter to go over limit?

> +	res_counter_uncharge(&oldscg->scg_counter, PAGE_CACHE_SIZE);
> +}
> +
> +void
> +swap_cgroup_init_mm(struct mm_struct *mm, struct task_struct *t)
> +{
> +	struct swap_cgroup *scg;
> +	struct cgroup *cg;
> +
> +	/* mm->swap_cgroup is not NULL in the case of dup_mm */
> +	cg = task_cgroup(t, swap_cgroup_subsys_id);
> +	BUG_ON(cg == NULL);
> +	scg = cg_to_scg(cg);
> +	BUG_ON(scg == NULL);
> +	css_get(&scg->scg_css);
> +	mm->swap_cgroup = scg;

With mm->owner changes, this routine can be simplified.

> +}
> +
> +void
> +swap_cgroup_exit_mm(struct mm_struct *mm)
> +{
> +	struct swap_cgroup * const scg = mm->swap_cgroup;
> +
> +	/*
> +	 * note that swap_cgroup_attach does get_task_mm which
> +	 * prevents us from being called.  we can assume mm->swap_cgroup
> +	 * is stable.
> +	 */
> +
> +	BUG_ON(scg == NULL);
> +	mm->swap_cgroup = NULL; /* it isn't necessary.  just being tidy. */

If it's not required, I'd say remove it.

> +	css_put(&scg->scg_css);
> +}
> +
> +static u64
> +swap_cgroup_read_u64(struct cgroup *cg, struct cftype *cft)
> +{
> +
> +	return res_counter_read_u64(&cg_to_scg(cg)->scg_counter, cft->private);
> +}
> +
> +static int
> +swap_cgroup_write_u64(struct cgroup *cg, struct cftype *cft, u64 val)
> +{
> +	struct res_counter *counter = &cg_to_scg(cg)->scg_counter;
> +	unsigned long flags;
> +
> +	/* XXX res_counter_write_u64 */
> +	BUG_ON(cft->private != RES_LIMIT);
> +	spin_lock_irqsave(&counter->lock, flags);
> +	counter->limit = val;
> +	spin_unlock_irqrestore(&counter->lock, flags);
> +	return 0;
> +}
> +

We need to write actual numbers here? Can't we keep the write
interface consistent with the memory controller?

> +static const struct cftype files[] = {
> +	{
> +		.name = "usage_in_bytes",
> +		.private = RES_USAGE,
> +		.read_u64 = swap_cgroup_read_u64,
> +	},
> +	{
> +		.name = "failcnt",
> +		.private = RES_FAILCNT,
> +		.read_u64 = swap_cgroup_read_u64,
> +	},
> +	{
> +		.name = "limit_in_bytes",
> +		.private = RES_LIMIT,
> +		.read_u64 = swap_cgroup_read_u64,
> +		.write_u64 = swap_cgroup_write_u64,
> +	},
> +};
> +
> +static struct cgroup_subsys_state *
> +swap_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cg)
> +{
> +	struct swap_cgroup *scg;
> +
> +	scg = kzalloc(sizeof(*scg), GFP_KERNEL);
> +	if (scg == NULL) {

Extra braces, not required if there is just one statement following if

> +		return ERR_PTR(-ENOMEM);
> +	}
> +	res_counter_init(&scg->scg_counter);
> +	if (unlikely(cg->parent == NULL)) {

Ditto

> +		init_mm.swap_cgroup = scg;
> +	}
> +	INIT_LIST_HEAD(&scg->scg_shmem_list);
> +	return &scg->scg_css;
> +}
> +
> +static void
> +swap_cgroup_destroy(struct cgroup_subsys *ss, struct cgroup *cg)
> +{
> +	struct swap_cgroup *oldscg = cg_to_scg(cg);
> +	struct swap_cgroup *newscg;
> +	struct list_head *pos;
> +	struct list_head *next;
> +
> +	/*
> +	 * move our anonymous objects to init_mm's group.
> +	 */

Is this good design, should be allow a swap_cgroup to be destroyed,
even though pages are allocated to it? Is moving to init_mm (root
cgroup) a good idea? Ideally with support for hierarchies, if we ever
do move things, it should be to the parent cgroup.

> +	newscg = init_mm.swap_cgroup;
> +	BUG_ON(newscg == NULL);
> +	BUG_ON(newscg == oldscg);
> +
> +	mutex_lock(&swap_cgroup_shmem_lock);
> +	list_for_each_safe(pos, next, &oldscg->scg_shmem_list) {
> +		struct shmem_inode_info * const info =
> +		    list_entry(pos, struct shmem_inode_info, swap_cgroup_list);
> +		long bytes;
> +
> +		spin_lock(&info->lock);
> +		BUG_ON(info->swap_cgroup != oldscg);
> +		bytes = info->swapped << PAGE_SHIFT;

Shouldn't this be PAGE_CACHE_SHIFT just to be consistent

> +		info->swap_cgroup = newscg;
> +		res_counter_uncharge(&oldscg->scg_counter, bytes);
> +		res_counter_charge_force(&newscg->scg_counter, bytes);

I don't like the excessive use of res_counter_charge_force(), it seems
like we might end up bypassing the controller all together. I would
rather fail the destroy operation if the charge fails.

> +		spin_unlock(&info->lock);
> +		list_del_init(&info->s

...

[ Show the rest of the message ]

Report message to a moderator

[Message index]

		[RFC][PATCH] another swap controller for cgroup By: yamamoto on Mon, 17 March 2008 02:04
		Re: [RFC][PATCH] another swap controller for cgroup By: Balbir Singh on Mon, 17 March 2008 05:11
		Re: [RFC][PATCH] another swap controller for cgroup By: Daisuke Nishimura on Mon, 17 March 2008 08:15
		Re: [RFC][PATCH] another swap controller for cgroup By: yamamoto on Mon, 17 March 2008 08:50
		Re: [RFC][PATCH] another swap controller for cgroup By: yamamoto on Tue, 29 April 2008 22:50
		Re: [RFC][PATCH] another swap controller for cgroup By: Daisuke Nishimura on Wed, 30 April 2008 04:09
		Re: [RFC][PATCH] another swap controller for cgroup By: yamamoto on Thu, 22 May 2008 04:46
		Re: [RFC][PATCH] another swap controller for cgroup By: Daisuke Nishimura on Thu, 22 May 2008 04:54
		Re: [RFC][PATCH] another swap controller for cgroup By: Balbir Singh on Mon, 05 May 2008 06:11
		Re: [RFC][PATCH] another swap controller for cgroup By: yamamoto on Wed, 07 May 2008 05:50
		Re: [RFC][PATCH] another swap controller for cgroup By: Balbir Singh on Thu, 08 May 2008 15:43
		Re: [RFC][PATCH] another swap controller for cgroup By: yamamoto on Wed, 14 May 2008 03:21
		Re: [RFC][PATCH] another swap controller for cgroup By: Paul Menage on Wed, 14 May 2008 03:27
		Re: [RFC][PATCH] another swap controller for cgroup By: Paul Menage on Wed, 14 May 2008 08:44
		Re: [RFC][PATCH] another swap controller for cgroup By: yamamoto on Thu, 15 May 2008 06:23
		Re: [RFC][PATCH] another swap controller for cgroup By: Paul Menage on Thu, 15 May 2008 07:19
		Re: [RFC][PATCH] another swap controller for cgroup By: yamamoto on Thu, 15 May 2008 08:56
		Re: [RFC][PATCH] another swap controller for cgroup By: Daisuke Nishimura on Thu, 15 May 2008 12:01
		Re: [RFC][PATCH] another swap controller for cgroup By: yamamoto on Mon, 19 May 2008 04:14
		Re: [RFC][PATCH] another swap controller for cgroup By: Daisuke Nishimura on Mon, 24 March 2008 12:10
		Re: [RFC][PATCH] another swap controller for cgroup By: Balbir Singh on Mon, 24 March 2008 12:22
		Re: [RFC][PATCH] another swap controller for cgroup By: Daisuke Nishimura on Tue, 25 March 2008 06:46
		Re: [RFC][PATCH] another swap controller for cgroup By: yamamoto on Tue, 25 March 2008 03:10
		Re: [RFC][PATCH] another swap controller for cgroup By: Daisuke Nishimura on Tue, 25 March 2008 04:35
		Re: [RFC][PATCH] another swap controller for cgroup By: yamamoto on Tue, 25 March 2008 08:57
		Re: [RFC][PATCH] another swap controller for cgroup By: Daisuke Nishimura on Tue, 25 March 2008 12:35
		Re: [RFC][PATCH] another swap controller for cgroup By: yamamoto on Thu, 27 March 2008 06:28
		Re: [RFC][PATCH] another swap controller for cgroup By: Daisuke Nishimura on Fri, 28 March 2008 09:00
		Re: [RFC][PATCH] another swap controller for cgroup By: yamamoto on Tue, 08 April 2008 03:29
		Re: [RFC][PATCH] another swap controller for cgroup By: yamamoto on Thu, 10 April 2008 07:40

Previous Topic:	extradite oracle into a VE, can we?
Next Topic:	[PATCH -mm] remove node_ prefix_from ns subsystem

Goto Forum:

-=] Back to Top [=-

[ Syndicate this forum (XML) ] [

] [

]

Current Time: Wed Jul 17 09:28:48 GMT 2024

Total time taken to generate the page: 0.02995 seconds