OpenVZ Forum


Home » Mailing lists » Devel » [RFC][PATCH] another swap controller for cgroup
Re: [RFC][PATCH] another swap controller for cgroup [message #29997 is a reply to message #29914] Mon, 05 May 2008 06:11 Go to previous messageGo to previous message
Balbir Singh is currently offline  Balbir Singh
Messages: 491
Registered: August 2006
Senior Member
* YAMAMOTO Takashi <yamamoto@valinux.co.jp> [2008-04-30 07:50:47]:

> > > > - anonymous objects (shmem) are not accounted.
> > > IMHO, shmem should be accounted.
> > > I agree it's difficult in your implementation,
> > > but are you going to support it?
> > 
> > it should be trivial to track how much swap an anonymous object is using.
> > i'm not sure how it should be associated with cgroups, tho.
> > 
> > YAMAMOTO Takashi
> 
> i implemented shmem swap accounting.  see below.
> 
> YAMAMOTO Takashi
> 
> 
> the following is another swap controller, which was designed and
> implemented independently from nishimura-san's one.
>

Hi, Thanks for the patches and your patience. I've just applied your
patches on top of 2.6.25-mm1 (it had a minor reject, that I've fixed).
I am building and testing the patches along with KAMEZAWA-San's low
overhead patches.
 
> some random differences from nishimura-san's one:
> - counts and limits the number of ptes with swap entries instead of
>   on-disk swap slots.  (except swap slots used by anonymous objects,
>   which are counted differently.  see below.)
> - no swapon-time memory allocation.
> - precise wrt moving tasks between cgroups.
> - [NEW] anonymous objects (shmem) are associated to a cgroup when they are
>   created and the number of on-disk swap slots used by them are counted for
>   the cgroup.  the association is persistent except cgroup removal,
>   in which case, its associated objects are moved to init_mm's cgroup.
> 
> 
> Signed-off-by: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
> ---
> 
> --- linux-2.6.25-rc8-mm2/mm/swapcontrol.c.BACKUP	2008-04-21 12:06:44.000000000 +0900
> +++ linux-2.6.25-rc8-mm2/mm/swapcontrol.c	2008-04-28 14:16:03.000000000 +0900
> @@ -0,0 +1,447 @@
> +
> +/*
> + * swapcontrol.c COPYRIGHT FUJITSU LIMITED 2008
> + *
> + * Author: yamamoto@valinux.co.jp
> + */
> +
> +#include <linux/err.h>
> +#include <linux/cgroup.h>
> +#include <linux/hugetlb.h>

My powerpc build fails, we need to move hugetlb.h down to the bottom

> +#include <linux/res_counter.h>
> +#include <linux/pagemap.h>
> +#include <linux/slab.h>
> +#include <linux/swap.h>
> +#include <linux/swapcontrol.h>
> +#include <linux/swapops.h>
> +#include <linux/shmem_fs.h>
> +
> +struct swap_cgroup {
> +	struct cgroup_subsys_state scg_css;

Can't we call this just css. Since the structure is swap_cgroup it
already has the required namespace required to distinguish it from
other css's. Please see page 4 of "The practice of programming", be
consistent. The same comment applies to all members below.

> +	struct res_counter scg_counter;
> +	struct list_head scg_shmem_list;
> +};
> +
> +#define	task_to_css(task) task_subsys_state((task), swap_cgroup_subsys_id)
> +#define	css_to_scg(css)	container_of((css), struct swap_cgroup, scg_css)
> +#define	cg_to_css(cg)	cgroup_subsys_state((cg), swap_cgroup_subsys_id)
> +#define	cg_to_scg(cg)	css_to_scg(cg_to_css(cg))

Aren't static inline better than macros? I would suggest moving to
them.

> +
> +static DEFINE_MUTEX(swap_cgroup_shmem_lock);
> +
> +static struct swap_cgroup *
> +swap_cgroup_prepare_ptp(struct page *ptp, struct mm_struct *mm)
> +{
> +	struct swap_cgroup *scg = ptp->ptp_swap_cgroup;
> +

Is this routine safe w.r.t. concurrent operations, modifications to
ptp_swap_cgroup?

> +	BUG_ON(mm == NULL);
> +	BUG_ON(mm->swap_cgroup == NULL);
> +	if (scg == NULL) {
> +		/*
> +		 * see swap_cgroup_attach.
> +		 */
> +		smp_rmb();
> +		scg = mm->swap_cgroup;

With the mm->owner patches, we need not maintain a separate
mm->swap_cgroup.

> +		BUG_ON(scg == NULL);
> +		ptp->ptp_swap_cgroup = scg;
> +	}
> +
> +	return scg;
> +}
> +
> +/*
> + * called with page table locked.
> + */
> +int
> +swap_cgroup_charge(struct page *ptp, struct mm_struct *mm)
> +{
> +	struct swap_cgroup * const scg = swap_cgroup_prepare_ptp(ptp, mm);
> +
> +	return res_counter_charge(&scg->scg_counter, PAGE_CACHE_SIZE);
> +}
> +
> +/*
> + * called with page table locked.
> + */
> +void
> +swap_cgroup_uncharge(struct page *ptp)
> +{
> +	struct swap_cgroup * const scg = ptp->ptp_swap_cgroup;
> +
> +	BUG_ON(scg == NULL);
> +	res_counter_uncharge(&scg->scg_counter, PAGE_CACHE_SIZE);
> +}
> +
> +/*
> + * called with both page tables locked.
> + */

Some more comments here would help.

> +void
> +swap_cgroup_remap_charge(struct page *oldptp, struct page *newptp,
> +    struct mm_struct *mm)
> +{
> +	struct swap_cgroup * const oldscg = oldptp->ptp_swap_cgroup;
> +	struct swap_cgroup * const newscg = swap_cgroup_prepare_ptp(newptp, mm);
> +
> +	BUG_ON(oldscg == NULL);
> +	BUG_ON(newscg == NULL);
> +	if (oldscg == newscg) {
> +		return;
> +	}
> +
> +	/*
> +	 * swap_cgroup_attach is in progress.
> +	 */
> +
> +	res_counter_charge_force(&newscg->scg_counter, PAGE_CACHE_SIZE);

So, we force the counter to go over limit?

> +	res_counter_uncharge(&oldscg->scg_counter, PAGE_CACHE_SIZE);
> +}
> +
> +void
> +swap_cgroup_init_mm(struct mm_struct *mm, struct task_struct *t)
> +{
> +	struct swap_cgroup *scg;
> +	struct cgroup *cg;
> +
> +	/* mm->swap_cgroup is not NULL in the case of dup_mm */
> +	cg = task_cgroup(t, swap_cgroup_subsys_id);
> +	BUG_ON(cg == NULL);
> +	scg = cg_to_scg(cg);
> +	BUG_ON(scg == NULL);
> +	css_get(&scg->scg_css);
> +	mm->swap_cgroup = scg;

With mm->owner changes, this routine can be simplified.

> +}
> +
> +void
> +swap_cgroup_exit_mm(struct mm_struct *mm)
> +{
> +	struct swap_cgroup * const scg = mm->swap_cgroup;
> +
> +	/*
> +	 * note that swap_cgroup_attach does get_task_mm which
> +	 * prevents us from being called.  we can assume mm->swap_cgroup
> +	 * is stable.
> +	 */
> +
> +	BUG_ON(scg == NULL);
> +	mm->swap_cgroup = NULL; /* it isn't necessary.  just being tidy. */

If it's not required, I'd say remove it.

> +	css_put(&scg->scg_css);
> +}
> +
> +static u64
> +swap_cgroup_read_u64(struct cgroup *cg, struct cftype *cft)
> +{
> +
> +	return res_counter_read_u64(&cg_to_scg(cg)->scg_counter, cft->private);
> +}
> +
> +static int
> +swap_cgroup_write_u64(struct cgroup *cg, struct cftype *cft, u64 val)
> +{
> +	struct res_counter *counter = &cg_to_scg(cg)->scg_counter;
> +	unsigned long flags;
> +
> +	/* XXX res_counter_write_u64 */
> +	BUG_ON(cft->private != RES_LIMIT);
> +	spin_lock_irqsave(&counter->lock, flags);
> +	counter->limit = val;
> +	spin_unlock_irqrestore(&counter->lock, flags);
> +	return 0;
> +}
> +

We need to write actual numbers here? Can't we keep the write
interface consistent with the memory controller?

> +static const struct cftype files[] = {
> +	{
> +		.name = "usage_in_bytes",
> +		.private = RES_USAGE,
> +		.read_u64 = swap_cgroup_read_u64,
> +	},
> +	{
> +		.name = "failcnt",
> +		.private = RES_FAILCNT,
> +		.read_u64 = swap_cgroup_read_u64,
> +	},
> +	{
> +		.name = "limit_in_bytes",
> +		.private = RES_LIMIT,
> +		.read_u64 = swap_cgroup_read_u64,
> +		.write_u64 = swap_cgroup_write_u64,
> +	},
> +};
> +
> +static struct cgroup_subsys_state *
> +swap_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cg)
> +{
> +	struct swap_cgroup *scg;
> +
> +	scg = kzalloc(sizeof(*scg), GFP_KERNEL);
> +	if (scg == NULL) {

Extra braces, not required if there is just one statement following if

> +		return ERR_PTR(-ENOMEM);
> +	}
> +	res_counter_init(&scg->scg_counter);
> +	if (unlikely(cg->parent == NULL)) {

Ditto

> +		init_mm.swap_cgroup = scg;
> +	}
> +	INIT_LIST_HEAD(&scg->scg_shmem_list);
> +	return &scg->scg_css;
> +}
> +
> +static void
> +swap_cgroup_destroy(struct cgroup_subsys *ss, struct cgroup *cg)
> +{
> +	struct swap_cgroup *oldscg = cg_to_scg(cg);
> +	struct swap_cgroup *newscg;
> +	struct list_head *pos;
> +	struct list_head *next;
> +
> +	/*
> +	 * move our anonymous objects to init_mm's group.
> +	 */

Is this good design, should be allow a swap_cgroup to be destroyed,
even though pages are allocated to it? Is moving to init_mm (root
cgroup) a good idea? Ideally with support for hierarchies, if we ever
do move things, it should be to the parent cgroup.

> +	newscg = init_mm.swap_cgroup;
> +	BUG_ON(newscg == NULL);
> +	BUG_ON(newscg == oldscg);
> +
> +	mutex_lock(&swap_cgroup_shmem_lock);
> +	list_for_each_safe(pos, next, &oldscg->scg_shmem_list) {
> +		struct shmem_inode_info * const info =
> +		    list_entry(pos, struct shmem_inode_info, swap_cgroup_list);
> +		long bytes;
> +
> +		spin_lock(&info->lock);
> +		BUG_ON(info->swap_cgroup != oldscg);
> +		bytes = info->swapped << PAGE_SHIFT;

Shouldn't this be PAGE_CACHE_SHIFT just to be consistent

> +		info->swap_cgroup = newscg;
> +		res_counter_uncharge(&oldscg->scg_counter, bytes);
> +		res_counter_charge_force(&newscg->scg_counter, bytes);

I don't like the excessive use of res_counter_charge_force(), it seems
like we might end up bypassing the controller all together. I would
rather fail the destroy operation if the charge fails.

> +		spin_unlock(&info->lock);
> +		list_del_init(&info->s
...

 
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Previous Topic: extradite oracle into a VE, can we?
Next Topic: [PATCH -mm] remove node_ prefix_from ns subsystem
Goto Forum:
  


Current Time: Wed Jul 17 09:28:48 GMT 2024

Total time taken to generate the page: 0.02995 seconds