Home » Mailing lists » Devel » [PATCH 0/4] swapcgroup(v2)
[PATCH 0/4] swapcgroup(v2) [message #30352] |
Thu, 22 May 2008 06:13  |
Daisuke Nishimura
Messages: 54 Registered: March 2008
|
Member |
|
|
Hi.
I updated my swapcgroup patch.
Major changes from previous version(*1):
- Rebased on 2.6.26-rc2-mm1 + KAMEZAWA-san's performance
improvement patchset v4.
- Implemented as a add-on to memory cgroup.
So, there is no need to add a new member to page_cgroup now.
- (NEW)Modified vm_swap_full() to calculate the rate of
swap usage per cgroup.
Patchs:
- [1/4] add cgroup files
- [2/4] add member to swap_info_struct for cgroup
- [3/4] implement charge/uncharge
- [4/4] modify vm_swap_full for cgroup
ToDo:
- handle force_empty.
- make it possible for users to select if they use
this feature or not, and avoid overhead for users
not using this feature.
- move charges along with task move between cgroups.
*1
https://lists.linux-foundation.org/pipermail/containers/2008-March/010216.html
Thanks,
Daisuke Nishimura.
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers
|
|
|
[PATCH 1/4] swapcgroup: add cgroup files [message #30353 is a reply to message #30352] |
Thu, 22 May 2008 06:17   |
Daisuke Nishimura
Messages: 54 Registered: March 2008
|
Member |
|
|
This patch add cgroup files(and a member to struct mem_cgroup)
for swapcgroup.
The files to be added are:
- memory.swap_usage_in_bytes
- memory.swap_max_usage_in_bytes
- memory.swap_limit_in_bytes
- memory.swap_failcnt
The meaning of those files are the same as memory cgroup.
Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
---
init/Kconfig | 7 +++++
mm/memcontrol.c | 67 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 74 insertions(+), 0 deletions(-)
diff --git a/init/Kconfig b/init/Kconfig
index 6135d07..aabbb83 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -407,6 +407,13 @@ config CGROUP_MEM_RES_CTLR
This config option also selects MM_OWNER config option, which
could in turn add some fork/exit overhead.
+config CGROUP_SWAP_RES_CTLR
+ bool "Swap Resource Controller for Control Groups"
+ depends on CGROUP_MEM_RES_CTLR && SWAP
+ help
+ Provides a swap resource controller that manages and limits swap usage.
+ Implemented as a add-on to Memory Resource Controller.
+
config SYSFS_DEPRECATED
bool
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index a96577f..a837215 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -141,6 +141,12 @@ struct mem_cgroup {
* statistics.
*/
struct mem_cgroup_stat stat;
+#ifdef CONFIG_CGROUP_SWAP_RES_CTLR
+ /*
+ * the counter to account for swap usage
+ */
+ struct res_counter swap_res;
+#endif
};
static struct mem_cgroup init_mem_cgroup;
@@ -960,6 +966,39 @@ static int mem_control_stat_show(struct cgroup *cont, struct cftype *cft,
return 0;
}
+#ifdef CONFIG_CGROUP_SWAP_RES_CTLR
+static u64 swap_cgroup_read(struct cgroup *cont, struct cftype *cft)
+{
+ return res_counter_read_u64(&mem_cgroup_from_cont(cont)->swap_res,
+ cft->private);
+}
+
+static ssize_t swap_cgroup_write(struct cgroup *cont, struct cftype *cft,
+ struct file *file, const char __user *userbuf,
+ size_t nbytes, loff_t *ppos)
+{
+ return res_counter_write(&mem_cgroup_from_cont(cont)->swap_res,
+ cft->private, userbuf, nbytes, ppos,
+ mem_cgroup_write_strategy);
+}
+
+static int swap_cgroup_reset(struct cgroup *cont, unsigned int event)
+{
+ struct mem_cgroup *mem;
+
+ mem = mem_cgroup_from_cont(cont);
+ switch (event) {
+ case RES_MAX_USAGE:
+ res_counter_reset_max(&mem->swap_res);
+ break;
+ case RES_FAILCNT:
+ res_counter_reset_failcnt(&mem->swap_res);
+ break;
+ }
+ return 0;
+}
+#endif
+
static struct cftype mem_cgroup_files[] = {
{
.name = "usage_in_bytes",
@@ -992,6 +1031,31 @@ static struct cftype mem_cgroup_files[] = {
.name = "stat",
.read_map = mem_control_stat_show,
},
+#ifdef CONFIG_CGROUP_SWAP_RES_CTLR
+ {
+ .name = "swap_usage_in_bytes",
+ .private = RES_USAGE,
+ .read_u64 = swap_cgroup_read,
+ },
+ {
+ .name = "swap_max_usage_in_bytes",
+ .private = RES_MAX_USAGE,
+ .trigger = swap_cgroup_reset,
+ .read_u64 = swap_cgroup_read,
+ },
+ {
+ .name = "swap_limit_in_bytes",
+ .private = RES_LIMIT,
+ .write = swap_cgroup_write,
+ .read_u64 = swap_cgroup_read,
+ },
+ {
+ .name = "swap_failcnt",
+ .private = RES_FAILCNT,
+ .trigger = swap_cgroup_reset,
+ .read_u64 = swap_cgroup_read,
+ },
+#endif
};
static int alloc_mem_cgroup_per_zone_info(struct mem_cgroup *mem, int node)
@@ -1069,6 +1133,9 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
}
res_counter_init(&mem->res);
+#ifdef CONFIG_CGROUP_SWAP_RES_CTLR
+ res_counter_init(&mem->swap_res);
+#endif
for_each_node_state(node, N_POSSIBLE)
if (alloc_mem_cgroup_per_zone_info(mem, node))
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers
|
|
|
[PATCH 2/4] swapcgroup: add member to swap_info_struct for cgroup [message #30354 is a reply to message #30352] |
Thu, 22 May 2008 06:18   |
Daisuke Nishimura
Messages: 54 Registered: March 2008
|
Member |
|
|
This patch add a member to swap_info_struct for cgroup.
This member, array of pointers to mem_cgroup, is used to
remember to which cgroup each swap entries are charged.
The memory for this array of pointers is allocated on swapon,
and freed on swapoff.
Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
---
include/linux/swap.h | 3 +++
mm/swapfile.c | 32 ++++++++++++++++++++++++++++++++
2 files changed, 35 insertions(+), 0 deletions(-)
diff --git a/include/linux/swap.h b/include/linux/swap.h
index de40f16..67de27b 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -141,6 +141,9 @@ struct swap_info_struct {
struct swap_extent *curr_swap_extent;
unsigned old_block_size;
unsigned short * swap_map;
+#ifdef CONFIG_CGROUP_SWAP_RES_CTLR
+ struct mem_cgroup **memcg;
+#endif
unsigned int lowest_bit;
unsigned int highest_bit;
unsigned int cluster_next;
diff --git a/mm/swapfile.c b/mm/swapfile.c
index d3caf3a..232bf20 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -1207,6 +1207,9 @@ asmlinkage long sys_swapoff(const char __user * specialfile)
{
struct swap_info_struct * p = NULL;
unsigned short *swap_map;
+#ifdef CONFIG_CGROUP_SWAP_RES_CTLR
+ struct mem_cgroup **memcg;
+#endif
struct file *swap_file, *victim;
struct address_space *mapping;
struct inode *inode;
@@ -1309,10 +1312,17 @@ asmlinkage long sys_swapoff(const char __user * specialfile)
p->max = 0;
swap_map = p->swap_map;
p->swap_map = NULL;
+#ifdef CONFIG_CGROUP_SWAP_RES_CTLR
+ memcg = p->memcg;
+ p->memcg = NULL;
+#endif
p->flags = 0;
spin_unlock(&swap_lock);
mutex_unlock(&swapon_mutex);
vfree(swap_map);
+#ifdef CONFIG_CGROUP_SWAP_RES_CTLR
+ vfree(memcg);
+#endif
inode = mapping->host;
if (S_ISBLK(inode->i_mode)) {
struct block_device *bdev = I_BDEV(inode);
@@ -1456,6 +1466,9 @@ asmlinkage long sys_swapon(const char __user * specialfile, int swap_flags)
unsigned long maxpages = 1;
int swapfilesize;
unsigned short *swap_map;
+#ifdef CONFIG_CGROUP_SWAP_RES_CTLR
+ struct mem_cgroup **memcg;
+#endif
struct page *page = NULL;
struct inode *inode = NULL;
int did_down = 0;
@@ -1479,6 +1492,9 @@ asmlinkage long sys_swapon(const char __user * specialfile, int swap_flags)
p->swap_file = NULL;
p->old_block_size = 0;
p->swap_map = NULL;
+#ifdef CONFIG_CGROUP_SWAP_RES_CTLR
+ p->memcg = NULL;
+#endif
p->lowest_bit = 0;
p->highest_bit = 0;
p->cluster_nr = 0;
@@ -1651,6 +1667,15 @@ asmlinkage long sys_swapon(const char __user * specialfile, int swap_flags)
1 /* header page */;
if (error)
goto bad_swap;
+
+#ifdef CONFIG_CGROUP_SWAP_RES_CTLR
+ p->memcg = vmalloc(maxpages * sizeof(struct mem_cgroup *));
+ if (!p->memcg) {
+ error = -ENOMEM;
+ goto bad_swap;
+ }
+ memset(p->memcg, 0, maxpages * sizeof(struct mem_cgroup *));
+#endif
}
if (nr_good_pages) {
@@ -1710,11 +1735,18 @@ bad_swap_2:
swap_map = p->swap_map;
p->swap_file = NULL;
p->swap_map = NULL;
+#ifdef CONFIG_CGROUP_SWAP_RES_CTLR
+ memcg = p->memcg;
+ p->memcg = NULL;
+#endif
p->flags = 0;
if (!(swap_flags & SWAP_FLAG_PREFER))
++least_priority;
spin_unlock(&swap_lock);
vfree(swap_map);
+#ifdef CONFIG_CGROUP_SWAP_RES_CTLR
+ vfree(memcg);
+#endif
if (swap_file)
filp_close(swap_file, NULL);
out:
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers
|
|
|
[PATCH 3/4] swapcgroup: implement charge/uncharge [message #30355 is a reply to message #30352] |
Thu, 22 May 2008 06:20   |
Daisuke Nishimura
Messages: 54 Registered: March 2008
|
Member |
|
|
This patch implements charge and uncharge of swapcgroup.
- what will be charged ?
charge the number of swap entries in bytes.
- when to charge/uncharge ?
charge at get_swap_entry(), and uncharge at swap_entry_free().
- to what group charge the swap entry ?
To determine to what mem_cgroup the swap entry should be charged,
I changed the argument of get_swap_entry() from (void) to
(struct page *). As a result, get_swap_entry() can determine
to what mem_cgroup it should charge the swap entry
by referring to page->page_cgroup->mem_cgroup.
- from what group uncharge the swap entry ?
I added to swap_info_struct a member 'struct swap_cgroup **',
array of pointer to which swap_cgroup the swap entry is
charged.
Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
---
include/linux/memcontrol.h | 21 +++++++++++++++++++
include/linux/swap.h | 4 +-
mm/memcontrol.c | 47 ++++++++++++++++++++++++++++++++++++++++++++
mm/shmem.c | 2 +-
mm/swap_state.c | 2 +-
mm/swapfile.c | 14 ++++++++++++-
6 files changed, 85 insertions(+), 5 deletions(-)
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index fdf3967..a7e6621 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -24,6 +24,7 @@ struct mem_cgroup;
struct page_cgroup;
struct page;
struct mm_struct;
+struct swap_info_struct;
#ifdef CONFIG_CGROUP_MEM_RES_CTLR
@@ -172,5 +173,25 @@ static inline long mem_cgroup_calc_reclaim_inactive(struct mem_cgroup *mem,
}
#endif /* CONFIG_CGROUP_MEM_CONT */
+#ifdef CONFIG_CGROUP_SWAP_RES_CTLR
+extern int swap_cgroup_charge(struct page *page,
+ struct swap_info_struct *si,
+ unsigned long offset);
+extern void swap_cgroup_uncharge(struct swap_info_struct *si,
+ unsigned long offset);
+#else /* CONFIG_CGROUP_SWAP_RES_CTLR */
+static inline int swap_cgroup_charge(struct page *page,
+ struct swap_info_struct *si,
+ unsigned long offset)
+{
+ return 0;
+}
+
+static inline void swap_cgroup_uncharge(struct swap_info_struct *si,
+ unsigned long offset)
+{
+}
+#endif /* CONFIG_CGROUP_SWAP_RES_CTLR */
+
#endif /* _LINUX_MEMCONTROL_H */
diff --git a/include/linux/swap.h b/include/linux/swap.h
index 67de27b..18887f0 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -241,7 +241,7 @@ extern struct page *swapin_readahead(swp_entry_t, gfp_t,
/* linux/mm/swapfile.c */
extern long total_swap_pages;
extern void si_swapinfo(struct sysinfo *);
-extern swp_entry_t get_swap_page(void);
+extern swp_entry_t get_swap_page(struct page *);
extern swp_entry_t get_swap_page_of_type(int);
extern int swap_duplicate(swp_entry_t);
extern int valid_swaphandles(swp_entry_t, unsigned long *);
@@ -342,7 +342,7 @@ static inline int remove_exclusive_swap_page(struct page *p)
return 0;
}
-static inline swp_entry_t get_swap_page(void)
+static inline swp_entry_t get_swap_page(struct page *page)
{
swp_entry_t entry;
entry.val = 0;
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index a837215..84e803d 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1220,3 +1220,50 @@ struct cgroup_subsys mem_cgroup_subsys = {
.attach = mem_cgroup_move_task,
.early_init = 0,
};
+
+#ifdef CONFIG_CGROUP_SWAP_RES_CTLR
+int swap_cgroup_charge(struct page *page,
+ struct swap_info_struct *si,
+ unsigned long offset)
+{
+ int ret;
+ struct page_cgroup *pc;
+ struct mem_cgroup *mem;
+
+ lock_page_cgroup(page);
+ pc = page_get_page_cgroup(page);
+ if (unlikely(!pc))
+ mem = &init_mem_cgroup;
+ else
+ mem = pc->mem_cgroup;
+ unlock_page_cgroup(page);
+
+ css_get(&mem->css);
+ ret = res_counter_charge(&mem->swap_res, PAGE_SIZE);
+ if (!ret)
+ si->memcg[offset] = mem;
+ else
+ css_put(&mem->css);
+
+ return ret;
+}
+
+void swap_cgroup_uncharge(struct swap_info_struct *si,
+ unsigned long offset)
+{
+ struct mem_cgroup *mem = si->memcg[offset];
+
+ /* "mem" would be NULL:
+ * 1. when get_swap_page() failed at charging swap_cgroup,
+ * and called swap_entry_free().
+ * 2. when this swap entry had been assigned by
+ * get_swap_page_of_type() (via SWSUSP?).
+ */
+ if (mem) {
+ res_counter_uncharge(&mem->swap_res, PAGE_SIZE);
+ si->memcg[offset] = NULL;
+ css_put(&mem->css);
+ }
+}
+#endif
+
diff --git a/mm/shmem.c b/mm/shmem.c
index 95b056d..69f8909 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1029,7 +1029,7 @@ static int shmem_writepage(struct page *page, struct writeback_control *wbc)
* want to check if there's a redundant swappage to be discarded.
*/
if (wbc->for_reclaim)
- swap = get_swap_page();
+ swap = get_swap_page(page);
else
swap.val = 0;
diff --git a/mm/swap_state.c b/mm/swap_state.c
index 676e191..a78d617 100644
--- a/mm/swap_state.c
+++ b/mm/swap_state.c
@@ -130,7 +130,7 @@ int add_to_swap(struct page * page, gfp_t gfp_mask)
BUG_ON(!PageUptodate(page));
for (;;) {
- entry = get_swap_page();
+ entry = get_swap_page(page);
if (!entry.val)
return 0;
diff --git a/mm/swapfile.c b/mm/swapfile.c
index 232bf20..682b71e 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -172,7 +172,10 @@ no_page:
return 0;
}
-swp_entry_t get_swap_page(void)
+/* get_swap_page() calls this */
+static int swap_entry_free(struct swap_info_struct *, unsigned long);
+
+swp_entry_t get_swap_page(struct page *page)
{
struct swap_info_struct *si;
pgoff_t offset;
@@ -201,6 +204,14 @@ swp_entry_t get_swap_page(void)
swap_list.next = next;
offset = scan_swap_map(si);
if (offset) {
+ /*
+ * This should be the first use of this swap entry.
+ * So, charge this swap entry here.
+ */
+ if (swap_cgroup_charge(page, si, offset)) {
+ swap_entry_free(si, offset);
+ goto noswap;
+ }
spin_unlock(&swap_lock);
return swp_entry(type, offset);
}
@@ -285,6 +296,7 @@ static int swap_entry_free(struct swap_info_struct *p, unsigned long offset)
swap_list.next = p - swap_info;
nr_swap_pages++;
p->inuse_pages--;
+ swap_cgroup_uncharge(p, offset);
}
}
return count;
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers
|
|
|
[PATCH 4/4] swapcgroup: modify vm_swap_full for cgroup [message #30356 is a reply to message #30352] |
Thu, 22 May 2008 06:22   |
Daisuke Nishimura
Messages: 54 Registered: March 2008
|
Member |
|
|
This patch modifies vm_swap_full() to calculate
the rate of swap usage per cgroup.
The purpose of this change is to free freeable swap caches
(that is, swap entries) per cgroup, so that swap_cgroup_charge()
fails less frequently.
I think I can free freeable swap caches more agressively
with one of Rik's splitlru patchset:
[patch 04/14] free swap space on swap-in/activation
but, I should verify whether this change to vm_swap_full()
is valid.
Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
---
include/linux/memcontrol.h | 3 +++
include/linux/swap.h | 6 +++++-
mm/memcontrol.c | 10 ++++++++++
mm/memory.c | 2 +-
mm/swapfile.c | 35 ++++++++++++++++++++++++++++++++++-
5 files changed, 53 insertions(+), 3 deletions(-)
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index a7e6621..256b298 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -179,6 +179,9 @@ extern int swap_cgroup_charge(struct page *page,
unsigned long offset);
extern void swap_cgroup_uncharge(struct swap_info_struct *si,
unsigned long offset);
+extern int swap_cgroup_vm_swap_full(struct page *page);
+extern u64 swap_cgroup_read_usage(struct mem_cgroup *mem);
+extern u64 swap_cgroup_read_limit(struct mem_cgroup *mem);
#else /* CONFIG_CGROUP_SWAP_RES_CTLR */
static inline int swap_cgroup_charge(struct page *page,
struct swap_info_struct *si,
diff --git a/include/linux/swap.h b/include/linux/swap.h
index 18887f0..ef156c9 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -159,8 +159,12 @@ struct swap_list_t {
int next; /* swapfile to be used next */
};
+#ifndef CONFIG_CGROUP_SWAP_RES_CTLR
/* Swap 50% full? Release swapcache more aggressively.. */
-#define vm_swap_full() (nr_swap_pages*2 < total_swap_pages)
+#define vm_swap_full(page) (nr_swap_pages*2 < total_swap_pages)
+#else
+#define vm_swap_full(page) swap_cgroup_vm_swap_full(page)
+#endif
/* linux/mm/page_alloc.c */
extern unsigned long totalram_pages;
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 84e803d..58d72ca 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1265,5 +1265,15 @@ void swap_cgroup_uncharge(struct swap_info_struct *si,
css_put(&mem->css);
}
}
+
+u64 swap_cgroup_read_usage(struct mem_cgroup *mem)
+{
+ return res_counter_read_u64(&mem->swap_res, RES_USAGE);
+}
+
+u64 swap_cgroup_read_limit(struct mem_cgroup *mem)
+{
+ return res_counter_read_u64(&mem->swap_res, RES_LIMIT);
+}
#endif
diff --git a/mm/memory.c b/mm/memory.c
index df8f0e9..be2ff96 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2175,7 +2175,7 @@ static int do_swap_page(struct mm_struct *mm, struct vm_area_struct *vma,
page_add_anon_rmap(page, vma, address);
swap_free(entry);
- if (vm_swap_full())
+ if (vm_swap_full(page))
remove_exclusive_swap_page(page);
unlock_page(page);
diff --git a/mm/swapfile.c b/mm/swapfile.c
index 682b71e..9256c2d 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -429,7 +429,7 @@ void free_swap_and_cache(swp_entry_t entry)
/* Only cache user (+us), or swap space full? Free it! */
/* Also recheck PageSwapCache after page is locked (above) */
if (PageSwapCache(page) && !PageWriteback(page) &&
- (one_user || vm_swap_full())) {
+ (one_user || vm_swap_full(page))) {
delete_from_swap_cache(page);
SetPageDirty(page);
}
@@ -1892,3 +1892,36 @@ int valid_swaphandles(swp_entry_t entry, unsigned long *offset)
*offset = ++toff;
return nr_pages? ++nr_pages: 0;
}
+
+#ifdef CONFIG_CGROUP_SWAP_RES_CTLR
+int swap_cgroup_vm_swap_full(struct page *page)
+{
+ int ret;
+ struct swap_info_struct *p;
+ struct mem_cgroup *mem;
+ u64 usage;
+ u64 limit;
+ swp_entry_t entry;
+
+ VM_BUG_ON(!PageLocked(page));
+ VM_BUG_ON(!PageSwapCache(page));
+
+ ret = 0;
+ entry.val = page_private(page);
+ p = swap_info_get(entry);
+ if (!p)
+ goto out;
+
+ mem = p->memcg[swp_offset(entry)];
+ usage = swap_cgroup_read_usage(mem) / PAGE_SIZE;
+ limit = swap_cgroup_read_limit(mem) / PAGE_SIZE;
+ limit = (limit < total_swap_pages) ? limit : total_swap_pages;
+
+ ret = usage * 2 > limit;
+
+ spin_unlock(&swap_lock);
+
+out:
+ return ret;
+}
+#endif
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers
|
|
|
|
Re: [PATCH 2/4] swapcgroup: add member to swap_info_struct for cgroup [message #30359 is a reply to message #30354] |
Thu, 22 May 2008 07:23   |
KAMEZAWA Hiroyuki
Messages: 463 Registered: September 2006
|
Senior Member |
|
|
On Thu, 22 May 2008 15:18:51 +0900
Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> wrote:
> This patch add a member to swap_info_struct for cgroup.
>
> This member, array of pointers to mem_cgroup, is used to
> remember to which cgroup each swap entries are charged.
>
> The memory for this array of pointers is allocated on swapon,
> and freed on swapoff.
>
Hi, in general, #ifdefs in the middle of functions are not good style.
I'd like to comment some hints.
>
> Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
>
> ---
> include/linux/swap.h | 3 +++
> mm/swapfile.c | 32 ++++++++++++++++++++++++++++++++
> 2 files changed, 35 insertions(+), 0 deletions(-)
>
> diff --git a/include/linux/swap.h b/include/linux/swap.h
> index de40f16..67de27b 100644
> --- a/include/linux/swap.h
> +++ b/include/linux/swap.h
> @@ -141,6 +141,9 @@ struct swap_info_struct {
> struct swap_extent *curr_swap_extent;
> unsigned old_block_size;
> unsigned short * swap_map;
> +#ifdef CONFIG_CGROUP_SWAP_RES_CTLR
> + struct mem_cgroup **memcg;
> +#endif
> unsigned int lowest_bit;
> unsigned int highest_bit;
> unsigned int cluster_next;
> diff --git a/mm/swapfile.c b/mm/swapfile.c
> index d3caf3a..232bf20 100644
> --- a/mm/swapfile.c
> +++ b/mm/swapfile.c
> @@ -1207,6 +1207,9 @@ asmlinkage long sys_swapoff(const char __user * specialfile)
> {
> struct swap_info_struct * p = NULL;
> unsigned short *swap_map;
> +#ifdef CONFIG_CGROUP_SWAP_RES_CTLR
> + struct mem_cgroup **memcg;
> +#endif
Remove #ifdef.
struct mem_cgroup **memcg = NULL;
> struct file *swap_file, *victim;
> struct address_space *mapping;
> struct inode *inode;
> @@ -1309,10 +1312,17 @@ asmlinkage long sys_swapoff(const char __user * specialfile)
> p->max = 0;
> swap_map = p->swap_map;
> p->swap_map = NULL;
> +#ifdef CONFIG_CGROUP_SWAP_RES_CTLR
> + memcg = p->memcg;
> + p->memcg = NULL;
> +#endif
==
#ifdef CONFIG_CGROUP_SWAP_RES_CTR
void swap_cgroup_init_memcg(p, memcg)
{
do something.
}
#else
void swap_cgroup_init_memcg(p, memcg)
{
}
#endif
==
> p->flags = 0;
> spin_unlock(&swap_lock);
> mutex_unlock(&swapon_mutex);
> vfree(swap_map);
> +#ifdef CONFIG_CGROUP_SWAP_RES_CTLR
> + vfree(memcg);
> +#endif
if (memcg)
vfree(memcg);
> inode = mapping->host;
> if (S_ISBLK(inode->i_mode)) {
> struct block_device *bdev = I_BDEV(inode);
> @@ -1456,6 +1466,9 @@ asmlinkage long sys_swapon(const char __user * specialfile, int swap_flags)
> unsigned long maxpages = 1;
> int swapfilesize;
> unsigned short *swap_map;
> +#ifdef CONFIG_CGROUP_SWAP_RES_CTLR
> + struct mem_cgroup **memcg;
> +#endif
Remove #ifdefs
> struct page *page = NULL;
> struct inode *inode = NULL;
> int did_down = 0;
> @@ -1479,6 +1492,9 @@ asmlinkage long sys_swapon(const char __user * specialfile, int swap_flags)
> p->swap_file = NULL;
> p->old_block_size = 0;
> p->swap_map = NULL;
> +#ifdef CONFIG_CGROUP_SWAP_RES_CTLR
> + p->memcg = NULL;
> +#endif
void init_swap_ctlr_memcg(p);
> p->lowest_bit = 0;
> p->highest_bit = 0;
> p->cluster_nr = 0;
> @@ -1651,6 +1667,15 @@ asmlinkage long sys_swapon(const char __user * specialfile, int swap_flags)
> 1 /* header page */;
> if (error)
> goto bad_swap;
> +
> +#ifdef CONFIG_CGROUP_SWAP_RES_CTLR
> + p->memcg = vmalloc(maxpages * sizeof(struct mem_cgroup *));
> + if (!p->memcg) {
> + error = -ENOMEM;
> + goto bad_swap;
> + }
> + memset(p->memcg, 0, maxpages * sizeof(struct mem_cgroup *));
> +#endif
void alloc_swap_ctlr_memcg(p)
But this implies swapon will fail at memory shortage. Is it good ?
> }
>
> if (nr_good_pages) {
> @@ -1710,11 +1735,18 @@ bad_swap_2:
> swap_map = p->swap_map;
> p->swap_file = NULL;
> p->swap_map = NULL;
> +#ifdef CONFIG_CGROUP_SWAP_RES_CTLR
> + memcg = p->memcg;
> + p->memcg = NULL;
> +#endif
> p->flags = 0;
> if (!(swap_flags & SWAP_FLAG_PREFER))
> ++least_priority;
> spin_unlock(&swap_lock);
> vfree(swap_map);
> +#ifdef CONFIG_CGROUP_SWAP_RES_CTLR
> + vfree(memcg);
> +#endif
> if (swap_file)
> filp_close(swap_file, NULL);
> out:
>
>
Thanks,
-Kame
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers
|
|
|
Re: [PATCH 3/4] swapcgroup: implement charge/uncharge [message #30361 is a reply to message #30355] |
Thu, 22 May 2008 07:35   |
KAMEZAWA Hiroyuki
Messages: 463 Registered: September 2006
|
Senior Member |
|
|
On Thu, 22 May 2008 15:20:05 +0900
Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> wrote:
> +#ifdef CONFIG_CGROUP_SWAP_RES_CTLR
> +int swap_cgroup_charge(struct page *page,
> + struct swap_info_struct *si,
> + unsigned long offset)
> +{
> + int ret;
> + struct page_cgroup *pc;
> + struct mem_cgroup *mem;
> +
> + lock_page_cgroup(page);
> + pc = page_get_page_cgroup(page);
> + if (unlikely(!pc))
> + mem = &init_mem_cgroup;
> + else
> + mem = pc->mem_cgroup;
> + unlock_page_cgroup(page);
If !pc, the page is used before memory controller is available. But is it
good to be charged to init_mem_cgroup() ?
How about returning 'failure' in this case ? I think returning 'failure' here
is not so bad.
> +
> + css_get(&mem->css);
move this css_get() before unlock_page_cgroup() is safer.
> + ret = res_counter_charge(&mem->swap_res, PAGE_SIZE);
> + if (!ret)
> + si->memcg[offset] = mem;
> + else
> + css_put(&mem->css);
> +
> + return ret;
> +}
> +
> +void swap_cgroup_uncharge(struct swap_info_struct *si,
> + unsigned long offset)
> +{
> + struct mem_cgroup *mem = si->memcg[offset];
> +
> + /* "mem" would be NULL:
> + * 1. when get_swap_page() failed at charging swap_cgroup,
> + * and called swap_entry_free().
> + * 2. when this swap entry had been assigned by
> + * get_swap_page_of_type() (via SWSUSP?).
> + */
> + if (mem) {
> + res_counter_uncharge(&mem->swap_res, PAGE_SIZE);
> + si->memcg[offset] = NULL;
> + css_put(&mem->css);
> + }
> +}
> +#endif
> +
> diff --git a/mm/shmem.c b/mm/shmem.c
> index 95b056d..69f8909 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -1029,7 +1029,7 @@ static int shmem_writepage(struct page *page, struct writeback_control *wbc)
> * want to check if there's a redundant swappage to be discarded.
> */
> if (wbc->for_reclaim)
> - swap = get_swap_page();
> + swap = get_swap_page(page);
> else
> swap.val = 0;
>
> diff --git a/mm/swap_state.c b/mm/swap_state.c
> index 676e191..a78d617 100644
> --- a/mm/swap_state.c
> +++ b/mm/swap_state.c
> @@ -130,7 +130,7 @@ int add_to_swap(struct page * page, gfp_t gfp_mask)
> BUG_ON(!PageUptodate(page));
>
> for (;;) {
> - entry = get_swap_page();
> + entry = get_swap_page(page);
> if (!entry.val)
> return 0;
>
> diff --git a/mm/swapfile.c b/mm/swapfile.c
> index 232bf20..682b71e 100644
> --- a/mm/swapfile.c
> +++ b/mm/swapfile.c
> @@ -172,7 +172,10 @@ no_page:
> return 0;
> }
>
> -swp_entry_t get_swap_page(void)
> +/* get_swap_page() calls this */
> +static int swap_entry_free(struct swap_info_struct *, unsigned long);
> +
> +swp_entry_t get_swap_page(struct page *page)
> {
> struct swap_info_struct *si;
> pgoff_t offset;
> @@ -201,6 +204,14 @@ swp_entry_t get_swap_page(void)
> swap_list.next = next;
> offset = scan_swap_map(si);
> if (offset) {
> + /*
> + * This should be the first use of this swap entry.
> + * So, charge this swap entry here.
> + */
> + if (swap_cgroup_charge(page, si, offset)) {
> + swap_entry_free(si, offset);
> + goto noswap;
> + }
> spin_unlock(&swap_lock);
> return swp_entry(type, offset);
> }
> @@ -285,6 +296,7 @@ static int swap_entry_free(struct swap_info_struct *p, unsigned long offset)
> swap_list.next = p - swap_info;
> nr_swap_pages++;
> p->inuse_pages--;
> + swap_cgroup_uncharge(p, offset);
> }
> }
> return count;
>
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers
|
|
|
|
|
Re: [PATCH 2/4] swapcgroup: add member to swap_info_struct for cgroup [message #30369 is a reply to message #30359] |
Thu, 22 May 2008 08:46   |
Daisuke Nishimura
Messages: 54 Registered: March 2008
|
Member |
|
|
Hi.
On 2008/05/22 16:23 +0900, KAMEZAWA Hiroyuki wrote:
> On Thu, 22 May 2008 15:18:51 +0900
> Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> wrote:
>
>> This patch add a member to swap_info_struct for cgroup.
>>
>> This member, array of pointers to mem_cgroup, is used to
>> remember to which cgroup each swap entries are charged.
>>
>> The memory for this array of pointers is allocated on swapon,
>> and freed on swapoff.
>>
> Hi, in general, #ifdefs in the middle of functions are not good style.
> I'd like to comment some hints.
>
I completely agree that it's not good style.
>> Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
>>
>> ---
>> include/linux/swap.h | 3 +++
>> mm/swapfile.c | 32 ++++++++++++++++++++++++++++++++
>> 2 files changed, 35 insertions(+), 0 deletions(-)
>>
>> diff --git a/include/linux/swap.h b/include/linux/swap.h
>> index de40f16..67de27b 100644
>> --- a/include/linux/swap.h
>> +++ b/include/linux/swap.h
>> @@ -141,6 +141,9 @@ struct swap_info_struct {
>> struct swap_extent *curr_swap_extent;
>> unsigned old_block_size;
>> unsigned short * swap_map;
>> +#ifdef CONFIG_CGROUP_SWAP_RES_CTLR
>> + struct mem_cgroup **memcg;
>> +#endif
>> unsigned int lowest_bit;
>> unsigned int highest_bit;
>> unsigned int cluster_next;
>> diff --git a/mm/swapfile.c b/mm/swapfile.c
>> index d3caf3a..232bf20 100644
>> --- a/mm/swapfile.c
>> +++ b/mm/swapfile.c
>> @@ -1207,6 +1207,9 @@ asmlinkage long sys_swapoff(const char __user * specialfile)
>> {
>> struct swap_info_struct * p = NULL;
>> unsigned short *swap_map;
>> +#ifdef CONFIG_CGROUP_SWAP_RES_CTLR
>> + struct mem_cgroup **memcg;
>> +#endif
> Remove #ifdef.
>
> struct mem_cgroup **memcg = NULL;
>
good idea.
I'll do it.
>> struct file *swap_file, *victim;
>> struct address_space *mapping;
>> struct inode *inode;
>> @@ -1309,10 +1312,17 @@ asmlinkage long sys_swapoff(const char __user * specialfile)
>> p->max = 0;
>> swap_map = p->swap_map;
>> p->swap_map = NULL;
>> +#ifdef CONFIG_CGROUP_SWAP_RES_CTLR
>> + memcg = p->memcg;
>> + p->memcg = NULL;
>> +#endif
>
>
> ==
> #ifdef CONFIG_CGROUP_SWAP_RES_CTR
> void swap_cgroup_init_memcg(p, memcg)
> {
> do something.
> }
> #else
> void swap_cgroup_init_memcg(p, memcg)
> {
> }
> #endif
> ==
>
I think swap_cgroup_init_memcg should return old value
of p->memcg, and I would like to name it swap_cgroup_clear_memcg,
because it is called by sys_swapoff, so "clear" rather than "init"
would be better.
How about something like this?
struct mem_cgroup **swap_cgroup_clear_memcg(p, memcg)
{
struct mem_cgroup **mem;
mem = p->memcg;
p->memcg = NULL;
return mem;
}
and at sys_swapoff():
struct mem_cgroup **memcg;
:
memcg = swap_cgroup_clear_memcg(p, memcg);
:
if (memcg)
vfree(memcg);
>> p->flags = 0;
>> spin_unlock(&swap_lock);
>> mutex_unlock(&swapon_mutex);
>> vfree(swap_map);
>> +#ifdef CONFIG_CGROUP_SWAP_RES_CTLR
>> + vfree(memcg);
>> +#endif
> if (memcg)
> vfree(memcg);
>
>
will do.
>> inode = mapping->host;
>> if (S_ISBLK(inode->i_mode)) {
>> struct block_device *bdev = I_BDEV(inode);
>> @@ -1456,6 +1466,9 @@ asmlinkage long sys_swapon(const char __user * specialfile, int swap_flags)
>> unsigned long maxpages = 1;
>> int swapfilesize;
>> unsigned short *swap_map;
>> +#ifdef CONFIG_CGROUP_SWAP_RES_CTLR
>> + struct mem_cgroup **memcg;
>> +#endif
> Remove #ifdefs
>
will do.
>> struct page *page = NULL;
>> struct inode *inode = NULL;
>> int did_down = 0;
>> @@ -1479,6 +1492,9 @@ asmlinkage long sys_swapon(const char __user * specialfile, int swap_flags)
>> p->swap_file = NULL;
>> p->old_block_size = 0;
>> p->swap_map = NULL;
>> +#ifdef CONFIG_CGROUP_SWAP_RES_CTLR
>> + p->memcg = NULL;
>> +#endif
>
> void init_swap_ctlr_memcg(p);
>
I would like to call this one swap_cgroup_init_memcg.
>> p->lowest_bit = 0;
>> p->highest_bit = 0;
>> p->cluster_nr = 0;
>> @@ -1651,6 +1667,15 @@ asmlinkage long sys_swapon(const char __user * specialfile, int swap_flags)
>> 1 /* header page */;
>> if (error)
>> goto bad_swap;
>> +
>> +#ifdef CONFIG_CGROUP_SWAP_RES_CTLR
>> + p->memcg = vmalloc(maxpages * sizeof(struct mem_cgroup *));
>> + if (!p->memcg) {
>> + error = -ENOMEM;
>> + goto bad_swap;
>> + }
>> + memset(p->memcg, 0, maxpages * sizeof(struct mem_cgroup *));
>> +#endif
> void alloc_swap_ctlr_memcg(p)
>
OK.
I'll implement swap_cgroup_alloc_memcg.
> But this implies swapon will fail at memory shortage. Is it good ?
>
Hum.
Would it be better to just disabling this feature?
>> }
>>
>> if (nr_good_pages) {
>> @@ -1710,11 +1735,18 @@ bad_swap_2:
>> swap_map = p->swap_map;
>> p->swap_file = NULL;
>> p->swap_map = NULL;
>> +#ifdef CONFIG_CGROUP_SWAP_RES_CTLR
>> + memcg = p->memcg;
>> + p->memcg = NULL;
>> +#endif
>> p->flags = 0;
>> if (!(swap_flags & SWAP_FLAG_PREFER))
>> ++least_priority;
>> spin_unlock(&swap_lock);
>> vfree(swap_map);
>> +#ifdef CONFIG_CGROUP_SWAP_RES_CTLR
>> + vfree(memcg);
>> +#endif
>> if (swap_file)
>> filp_close(swap_file, NULL);
>> out:
>>
I'll handle these 2 #ifdefs as sys_swapoff.
>>
>
> Thanks,
> -Kame
>
Thank you for your advice!
Daisuke Nishimura.
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers
|
|
|
Re: [PATCH 2/4] swapcgroup: add member to swap_info_struct for cgroup [message #30371 is a reply to message #30369] |
Thu, 22 May 2008 09:33   |
KAMEZAWA Hiroyuki
Messages: 463 Registered: September 2006
|
Senior Member |
|
|
On Thu, 22 May 2008 17:46:54 +0900
Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> wrote:
> > ==
> > #ifdef CONFIG_CGROUP_SWAP_RES_CTR
> > void swap_cgroup_init_memcg(p, memcg)
> > {
> > do something.
> > }
> > #else
> > void swap_cgroup_init_memcg(p, memcg)
> > {
> > }
> > #endif
> > ==
> >
> I think swap_cgroup_init_memcg should return old value
> of p->memcg, and I would like to name it swap_cgroup_clear_memcg,
> because it is called by sys_swapoff, so "clear" rather than "init"
> would be better.
>
> How about something like this?
>
> struct mem_cgroup **swap_cgroup_clear_memcg(p, memcg)
> {
> struct mem_cgroup **mem;
>
> mem = p->memcg;
> p->memcg = NULL;
>
> return mem;
> }
>
> and at sys_swapoff():
>
> struct mem_cgroup **memcg;
> :
> memcg = swap_cgroup_clear_memcg(p, memcg);
> :
> if (memcg)
> vfree(memcg);
>
seems good.
> >> +#ifdef CONFIG_CGROUP_SWAP_RES_CTLR
> >> + p->memcg = vmalloc(maxpages * sizeof(struct mem_cgroup *));
> >> + if (!p->memcg) {
> >> + error = -ENOMEM;
> >> + goto bad_swap;
> >> + }
> >> + memset(p->memcg, 0, maxpages * sizeof(struct mem_cgroup *));
> >> +#endif
> > void alloc_swap_ctlr_memcg(p)
> >
> OK.
> I'll implement swap_cgroup_alloc_memcg.
>
> > But this implies swapon will fail at memory shortage. Is it good ?
> >
> Hum.
> Would it be better to just disabling this feature?
>
I have no good idea. IMHO, adding printk() to show 'fatal status of
not-enough-memory-for-vmalloc' will be first step.
I believe vmalloc() tend not to fail on 64bit machine, but on i386,
vmalloc area is not enough.
Thanks,
-Kame
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers
|
|
|
|
|
|
|
Re: [PATCH 0/4] swapcgroup(v2) [message #30384 is a reply to message #30352] |
Fri, 23 May 2008 02:26   |
Rik van Riel
Messages: 11 Registered: February 2006
|
Junior Member |
|
|
On Thu, 22 May 2008 15:13:41 +0900
Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> wrote:
> I updated my swapcgroup patch.
I do not understand why this is useful.
With the other cgroup resource controllers, once a process
group reaches its limit, it is limited or punished in some
way. For example, when it goes over its RSS limit, memory
is taken away.
However, once a cgroup reaches its swap limit, it is
rewarded, by allowing more of its pages to stay resident
in RAM, instead of having them swapped out.
This, in turn, will cause the VM to evict pages from other,
better behaving groups. In short, the cgroup that has
"misbehaved" by reaching its limit causes other cgroups to
get punished.
Even worse is that a cgroup has NO CONTROL over how much
of its memory is kept in RAM and how much is swapped out.
This kind of decision is made on a system-wide basis by
the kernel, dependent on what other processes in the system
are doing. There also is no easy way for a cgroup to reduce
its swap use, unlike with other resources.
In what scenario would you use a resource controller that
rewards a group for reaching its limit?
How can the cgroup swap space controller help sysadmins
achieve performance or fairness goals on a system?
--
All rights reversed.
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers
|
|
|
|
Re: [PATCH 0/4] swapcgroup(v2) [message #30387 is a reply to message #30384] |
Fri, 23 May 2008 03:08   |
KAMEZAWA Hiroyuki
Messages: 463 Registered: September 2006
|
Senior Member |
|
|
On Thu, 22 May 2008 22:26:55 -0400
Rik van Riel <riel@redhat.com> wrote:
> Even worse is that a cgroup has NO CONTROL over how much
> of its memory is kept in RAM and how much is swapped out.
Could you explain "NO CONTROL" ? cgroup has LRU....
'how mucch memory should be swapped out from memory' is well controlled
in the VM besides LRU logic ?
> This kind of decision is made on a system-wide basis by
> the kernel, dependent on what other processes in the system
> are doing. There also is no easy way for a cgroup to reduce
> its swap use, unlike with other resources.
>
> In what scenario would you use a resource controller that
> rewards a group for reaching its limit?
>
> How can the cgroup swap space controller help sysadmins
> achieve performance or fairness goals on a system?
>
Perforamnce is not the first goal of this swap controller, I think.
This is for resouce isolation/overcommiting.
1. Some _crazy_ people considers swap as very-slow-memory resource ;)
I don't think so but I know there are tons of people....
2. Resource Isolation.
When a cgroup has memory limitation, it can create tons of swap.
For example, limit a cgroup's memory to be 128M and malloc 3G bytes.
2.8Gbytes of swap will be used _easily_. A process can use up all swap.
In that case, other process can't use swap.
IIRC, a man shown his motivation to controll swap in OLS2007/BOF as following.
==
Consider following system. (and there is no swap controller.)
Memory 4G. Swap 1G. with 2 cgroups A, B.
state 1) swap is not used.
A....memory limit to be 1G no swap usage memory_usage=0M
B....memory limit to be 1G no swap usage memory_usage=0M
state 2) Run a big program on A.
A....memory limit to be 1G and try to use 1.7G. uses 700MBytes of swap.
memory_usage=1G swap_usage=700M
B....memory_usage=0M
state 3) A some of programs ends in 'A'
A....memory_usage=500M swap_usage=700M
B....memory_usage=0M.
state 4) Run a big program on B.
A...memory_usage=500M swap_usage=700M.
B...memory_usage=1G swap_usage=300M
Group B can only use 1.3G because of unfair swap use of group A.
But users think why A uses 700M of swap with 500M of free memory....
If we don't have limitation to swap, we'll have to innovate a way to move swap
to memory in some reasonable logic.
Thanks,
-Kame
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers
|
|
|
Re: [PATCH 0/4] swapcgroup(v2) [message #30388 is a reply to message #30387] |
Fri, 23 May 2008 03:32   |
Rik van Riel
Messages: 11 Registered: February 2006
|
Junior Member |
|
|
On Fri, 23 May 2008 12:10:27 +0900
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> On Thu, 22 May 2008 22:26:55 -0400
> Rik van Riel <riel@redhat.com> wrote:
>
> > Even worse is that a cgroup has NO CONTROL over how much
> > of its memory is kept in RAM and how much is swapped out.
> Could you explain "NO CONTROL" ? cgroup has LRU....
> 'how mucch memory should be swapped out from memory' is well controlled
> in the VM besides LRU logic ?
The kernel controls what is swapped out. The userland
processes in the cgroup can do nothing to reduce their
swap usage.
> Consider following system. (and there is no swap controller.)
> Memory 4G. Swap 1G. with 2 cgroups A, B.
>
> state 1) swap is not used.
> A....memory limit to be 1G no swap usage memory_usage=0M
> B....memory limit to be 1G no swap usage memory_usage=0M
>
> state 2) Run a big program on A.
> A....memory limit to be 1G and try to use 1.7G. uses 700MBytes of swap.
> memory_usage=1G swap_usage=700M
> B....memory_usage=0M
>
> state 3) A some of programs ends in 'A'
> A....memory_usage=500M swap_usage=700M
> B....memory_usage=0M.
>
> state 4) Run a big program on B.
> A...memory_usage=500M swap_usage=700M.
> B...memory_usage=1G swap_usage=300M
>
> Group B can only use 1.3G because of unfair swap use of group A.
> But users think why A uses 700M of swap with 500M of free memory....
>
> If we don't have limitation to swap, we'll have to innovate a way to move swap
> to memory in some reasonable logic.
OK, I see the use case.
In the above example, it would be possible for cgroup A
to have only 800MB of anonymous memory total, in addition
to 400MB of page cache. The page cache could push the
anonymous memory into swap, indirectly penalizing how much
memory cgroup B can use.
Of course, it could be argued that the system should just
be run with enough swap space, but that is another story :)
--
All rights reversed.
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers
|
|
|
Re: [PATCH 0/4] swapcgroup(v2) [message #30389 is a reply to message #30387] |
Fri, 23 May 2008 03:59   |
Balbir Singh
Messages: 491 Registered: August 2006
|
Senior Member |
|
|
KAMEZAWA Hiroyuki wrote:
> On Thu, 22 May 2008 22:26:55 -0400
> Rik van Riel <riel@redhat.com> wrote:
>
>> Even worse is that a cgroup has NO CONTROL over how much
>> of its memory is kept in RAM and how much is swapped out.
We used to have a control on the swap cache pages as well, but their
implementation needed more thought
> Could you explain "NO CONTROL" ? cgroup has LRU....
> 'how mucch memory should be swapped out from memory' is well controlled
> in the VM besides LRU logic ?
>
>> This kind of decision is made on a system-wide basis by
>> the kernel, dependent on what other processes in the system
>> are doing. There also is no easy way for a cgroup to reduce
>> its swap use, unlike with other resources.
>>
One option is to limit the virtual address space usage of the cgroup to ensure
that swap usage of a cgroup will *not* exceed the specified limit. Along with a
good swap controller, it should provide good control over the cgroup's memory usage.
>
>> In what scenario would you use a resource controller that
>> rewards a group for reaching its limit?
>>
>> How can the cgroup swap space controller help sysadmins
>> achieve performance or fairness goals on a system?
>>
> Perforamnce is not the first goal of this swap controller, I think.
> This is for resouce isolation/overcommiting.
>
> 1. Some _crazy_ people considers swap as very-slow-memory resource ;)
> I don't think so but I know there are tons of people....
>
> 2. Resource Isolation.
> When a cgroup has memory limitation, it can create tons of swap.
> For example, limit a cgroup's memory to be 128M and malloc 3G bytes.
> 2.8Gbytes of swap will be used _easily_. A process can use up all swap.
> In that case, other process can't use swap.
>
> IIRC, a man shown his motivation to controll swap in OLS2007/BOF as following.
> ==
> Consider following system. (and there is no swap controller.)
> Memory 4G. Swap 1G. with 2 cgroups A, B.
>
> state 1) swap is not used.
> A....memory limit to be 1G no swap usage memory_usage=0M
> B....memory limit to be 1G no swap usage memory_usage=0M
>
> state 2) Run a big program on A.
> A....memory limit to be 1G and try to use 1.7G. uses 700MBytes of swap.
> memory_usage=1G swap_usage=700M
> B....memory_usage=0M
>
> state 3) A some of programs ends in 'A'
> A....memory_usage=500M swap_usage=700M
> B....memory_usage=0M.
>
> state 4) Run a big program on B.
> A...memory_usage=500M swap_usage=700M.
> B...memory_usage=1G swap_usage=300M
>
> Group B can only use 1.3G because of unfair swap use of group A.
> But users think why A uses 700M of swap with 500M of free memory....
>
> If we don't have limitation to swap, we'll have to innovate a way to move swap
> to memory in some reasonable logic.
>
> Thanks,
> -Kame
--
Warm Regards,
Balbir Singh
Linux Technology Center
IBM, ISTL
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers
|
|
|
|
Re: [PATCH 0/4] swapcgroup(v2) [message #30391 is a reply to message #30352] |
Fri, 23 May 2008 04:51   |
Balbir Singh
Messages: 491 Registered: August 2006
|
Senior Member |
|
|
KOSAKI Motohiro wrote:
>> One option is to limit the virtual address space usage of the cgroup to ensure
>> that swap usage of a cgroup will *not* exceed the specified limit. Along with a
>> good swap controller, it should provide good control over the cgroup's memory usage.
>
> unfortunately, it doesn't works in real world.
> IMHO you said as old good age.
>
> because, Some JavaVM consume crazy large virtual address space.
> it often consume >10x than phycal memory consumption.
>
Have you seen any real world example of this? The overcommit feature of Linux.
We usually by default limit the overcommit to 1.5 times total memory (IIRC).
Yes, one can override that value, you get the same flexibility with the virtual
address space controller.
I thought java was particular about it with it's heap management options and policy.
> yes, that behaviour is crazy. but it is used widely.
> thus, We shouldn't assume virtual address space limitation.
It's useful in many cases to limit the virtual address space - to allow
applications to deal with memory failure, rather than
1. OOM the application later
2. Allow uncontrolled swapping (swap controller would help here)
--
Warm Regards,
Balbir Singh
Linux Technology Center
IBM, ISTL
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers
|
|
|
|
|
Re: [PATCH 3/4] swapcgroup: implement charge/uncharge [message #30398 is a reply to message #30361] |
Fri, 23 May 2008 11:52   |
Daisuke Nishimura
Messages: 54 Registered: March 2008
|
Member |
|
|
On 2008/05/22 16:37 +0900, KAMEZAWA Hiroyuki wrote:
> On Thu, 22 May 2008 15:20:05 +0900
> Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> wrote:
>
>> +#ifdef CONFIG_CGROUP_SWAP_RES_CTLR
>> +int swap_cgroup_charge(struct page *page,
>> + struct swap_info_struct *si,
>> + unsigned long offset)
>> +{
>> + int ret;
>> + struct page_cgroup *pc;
>> + struct mem_cgroup *mem;
>> +
>> + lock_page_cgroup(page);
>> + pc = page_get_page_cgroup(page);
>> + if (unlikely(!pc))
>> + mem = &init_mem_cgroup;
>> + else
>> + mem = pc->mem_cgroup;
>> + unlock_page_cgroup(page);
>
> If !pc, the page is used before memory controller is available. But is it
> good to be charged to init_mem_cgroup() ?
I'm sorry, but I can't understand this situation.
memory controller is initialized at kernel initialization,
so aren't processes created after it is initialized?
> How about returning 'failure' in this case ? I think returning 'failure' here
> is not so bad.
>
>
Which of below do you mean by 'failure'?
A. make it fail to get swap entry, so the page cannot be swapped out.
B. don't charge this swap entry to any cgroup, but the page
would be swapped out.
I don't want to do B, because I don't want to make such
not-charged-to-anywhere entries.
And I've seen several times this condition(!pc) becomes true,
so I charged to init_mem_cgroup.
BTW, I noticed that almost all of functions I added by this patch set
should check "mem_cgroup_subsys.disabled" first because it depend on
memory cgroup.
>> +
>> + css_get(&mem->css);
>
> move this css_get() before unlock_page_cgroup() is safer.
>
OK, thanks.
Daisuke Nishimura.
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers
|
|
|
|
|
Re: [PATCH 3/4] swapcgroup: implement charge/uncharge [message #30433 is a reply to message #30398] |
Mon, 26 May 2008 00:55   |
KAMEZAWA Hiroyuki
Messages: 463 Registered: September 2006
|
Senior Member |
|
|
On Fri, 23 May 2008 20:52:29 +0900
Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> wrote:
> On 2008/05/22 16:37 +0900, KAMEZAWA Hiroyuki wrote:
> > On Thu, 22 May 2008 15:20:05 +0900
> > Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> wrote:
> >
> >> +#ifdef CONFIG_CGROUP_SWAP_RES_CTLR
> >> +int swap_cgroup_charge(struct page *page,
> >> + struct swap_info_struct *si,
> >> + unsigned long offset)
> >> +{
> >> + int ret;
> >> + struct page_cgroup *pc;
> >> + struct mem_cgroup *mem;
> >> +
> >> + lock_page_cgroup(page);
> >> + pc = page_get_page_cgroup(page);
> >> + if (unlikely(!pc))
> >> + mem = &init_mem_cgroup;
> >> + else
> >> + mem = pc->mem_cgroup;
> >> + unlock_page_cgroup(page);
> >
> > If !pc, the page is used before memory controller is available. But is it
> > good to be charged to init_mem_cgroup() ?
> I'm sorry, but I can't understand this situation.
> memory controller is initialized at kernel initialization,
> so aren't processes created after it is initialized?
>
I think add_to_page_cache() may be called before late_init..I'll check again.
(Because I saw some panics related to it, but I noticed this is _swap_ controller
...)
> > How about returning 'failure' in this case ? I think returning 'failure' here
> > is not so bad.
> >
> >
> Which of below do you mean by 'failure'?
>
> A. make it fail to get swap entry, so the page cannot be swapped out.
> B. don't charge this swap entry to any cgroup, but the page
> would be swapped out.
means A.
>
> I don't want to do B, because I don't want to make such
> not-charged-to-anywhere entries.
> And I've seen several times this condition(!pc) becomes true,
> so I charged to init_mem_cgroup.
>
>
> BTW, I noticed that almost all of functions I added by this patch set
> should check "mem_cgroup_subsys.disabled" first because it depend on
> memory cgroup.
>
Ah, yes, please.
Thanks,
-Kame
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers
|
|
|
|
|
|
Re: [PATCH 0/4] swapcgroup(v2) [message #30449 is a reply to message #30391] |
Fri, 23 May 2008 05:29   |
David Singleton
Messages: 1 Registered: May 2008
|
Junior Member |
|
|
Balbir Singh wrote:
> KOSAKI Motohiro wrote:
>>> One option is to limit the virtual address space usage of the cgroup to ensure
>>> that swap usage of a cgroup will *not* exceed the specified limit. Along with a
>>> good swap controller, it should provide good control over the cgroup's memory usage.
>> unfortunately, it doesn't works in real world.
>> IMHO you said as old good age.
>>
>> because, Some JavaVM consume crazy large virtual address space.
>> it often consume >10x than phycal memory consumption.
>>
>
> Have you seen any real world example of this?
At the unsophisticated end, there are lots of (Fortran) HPC applications
with very large static array declarations but only "use" a small fraction
of that. Those users know they only need a small fraction and are happy
to volunteer small physical memory limits that we (admins/queuing
systems) can apply.
At the sophisticated end, the use of numerous large memory maps in
parallel HPC applications to gain visibility into other processes is
growing. We have processes with VSZ > 400GB just because they have
4GB maps into 127 other processes. Their physical page use is of
the order 2GB.
Imposing virtual address space limits on these applications is
meaningless.
The overcommit feature of Linux.
> We usually by default limit the overcommit to 1.5 times total memory (IIRC).
> Yes, one can override that value, you get the same flexibility with the virtual
> address space controller.
>
> I thought java was particular about it with it's heap management options and policy.
>
>> yes, that behaviour is crazy. but it is used widely.
>> thus, We shouldn't assume virtual address space limitation.
>
> It's useful in many cases to limit the virtual address space - to allow
> applications to deal with memory failure, rather than
>
> 1. OOM the application later
> 2. Allow uncontrolled swapping (swap controller would help here)
>
David
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers
|
|
|
|
|
|
Re: [PATCH 0/4] swapcgroup(v2) [message #30487 is a reply to message #30485] |
Tue, 27 May 2008 08:30   |
Daisuke Nishimura
Messages: 54 Registered: March 2008
|
Member |
|
|
On 2008/05/27 16:42 +0900, Balbir Singh wrote:
> YAMAMOTO Takashi wrote:
>> hi,
>>
>>>> Thanks for looking into this. Yamamoto-San is also looking into a swap
>>>> controller. Is there a consensus on the approach?
>>>>
>>> Not yet, but I think we should have some consensus each other
>>> before going further.
>>>
>>>
>>> Thanks,
>>> Daisuke Nishimura.
>> while nishimura-san's one still seems to have a lot of todo,
>> it seems good enough as a start point to me.
>> so i'd like to withdraw mine.
>>
>> nishimura-san, is it ok for you?
>>
Of cource.
I'll work hard to make it better.
>
> I would suggest that me merge the good parts from both into the swap controller.
> Having said that I'll let the two of you decide on what the good aspects of both
> are. I cannot see any immediate overlap, but there might be some w.r.t.
> infrastructure used.
>
Well, you mean you'll make another patch based on yamamoto-san's
and mine?
Basically, I think it's difficult to merge
because we charge different objects.
Thanks,
Daisuke Nishimura.
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers
|
|
|
Re: [PATCH 0/4] swapcgroup(v2) [message #30492 is a reply to message #30487] |
Tue, 27 May 2008 13:18   |
Balbir Singh
Messages: 491 Registered: August 2006
|
Senior Member |
|
|
Daisuke Nishimura wrote:
> On 2008/05/27 16:42 +0900, Balbir Singh wrote:
>> YAMAMOTO Takashi wrote:
>>> hi,
>>>
>>>>> Thanks for looking into this. Yamamoto-San is also looking into a swap
>>>>> controller. Is there a consensus on the approach?
>>>>>
>>>> Not yet, but I think we should have some consensus each other
>>>> before going further.
>>>>
>>>>
>>>> Thanks,
>>>> Daisuke Nishimura.
>>> while nishimura-san's one still seems to have a lot of todo,
>>> it seems good enough as a start point to me.
>>> so i'd like to withdraw mine.
>>>
>>> nishimura-san, is it ok for you?
>>>
> Of cource.
> I'll work hard to make it better.
>
>> I would suggest that me merge the good parts from both into the swap controller.
>> Having said that I'll let the two of you decide on what the good aspects of both
>> are. I cannot see any immediate overlap, but there might be some w.r.t.
>> infrastructure used.
>>
> Well, you mean you'll make another patch based on yamamoto-san's
> and mine?
>
> Basically, I think it's difficult to merge
> because we charge different objects.
>
Yes, I know - that's why I said infrastructure, to see the grouping and common
data structure aspects. I'll try and review the code.
--
Warm Regards,
Balbir Singh
Linux Technology Center
IBM, ISTL
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers
|
|
|
Re: [PATCH 0/4] swapcgroup(v2) [message #30494 is a reply to message #30492] |
Tue, 27 May 2008 13:42   |
Daisuke Nishimura
Messages: 54 Registered: March 2008
|
Member |
|
|
On 2008/05/27 22:18 +0900, Balbir Singh wrote:
> Daisuke Nishimura wrote:
>> On 2008/05/27 16:42 +0900, Balbir Singh wrote:
>>> YAMAMOTO Takashi wrote:
>>>> hi,
>>>>
>>>>>> Thanks for looking into this. Yamamoto-San is also looking into a swap
>>>>>> controller. Is there a consensus on the approach?
>>>>>>
>>>>> Not yet, but I think we should have some consensus each other
>>>>> before going further.
>>>>>
>>>>>
>>>>> Thanks,
>>>>> Daisuke Nishimura.
>>>> while nishimura-san's one still seems to have a lot of todo,
>>>> it seems good enough as a start point to me.
>>>> so i'd like to withdraw mine.
>>>>
>>>> nishimura-san, is it ok for you?
>>>>
>> Of cource.
>> I'll work hard to make it better.
>>
>>> I would suggest that me merge the good parts from both into the swap controller.
>>> Having said that I'll let the two of you decide on what the good aspects of both
>>> are. I cannot see any immediate overlap, but there might be some w.r.t.
>>> infrastructure used.
>>>
>> Well, you mean you'll make another patch based on yamamoto-san's
>> and mine?
>>
>> Basically, I think it's difficult to merge
>> because we charge different objects.
>>
>
> Yes, I know - that's why I said infrastructure, to see the grouping and common
> data structure aspects. I'll try and review the code.
>
OK.
I'll wait for your patch.
Thanks,
Daisuke Nishimura.
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers
|
|
|
Re: [PATCH 3/4] swapcgroup: implement charge/uncharge [message #30495 is a reply to message #30433] |
Tue, 27 May 2008 13:42   |
KAMEZAWA Hiroyuki
Messages: 463 Registered: September 2006
|
Senior Member |
|
|
On Mon, 26 May 2008 09:57:06 +0900
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> On Fri, 23 May 2008 20:52:29 +0900
> Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> wrote:
>
> > On 2008/05/22 16:37 +0900, KAMEZAWA Hiroyuki wrote:
> > > On Thu, 22 May 2008 15:20:05 +0900
> > > Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> wrote:
> > >
> > >> +#ifdef CONFIG_CGROUP_SWAP_RES_CTLR
> > >> +int swap_cgroup_charge(struct page *page,
> > >> + struct swap_info_struct *si,
> > >> + unsigned long offset)
> > >> +{
> > >> + int ret;
> > >> + struct page_cgroup *pc;
> > >> + struct mem_cgroup *mem;
> > >> +
> > >> + lock_page_cgroup(page);
> > >> + pc = page_get_page_cgroup(page);
> > >> + if (unlikely(!pc))
> > >> + mem = &init_mem_cgroup;
> > >> + else
> > >> + mem = pc->mem_cgroup;
> > >> + unlock_page_cgroup(page);
> > >
> > > If !pc, the page is used before memory controller is available. But is it
> > > good to be charged to init_mem_cgroup() ?
> > I'm sorry, but I can't understand this situation.
> > memory controller is initialized at kernel initialization,
> > so aren't processes created after it is initialized?
> >
> I think add_to_page_cache() may be called before late_init..I'll check again.
> (Because I saw some panics related to it, but I noticed this is _swap_ controller
> ...)
_Now_, force_empty() will create a page which is used but
page->page_cgroup is NULL page. I'm now writing a workaround (1/4 in my newest set)
but it's better to check page->page_cgroup is NULL or not.
Thanks,
-Kame
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers
|
|
|
Goto Forum:
Current Time: Fri Oct 24 07:47:22 GMT 2025
Total time taken to generate the page: 0.09509 seconds
|