| Home » Mailing lists » Devel » [RFC][PATCH] UBC: user resource beancounters Goto Forum:
	| 
		
			| [RFC][PATCH] UBC: user resource beancounters [message #5192] | Wed, 16 August 2006 15:23  |  
			| 
				
				
					|  dev Messages: 1693
 Registered: September 2005
 Location: Moscow
 | Senior Member |  
 |  |  
	| The following patch set presents base of User Resource Beancounters (UBC).
 UBC allows to account and control consumption
 of kernel resources used by group of processes.
 
 The full UBC patch set allows to control:
 - kernel memory. All the kernel objects allocatable
 on user demand should be accounted and limited
 for DoS protection.
 E.g. page tables, task structs, vmas etc.
 
 - virtual memory pages. UBC allows to
 limit a container to some amount of memory and
 introduces 2-level OOM killer taking into account
 container's consumption.
 pages shared between containers are correctly
 charged as fractions (tunable).
 
 - network buffers. These includes TCP/IP rcv/snd
 buffers, dgram snd buffers, unix, netlinks and
 other buffers.
 
 - minor resources accounted/limited by number:
 tasks, files, flocks, ptys, siginfo, pinned dcache
 mem, sockets, iptentries (for containers with
 virtualized networking)
 
 As the first step we want to propose for discussion
 the most complicated parts of resource management:
 kernel memory and virtual memory.
 The patch set to be sent provides core for UBC and
 management of kernel memory only. Virtual memory
 management will be sent in a couple of days.
 
 The patches in these series are:
 diff-ubc-kconfig.patch:
 Adds kernel/ub/Kconfig file with UBC options and
 includes it into arch Kconfigs
 
 diff-ubc-core.patch:
 Contains core functionality and interfaces of UBC:
 find/create beancounter, initialization,
 charge/uncharge of resource, core objects' declarations.
 
 diff-ubc-task.patch:
 Contains code responsible for setting UB on task,
 it's inheriting and setting host context in interrupts.
 
 Task contains three beancounters:
 1. exec_ub  - current context. all resources are charged
 to this beancounter.
 2. task_ub  - beancounter to which task_struct is charged
 itself.
 3. fork_sub - beancounter which is inherited by
 task's children on fork
 
 diff-ubc-syscalls.patch:
 Patch adds system calls for UB management:
 1. sys_getluid    - get current UB id
 2. sys_setluid    - changes exec_ and fork_ UBs on current
 3. sys_setublimit - set limits for resources consumtions
 
 diff-ubc-kmem-core.patch:
 Introduces UB_KMEMSIZE resource which accounts kernel
 objects allocated by task's request.
 
 Objects are accounted via struct page and slab objects.
 For the latter ones each slab contains a set of pointers
 corresponding object is charged to.
 
 Allocation charge rules:
 1. Pages - if allocation is performed with __GFP_UBC flag - page
 is charged to current's exec_ub.
 2. Slabs - kmem_cache may be created with SLAB_UBC flag - in this
 case each allocation is charged. Caches used by kmalloc are
 created with SLAB_UBC | SLAB_UBC_NOCHARGE flags. In this case
 only __GFP_UBC allocations are charged.
 
 diff-ubc-kmem-charge.patch:
 Adds SLAB_UBC and __GFP_UBC flags in appropriate places
 to cause charging/limiting of specified resources.
 
 diff-ubc-proc.patch:
 Adds two proc entries user_beancounters and user_beancounters_sub
 allowing to see current state (usage/limits/fails for each UB).
 Implemented via seq files.
 
 Patch set is applicable to 2.6.18-rc4-mm1
 
 Thanks,
 Kirill
 |  
	|  |  |  
	| 
		
			| [RFC][PATCH 1/7] UBC: kconfig [message #5195 is a reply to message #5192] | Wed, 16 August 2006 15:34   |  
			| 
				
				
					|  dev Messages: 1693
 Registered: September 2005
 Location: Moscow
 | Senior Member |  
 |  |  
	| Add kernel/ub/Kconfig file with UBC options and includes it into arch Kconfigs
 
 Signed-Off-By: Pavel Emelianov <xemul@sw.ru>
 Signed-Off-By: Kirill Korotaev <dev@sw.ru>
 
 ---
 arch/i386/Kconfig    |    2 ++
 arch/ia64/Kconfig    |    2 ++
 arch/powerpc/Kconfig |    2 ++
 arch/ppc/Kconfig     |    2 ++
 arch/sparc/Kconfig   |    2 ++
 arch/sparc64/Kconfig |    2 ++
 arch/x86_64/Kconfig  |    2 ++
 kernel/ub/Kconfig    |   25 +++++++++++++++++++++++++
 8 files changed, 39 insertions(+)
 
 --- ./arch/i386/Kconfig.ubkm	2006-07-10 12:39:10.000000000 +0400
 +++ ./arch/i386/Kconfig	2006-07-28 14:10:41.000000000 +0400
 @@ -1146,6 +1146,8 @@ source "crypto/Kconfig"
 
 source "lib/Kconfig"
 
 +source "kernel/ub/Kconfig"
 +
 #
 # Use the generic interrupt handling code in kernel/irq/:
 #
 --- ./arch/ia64/Kconfig.ubkm	2006-07-10 12:39:10.000000000 +0400
 +++ ./arch/ia64/Kconfig	2006-07-28 14:10:56.000000000 +0400
 @@ -481,6 +481,8 @@ source "fs/Kconfig"
 
 source "lib/Kconfig"
 
 +source "kernel/ub/Kconfig"
 +
 #
 # Use the generic interrupt handling code in kernel/irq/:
 #
 --- ./arch/powerpc/Kconfig.arkcfg	2006-08-07 14:07:12.000000000 +0400
 +++ ./arch/powerpc/Kconfig	2006-08-10 17:55:58.000000000 +0400
 @@ -1038,6 +1038,8 @@ source "arch/powerpc/platforms/iseries/K
 
 source "lib/Kconfig"
 
 +source "ub/Kconfig"
 +
 menu "Instrumentation Support"
 depends on EXPERIMENTAL
 
 --- ./arch/ppc/Kconfig.arkcfg	2006-07-10 12:39:10.000000000 +0400
 +++ ./arch/ppc/Kconfig	2006-08-10 17:56:13.000000000 +0400
 @@ -1414,6 +1414,8 @@ endmenu
 
 source "lib/Kconfig"
 
 +source "ub/Kconfig"
 +
 source "arch/powerpc/oprofile/Kconfig"
 
 source "arch/ppc/Kconfig.debug"
 --- ./arch/sparc/Kconfig.arkcfg	2006-04-21 11:59:32.000000000 +0400
 +++ ./arch/sparc/Kconfig	2006-08-10 17:56:24.000000000 +0400
 @@ -296,3 +296,5 @@ source "security/Kconfig"
 source "crypto/Kconfig"
 
 source "lib/Kconfig"
 +
 +source "ub/Kconfig"
 --- ./arch/sparc64/Kconfig.arkcfg	2006-07-17 17:01:11.000000000 +0400
 +++ ./arch/sparc64/Kconfig	2006-08-10 17:56:36.000000000 +0400
 @@ -432,3 +432,5 @@ source "security/Kconfig"
 source "crypto/Kconfig"
 
 source "lib/Kconfig"
 +
 +source "lib/Kconfig"
 --- ./arch/x86_64/Kconfig.ubkm	2006-07-10 12:39:11.000000000 +0400
 +++ ./arch/x86_64/Kconfig	2006-07-28 14:10:49.000000000 +0400
 @@ -655,3 +655,5 @@ source "security/Kconfig"
 source "crypto/Kconfig"
 
 source "lib/Kconfig"
 +
 +source "kernel/ub/Kconfig"
 --- ./kernel/ub/Kconfig.ubkm	2006-07-28 13:07:38.000000000 +0400
 +++ ./kernel/ub/Kconfig	2006-07-28 13:09:51.000000000 +0400
 @@ -0,0 +1,25 @@
 +#
 +# User resources part (UBC)
 +#
 +# Copyright (C) 2006 OpenVZ. SWsoft Inc
 +
 +menu "User resources"
 +
 +config USER_RESOURCE
 +	bool "Enable user resource accounting"
 +	default y
 +	help
 +          This patch provides accounting and allows to configure
 +          limits for user's consumption of exhaustible system resources.
 +          The most important resource controlled by this patch is unswappable
 +          memory (either mlock'ed or used by internal kernel structures and
 +          buffers). The main goal of this patch is to protect processes
 +          from running short of important resources because of an accidental
 +          misbehavior of processes or malicious activity aiming to ``kill''
 +          the system. It's worth to mention that resource limits configured
 +          by setrlimit(2) do not give an acceptable level of protection
 +          because they cover only small fraction of resources and work on a
 +          per-process basis.  Per-process accounting doesn't prevent malicious
 +          users from spawning a lot of resource-consuming processes.
 +
 +endmenu
 |  
	|  |  |  
	| 
		
			| [RFC][PATCH 2/7] UBC: core (structures, API) [message #5196 is a reply to message #5192] | Wed, 16 August 2006 15:35   |  
			| 
				
				
					|  dev Messages: 1693
 Registered: September 2005
 Location: Moscow
 | Senior Member |  
 |  |  
	| Core functionality and interfaces of UBC: find/create beancounter, initialization,
 charge/uncharge of resource, core objects' declarations.
 
 Basic structures:
 ubparm           - resource description
 user_beancounter - set of resources, id, lock
 
 Signed-Off-By: Pavel Emelianov <xemul@sw.ru>
 Signed-Off-By: Kirill Korotaev <dev@sw.ru>
 
 ---
 include/ub/beancounter.h |  157 ++++++++++++++++++
 init/main.c              |    4
 kernel/Makefile          |    1
 kernel/ub/Makefile       |    7
 kernel/ub/beancounter.c  |  398 +++++++++++++++++++++++++++++++++++++++++++++++
 5 files changed, 567 insertions(+)
 
 --- /dev/null	2006-07-18 14:52:43.075228448 +0400
 +++ ./include/ub/beancounter.h	2006-08-10 14:58:27.000000000 +0400
 @@ -0,0 +1,157 @@
 +/*
 + *  include/ub/beancounter.h
 + *
 + *  Copyright (C) 2006 OpenVZ. SWsoft Inc
 + *
 + */
 +
 +#ifndef _LINUX_BEANCOUNTER_H
 +#define _LINUX_BEANCOUNTER_H
 +
 +/*
 + *	Resource list.
 + */
 +
 +#define UB_RESOURCES	0
 +
 +struct ubparm {
 +	/*
 +	 * A barrier over which resource allocations are failed gracefully.
 +	 * e.g. if the amount of consumed memory is over the barrier further
 +	 * sbrk() or mmap() calls fail, the existing processes are not killed.
 +	 */
 +	unsigned long	barrier;
 +	/* hard resource limit */
 +	unsigned long	limit;
 +	/* consumed resources */
 +	unsigned long	held;
 +	/* maximum amount of consumed resources through the last period */
 +	unsigned long	maxheld;
 +	/* minimum amount of consumed resources through the last period */
 +	unsigned long	minheld;
 +	/* count of failed charges */
 +	unsigned long	failcnt;
 +};
 +
 +/*
 + * Kernel internal part.
 + */
 +
 +#ifdef __KERNEL__
 +
 +#include <linux/config.h>
 +#include <linux/spinlock.h>
 +#include <linux/list.h>
 +#include <asm/atomic.h>
 +
 +/*
 + * UB_MAXVALUE is essentially LONG_MAX declared in a cross-compiling safe form.
 + */
 +#define UB_MAXVALUE	( (1UL << (sizeof(unsigned long)*8-1)) - 1)
 +
 +
 +/*
 + *	Resource management structures
 + * Serialization issues:
 + *   beancounter list management is protected via ub_hash_lock
 + *   task pointers are set only for current task and only once
 + *   refcount is managed atomically
 + *   value and limit comparison and change are protected by per-ub spinlock
 + */
 +
 +struct user_beancounter
 +{
 +	atomic_t		ub_refcount;
 +	spinlock_t		ub_lock;
 +	uid_t			ub_uid;
 +	struct hlist_node	hash;
 +
 +	struct user_beancounter	*parent;
 +	void			*private_data;
 +
 +	/* resources statistics and settings */
 +	struct ubparm		ub_parms[UB_RESOURCES];
 +};
 +
 +enum severity { UB_BARRIER, UB_LIMIT, UB_FORCE };
 +
 +/* Flags passed to beancounter_findcreate() */
 +#define UB_LOOKUP_SUB		0x01 /* Lookup subbeancounter */
 +#define UB_ALLOC		0x02 /* May allocate new one */
 +#define UB_ALLOC_ATOMIC		0x04 /* Allocate with GFP_ATOMIC */
 +
 +#define UB_HASH_SIZE		256
 +
 +#ifdef CONFIG_USER_RESOURCE
 +extern struct hlist_head ub_hash[];
 +extern spinlock_t ub_hash_lock;
 +
 +static inline void ub_adjust_held_minmax(struct user_beancounter *ub,
 +		int resource)
 +{
 +	if (ub->ub_parms[resource].maxheld < ub->ub_parms[resource].held)
 +		ub->ub_parms[resource].maxheld = ub->ub_parms[resource].held;
 +	if (ub->ub_parms[resource].minheld > ub->ub_parms[resource].held)
 +		ub->ub_parms[resource].minheld = ub->ub_parms[resource].held;
 +}
 +
 +void ub_print_resource_warning(struct user_beancounter *ub, int res,
 +		char *str, unsigned long val, unsigned long held);
 +void ub_print_uid(struct user_beancounter *ub, char *str, int size);
 +
 +int  __charge_beancounter_locked(struct user_beancounter *ub,
 +		int resource, unsigned long val, enum severity strict);
 +void charge_beancounter_notop(struct user_beancounter *ub,
 +		int resource, unsigned long val);
 +int  charge_beancounter(struct user_beancounter *ub,
 +		int resource, unsigned long val, enum severity strict);
 +
 +void __uncharge_beancounter_locked(struct user_beancounter *ub,
 +		int resource, unsigned long val);
 +void uncharge_beancounter_notop(struct user_beancounter *ub,
 +		int resource, unsigned long val);
 +void uncharge_beancounter(struct user_beancounter *ub,
 +		int resource, unsigned long val);
 +
 +struct user_beancounter *beancounter_findcreate(uid_t uid,
 +		struct user_beancounter *parent, int flags);
 +
 +static inline struct user_beancounter *get_beancounter(
 +		struct user_beancounter *ub)
 +{
 +	atomic_inc(&ub->ub_refcount);
 +	return ub;
 +}
 +
 +void __put_beancounter(struct user_beancounter *ub);
 +static inline void put_beancounter(struct user_beancounter *ub)
 +{
 +	__put_beancounter(ub);
 +}
 +
 +void ub_init_early(void);
 +void ub_init_late(void);
 +void ub_init_proc(void);
 +
 +extern struct user_beancounter ub0;
 +extern const char *ub_rnames[];
 +
 +#else /* CONFIG_USER_RESOURCE */
 +
 +#define beancounter_findcreate(id, p, f)		(NULL)
 +#define get_beancounter(ub)				(NULL)
 +#define put_beancounter(ub)				do { } while (0)
 +#define __charge_beancounter_locked(ub, r, v, s)	(0)
 +#define charge_beancounter(ub, r, v, s)			(0)
 +#define charge_beancounter_notop(ub, r, v)		do { } while (0)
 +#define __uncharge_beancounter_locked(ub, r, v)		do { } while (0)
 +#define uncharge_beancounter(ub, r, v)			do { } while (0)
 +#define uncharge_beancounter_notop(ub, r, v)		do { } while (0)
 +#define ub_init_early()					do { } while (0)
 +#define ub_init_late()					do { } while (0)
 +#define ub_init_proc()					do { } while (0)
 +
 +#endif /* CONFIG_USER_RESOURCE */
 +#endif /* __KERNEL__ */
 +
 +#endif /* _LINUX_BEANCOUNTER_H */
 --- ./init/main.c.ubcore	2006-08-10 14:55:47.000000000 +0400
 +++ ./init/main.c	2006-08-10 14:57:01.000000000 +0400
 @@ -52,6 +52,8 @@
 #include <linux/debug_locks.h>
 #include <linux/lockdep.h>
 
 +#include <ub/beancounter.h>
 +
 #include <asm/io.h>
 #include <asm/bugs.h>
 #include <asm/setup.h>
 @@ -470,6 +472,7 @@ asmlinkage void __init start_kernel(void
 early_boot_irqs_off();
 early_init_irq_lock_class();
 
 +	ub_init_early();
 /*
 * Interrupts are still disabled. Do necessary setups, then
 * enable them
 @@ -563,6 +566,7 @@ asmlinkage void __init start_kernel(void
 #endif
 fork_init(num_physpages);
 proc_caches_init();
 +	ub_init_late();
 buffer_init();
 unnamed_dev_init();
 key_init();
 --- ./kernel/Makefile.ubcore	2006-08-10 14:55:47.000000000 +0400
 +++ ./kernel/Makefile	2006-08-10 14:57:01.000000000 +0400
 @@ -12,6 +12,7 @@ obj-y     = sched.o fork.o exec_domain.o
 
 obj-$(CONFIG_STACKTRACE) += stacktrace.o
 obj-y += time/
 +obj-y += ub/
 obj-$(CONFIG_DEBUG_MUTEXES) += mutex-debug.o
 obj-$(CONFIG_LOCKDEP) += lockdep.o
 ifeq ($(CONFIG_PROC_FS),y)
 --- /dev/null	2006-07-18 14:52:43.075228448 +0400
 +++ ./kernel/ub/Makefile	2006-08-10 14:57:01.000000000 +0400
 @@ -0,0 +1,7 @@
 +#
 +# User resources part (UBC)
 +#
 +# Copyright (C) 2006 OpenVZ. SWsoft Inc
 +#
 +
 +obj-$(CONFIG_USER_RESOURCE) += beancounter.o
 --- /dev/null	2006-07-18 14:52:43.075228448 +0400
 +++ ./kernel/ub/beancounter.c	2006-08-10 15:09:34.000000000 +0400
 @@ -0,0 +1,398 @@
 +/*
 + *  kernel/ub/beancounter.c
 + *
 + *  Copyright (C) 2006 OpenVZ. SWsoft Inc
 + *  Original code by (C) 1998      Alan Cox
 + *                       1998-2000 Andrey Savochkin <saw@saw.sw.com.sg>
 + */
 +
 +#include <linux/slab.h>
 +#include <linux/module.h>
 +
 +#include <ub/beancounter.h>
 +
 +static kmem_cache_t *ub_cachep;
 +static struct user_beancounter default_beancounter;
 +static struct user_beancounter default_subbeancounter;
 +
 +static void init_beancounter_struct(struct user_beancounter *ub, uid_t id);
 +
 +struct user_beancounter ub0;
 +
 +const char *ub_rnames[] = {
 +};
 +
 +#define ub_hash_fun(x) ((((x) >> 8) ^ (x)) & (UB_HASH_SIZE - 1))
 +#define ub_subhash_fun(p, id) ub_hash_fun((p)->ub_uid + (id) * 17)
 +
 +struct hlist_head ub_hash[UB_HASH_SIZE];
 +spinlock_t ub_hash_lock;
 +
 +EXPORT_SYMBOL(ub_hash);
 +EXPORT_SYMBOL(ub_hash_lock);
 +
 +/*
 + *	Per user resource beancounting. Resources are tied to their luid.
 + *	The resource structure itself is tagged both to the process and
 + *	the charging resources (a socket doesn't want to have to search for
 + *	things at irq time for example). Reference counters keep things in
 + *	hand.
 + *
 + *	The case where a user creates resource, kills all his processes and
 + *	then starts new ones is correctly handled this way. The refcounters
 + *	will mean the old entry is still around with resource tied to it.
 + */
 +
 +struct user_beancounter *beancounter_findcreate(uid_t uid,
 +		struct user_beancounter *p, int mask)
 +{
 +	struct user_beancounter *new_ub, *ub, *tmpl_ub;
 +	unsigned long flags;
 +	struct hlist_head *slot;
 +	struct hlist_node *pos;
 +
 +	if (mask & UB_LOOKUP_SUB) {
 +		WARN_ON(p == NULL);
 +		tmpl_ub = &default_subbeancounter;
 +		slot = &ub_hash[ub_subhash_fun(p, uid)];
 +	} else {
 +		WARN_ON(p != NULL);
 +		tmpl_ub = &default_beancounter;
 +		slot = &ub_hash[ub_hash_fun(uid)];
 +	}
 +	new_ub = NULL;
 +
 +retry:
 +	spin_lock_irqsave(&ub_hash_lock, flags);
 +	hlist_for_each_entry (ub, pos, slot, hash)
 +		if (ub->ub_uid == uid && ub->parent == p)
 +			break;
 +
 +	if (pos != NULL) {
 +		get_beancounter(ub);
 +		spin_unlock_irqrestore(&ub_hash_lock, flags);
 +
 +		if (new_ub != NULL) {
 +			put_beancounter(new_ub->parent);
 +			kmem_cache_free(ub_cachep, new_ub);
 +		}
 +		return ub;
 +	}
 +
 +	if (!(mask & UB_ALLOC))
 +		goto out_unlock;
 +
 +	if (new_ub != NULL)
 +		goto out_install;
 +
 +	if (mask & UB_ALLOC_ATOMIC) {
 +		new_ub = kmem_cache_alloc(ub_cachep, GFP_ATOMIC);
 +		if (new_ub == NULL)
 +			goto out_unlock;
 +
 +		memcpy(new_ub, tmpl_ub, sizeof(*new_ub));
 +		init_beancounter_struct(new_ub, uid);
 +		if (p)
 +			new_ub->parent = get_beancounter(p);
 +		goto out_install;
 +	}
 +
 +	spin_unlock_irqrestore(&ub_hash_lock, flags);
 +
 +	new_ub = kmem_cache_alloc(ub_cachep, GFP_KERNEL);
 +	if (new_ub == NULL)
 +		goto out;
 +
 +	memcpy(new_ub, tmpl_ub, sizeof(*new_ub));
 +	init_beancounter_struct(new_ub, uid);
 +	if (p)
 +		new_ub->parent = get_beancounter(p);
 +	goto retry;
 +
 +out_install:
 +	hlist_add_head(&new_ub->hash, slot);
 +out_unlock:
 +	spin_unlock_irqrestore(&ub_hash_lock, flags);
 +out:
 +	return new_ub;
 +}
 +
 +EXPORT_SYMBOL(beancounter_findcreate);
 +
 +void ub_print_uid(struct user_beancounter *ub, char *str, int size)
 +{
 +	if (ub->parent != NULL)
 +		snprintf(str, size, "%u.%u", ub->parent->ub_uid, ub->ub_uid);
 +	else
 +		snprintf(str, size, "%u", ub->ub_uid);
 +}
 +
 +EXPORT_SYMBOL(ub_print_uid);
 +
 +void ub_print_resource_warning(struct user_beancounter *ub, int res,
 +		char *str, unsigned long val, unsigned long held)
 +{
 +	char uid[64];
 +
 +	ub_print_uid(ub, uid, sizeof(uid));
 +	printk(KERN_WARNING "UB %s %s warning: %s "
 +			"(held %lu, fails %lu, val %lu)\n",
 +			uid, ub_rnames[res], str,
 +			(res < UB_RESOURCES ? ub->ub_parms[res].held : held),
 +			(res < UB_RESOURCES ? ub->ub_parms[res].failcnt : 0),
 +			val);
 +}
 +
 +EXPORT_SYMBOL(ub_print_resource_warning);
 +
 +static inline void verify_held(struct user_beancounter *ub)
 +{
 +	int i;
 +
 +	for (i = 0; i < UB_RESOURCES; i++)
 +		if (ub->ub_parms[i].held != 0)
 +			ub_print_resource_warning(ub, i,
 +					"resource is held on put", 0, 0);
 +}
 +
 +void __put_beancounter(struct user_beancounter *ub)
 +{
 +	unsigned long flags;
 +	struct user_beancounter *parent;
 +
 +again:
 +	parent = ub->parent;
 +	/* equevalent to atomic_dec_and_lock_irqsave() */
 +	local_irq_save(flags);
 +	if (likely(!atomic_dec_and_lock(&ub->ub_refcount, &ub_hash_lock))) {
 +		if (unlikely(atomic_read(&ub->ub_refcount) < 0))
 +			printk(KERN_ERR "UB: Bad ub refcount: ub=%p, "
 +					"luid=%d, ref=%d\n",
 +					ub, ub->ub_uid,
 +					atomic_read(&ub->ub_refcount));
 +		local_irq_restore(flags);
 +		return;
 +	}
 +
 +	if (unlikely(ub == &ub0)) {
 +		printk(KERN_ERR "Trying to put ub0\n");
 +		spin_unlock_irqrestore(&ub_hash_lock, flags);
 +		return;
 +	}
 +
 +	verify_held(ub);
 +	hlist_del(&ub->hash);
 +	spin_unlock_irqrestore(&ub_hash_lock, flags);
 +
 +	kmem_cache_free(ub_cachep, ub);
 +
 +	ub = parent;
 +	if (ub != NULL)
 +		goto again;
 +}
 +
 +EXPORT_SYMBOL(__put_beancounter);
 +
 +/*
 + *	Generic resource charging stuff
 + */
 +
 +int __charge_beancounter_locked(struct user_beancounter *ub,
 +		int resource, unsigned long val, enum severity strict)
 +{
 +	/*
 +	 * ub_value <= UB_MAXVALUE, value <= UB_MAXVALUE, and only one addition
 +	 * at the moment is possible so an overflow is impossible.
 +	 */
 +	ub->ub_parms[resource].held += val;
 +
 +	switch (strict) {
 +		case UB_BARRIER:
 +			if (ub->ub_parms[resource].held >
 +					ub->ub_parms[resource].barrier)
 +				break;
 +			/* fallthrough */
 +		case UB_LIMIT:
 +			if (ub->ub_parms[resource].held >
 +					ub->ub_parms[resource].limit)
 +				break;
 +			/* fallthrough */
 +		case UB_FORCE:
 +			ub_adjust_held_minmax(ub, resource);
 +			return 0;
 +		default:
 +			BUG();
 +	}
 +
 +	ub->ub_parms[resource].failcnt++;
 +	ub->ub_parms[resource].held -= val;
 +	return -ENOMEM;
 +}
 +
 +int charge_beancounter(struct user_beancounter *ub,
 +		int resource, unsigned long val, enum severity strict)
 +{
 +	int retval;
 +	struct user_beancounter *p, *q;
 +	unsigned long flags;
 +
 +	retval = -EINVAL;
 +	BUG_ON(val > UB_MAXVALUE);
 +
 +	local_irq_save(flags);
 +	for (p = ub; p != NULL; p = p->parent) {
 +		spin_lock(&p->ub_lock);
 +		retval = __charge_beancounter_locked(p, resource, val, strict);
 +		spin_unlock(&p->ub_lock);
 +		if (retval)
 +			goto unroll;
 +	}
 +out_restore:
 +	local_irq_restore(flags);
 +	return retval;
 +
 +unroll:
 +	for (q = ub; q != p; q = q->parent) {
 +		spin_lock(&q->ub_lock);
 +		__uncharge_beancounter_locked(q, resource, val);
 +		spin_unlock(&q->ub_lock);
 +	}
 +	goto out_restore;
 +}
 +
 +EXPORT_SYMBOL(charge_beancounter);
 +
 +void charge_beancounter_notop(struct user_beancounter *ub,
 +		int resource, unsigned long val)
 +{
 +	struct user_beancounter *p;
 +	unsigned long flags;
 +
 +	local_irq_save(flags);
 +	for (p = ub; p->parent != NULL; p = p->parent) {
 +		spin_lock(&p->ub_lock);
 +		__charge_beancounter_locked(p, resource, val, UB_FORCE);
 +		spin_unlock(&p->ub_lock);
 +	}
 +	local_irq_restore(flags);
 +}
 +
 +EXPORT_SYMBOL(charge_beancounter_notop);
 +
 +void __uncharge_beancounter_locked(struct user_beancounter *ub,
 +		int resource, unsigned long val)
 +{
 +	if (unlikely(ub->ub_parms[resource].held < val)) {
 +		ub_print_resource_warning(ub, resource,
 +				"uncharging too much", val, 0);
 +		val = ub->ub_parms[resource].held;
 +	}
 +	ub->ub_parms[resource].held -= val;
 +	ub_adjust_held_minmax(ub, resource);
 +}
 +
 +void uncharge_beancounter(struct user_beancounter *ub,
 +		int resource, unsigned long val)
 +{
 +	unsigned long flags;
 +	struct user_beancounter *p;
 +
 +	for (p = ub; p != NULL; p = p->parent) {
 +		spin_lock_irqsave(&p->ub_lock, flags);
 +		__uncharge_beancounter_locked(p, resource, val);
 +		spin_unlock_irqrestore(&p->ub_lock, flags);
 +	}
 +}
 +
 +EXPORT_SYMBOL(uncharge_beancounter);
 +
 +void uncharge_beancounter_notop(struct user_beancounter *ub,
 +		int resource, unsigned long val)
 +{
 +	struct user_beancounter *p;
 +	unsigned long flags;
 +
 +	local_irq_save(flags);
 +	for (p = ub; p->parent != NULL; p = p->parent) {
 +		spin_lock(&p->ub_lock);
 +		__uncharge_beancounter_locked(p, resource, val);
 +		spin_unlock(&p->ub_lock);
 +	}
 +	local_irq_restore(flags);
 +}
 +
 +EXPORT_SYMBOL(uncharge_beancounter_notop);
 +
 +/*
 + *	Initialization
 + *
 + *	struct user_beancounter contains
 + *	 - limits and other configuration settings
 + *	 - structural fields: lists, spinlocks and so on.
 + *
 + *	Before these parts are initialized, the structure should be memset
 + *	to 0 or copied from a known clean structure.  That takes care of a lot
 + *	of fields not initialized explicitly.
 + */
 +
 +static void init_beancounter_struct(struct user_beancounter *ub, uid_t id)
 +{
 +	atomic_set(&ub->ub_refcount, 1);
 +	spin_lock_init(&ub->ub_lock);
 +	ub->ub_uid = id;
 +}
 +
 +static void init_beancounter_nolimits(struct user_beancounter *ub)
 +{
 +	int k;
 +
 +	for (k = 0; k < UB_RESOURCES; k++) {
 +		ub->ub_parms[k].limit = UB_MAXVALUE;
 +		ub->ub_parms[k].barrier = UB_MAXVALUE;
 +	}
 +}
 +
 +static void init_beancounter_syslimits(struct user_beancounter *ub)
 +{
 +	int k;
 +
 +	for (k = 0; k < UB_RESOURCES; k++)
 +		ub->ub_parms[k].barrier = ub->ub_parms[k].limit;
 +}
 +
 +void __init ub_init_early(void)
 +{
 +	struct user_beancounter *ub;
 +	struct hlist_head *slot;
 +
 +	ub = &ub0;
 +
 +	memset(ub, 0, sizeof(*ub));
 +	init_beancounter_nolimits(ub);
 +	init_beancounter_struct(ub, 0);
 +
 +	spin_lock_init(&ub_hash_lock);
 +	slot = &ub_hash[ub_hash_fun(ub->ub_uid)];
 +	hlist_add_head(&ub->hash, slot);
 +}
 +
 +void __init ub_init_late(void)
 +{
 +	struct user_beancounter *ub;
 +
 +	ub_cachep = kmem_cache_create("user_beancounters",
 +			sizeof(struct user_beancounter),
 +			0, SLAB_HWCACHE_ALIGN, NULL, NULL);
 +	if (ub_cachep == NULL)
 +		panic("Can't create ubc caches\n");
 +
 +	ub = &default_beancounter;
 +	memset(ub, 0, sizeof(default_beancounter));
 +	init_beancounter_syslimits(ub);
 +	init_beancounter_struct(ub, 0);
 +
 +	ub = &default_subbeancounter;
 +	memset(ub, 0, sizeof(default_subbeancounter));
 +	init_beancounter_nolimits(ub);
 +	init_beancounter_struct(ub, 0);
 +}
 |  
	|  |  |  
	| 
		
			| [RFC][PATCH 3/7] UBC: ub context and inheritance [message #5198 is a reply to message #5192] | Wed, 16 August 2006 15:36   |  
			| 
				
				
					|  dev Messages: 1693
 Registered: September 2005
 Location: Moscow
 | Senior Member |  
 |  |  
	| Contains code responsible for setting UB on task, it's inheriting and setting host context in interrupts.
 
 Task references three beancounters:
 1. exec_ub  current context. all resources are
 charged to this beancounter.
 2. task_ub  beancounter to which task_struct is
 charged itself.
 3. fork_sub beancounter which is inherited by
 task's children on fork
 
 Signed-Off-By: Pavel Emelianov <xemul@sw.ru>
 Signed-Off-By: Kirill Korotaev <dev@sw.ru>
 
 ---
 include/linux/sched.h   |    5 +++++
 include/ub/task.h       |   42 ++++++++++++++++++++++++++++++++++++++++++
 kernel/fork.c           |   21 ++++++++++++++++-----
 kernel/irq/handle.c     |    9 +++++++++
 kernel/softirq.c        |    8 ++++++++
 kernel/ub/Makefile      |    1 +
 kernel/ub/beancounter.c |    4 ++++
 kernel/ub/misc.c        |   34 ++++++++++++++++++++++++++++++++++
 8 files changed, 119 insertions(+), 5 deletions(-)
 
 --- ./include/linux/sched.h.ubfork	2006-07-17 17:01:12.000000000 +0400
 +++ ./include/linux/sched.h	2006-07-31 16:01:54.000000000 +0400
 @@ -81,6 +81,8 @@ struct sched_param {
 #include <linux/timer.h>
 #include <linux/hrtimer.h>
 
 +#include <ub/task.h>
 +
 #include <asm/processor.h>
 
 struct exec_domain;
 @@ -997,6 +999,9 @@ struct task_struct {
 spinlock_t delays_lock;
 struct task_delay_info *delays;
 #endif
 +#ifdef CONFIG_USER_RESOURCE
 +	struct task_beancounter	task_bc;
 +#endif
 };
 
 static inline pid_t process_group(struct task_struct *tsk)
 --- ./include/ub/task.h.ubfork	2006-07-28 18:53:52.000000000 +0400
 +++ ./include/ub/task.h	2006-08-01 15:26:08.000000000 +0400
 @@ -0,0 +1,42 @@
 +/*
 + *  include/ub/task.h
 + *
 + *  Copyright (C) 2006 OpenVZ. SWsoft Inc
 + *
 + */
 +
 +#ifndef __UB_TASK_H_
 +#define __UB_TASK_H_
 +
 +#include <linux/config.h>
 +
 +struct user_beancounter;
 +
 +struct task_beancounter {
 +	struct user_beancounter *exec_ub;
 +	struct user_beancounter *task_ub;
 +	struct user_beancounter *fork_sub;
 +};
 +
 +#ifdef CONFIG_USER_RESOURCE
 +#define get_exec_ub()		(current->task_bc.exec_ub)
 +#define set_exec_ub(newub)			\
 +	({					\
 +		 struct user_beancounter *old;	\
 +		 struct task_beancounter *tbc;	\
 +		 tbc = ¤t->task_bc;	\
 +		 old = tbc->exec_ub;		\
 +		 tbc->exec_ub = newub;		\
 +		 old;				\
 +	 })
 +
 +int ub_task_charge(struct task_struct *parent, struct task_struct *new);
 +void ub_task_uncharge(struct task_struct *tsk);
 +
 +#else /* CONFIG_USER_RESOURCE */
 +#define get_exec_ub()		(NULL)
 +#define set_exec_ub(__ub)	(NULL)
 +#define ub_task_charge(p, t)	(0)
 +#define ub_task_uncharge(t)	do { } while (0)
 +#endif /* CONFIG_USER_RESOURCE */
 +#endif /* __UB_TASK_H_ */
 --- ./kernel/irq/handle.c.ubirq	2006-07-10 12:39:20.000000000 +0400
 +++ ./kernel/irq/handle.c	2006-08-01 12:39:34.000000000 +0400
 @@ -16,6 +16,9 @@
 #include <linux/interrupt.h>
 #include <linux/kernel_stat.h>
 
 +#include <ub/beancounter.h>
 +#include <ub/task.h>
 +
 #include "internals.h"
 
 /**
 @@ -166,6 +169,9 @@ fastcall unsigned int __do_IRQ(unsigned
 struct irq_desc *desc = irq_desc + irq;
 struct irqaction *action;
 unsigned int status;
 +	struct user_beancounter *ub;
 +
 +	ub = set_exec_ub(&ub0);
 
 kstat_this_cpu.irqs[irq]++;
 if (CHECK_IRQ_PER_CPU(desc->status)) {
 @@ -178,6 +184,8 @@ fastcall unsigned int __do_IRQ(unsigned
 desc->chip->ack(irq);
 action_ret = handle_IRQ_event(irq, regs, desc->action);
 desc->chip->end(irq);
 +
 +		(void) set_exec_ub(ub);
 return 1;
 }
 
 @@ -246,6 +254,7 @@ out:
 desc->chip->end(irq);
 spin_unlock(&desc->lock);
 
 +	(void) set_exec_ub(ub);
 return 1;
 }
 
 --- ./kernel/softirq.c.ubirq	2006-07-17 17:01:12.000000000 +0400
 +++ ./kernel/softirq.c	2006-08-01 12:40:44.000000000 +0400
 @@ -18,6 +18,9 @@
 #include <linux/rcupdate.h>
 #include <linux/smp.h>
 
 +#include <ub/beancounter.h>
 +#include <ub/task.h>
 +
 #include <asm/irq.h>
 /*
 - No shared variables, all the data are CPU local.
 @@ -191,6 +194,9 @@ asmlinkage void __do_softirq(void)
 __u32 pending;
 int max_restart = MAX_SOFTIRQ_RESTART;
 int cpu;
 +	struct user_beancounter *ub;
 +
 +	ub = set_exec_ub(&ub0);
 
 pending = local_softirq_pending();
 account_system_vtime(current);
 @@ -229,6 +235,8 @@ restart:
 
 account_system_vtime(current);
 _local_bh_enable();
 +
 +	(void) set_exec_ub(ub);
 }
 
 #ifndef __ARCH_HAS_DO_SOFTIRQ
 --- ./kernel/fork.c.ubfork	2006-07-17 17:01:12.000000000 +0400
 +++ ./kernel/fork.c	2006-08-01 12:58:36.000000000 +0400
 @@ -46,6 +46,8 @@
 #include <linux/delayacct.h>
 #include <linux/taskstats_kern.h>
 
 +#include <ub/task.h>
 +
 #include <asm/pgtable.h>
 #include <asm/pgalloc.h>
 #include <asm/uaccess.h>
 @@ -102,6 +104,7 @@ static kmem_cache_t *mm_cachep;
 
 void free_task(struct task_struct *tsk)
 {
 +	ub_task_uncharge(tsk);
 free_thread_info(tsk->thread_info);
 rt_mutex_debug_task_free(tsk);
 free_task_struct(tsk);
 @@ -162,18 +165,19 @@ static struct task_struct *dup_task_stru
 
 tsk = alloc_task_struct();
 if (!tsk)
 -		return NULL;
 +		goto out;
 
 ti = alloc_thread_info(tsk);
 -	if (!ti) {
 -		free_task_struct(tsk);
 -		return NULL;
 -	}
 +	if (!ti)
 +		goto out_tsk;
 
 *tsk = *orig;
 tsk->thread_info = ti;
 setup_thread_stack(tsk, orig);
 
 +	if (ub_task_charge(orig, tsk))
 +		goto out_ti;
 +
 /* One for us, one for whoever does the "release_task()" (usually parent) */
 atomic_set(&tsk->usage,2);
 atomic_set(&tsk->fs_excl, 0);
 @@ -180,6 +184,13 @@ static struct task_struct *dup_task_stru
 #endif
 tsk->splice_pipe = NULL;
 return tsk;
 +
 +out_ti:
 +	free_thread_info(ti);
 +out_tsk:
 +	free_task_struct(tsk);
 +out:
 +	return NULL;
 }
 
 #ifdef CONFIG_MMU
 --- ./kernel/ub/Makefile.ubcore	2006-08-03 16:24:56.000000000 +0400
 +++ ./kernel/ub/Makefile	2006-08-01 11:08:39.000000000 +0400
 @@ -5,3 +5,4 @@
 #
 
 obj-$(CONFIG_USER_RESOURCE) += beancounter.o
 +obj-$(CONFIG_USER_RESOURCE) += misc.o
 --- ./kernel/ub/beancounter.c.ubcore	2006-07-28 13:07:44.000000000 +0400
 +++ ./kernel/ub/beancounter.c	2006-08-03 16:14:17.000000000 +0400
 @@ -395,6 +395,10 @@
 spin_lock_init(&ub_hash_lock);
 slot = &ub_hash[ub_hash_fun(ub->ub_uid)];
 hlist_add_head(&ub->hash, slot);
 +
 +	current->task_bc.exec_ub = ub;
 +	current->task_bc.task_ub = get_beancounter(ub);
 +	current->task_bc.fork_sub = get_beancounter(ub);
 }
 
 void __init ub_init_late(void)
 --- ./kernel/ub/misc.c.ubfork	2006-07-31 16:23:44.000000000 +0400
 +++ ./kernel/ub/misc.c	2006-07-31 16:28:47.000000000 +0400
 @@ -0,0 +1,34 @@
 +/*
 + * kernel/ub/misc.c
 + *
 + * Copyright (C) 2006 OpenVZ. SWsoft Inc.
 + *
 + */
 +
 +#include <linux/sched.h>
 +
 +#include <ub/beancounter.h>
 +#include <ub/task.h>
 +
 +int ub_task_charge(struct task_struct *parent, struct task_struct *new)
 +{
 +	struct task_beancounter *old_bc;
 +	struct task_beancounter *new_bc;
 +	struct user_beancounter *ub;
 +
 +	old_bc = &parent->task_bc;
 +	new_bc = &new->task_bc;
 +
 +	ub = old_bc->fork_sub;
 +	new_bc->exec_ub = get_beancounter(ub);
 +	new_bc->task_ub = get_beancounter(ub);
 +	new_bc->fork_sub = get_beancounter(ub);
 +	return 0;
 +}
 +
 +void ub_task_uncharge(struct task_struct *tsk)
 +{
 +	put_beancounter(tsk->task_bc.exec_ub);
 +	put_beancounter(tsk->task_bc.task_ub);
 +	put_beancounter(tsk->task_bc.fork_sub);
 +}
 |  
	|  |  |  
	| 
		
			| [RFC][PATCH 4/7] UBC: syscalls (user interface) [message #5199 is a reply to message #5192] | Wed, 16 August 2006 15:37   |  
			| 
				
				
					|  dev Messages: 1693
 Registered: September 2005
 Location: Moscow
 | Senior Member |  
 |  |  
	| Add the following system calls for UB management: 1. sys_getluid    - get current UB id
 2. sys_setluid    - changes exec_ and fork_ UBs on current
 3. sys_setublimit - set limits for resources consumtions
 
 Signed-Off-By: Pavel Emelianov <xemul@sw.ru>
 Signed-Off-By: Kirill Korotaev <dev@sw.ru>
 
 ---
 arch/i386/kernel/syscall_table.S |    3
 arch/ia64/kernel/entry.S         |    3
 arch/sparc/kernel/systbls.S      |    2
 arch/sparc64/kernel/systbls.S    |    2
 include/asm-i386/unistd.h        |    5 +
 include/asm-ia64/unistd.h        |    5 +
 include/asm-powerpc/systbl.h     |    3
 include/asm-powerpc/unistd.h     |    5 +
 include/asm-sparc/unistd.h       |    3
 include/asm-sparc64/unistd.h     |    3
 include/asm-x86_64/unistd.h      |    8 ++
 kernel/ub/Makefile               |    1
 kernel/ub/sys.c                  |  126 +++++++++++++++++++++++++++++++++++++++
 13 files changed, 163 insertions(+), 6 deletions(-)
 
 --- ./arch/i386/kernel/syscall_table.S.ubsys	2006-07-10 12:39:10.000000000 +0400
 +++ ./arch/i386/kernel/syscall_table.S	2006-07-31 14:36:59.000000000 +0400
 @@ -317,3 +317,6 @@ ENTRY(sys_call_table)
 .long sys_vmsplice
 .long sys_move_pages
 .long sys_getcpu
 +	.long sys_getluid
 +	.long sys_setluid
 +	.long sys_setublimit		/* 320 */
 --- ./arch/ia64/kernel/entry.S.ubsys	2006-07-10 12:39:10.000000000 +0400
 +++ ./arch/ia64/kernel/entry.S	2006-07-31 15:25:36.000000000 +0400
 @@ -1610,5 +1610,8 @@ sys_call_table:
 data8 sys_sync_file_range		// 1300
 data8 sys_tee
 data8 sys_vmsplice
 +	daat8 sys_getluid
 +	data8 sys_setluid
 +	data8 sys_setublimit			// 1305
 
 .org sys_call_table + 8*NR_syscalls	// guard against failures to increase NR_syscalls
 --- ./arch/sparc/kernel/systbls.S.arsys	2006-07-10 12:39:10.000000000 +0400
 +++ ./arch/sparc/kernel/systbls.S	2006-08-10 17:07:15.000000000 +0400
 @@ -78,7 +78,7 @@ sys_call_table:
 /*285*/	.long sys_mkdirat, sys_mknodat, sys_fchownat, sys_futimesat, sys_fstatat64
 /*290*/	.long sys_unlinkat, sys_renameat, sys_linkat, sys_symlinkat, sys_readlinkat
 /*295*/	.long sys_fchmodat, sys_faccessat, sys_pselect6, sys_ppoll, sys_unshare
 -/*300*/	.long sys_set_robust_list, sys_get_robust_list
 +/*300*/	.long sys_set_robust_list, sys_get_robust_list, sys_getluid, sys_setluid, sys_setublimit
 
 #ifdef CONFIG_SUNOS_EMUL
 /* Now the SunOS syscall table. */
 --- ./arch/sparc64/kernel/systbls.S.arsys	2006-07-10 12:39:11.000000000 +0400
 +++ ./arch/sparc64/kernel/systbls.S	2006-08-10 17:08:52.000000000 +0400
 @@ -79,7 +79,7 @@ sys_call_table32:
 .word sys_mkdirat, sys_mknodat, sys_fchownat, compat_sys_futimesat, compat_sys_fstatat64
 /*290*/	.word sys_unlinkat, sys_renameat, sys_linkat, sys_symlinkat, sys_readlinkat
 .word sys_fchmodat, sys_faccessat, compat_sys_pselect6, compat_sys_ppoll, sys_unshare
 -/*300*/	.word compat_sys_set_robust_list, compat_sys_get_robust_list
 +/*300*/	.word compat_sys_set_robust_list, compat_sys_get_robust_list, sys_getluid, sys_setluid, sys_setublimit
 
 #endif /* CONFIG_COMPAT */
 
 --- ./include/asm-i386/unistd.h.ubsys	2006-07-10 12:39:19.000000000 +0400
 +++ ./include/asm-i386/unistd.h	2006-07-31 15:56:31.000000000 +0400
 @@ -323,10 +323,13 @@
 #define __NR_vmsplice		316
 #define __NR_move_pages		317
 #define __NR_getcpu		318
 +#define __NR_getluid		319
 +#define __NR_setluid		320
 +#define __NR_setublimit		321
 
 #ifdef __KERNEL__
 
 -#define NR_syscalls 318
 +#define NR_syscalls 322
 #include <linux/err.h>
 
 /*
 --- ./include/asm-ia64/unistd.h.ubsys	2006-07-10 12:39:19.000000000 +0400
 +++ ./include/asm-ia64/unistd.h	2006-07-31 15:57:23.000000000 +0400
 @@ -291,11 +291,14 @@
 #define __NR_sync_file_range		1300
 #define __NR_tee			1301
 #define __NR_vmsplice			1302
 +#define __NR_getluid			1303
 +#define __NR_setluid			1304
 +#define __NR_setublimit			1305
 
 #ifdef __KERNEL__
 
 
 -#define NR_syscalls			279 /* length of syscall table */
 +#define NR_syscalls			282 /* length of syscall table */
 
 #define __ARCH_WANT_SYS_RT_SIGACTION
 
 --- ./include/asm-powerpc/systbl.h.arsys	2006-07-10 12:39:19.000000000 +0400
 +++ ./include/asm-powerpc/systbl.h	2006-08-10 17:05:53.000000000 +0400
 @@ -304,3 +304,6 @@ SYSCALL_SPU(fchmodat)
 SYSCALL_SPU(faccessat)
 COMPAT_SYS_SPU(get_robust_list)
 COMPAT_SYS_SPU(set_robust_list)
 +SYSCALL(sys_getluid)
 +SYSCALL(sys_setluid)
 +SYSCALL(sys_setublimit)
 --- ./include/asm-powerpc/unistd.h.arsys	2006-07-10 12:39:19.000000000 +0400
 +++ ./include/asm-powerpc/unistd.h	2006-08-10 17:06:28.000000000 +0400
 @@ -323,10 +323,13 @@
 #define __NR_faccessat		298
 #define __NR_get_robust_list	299
 #define __NR_set_robust_list	300
 +#define __NR_getluid		301
 +#define __NR_setluid		302
 +#define __NR_setublimit		303
 
 #ifdef __KERNEL__
 
 -#define __NR_syscalls		301
 +#define __NR_syscalls		304
 
 #define __NR__exit __NR_exit
 #define NR_syscalls	__NR_syscalls
 --- ./include/asm-sparc/unistd.h.arsys	2006-07-10 12:39:19.000000000 +0400
 +++ ./include/asm-sparc/unistd.h	2006-08-10 17:08:19.000000000 +0400
 @@ -318,6 +318,9 @@
 #define __NR_unshare		299
 #define __NR_set_robust_list	300
 #define __NR_get_robust_list	301
 +#define __NR_getluid		302
 +#define __NR_setluid		303
 +#define __NR_setublimit		304
 
 #ifdef __KERNEL__
 /* WARNING: You MAY NOT add syscall numbers larger than 301, since
 --- ./include/asm-sparc64/unistd.h.arsys	2006-07-10 12:39:19.000000000 +0400
 +++ ./include/asm-sparc64/unistd.h	2006-08-10 17:09:24.000000000 +0400
 @@ -320,6 +320,9 @@
 #define __NR_unshare		299
 #define __NR_set_robust_list	300
 #define __NR_get_robust_list	301
 +#define __NR_getluid		302
 +#define __NR_setluid		303
 +#define __NR_setublimit		304
 
 #ifdef __KERNEL__
 /* WARNING: You MAY NOT add syscall numbers larger than 301, since
 --- ./include/asm-x86_64/unistd.h.ubsys	2006-07-10 12:39:19.000000000 +0400
 +++ ./include/asm-x86_64/unistd.h	2006-07-31 16:00:01.000000000 +0400
 @@ -619,10 +619,16 @@ __SYSCALL(__NR_sync_file_range, sys_sync
 __SYSCALL(__NR_vmsplice, sys_vmsplice)
 #define __NR_move_pages		279
 __SYSCALL(__NR_move_pages, sys_move_pages)
 +#define __NR_getluid		280
 +__SYSCALL(__NR_getluid, sys_getluid)
 +#define __NR_setluid		281
 +__SYSCALL(__NR_setluid, sys_setluid)
 +#define __NR_setublimit		282
 +__SYSCALL(__NR_setublimit, sys_setublimit)
 
 #ifdef __KERNEL__
 
 -#define __NR_syscall_max __NR_move_pages
 +#define __NR_syscall_max __NR_setublimit
 #include <linux/err.h>
 
 #ifndef __NO_STUBS
 --- ./kernel/ub/Makefile.ubsys	2006-07-28 14:08:37.000000000 +0400
 +++ ./kernel/ub/Makefile	2006-08-01 11:08:39.000000000 +0400
 @@ -6,3 +6,4 @@
 
 obj-$(CONFIG_USER_RESOURCE) += beancounter.o
 obj-$(CONFIG_USER_RESOURCE) += misc.o
 +obj-y += sys.o
 --- ./kernel/ub/sys.c.ubsys	2006-07-28 18:52:18.000000000 +0400
 +++ ./kernel/ub/sys.c	2006-08-03 16:14:23.000000000 +0400
 @@ -0,0 +1,126 @@
 +/*
 + *  kernel/ub/sys.c
 + *
 + *  Copyright (C) 2006 OpenVZ. SWsoft Inc
 + *
 + */
 +
 +#include <linux/config.h>
 +#include <linux/sched.h>
 +#include <asm/uaccess.h>
 +
 +#include <ub/beancounter.h>
 +#include <ub/task.h>
 +
 +#ifndef CONFIG_USER_RESOURCE
 +asmlinkage long sys_getluid(void)
 +{
 +	return -ENOSYS;
 +}
 +
 +asmlinkage long sys_setluid(uid_t uid)
 +{
 +	return -ENOSYS;
 +}
 +
 +asmlinkage long sys_setublimit(uid_t uid, unsigned long resource,
 +		unsigned long *limits)
 +{
 +	return -ENOSYS;
 +}
 +#else /* CONFIG_USER_RESOURCE */
 +
 +/*
 + *	The (rather boring) getluid syscall
 + */
 +asmlinkage long sys_getluid(void)
 +{
 +	struct user_beancounter *ub;
 +
 +	ub = get_exec_ub();
 +	if (ub == NULL)
 +		return -EINVAL;
 +
 +	return ub->ub_uid;
 +}
 +
 +/*
 + *	The setluid syscall
 + */
 +asmlinkage long sys_setluid(uid_t uid)
 +{
 +	int error;
 +	struct user_beancounter *ub;
 +	struct task_beancounter *task_bc;
 +
 +	task_bc = ¤t->task_bc;
 +
 +	/* You may not disown a setluid */
 +	error = -EINVAL;
 +	if (uid == (uid_t)-1)
 +		goto out;
 +
 +	/* You may only set an ub as root */
 +	error = -EPERM;
 +	if (!capable(CAP_SETUID))
 +		goto out;
 +
 +	/* Ok - set up a beancounter entry for this user */
 +	error = -ENOBUFS;
 +	ub = beancounter_findcreate(uid, NULL, UB_ALLOC);
 +	if (ub == NULL)
 +		goto out;
 +
 +	/* install bc */
 +	put_beancounter(task_bc->exec_ub);
 +	task_bc->exec_ub = ub;
 +	put_beancounter(task_bc->fork_sub);
 +	task_bc->fork_sub = get_beancounter(ub);
 +	error = 0;
 +out:
 +	return error;
 +}
 +
 +/*
 + *	The setbeanlimit syscall
 + */
 +asmlinkage long sys_setublimit(uid_t uid, unsigned long resource,
 +		unsigned long *limits)
 +{
 +	int error;
 +	unsigned long flags;
 +	struct user_beancounter *ub;
 +	unsigned long new_limits[2];
 +
 +	error = -EPERM;
 +	if(!capable(CAP_SYS_RESOURCE))
 +		goto out;
 +
 +	error = -EINVAL;
 +	if (resource >= UB_RESOURCES)
 +		goto out;
 +
 +	error = -EFAULT;
 +	if (copy_from_user(&new_limits, limits, sizeof(new_limits)))
 +		goto out;
 +
 +	error = -EINVAL;
 +	if (new_limits[0] > UB_MAXVALUE || new_limits[1] > UB_MAXVALUE)
 +		goto out;
 +
 +	error = -ENOENT;
 +	ub = beancounter_findcreate(uid, NULL, 0);
 +	if (ub == NULL)
 +		goto out;
 +
 +	spin_lock_irqsave(&ub->ub_lock, flags);
 +	ub->ub_parms[resource].barrier = new_limits[0];
 +	ub->ub_parms[resource].limit = new_limits[1];
 +	spin_unlock_irqrestore(&ub->ub_lock, flags);
 +
 +	put_beancounter(ub);
 +	error = 0;
 +out:
 +	return error;
 +}
 +#endif
...
 
 
 |  
	|  |  |  
	| 
		
			| [RFC][PATCH 5/7] UBC: kernel memory accounting (core) [message #5200 is a reply to message #5192] | Wed, 16 August 2006 15:39   |  
			| 
				
				
					|  dev Messages: 1693
 Registered: September 2005
 Location: Moscow
 | Senior Member |  
 |  |  
	| Introduce UB_KMEMSIZE resource which accounts kernel objects allocated by task's request.
 
 Reference to UB is kept on struct page or slab object.
 For slabs each struct slab contains a set of pointers
 corresponding objects are charged to.
 
 Allocation charge rules:
 1. Pages - if allocation is performed with __GFP_UBC flag - page
 is charged to current's exec_ub.
 2. Slabs - kmem_cache may be created with SLAB_UBC flag - in this
 case each allocation is charged. Caches used by kmalloc are
 created with SLAB_UBC | SLAB_UBC_NOCHARGE flags. In this case
 only __GFP_UBC allocations are charged.
 
 Signed-Off-By: Pavel Emelianov <xemul@sw.ru>
 Signed-Off-By: Kirill Korotaev <dev@sw.ru>
 
 ---
 include/linux/gfp.h      |    8 ++-
 include/linux/mm.h       |    6 ++
 include/linux/slab.h     |    4 +
 include/linux/vmalloc.h  |    1
 include/ub/beancounter.h |    4 +
 include/ub/kmem.h        |   33 ++++++++++++
 kernel/ub/Makefile       |    1
 kernel/ub/beancounter.c  |    3 +
 kernel/ub/kmem.c         |   89 ++++++++++++++++++++++++++++++++++
 mm/mempool.c             |    2
 mm/page_alloc.c          |   11 ++++
 mm/slab.c                |  121 ++++++++++++++++++++++++++++++++++++++---------
 mm/vmalloc.c             |    6 ++
 13 files changed, 264 insertions(+), 25 deletions(-)
 
 --- ./include/linux/gfp.h.kmemcore	2006-08-16 19:10:38.000000000 +0400
 +++ ./include/linux/gfp.h	2006-08-16 19:12:56.000000000 +0400
 @@ -46,15 +46,18 @@ struct vm_area_struct;
 #define __GFP_NOMEMALLOC ((__force gfp_t)0x10000u) /* Don't use emergency reserves */
 #define __GFP_HARDWALL   ((__force gfp_t)0x20000u) /* Enforce hardwall cpuset memory allocs */
 #define __GFP_THISNODE	((__force gfp_t)0x40000u)/* No fallback, no policies */
 +#define __GFP_UBC	((__force gfp_t)0x80000u) /* Charge allocation with UB */
 +#define __GFP_UBC_LIMIT ((__force gfp_t)0x100000u) /* Charge against UB limit */
 
 -#define __GFP_BITS_SHIFT 20	/* Room for 20 __GFP_FOO bits */
 +#define __GFP_BITS_SHIFT 21	/* Room for 20 __GFP_FOO bits */
 #define __GFP_BITS_MASK ((__force gfp_t)((1 << __GFP_BITS_SHIFT) - 1))
 
 /* if you forget to add the bitmask here kernel will crash, period */
 #define GFP_LEVEL_MASK (__GFP_WAIT|__GFP_HIGH|__GFP_IO|__GFP_FS| \
 __GFP_COLD|__GFP_NOWARN|__GFP_REPEAT| \
 __GFP_NOFAIL|__GFP_NORETRY|__GFP_NO_GROW|__GFP_COMP| \
 -			__GFP_NOMEMALLOC|__GFP_HARDWALL|__GFP_THISNODE)
 +			__GFP_NOMEMALLOC|__GFP_HARDWALL|__GFP_THISNODE| \
 +			__GFP_UBC|__GFP_UBC_LIMIT)
 
 /* This equals 0, but use constants in case they ever change */
 #define GFP_NOWAIT	(GFP_ATOMIC & ~__GFP_HIGH)
 @@ -63,6 +66,7 @@ struct vm_area_struct;
 #define GFP_NOIO	(__GFP_WAIT)
 #define GFP_NOFS	(__GFP_WAIT | __GFP_IO)
 #define GFP_KERNEL	(__GFP_WAIT | __GFP_IO | __GFP_FS)
 +#define GFP_KERNEL_UBC	(__GFP_WAIT | __GFP_IO | __GFP_FS | __GFP_UBC)
 #define GFP_USER	(__GFP_WAIT | __GFP_IO | __GFP_FS | __GFP_HARDWALL)
 #define GFP_HIGHUSER	(__GFP_WAIT | __GFP_IO | __GFP_FS | __GFP_HARDWALL | \
 __GFP_HIGHMEM)
 --- ./include/linux/mm.h.kmemcore	2006-08-16 19:10:38.000000000 +0400
 +++ ./include/linux/mm.h	2006-08-16 19:10:51.000000000 +0400
 @@ -274,8 +274,14 @@ struct page {
 unsigned int gfp_mask;
 unsigned long trace[8];
 #endif
 +#ifdef CONFIG_USER_RESOURCE
 +	union {
 +		struct user_beancounter	*page_ub;
 +	} bc;
 +#endif
 };
 
 +#define page_ub(page)			((page)->bc.page_ub)
 #define page_private(page)		((page)->private)
 #define set_page_private(page, v)	((page)->private = (v))
 
 --- ./include/linux/slab.h.kmemcore	2006-08-16 19:10:38.000000000 +0400
 +++ ./include/linux/slab.h	2006-08-16 19:10:51.000000000 +0400
 @@ -46,6 +46,8 @@ typedef struct kmem_cache kmem_cache_t;
 #define SLAB_PANIC		0x00040000UL	/* panic if kmem_cache_create() fails */
 #define SLAB_DESTROY_BY_RCU	0x00080000UL	/* defer freeing pages to RCU */
 #define SLAB_MEM_SPREAD		0x00100000UL	/* Spread some memory over cpuset */
 +#define SLAB_UBC		0x00200000UL	/* Account with UB */
 +#define SLAB_UBC_NOCHARGE	0x00400000UL	/* Explicit accounting */
 
 /* flags passed to a constructor func */
 #define	SLAB_CTOR_CONSTRUCTOR	0x001UL		/* if not set, then deconstructor */
 @@ -293,6 +295,8 @@ extern kmem_cache_t	*bio_cachep;
 
 extern atomic_t slab_reclaim_pages;
 
 +struct user_beancounter;
 +struct user_beancounter **kmem_cache_ubp(kmem_cache_t *cachep, void *obj);
 #endif	/* __KERNEL__ */
 
 #endif	/* _LINUX_SLAB_H */
 --- ./include/linux/vmalloc.h.kmemcore	2006-08-16 19:10:38.000000000 +0400
 +++ ./include/linux/vmalloc.h	2006-08-16 19:10:51.000000000 +0400
 @@ -36,6 +36,7 @@ struct vm_struct {
 *	Highlevel APIs for driver use
 */
 extern void *vmalloc(unsigned long size);
 +extern void *vmalloc_ub(unsigned long size);
 extern void *vmalloc_user(unsigned long size);
 extern void *vmalloc_node(unsigned long size, int node);
 extern void *vmalloc_exec(unsigned long size);
 --- ./include/ub/beancounter.h.kmemcore	2006-08-16 19:10:38.000000000 +0400
 +++ ./include/ub/beancounter.h	2006-08-16 19:10:51.000000000 +0400
 @@ -12,7 +12,9 @@
 *	Resource list.
 */
 
 -#define UB_RESOURCES	0
 +#define UB_KMEMSIZE	0
 +
 +#define UB_RESOURCES	1
 
 struct ubparm {
 /*
 --- ./include/ub/kmem.h.kmemcore	2006-08-16 19:10:38.000000000 +0400
 +++ ./include/ub/kmem.h	2006-08-16 19:10:51.000000000 +0400
 @@ -0,0 +1,33 @@
 +/*
 + *  include/ub/kmem.h
 + *
 + *  Copyright (C) 2006 OpenVZ. SWsoft Inc
 + *
 + */
 +
 +#ifndef __UB_KMEM_H_
 +#define __UB_KMEM_H_
 +
 +#include <linux/config.h>
 +
 +/*
 + * UB_KMEMSIZE accounting
 + */
 +
 +struct mm_struct;
 +struct page;
 +struct user_beancounter;
 +
 +#ifdef CONFIG_USER_RESOURCE
 +int  ub_page_charge(struct page *page, int order, gfp_t flags);
 +void ub_page_uncharge(struct page *page, int order);
 +
 +int ub_slab_charge(kmem_cache_t *cachep, void *obj, gfp_t flags);
 +void ub_slab_uncharge(kmem_cache_t *cachep, void *obj);
 +#else
 +#define ub_page_charge(pg, o, mask)	(0)
 +#define ub_page_uncharge(pg, o)		do { } while (0)
 +#define ub_slab_charge(cachep, o)	(0)
 +#define ub_slab_uncharge(cachep, o)	do { } while (0)
 +#endif
 +#endif /* __UB_SLAB_H_ */
 --- ./kernel/ub/Makefile.kmemcore	2006-08-16 19:10:38.000000000 +0400
 +++ ./kernel/ub/Makefile	2006-08-16 19:10:51.000000000 +0400
 @@ -7,3 +7,4 @@
 obj-$(CONFIG_USER_RESOURCE) += beancounter.o
 obj-$(CONFIG_USER_RESOURCE) += misc.o
 obj-y += sys.o
 +obj-$(CONFIG_USER_RESOURCE) += kmem.o
 --- ./kernel/ub/beancounter.c.kmemcore	2006-08-16 19:10:38.000000000 +0400
 +++ ./kernel/ub/beancounter.c	2006-08-16 19:10:51.000000000 +0400
 @@ -20,6 +20,7 @@ static void init_beancounter_struct(stru
 struct user_beancounter ub0;
 
 const char *ub_rnames[] = {
 +	"kmemsize",	/* 0 */
 };
 
 #define ub_hash_fun(x) ((((x) >> 8) ^ (x)) & (UB_HASH_SIZE - 1))
 @@ -356,6 +357,8 @@ static void init_beancounter_syslimits(s
 {
 int k;
 
 +	ub->ub_parms[UB_KMEMSIZE].limit = 32 * 1024 * 1024;
 +
 for (k = 0; k < UB_RESOURCES; k++)
 ub->ub_parms[k].barrier = ub->ub_parms[k].limit;
 }
 --- ./kernel/ub/kmem.c.kmemcore	2006-08-16 19:10:38.000000000 +0400
 +++ ./kernel/ub/kmem.c	2006-08-16 19:10:51.000000000 +0400
 @@ -0,0 +1,89 @@
 +/*
 + *  kernel/ub/kmem.c
 + *
 + *  Copyright (C) 2006 OpenVZ. SWsoft Inc
 + *
 + */
 +
 +#include <linux/sched.h>
 +#include <linux/gfp.h>
 +#include <linux/slab.h>
 +#include <linux/mm.h>
 +
 +#include <ub/beancounter.h>
 +#include <ub/kmem.h>
 +#include <ub/task.h>
 +
 +/*
 + * Slab accounting
 + */
 +
 +int ub_slab_charge(kmem_cache_t *cachep, void *objp, gfp_t flags)
 +{
 +	unsigned int size;
 +	struct user_beancounter *ub, **slab_ubp;
 +
 +	ub = get_exec_ub();
 +	if (ub == NULL)
 +		return 0;
 +
 +	size = kmem_cache_size(cachep);
 +	if (charge_beancounter(ub, UB_KMEMSIZE, size,
 +			(flags & __GFP_UBC_LIMIT ? UB_LIMIT : UB_BARRIER)))
 +		return -ENOMEM;
 +
 +	slab_ubp = kmem_cache_ubp(cachep, objp);
 +	*slab_ubp = get_beancounter(ub);
 +	return 0;
 +}
 +
 +void ub_slab_uncharge(kmem_cache_t *cachep, void *objp)
 +{
 +	unsigned int size;
 +	struct user_beancounter *ub, **slab_ubp;
 +
 +	slab_ubp = kmem_cache_ubp(cachep, objp);
 +	if (*slab_ubp == NULL)
 +		return;
 +
 +	ub = *slab_ubp;
 +	size = kmem_cache_size(cachep);
 +	uncharge_beancounter(ub, UB_KMEMSIZE, size);
 +	put_beancounter(ub);
 +	*slab_ubp = NULL;
 +}
 +
 +/*
 + * Pages accounting
 + */
 +
 +int ub_page_charge(struct page *page, int order, gfp_t flags)
 +{
 +	struct user_beancounter *ub;
 +
 +	BUG_ON(page_ub(page) != NULL);
 +
 +	ub = get_exec_ub();
 +	if (ub == NULL)
 +		return 0;
 +
 +	if (charge_beancounter(ub, UB_KMEMSIZE, PAGE_SIZE << order,
 +			(flags & __GFP_UBC_LIMIT ? UB_LIMIT : UB_BARRIER)))
 +		return -ENOMEM;
 +
 +	page_ub(page) = get_beancounter(ub);
 +	return 0;
 +}
 +
 +void ub_page_uncharge(struct page *page, int order)
 +{
 +	struct user_beancounter *ub;
 +
 +	ub = page_ub(page);
 +	if (ub == NULL)
 +		return;
 +
 +	uncharge_beancounter(ub, UB_KMEMSIZE, PAGE_SIZE << order);
 +	put_beancounter(ub);
 +	page_ub(page) = NULL;
 +}
 --- ./mm/mempool.c.kmemcore	2006-08-16 19:10:38.000000000 +0400
 +++ ./mm/mempool.c	2006-08-16 19:10:51.000000000 +0400
 @@ -119,6 +119,7 @@ int mempool_resize(mempool_t *pool, int
 unsigned long flags;
 
 BUG_ON(new_min_nr <= 0);
 +	gfp_mask &= ~__GFP_UBC;
 
 spin_lock_irqsave(&pool->lock, flags);
 if (new_min_nr <= pool->min_nr) {
 @@ -212,6 +213,7 @@ void * mempool_alloc(mempool_t *pool, gf
 gfp_mask |= __GFP_NOMEMALLOC;	/* don't allocate emergency reserves */
 gfp_mask |= __GFP_NORETRY;	/* don't loop in __alloc_pages */
 gfp_mask |= __GFP_NOWARN;	/* failures are OK */
 +	gfp_mask &= ~__GFP_UBC;		/* do not charge */
 
 gfp_temp = gfp_mask & ~(__GFP_WAIT|__GFP_IO);
 
 --- ./mm/page_alloc.c.kmemcore	2006-08-16 19:10:38.000000000 +0400
 +++ ./mm/page_alloc.c	2006-08-16 19:10:51.000000000 +0400
 @@ -38,6 +38,8 @@
 #include <linux/mempolicy.h>
 #include <linux/stop_machine.h>
 
 +#include <ub/kmem.h>
 +
 #include <asm/tlbflush.h>
 #include <asm/div64.h>
 #include "internal.h"
 @@ -484,6 +486,8 @@ static void __free_pages_ok(struct page
 if (reserved)
 return;
 
 +	ub_page_uncharge(page, order);
 +
 kernel_map_pages(page, 1 << order, 0);
 local_irq_save(flags);
 __count_vm_events(PGFREE, 1 << order);
 @@ -764,6 +768,8 @@ static void fastcall free_hot_cold_page(
 if (free_pages_check(page))
 return;
 
 +	ub_page_uncharge(page, 0);
 +
 kernel_map_pages(page, 1, 0);
 
 pcp = &zone_pcp(zone, get_cpu())->pcp[cold];
 @@ -1153,6 +1159,11 @@ nopage:
 show_mem();
 }
 got_pg:
 +	if ((gfp_mask & __GFP_UBC) &&
 +			ub_page_charge(page, order, gfp_mask)) {
 +		__free_pages(page, order);
 +		page = NULL;
 +	}
 #ifdef CONFIG_PAGE_OWNER
 if (page)
 set_page_owner(page, order, gfp_mask);
 --- ./mm/slab.c.kmemcore	2006-08-16 19:10:38.000000000 +0400
 +++ ./mm/slab.c	2006-08-16 19:10:51.000000000 +0400
 @@ -108,6 +108,8 @@
 #include	<linux/mutex.h>
 #include	<linux/rtmutex.h>
 
 +#include	<ub/kmem.h>
 +
 #include	<asm/uaccess.h>
 #include	<asm/cacheflush.h>
 #include	<asm/tlbflush.h>
 @@ -175,11 +177,13 @@
 SLAB_CACHE_DMA | \
 SLAB_MUST_HWCACHE_ALIGN | SLAB_STORE_USER | \
 SLAB_RECLAIM_ACCOUNT | SLAB_PANIC | \
 +			 SLAB_UBC | SLAB_UBC_NOCHARGE | \
 SLAB_DESTROY_BY_RCU | SLAB_MEM_SPREAD)
 #else
 # define CREATE_MASK	(SLAB_HWCACHE_ALIGN | \
 SLAB_CACHE_DMA | SLAB_MUST_HWCACHE_ALIGN | \
 SLAB_RECLAIM_ACCOUNT | SLAB_PANIC | \
 +			 SLAB_UBC | SLAB_UBC_NOCHARGE | \
 SLAB_DESTROY_BY_RCU | SLAB_MEM_SPREAD)
 #endif
 
 @@ -801,9 +805,33 @@ static struct kmem_cache *kmem_find_gene
 return __find_general_cachep(size, gfpflags);
 }
 
 -static size_t slab_mgmt_size(size_t nr_objs, size_t align)
 +static size_t slab_mgmt_size_raw(size_t nr_objs)
 +{
 +	return sizeof(struct slab) + nr_objs * sizeof(kmem_bufctl_t);
 +}
 +
 +#ifdef CONFIG_USER_RESOURCE
 +#define UB_EXTRASIZE	sizeof(struct user_beancounter *)
 +static inline size_t slab_mgmt_size_noalign(int flags, size_t nr_objs)
 +{
 +	size_t size;
 +
 +	size = slab_mgmt_size_raw(nr_objs);
 +	if (flags & SLAB_UBC)
 +		size = ALIGN(size, UB_EXTRASIZE) + nr_objs * UB_EXTRASIZE;
 +	return size;
 +}
 +#else
 +#define UB_EXTRASIZE	0
 +static inline size_t slab_mgmt_size_noalign(int flags, size_t nr_objs)
 +{
 +	return slab_mgmt_size_raw(nr_objs);
 +}
 +#endif
 +
 +static inline size_t slab_mgmt_size(int flags, size_t nr_objs, size_t align)
 {
 -	return ALIGN(sizeof(struct slab)+nr_objs*sizeof(kmem_bufctl_t), align);
 +	return ALIGN(slab_mgmt_size_noalign(flags, nr_objs), align);
 }
 
 /*
 @@ -848,20 +876,21 @@ static void cache_estimate(unsigned long
 * into account.
 */
 nr_objs = (slab_size - sizeof(struct slab)) /
 -			  (buffer_size + sizeof(kmem_bufctl_t));
 +			  (buffer_size + sizeof(kmem_bufctl_t) +
 +			  (flags & SLAB_UBC ? UB_EXTRASIZE : 0));
 
 /*
 * This calculated number will be either the right
 * amount, or one greater than what we want.
 */
 -		if (slab_mgmt_size(nr_objs, align) + nr_objs*buffer_size
 +		if (slab_mgmt_size(flags, nr_objs, align) + nr_objs*buffer_size
 > slab_size)
 nr_objs--;
 
 if (nr_objs > SLAB_LIMIT)
 nr_objs = SLAB_LIMIT;
 
 -		mgmt_size = slab_mgmt_size(nr_objs, align);
 +		mgmt_size = slab_mgmt_size(flags, nr_objs, align);
 }
 *num = nr_objs;
 *left_over = slab_size - nr_objs*buffer_size - mgmt_size;
 @@ -1420,7 +1449,8 @@ void __init kmem_cache_init(void)
 sizes[INDEX_AC].cs_cachep = kmem_cache_create(names[INDEX_AC].name,
 sizes[INDEX_AC].cs_size,
 ARCH_KMALLOC_MINALIGN,
 -					ARCH_KMALLOC_FLAGS|SLAB_PANIC,
 +					ARCH_KMALLOC_FLAGS | SLAB_UBC |
 +						SLAB_UBC_NOCHARGE | SLAB_PANIC,
 NULL, NULL);
 
 if (INDEX_AC != INDEX_L3) {
 @@ -1428,7 +1458,8 @@ void __init kmem_cache_init(void)
 kmem_cache_create(names[INDEX_L3].name,
 sizes[INDEX_L3].cs_size,
 ARCH_KMALLOC_MINALIGN,
 -				ARCH_KMALLOC_FLAGS|SLAB_PANIC,
 +				ARCH_KMALLOC_FLAGS | SLAB_UBC |
 +					SLAB_UBC_NOCHARGE | SLAB_PANIC,
 NULL, NULL);
 }
 
 @@ -1446,7 +1477,8 @@ void __init kmem_cache_init(void)
 sizes->cs_cachep = kmem_cache_create(names->name,
 sizes->cs_size,
 ARCH_KMALLOC_MINALIGN,
 -					ARCH_KMALLOC_FLAGS|SLAB_PANIC,
 +					ARCH_KMALLOC_FLAGS | SLAB_UBC |
 +						SLAB_UBC_NOCHARGE | SLAB_PANIC,
 NULL, NULL);
 }
 
 @@ -1943,7 +1975,8 @@ static size_t calculate_slab_order(struc
 * looping condition in cache_grow().
 */
 offslab_limit = size - sizeof(struct slab);
 -			offslab_limit /= sizeof(kmem_bufctl_t);
 +			offslab_limit /= (sizeof(kmem_bufctl_t) +
 +					(flags & SLAB_UBC ? UB_EXTRASIZE : 0));
 
 if (num > offslab_limit)
 break;
 @@ -2251,8 +2284,8 @@ kmem_cache_create (const char *name, siz
 cachep = NULL;
 goto oops;
 }
 -	slab_size = ALIGN(cachep->num * sizeof(kmem_bufctl_t)
 -			  + sizeof(struct slab), align);
 +
 +	slab_size = slab_mgmt_size(flags, cachep->num, align);
 
 /*
 * If the slab has been placed off-slab, and we have enough space then
 @@ -2263,11 +2296,9 @@ kmem_cache_create (const char *name, siz
 left_over -= slab_size;
 }
 
 -	if (flags & CFLGS_OFF_SLAB) {
 +	if (flags & CFLGS_OFF_SLAB)
 /* really off slab. No need for manual alignment */
 -		slab_size =
 -		    cachep->num * sizeof(kmem_bufctl_t) + sizeof(struct slab);
 -	}
 +		slab_size = slab_mgmt_size_noalign(flags, cachep->num);
 
 cachep->colour_off = cache_line_size();
 /* Offset must be a multiple of the alignment. */
 @@ -2513,6 +2544,30 @@ int kmem_cache_destroy(struct kmem_cache
 }
 EXPORT_SYMBOL(kmem_cache_destroy);
 
 +static inline kmem_bufctl_t *slab_bufctl(struct slab *slabp)
 +{
 +	return (kmem_bufctl_t *) (slabp + 1);
 +}
 +
 +#ifdef CONFIG_USER_RESOURCE
 +static inline struct user_beancounter **slab_ub_ptrs(kmem_cache_t *cachep,
 +		struct slab *slabp)
 +{
 +	return (struct user_beancounter **) ALIGN((unsigned long)
 +			(slab_bufctl(slabp) + cachep->num), UB_EXTRASIZE);
 +}
 +
 +struct user_beancounter **kmem_cache_ubp(kmem_cache_t *cachep, void *objp)
 +{
 +	struct slab *slabp;
 +	struct user_beancounter **ubs;
 +
 +	slabp = virt_to_slab(objp);
 +	ubs = slab_ub_ptrs(cachep, slabp);
 +	return ubs + obj_to_index(cachep, slabp, objp);
 +}
 +#endif
 +
 /*
 * Get the memory for a slab management obj.
 * For a slab cache when the slab descriptor is off-slab, slab descriptors
 @@ -2533,7 +2588,8 @@ static struct slab *alloc_slabmgmt(struc
 if (OFF_SLAB(cachep)) {
 /* Slab management obj is off-slab. */
 slabp = kmem_cache_alloc_node(cachep->slabp_cache,
 -					      local_flags, nodeid);
 +					      local_flags & (~__GFP_UBC),
 +					      nodeid);
 if (!slabp)
 return NULL;
 } else {
 @@ -2544,14 +2600,14 @@ static struct slab *alloc_slabmgmt(struc
 slabp->colouroff = colour_off;
 slabp->s_mem = objp + colour_off;
 slabp->nodeid = nodeid;
 +#ifdef CONFIG_USER_RESOURCE
 +	if (cachep->flags & SLAB_UBC)
 +		memset(slab_ub_ptrs(cachep, slabp), 0,
 +				cachep->num * UB_EXTRASIZE);
 +#endif
 return slabp;
 }
 
 -static inline kmem_bufctl_t *slab_bufctl(struct slab *slabp)
 -{
 -	return (kmem_bufctl_t *) (slabp + 1);
 -}
 -
 static void cache_init_objs(struct kmem_cache *cachep,
 struct slab *slabp, unsigned long ctor_flags)
 {
 @@ -2729,7 +2785,7 @@ static int cache_grow(struct kmem_cache
 * Get mem for the objs.  Attempt to allocate a physical page from
 * 'nodeid'.
 */
 -	objp = kmem_getpages(cachep, flags, nodeid);
 +	objp = kmem_getpages(cachep, flags & (~__GFP_UBC), nodeid);
 if (!objp)
 goto failed;
 
 @@ -3077,6 +3133,19 @@ static inline void *____cache_alloc(stru
 return objp;
 }
 
 +static inline int ub_should_charge(kmem_cache_t *cachep, gfp_t flags)
 +{
 +#ifdef CONFIG_USER_RESOURCE
 +	if (!(cachep->flags & SLAB_UBC))
 +		return 0;
 +	if (flags & __GFP_UBC)
 +		return 1;
 +	if (!(cachep->flags & SLAB_UBC_NOCHARGE))
 +		return 1;
 +#endif
 +	return 0;
 +}
 +
 static __always_inline void *__cache_alloc(struct kmem_cache *cachep,
 gfp_t flags, void *caller)
 {
 @@ -3090,6 +3159,12 @@ static __always_inline void *__cache_all
 local_irq_restore(save_flags);
 objp = cache_alloc_debugcheck_after(cachep, flags, objp,
 caller);
 +
 +	if (objp && ub_should_charge(cachep, flags))
 +		if (ub_slab_charge(cachep, objp, flags)) {
 +			kmem_cache_free(cachep, objp);
 +			objp = NULL;
 +		}
 prefetchw(objp);
 return objp;
 }
 @@ -3287,6 +3362,8 @@ static inline void __cache_free(struct k
 struct array_cache *ac = cpu_cache_get(cachep);
 
 check_irq_off();
 +	if (cachep->flags & SLAB_UBC)
 +		ub_slab_uncharge(cachep, objp);
 objp = cache_free_debugcheck(cachep, objp, __builtin_return_address(0));
 
 if (cache_free_alien(cachep, objp))
 --- ./mm/vmalloc.c.kmemcore	2006-08-16 19:10:38.000000000 +0400
 +++ ./mm/vmalloc.c	2006-08-16 19:10:51.000000000 +0400
 @@ -520,6 +520,12 @@ void *vmalloc(unsigned long size)
 }
 EXPORT_SYMBOL(vmalloc);
 
 +void *vmalloc_ub(unsigned long size)
 +{
 +	return __vmalloc(size, GFP_KERNEL_UBC | __GFP_HIGHMEM, PAGE_KERNEL);
 +}
 +EXPORT_SYMBOL(vmalloc_ub);
 +
 /**
 *	vmalloc_user  -  allocate virtually contiguous memory which has
 *			   been zeroed so it can be mapped to userspace without
 |  
	|  |  |  
	| 
		
			| [RFC][PATCH 6/7] UBC: kernel memory acconting (mark objects) [message #5201 is a reply to message #5192] | Wed, 16 August 2006 15:40   |  
			| 
				
				
					|  dev Messages: 1693
 Registered: September 2005
 Location: Moscow
 | Senior Member |  
 |  |  
	| Mark some kmem caches with SLAB_UBC and some allocations with __GFP_UBC to cause charging/limiting of appropriate kernel resources.
 
 Signed-Off-By: Pavel Emelianov <xemul@sw.ru>
 Signed-Off-By: Kirill Korotaev <dev@sw.ru>
 
 ---
 arch/i386/kernel/ldt.c           |    4 ++--
 arch/i386/mm/init.c              |    4 ++--
 arch/i386/mm/pgtable.c           |    6 ++++--
 drivers/char/tty_io.c            |   10 +++++-----
 fs/file.c                        |    8 ++++----
 fs/locks.c                       |    2 +-
 fs/namespace.c                   |    3 ++-
 fs/select.c                      |    7 ++++---
 include/asm-i386/thread_info.h   |    4 ++--
 include/asm-ia64/pgalloc.h       |   24 +++++++++++++++++-------
 include/asm-x86_64/pgalloc.h     |   12 ++++++++----
 include/asm-x86_64/thread_info.h |    5 +++--
 ipc/msgutil.c                    |    4 ++--
 ipc/sem.c                        |    7 ++++---
 ipc/util.c                       |    8 ++++----
 kernel/fork.c                    |   15 ++++++++-------
 kernel/posix-timers.c            |    3 ++-
 kernel/signal.c                  |    2 +-
 kernel/user.c                    |    2 +-
 mm/rmap.c                        |    3 ++-
 mm/shmem.c                       |    3 ++-
 21 files changed, 80 insertions(+), 56 deletions(-)
 
 --- ./arch/i386/kernel/ldt.c.ubslabs	2006-04-21 11:59:31.000000000 +0400
 +++ ./arch/i386/kernel/ldt.c	2006-08-01 13:22:30.000000000 +0400
 @@ -39,9 +39,9 @@ static int alloc_ldt(mm_context_t *pc, i
 oldsize = pc->size;
 mincount = (mincount+511)&(~511);
 if (mincount*LDT_ENTRY_SIZE > PAGE_SIZE)
 -		newldt = vmalloc(mincount*LDT_ENTRY_SIZE);
 +		newldt = vmalloc_ub(mincount*LDT_ENTRY_SIZE);
 else
 -		newldt = kmalloc(mincount*LDT_ENTRY_SIZE, GFP_KERNEL);
 +		newldt = kmalloc(mincount*LDT_ENTRY_SIZE, GFP_KERNEL_UBC);
 
 if (!newldt)
 return -ENOMEM;
 --- ./arch/i386/mm/init.c.ubslabs	2006-07-10 12:39:10.000000000 +0400
 +++ ./arch/i386/mm/init.c	2006-08-01 13:17:07.000000000 +0400
 @@ -680,7 +680,7 @@ void __init pgtable_cache_init(void)
 pmd_cache = kmem_cache_create("pmd",
 PTRS_PER_PMD*sizeof(pmd_t),
 PTRS_PER_PMD*sizeof(pmd_t),
 -					0,
 +					SLAB_UBC,
 pmd_ctor,
 NULL);
 if (!pmd_cache)
 @@ -689,7 +689,7 @@ void __init pgtable_cache_init(void)
 pgd_cache = kmem_cache_create("pgd",
 PTRS_PER_PGD*sizeof(pgd_t),
 PTRS_PER_PGD*sizeof(pgd_t),
 -				0,
 +				SLAB_UBC,
 pgd_ctor,
 PTRS_PER_PMD == 1 ? pgd_dtor : NULL);
 if (!pgd_cache)
 --- ./arch/i386/mm/pgtable.c.ubslabs	2006-07-10 12:39:10.000000000 +0400
 +++ ./arch/i386/mm/pgtable.c	2006-08-01 13:27:35.000000000 +0400
 @@ -158,9 +158,11 @@ struct page *pte_alloc_one(struct mm_str
 struct page *pte;
 
 #ifdef CONFIG_HIGHPTE
 -	pte =  alloc_pages(GFP_KERNEL|__GFP_HIGHMEM|__GFP_REPEAT|__GFP_ZERO , 0);
 +	pte =  alloc_pages(GFP_KERNEL|__GFP_HIGHMEM|__GFP_REPEAT|__GFP_ZERO |
 +			__GFP_UBC | __GFP_UBC_LIMIT, 0);
 #else
 -	pte = alloc_pages(GFP_KERNEL|__GFP_REPEAT|__GFP_ZERO, 0);
 +	pte = alloc_pages(GFP_KERNEL|__GFP_REPEAT|__GFP_ZERO|
 +			__GFP_UBC | __GFP_UBC_LIMIT, 0);
 #endif
 return pte;
 }
 --- ./drivers/char/tty_io.c.ubslabs	2006-07-10 12:39:11.000000000 +0400
 +++ ./drivers/char/tty_io.c	2006-08-01 15:21:21.000000000 +0400
 @@ -158,7 +158,7 @@ static struct tty_struct *alloc_tty_stru
 {
 struct tty_struct *tty;
 
 -	tty = kmalloc(sizeof(struct tty_struct), GFP_KERNEL);
 +	tty = kmalloc(sizeof(struct tty_struct), GFP_KERNEL_UBC);
 if (tty)
 memset(tty, 0, sizeof(struct tty_struct));
 return tty;
 @@ -1495,7 +1495,7 @@ static int init_dev(struct tty_driver *d
 
 if (!*tp_loc) {
 tp = (struct termios *) kmalloc(sizeof(struct termios),
 -						GFP_KERNEL);
 +						GFP_KERNEL_UBC);
 if (!tp)
 goto free_mem_out;
 *tp = driver->init_termios;
 @@ -1503,7 +1503,7 @@ static int init_dev(struct tty_driver *d
 
 if (!*ltp_loc) {
 ltp = (struct termios *) kmalloc(sizeof(struct termios),
 -						 GFP_KERNEL);
 +						 GFP_KERNEL_UBC);
 if (!ltp)
 goto free_mem_out;
 memset(ltp, 0, sizeof(struct termios));
 @@ -1528,7 +1528,7 @@ static int init_dev(struct tty_driver *d
 
 if (!*o_tp_loc) {
 o_tp = (struct termios *)
 -				kmalloc(sizeof(struct termios), GFP_KERNEL);
 +				kmalloc(sizeof(struct termios), GFP_KERNEL_UBC);
 if (!o_tp)
 goto free_mem_out;
 *o_tp = driver->other->init_termios;
 @@ -1536,7 +1536,7 @@ static int init_dev(struct tty_driver *d
 
 if (!*o_ltp_loc) {
 o_ltp = (struct termios *)
 -				kmalloc(sizeof(struct termios), GFP_KERNEL);
 +				kmalloc(sizeof(struct termios), GFP_KERNEL_UBC);
 if (!o_ltp)
 goto free_mem_out;
 memset(o_ltp, 0, sizeof(struct termios));
 --- ./fs/file.c.ubslabs	2006-07-17 17:01:12.000000000 +0400
 +++ ./fs/file.c	2006-08-01 15:18:03.000000000 +0400
 @@ -44,9 +44,9 @@ struct file ** alloc_fd_array(int num)
 int size = num * sizeof(struct file *);
 
 if (size <= PAGE_SIZE)
 -		new_fds = (struct file **) kmalloc(size, GFP_KERNEL);
 +		new_fds = (struct file **) kmalloc(size, GFP_KERNEL_UBC);
 else
 -		new_fds = (struct file **) vmalloc(size);
 +		new_fds = (struct file **) vmalloc_ub(size);
 return new_fds;
 }
 
 @@ -213,9 +213,9 @@ fd_set * alloc_fdset(int num)
 int size = num / 8;
 
 if (size <= PAGE_SIZE)
 -		new_fdset = (fd_set *) kmalloc(size, GFP_KERNEL);
 +		new_fdset = (fd_set *) kmalloc(size, GFP_KERNEL_UBC);
 else
 -		new_fdset = (fd_set *) vmalloc(size);
 +		new_fdset = (fd_set *) vmalloc_ub(size);
 return new_fdset;
 }
 
 --- ./fs/locks.c.ubslabs	2006-07-10 12:39:16.000000000 +0400
 +++ ./fs/locks.c	2006-08-01 12:46:47.000000000 +0400
 @@ -2226,7 +2226,7 @@ EXPORT_SYMBOL(lock_may_write);
 static int __init filelock_init(void)
 {
 filelock_cache = kmem_cache_create("file_lock_cache",
 -			sizeof(struct file_lock), 0, SLAB_PANIC,
 +			sizeof(struct file_lock), 0, SLAB_PANIC | SLAB_UBC,
 init_once, NULL);
 return 0;
 }
 --- ./fs/namespace.c.ubslabs	2006-07-10 12:39:16.000000000 +0400
 +++ ./fs/namespace.c	2006-08-01 12:47:12.000000000 +0400
 @@ -1825,7 +1825,8 @@ void __init mnt_init(unsigned long mempa
 init_rwsem(&namespace_sem);
 
 mnt_cache = kmem_cache_create("mnt_cache", sizeof(struct vfsmount),
 -			0, SLAB_HWCACHE_ALIGN | SLAB_PANIC, NULL, NULL);
 +			0, SLAB_HWCACHE_ALIGN | SLAB_UBC | SLAB_PANIC,
 +			NULL, NULL);
 
 mount_hashtable = (struct list_head *)__get_free_page(GFP_ATOMIC);
 
 --- ./fs/select.c.ubslabs	2006-07-10 12:39:17.000000000 +0400
 +++ ./fs/select.c	2006-08-01 15:17:01.000000000 +0400
 @@ -103,7 +103,8 @@ static struct poll_table_entry *poll_get
 if (!table || POLL_TABLE_FULL(table)) {
 struct poll_table_page *new_table;
 
 -		new_table = (struct poll_table_page *) __get_free_page(GFP_KERNEL);
 +		new_table = (struct poll_table_page *)
 +			__get_free_page(GFP_KERNEL_UBC);
 if (!new_table) {
 p->error = -ENOMEM;
 __set_current_state(TASK_RUNNING);
 @@ -339,7 +340,7 @@ static int core_sys_select(int n, fd_set
 if (size > sizeof(stack_fds) / 6) {
 /* Not enough space in on-stack array; must use kmalloc */
 ret = -ENOMEM;
 -		bits = kmalloc(6 * size, GFP_KERNEL);
 +		bits = kmalloc(6 * size, GFP_KERNEL_UBC);
 if (!bits)
 goto out_nofds;
 }
 @@ -693,7 +694,7 @@ int do_sys_poll(struct pollfd __user *uf
 if (!stack_pp)
 stack_pp = pp = (struct poll_list *)stack_pps;
 else {
 -			pp = kmalloc(size, GFP_KERNEL);
 +			pp = kmalloc(size, GFP_KERNEL_UBC);
 if (!pp)
 goto out_fds;
 }
 --- ./include/asm-i386/thread_info.h.ubslabs	2006-07-10 12:39:19.000000000 +0400
 +++ ./include/asm-i386/thread_info.h	2006-08-01 15:19:50.000000000 +0400
 @@ -99,13 +99,13 @@ static inline struct thread_info *curren
 ({							\
 struct thread_info *ret;			\
 \
 -		ret = kmalloc(THREAD_SIZE, GFP_KERNEL);		\
 +		ret = kmalloc(THREAD_SIZE, GFP_KERNEL_UBC);	\
 if (ret)					\
 memset(ret, 0, THREAD_SIZE);		\
 ret;						\
 })
 #else
 -#define alloc_thread_info(tsk) kmalloc(THREAD_SIZE, GFP_KERNEL)
 +#define alloc_thread_info(tsk) kmalloc(THREAD_SIZE, GFP_KERNEL_UBC)
 #endif
 
 #define free_thread_info(info)	kfree(info)
 --- ./include/asm-ia64/pgalloc.h.ubslabs	2006-07-10 12:39:19.000000000 +0400
 +++ ./include/asm-ia64/pgalloc.h	2006-08-01 13:35:49.000000000 +0400
 @@ -19,6 +19,8 @@
 #include <linux/page-flags.h>
 #include <linux/threads.h>
 
 +#include <ub/kmem.h>
 +
 #include <asm/mmu_context.h>
 
 DECLARE_PER_CPU(unsigned long *, __pgtable_quicklist);
 @@ -37,7 +39,7 @@ static inline long pgtable_quicklist_tot
 return ql_size;
 }
 
 -static inline void *pgtable_quicklist_alloc(void)
 +static inline void *pgtable_quicklist_alloc(int charge)
 {
 unsigned long *ret = NULL;
 
 @@ -45,13 +47,20 @@ static inline void *pgtable_quicklist_al
 
 ret = pgtable_quicklist;
 if (likely(ret != NULL)) {
 +		if (charge && ub_page_charge(virt_to_page(ret),
 +					0, __GFP_UBC_LIMIT)) {
 +			ret = NULL;
 +			goto out;
 +		}
 pgtable_quicklist = (unsigned long *)(*ret);
 ret[0] = 0;
 --pgtable_quicklist_size;
 +out:
 preempt_enable();
 } else {
 preempt_enable();
 -		ret = (unsigned long *)__get_free_page(GFP_KERNEL | __GFP_ZERO);
 +		ret = (unsigned long *)__get_free_page(GFP_KERNEL |
 +				__GFP_ZERO | __GFP_UBC | __GFP_UBC_LIMIT);
 }
 
 return ret;
 @@ -69,6 +78,7 @@ static inline void pgtable_quicklist_fre
 #endif
 
 preempt_disable();
 +	ub_page_uncharge(virt_to_page(pgtable_entry), 0);
 *(unsigned long *)pgtable_entry = (unsigned long)pgtable_quicklist;
 pgtable_quicklist = (unsigned long *)pgtable_entry;
 ++pgtable_quicklist_size;
 @@ -77,7 +87,7 @@ static inline void pgtable_quicklist_fre
 
 static inline pgd_t *pgd_alloc(struct mm_struct *mm)
 {
 -	return pgtable_quicklist_alloc();
 +	return pgtable_quicklist_alloc(1);
 }
 
 static inline void pgd_free(pgd_t * pgd)
 @@ -94,7 +104,7 @@ pgd_populate(struct mm_struct *mm, pgd_t
 
 static inline pud_t *pud_alloc_one(struct mm_struct *mm, unsigned long addr)
 {
 -	return pgtable_quicklist_alloc();
 +	retur
...
 
 
 |  
	|  |  |  
	| 
		
			| [RFC][PATCH 7/7] UBC: proc interface [message #5202 is a reply to message #5192] | Wed, 16 August 2006 15:42   |  
			| 
				
				
					|  dev Messages: 1693
 Registered: September 2005
 Location: Moscow
 | Senior Member |  
 |  |  
	| Add proc interface (/proc/user_beancounters) allowing to see current state (usage/limits/fails for each UB). Implemented via seq files.
 
 Signed-Off-By: Pavel Emelianov <xemul@sw.ru>
 Signed-Off-By: Kirill Korotaev <dev@sw.ru>
 
 ---
 init/main.c        |    1
 kernel/ub/Makefile |    1
 kernel/ub/proc.c   |  205 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 207 insertions(+)
 
 --- ./init/main.c.ubproc	2006-07-31 18:40:20.000000000 +0400
 +++ ./init/main.c	2006-08-03 16:02:19.000000000 +0400
 @@ -578,6 +578,7 @@ asmlinkage void __init start_kernel(void
 page_writeback_init();
 #ifdef CONFIG_PROC_FS
 proc_root_init();
 +	ub_init_proc();
 #endif
 cpuset_init();
 taskstats_init_early();
 --- ./kernel/ub/Makefile.ubproc	2006-07-31 17:49:05.000000000 +0400
 +++ ./kernel/ub/Makefile	2006-08-01 11:08:39.000000000 +0400
 @@ -4,3 +4,4 @@ obj-$(CONFIG_USER_RESOURCE) += beancount
 obj-$(CONFIG_USER_RESOURCE) += misc.o
 obj-y += sys.o
 obj-$(CONFIG_USER_RESOURCE) += kmem.o
 +obj-$(CONFIG_USER_RESOURCE) += proc.o
 --- ./kernel/ub/proc.c.ubproc	2006-08-01 10:22:09.000000000 +0400
 +++ ./kernel/ub/proc.c	2006-08-03 15:50:35.000000000 +0400
 @@ -0,0 +1,205 @@
 +/*
 + *  kernel/ub/proc.c
 + *
 + *  Copyright (C) 2006 OpenVZ. SWsoft Inc.
 + *
 + */
 +
 +#include <linux/sched.h>
 +#include <linux/kernel.h>
 +#include <linux/proc_fs.h>
 +#include <linux/seq_file.h>
 +
 +#include <ub/beancounter.h>
 +
 +#ifdef CONFIG_PROC_FS
 +
 +#if BITS_PER_LONG == 32
 +static const char *head_fmt = "%10s %-12s %10s %10s %10s %10s %10s\n";
 +static const char *res_fmt = "%10s %-12s %10lu %10lu %10lu %10lu %10lu\n";
 +#else
 +static const char *head_fmt = "%10s %-12s %20s %20s %20s %20s %20s\n";
 +static const char *res_fmt = "%10s %-12s %20lu %20lu %20lu %20lu %20lu\n";
 +#endif
 +
 +static void ub_show_header(struct seq_file *f)
 +{
 +	seq_printf(f, head_fmt, "uid", "resource",
 +			"held", "maxheld", "barrier", "limit", "failcnt");
 +}
 +
 +static void ub_show_res(struct seq_file *f, struct user_beancounter *ub, int r)
 +{
 +	char ub_uid[64];
 +
 +	if (r == 0)
 +		ub_print_uid(ub, ub_uid, sizeof(ub_uid));
 +	else
 +		strcpy(ub_uid, "");
 +
 +	seq_printf(f, res_fmt, ub_uid, ub_rnames[r],
 +			ub->ub_parms[r].held,
 +			ub->ub_parms[r].maxheld,
 +			ub->ub_parms[r].barrier,
 +			ub->ub_parms[r].limit,
 +			ub->ub_parms[r].failcnt);
 +}
 +
 +static struct ub_seq_struct {
 +	unsigned long flags;
 +	int slot;
 +	struct user_beancounter *ub;
 +} ub_seq_ctx;
 +
 +static int ub_show(struct seq_file *f, void *v)
 +{
 +	int res;
 +
 +	for (res = 0; res < UB_RESOURCES; res++)
 +		ub_show_res(f, ub_seq_ctx.ub, res);
 +	return 0;
 +}
 +
 +static void *ub_start_ctx(struct seq_file *f, unsigned long p, int sub)
 +{
 +	struct user_beancounter *ub;
 +	struct hlist_node *pos;
 +	unsigned long flags;
 +	int slot;
 +
 +	if (p == 0)
 +		ub_show_header(f);
 +
 +	spin_lock_irqsave(&ub_hash_lock, flags);
 +	ub_seq_ctx.flags = flags;
 +
 +	for (slot = 0; slot < UB_HASH_SIZE; slot++)
 +		hlist_for_each_entry (ub, pos, &ub_hash[slot], hash) {
 +			if (!sub && ub->parent != NULL)
 +				continue;
 +
 +			if (p-- == 0) {
 +				ub_seq_ctx.ub = ub;
 +				ub_seq_ctx.slot = slot;
 +				return &ub_seq_ctx;
 +			}
 +		}
 +
 +	return NULL;
 +}
 +
 +static void *ub_next_ctx(struct seq_file *f, loff_t *ppos, int sub)
 +{
 +	struct user_beancounter *ub;
 +	struct hlist_node *pos;
 +	int slot;
 +
 +	ub = ub_seq_ctx.ub;
 +
 +	pos = &ub->hash;
 +	hlist_for_each_entry_continue (ub, pos, hash) {
 +		if (!sub && ub->parent != NULL)
 +			continue;
 +
 +		ub_seq_ctx.ub = ub;
 +		(*ppos)++;
 +		return &ub_seq_ctx;
 +	}
 +
 +	for (slot = ub_seq_ctx.slot + 1; slot < UB_HASH_SIZE; slot++)
 +		hlist_for_each_entry (ub, pos, &ub_hash[slot], hash) {
 +			if (!sub && ub->parent != NULL)
 +				continue;
 +
 +			ub_seq_ctx.ub = ub;
 +			ub_seq_ctx.slot = slot;
 +			(*ppos)++;
 +			return &ub_seq_ctx;
 +		}
 +
 +	return NULL;
 +}
 +
 +static void *ub_start(struct seq_file *f, loff_t *ppos)
 +{
 +	return ub_start_ctx(f, *ppos, 0);
 +}
 +
 +static void *ub_sub_start(struct seq_file *f, loff_t *ppos)
 +{
 +	return ub_start_ctx(f, *ppos, 1);
 +}
 +
 +static void *ub_next(struct seq_file *f, void *v, loff_t *pos)
 +{
 +	return ub_next_ctx(f, pos, 0);
 +}
 +
 +static void *ub_sub_next(struct seq_file *f, void *v, loff_t *pos)
 +{
 +	return ub_next_ctx(f, pos, 1);
 +}
 +
 +static void ub_stop(struct seq_file *f, void *v)
 +{
 +	unsigned long flags;
 +
 +	flags = ub_seq_ctx.flags;
 +	spin_unlock_irqrestore(&ub_hash_lock, flags);
 +}
 +
 +static struct seq_operations ub_seq_ops = {
 +	.start = ub_start,
 +	.next  = ub_next,
 +	.stop  = ub_stop,
 +	.show  = ub_show
 +};
 +
 +static int ub_open(struct inode *inode, struct file *filp)
 +{
 +	return seq_open(filp, &ub_seq_ops);
 +}
 +
 +static struct file_operations ub_file_operations = {
 +	.open		= ub_open,
 +	.read		= seq_read,
 +	.llseek		= seq_lseek,
 +	.release	= seq_release,
 +};
 +
 +static struct seq_operations ub_sub_seq_ops = {
 +	.start = ub_sub_start,
 +	.next  = ub_sub_next,
 +	.stop  = ub_stop,
 +	.show  = ub_show
 +};
 +
 +static int ub_sub_open(struct inode *inode, struct file *filp)
 +{
 +	return seq_open(filp, &ub_sub_seq_ops);
 +}
 +
 +static struct file_operations ub_sub_file_operations = {
 +	.open		= ub_sub_open,
 +	.read		= seq_read,
 +	.llseek		= seq_lseek,
 +	.release	= seq_release,
 +};
 +
 +void __init ub_init_proc(void)
 +{
 +	struct proc_dir_entry *entry;
 +
 +	entry = create_proc_entry("user_beancounters", S_IRUGO, NULL);
 +	if (entry)
 +		entry->proc_fops = &ub_file_operations;
 +	else
 +		panic("Can't create /proc/user_beancounters\n");
 +
 +	entry = create_proc_entry("user_beancounters_sub", S_IRUGO, NULL);
 +	if (entry)
 +		entry->proc_fops = &ub_sub_file_operations;
 +	else
 +		panic("Can't create /proc/user_beancounters_sub\n");
 +}
 +#endif
 |  
	|  |  |  
	|  |  
	|  |  
	|  |  
	|  |  
	|  |  
	|  |  
	|  |  
	|  |  
	|  |  
	|  |  
	|  |  
	| 
		
			| Re: [RFC][PATCH 4/7] UBC: syscalls (user interface) [message #5234 is a reply to message #5199] | Wed, 16 August 2006 17:17   |  
			| 
				
				
					|  Greg KH Messages: 27
 Registered: February 2006
 | Junior Member |  |  |  
	| On Wed, Aug 16, 2006 at 07:39:43PM +0400, Kirill Korotaev wrote: > --- ./include/asm-sparc/unistd.h.arsys	2006-07-10 12:39:19.000000000 +0400
 > +++ ./include/asm-sparc/unistd.h	2006-08-10 17:08:19.000000000 +0400
 > @@ -318,6 +318,9 @@
 > #define __NR_unshare		299
 > #define __NR_set_robust_list	300
 > #define __NR_get_robust_list	301
 > +#define __NR_getluid		302
 > +#define __NR_setluid		303
 > +#define __NR_setublimit		304
 
 Hm, you seem to be ignoring this:
 
 >
 > #ifdef __KERNEL__
 > /* WARNING: You MAY NOT add syscall numbers larger than 301, since
 
 Same thing for sparc64:
 
 > --- ./include/asm-sparc64/unistd.h.arsys	2006-07-10
 > 12:39:19.000000000 +0400
 > +++ ./include/asm-sparc64/unistd.h	2006-08-10 17:09:24.000000000 +0400
 > @@ -320,6 +320,9 @@
 > #define __NR_unshare		299
 > #define __NR_set_robust_list	300
 > #define __NR_get_robust_list	301
 > +#define __NR_getluid		302
 > +#define __NR_setluid		303
 > +#define __NR_setublimit		304
 >
 > #ifdef __KERNEL__
 > /* WARNING: You MAY NOT add syscall numbers larger than 301, since
 
 You might want to read those comments...
 
 thanks,
 
 greg k-h
 |  
	|  |  |  
	| 
		
			| Re: [RFC][PATCH 4/7] UBC: syscalls (user interface) [message #5235 is a reply to message #5199] | Wed, 16 August 2006 18:17   |  
			| 
				
				
					|  Rohit Seth Messages: 101
 Registered: August 2006
 | Senior Member |  |  |  
	| On Wed, 2006-08-16 at 19:39 +0400, Kirill Korotaev wrote: > Add the following system calls for UB management:
 >   1. sys_getluid    - get current UB id
 >   2. sys_setluid    - changes exec_ and fork_ UBs on current
 >   3. sys_setublimit - set limits for resources consumtions
 >
 
 Why not have another system call for getting the current limits?
 
 But as I said in previous mail, configfs seems like a better choice for
 user interface.  That way user has to go to one place to read/write
 limits, see the current usage and other stats.
 
 > Signed-Off-By: Pavel Emelianov <xemul@sw.ru>
 > Signed-Off-By: Kirill Korotaev <dev@sw.ru>
 
 ...<snip>...
 > +
 > +/*
 > + *	The setbeanlimit syscall
 > + */
 > +asmlinkage long sys_setublimit(uid_t uid, unsigned long resource,
 > +		unsigned long *limits)
 > +{
 
 > +	ub = beancounter_findcreate(uid, NULL, 0);
 > +	if (ub == NULL)
 > +		goto out;
 > +
 > +	spin_lock_irqsave(&ub->ub_lock, flags);
 > +	ub->ub_parms[resource].barrier = new_limits[0];
 > +	ub->ub_parms[resource].limit = new_limits[1];
 > +	spin_unlock_irqrestore(&ub->ub_lock, flags);
 > +
 
 I think there should be a check here for seeing if the new limits are
 lower than the current usage of a resource.  If so then take appropriate
 action.
 
 -rohit
 |  
	|  |  |  
	| 
		
			| Re: [RFC][PATCH 2/7] UBC: core (structures, API) [message #5236 is a reply to message #5196] | Wed, 16 August 2006 18:11   |  
			| 
				
				
					|  Rohit Seth Messages: 101
 Registered: August 2006
 | Senior Member |  |  |  
	| On Wed, 2006-08-16 at 19:37 +0400, Kirill Korotaev wrote: > Core functionality and interfaces of UBC:
 > find/create beancounter, initialization,
 > charge/uncharge of resource, core objects' declarations.
 >
 > Basic structures:
 >   ubparm           - resource description
 >   user_beancounter - set of resources, id, lock
 >
 > Signed-Off-By: Pavel Emelianov <xemul@sw.ru>
 > Signed-Off-By: Kirill Korotaev <dev@sw.ru>
 >
 > ---
 >  include/ub/beancounter.h |  157 ++++++++++++++++++
 >  init/main.c              |    4
 >  kernel/Makefile          |    1
 >  kernel/ub/Makefile       |    7
 >  kernel/ub/beancounter.c  |  398 +++++++++++++++++++++++++++++++++++++++++++++++
 >  5 files changed, 567 insertions(+)
 >
 > --- /dev/null	2006-07-18 14:52:43.075228448 +0400
 > +++ ./include/ub/beancounter.h	2006-08-10 14:58:27.000000000 +0400
 > @@ -0,0 +1,157 @@
 > +/*
 > + *  include/ub/beancounter.h
 > + *
 > + *  Copyright (C) 2006 OpenVZ. SWsoft Inc
 > + *
 > + */
 > +
 > +#ifndef _LINUX_BEANCOUNTER_H
 > +#define _LINUX_BEANCOUNTER_H
 > +
 > +/*
 > + *	Resource list.
 > + */
 > +
 > +#define UB_RESOURCES	0
 > +
 > +struct ubparm {
 > +	/*
 > +	 * A barrier over which resource allocations are failed gracefully.
 > +	 * e.g. if the amount of consumed memory is over the barrier further
 > +	 * sbrk() or mmap() calls fail, the existing processes are not killed.
 > +	 */
 > +	unsigned long	barrier;
 > +	/* hard resource limit */
 > +	unsigned long	limit;
 > +	/* consumed resources */
 > +	unsigned long	held;
 > +	/* maximum amount of consumed resources through the last period */
 > +	unsigned long	maxheld;
 > +	/* minimum amount of consumed resources through the last period */
 > +	unsigned long	minheld;
 > +	/* count of failed charges */
 > +	unsigned long	failcnt;
 > +};
 
 What is the difference between barrier and limit. They both sound like
 hard limits.  No?
 
 > +
 > +/*
 > + * Kernel internal part.
 > + */
 > +
 > +#ifdef __KERNEL__
 > +
 > +#include <linux/config.h>
 > +#include <linux/spinlock.h>
 > +#include <linux/list.h>
 > +#include <asm/atomic.h>
 > +
 > +/*
 > + * UB_MAXVALUE is essentially LONG_MAX declared in a cross-compiling safe form.
 > + */
 > +#define UB_MAXVALUE	( (1UL << (sizeof(unsigned long)*8-1)) - 1)
 > +
 > +
 > +/*
 > + *	Resource management structures
 > + * Serialization issues:
 > + *   beancounter list management is protected via ub_hash_lock
 > + *   task pointers are set only for current task and only once
 > + *   refcount is managed atomically
 > + *   value and limit comparison and change are protected by per-ub spinlock
 > + */
 > +
 > +struct user_beancounter
 > +{
 > +	atomic_t		ub_refcount;
 > +	spinlock_t		ub_lock;
 > +	uid_t			ub_uid;
 
 Why uid?  Will it be possible to club processes belonging to different
 users to same bean counter.
 
 > +	struct hlist_node	hash;
 > +
 > +	struct user_beancounter	*parent;
 > +	void			*private_data;
 > +
 
 What are the above two fields used for?
 
 > +	/* resources statistics and settings */
 > +	struct ubparm		ub_parms[UB_RESOURCES];
 > +};
 > +
 
 I presume UB_RESOURCES value is going to change as different resources
 start getting tracked.
 
 I think something like configfs should be used for user interface.  It
 automatically presents the right interfaces to user land (based on
 kernel implementation).  And you wouldn't need any changes in glibc etc.
 
 
 -rohit
 |  
	|  |  |  
	| 
		
			| Re: [RFC][PATCH 5/7] UBC: kernel memory accounting (core) [message #5237 is a reply to message #5200] | Wed, 16 August 2006 18:24   |  
			| 
				
				
					|  Rohit Seth Messages: 101
 Registered: August 2006
 | Senior Member |  |  |  
	| On Wed, 2006-08-16 at 19:40 +0400, Kirill Korotaev wrote: > Introduce UB_KMEMSIZE resource which accounts kernel
 > objects allocated by task's request.
 >
 > Reference to UB is kept on struct page or slab object.
 > For slabs each struct slab contains a set of pointers
 > corresponding objects are charged to.
 >
 > Allocation charge rules:
 >  1. Pages - if allocation is performed with __GFP_UBC flag - page
 >     is charged to current's exec_ub.
 >  2. Slabs - kmem_cache may be created with SLAB_UBC flag - in this
 >     case each allocation is charged. Caches used by kmalloc are
 >     created with SLAB_UBC | SLAB_UBC_NOCHARGE flags. In this case
 >     only __GFP_UBC allocations are charged.
 
 <snip>
 
 > --- ./mm/page_alloc.c.kmemcore	2006-08-16 19:10:38.000000000 +0400
 > +++ ./mm/page_alloc.c	2006-08-16 19:10:51.000000000 +0400
 > @@ -38,6 +38,8 @@
 >  #include <linux/mempolicy.h>
 >  #include <linux/stop_machine.h>
 >
 > +#include <ub/kmem.h>
 > +
 >  #include <asm/tlbflush.h>
 >  #include <asm/div64.h>
 >  #include "internal.h"
 > @@ -484,6 +486,8 @@ static void __free_pages_ok(struct page
 >  	if (reserved)
 >  		return;
 >
 > +	ub_page_uncharge(page, order);
 > +
 >  	kernel_map_pages(page, 1 << order, 0);
 >  	local_irq_save(flags);
 >  	__count_vm_events(PGFREE, 1 << order);
 > @@ -764,6 +768,8 @@ static void fastcall free_hot_cold_page(
 >  	if (free_pages_check(page))
 >  		return;
 >
 > +	ub_page_uncharge(page, 0);
 > +
 >  	kernel_map_pages(page, 1, 0);
 >
 >  	pcp = &zone_pcp(zone, get_cpu())->pcp[cold];
 > @@ -1153,6 +1159,11 @@ nopage:
 >  		show_mem();
 >  	}
 >  got_pg:
 > +	if ((gfp_mask & __GFP_UBC) &&
 > +			ub_page_charge(page, order, gfp_mask)) {
 > +		__free_pages(page, order);
 > +		page = NULL;
 > +	}
 >  #ifdef CONFIG_PAGE_OWNER
 >  	if (page)
 >  		set_page_owner(page, order, gfp_mask);
 
 If I'm reading this patch right then seems like you are making page
 allocations to fail w/o (for example) trying to purge some pages from
 the page cache belonging to this container.  Or is that reclaim going to
 come later?
 
 -rohit
 |  
	|  |  |  
	|  |  
	| 
		
			| Re: [RFC][PATCH 5/7] UBC: kernel memory accounting (core) [message #5239 is a reply to message #5217] | Wed, 16 August 2006 19:15   |  
			| 
				
				
					|  Rohit Seth Messages: 101
 Registered: August 2006
 | Senior Member |  |  |  
	| On Wed, 2006-08-16 at 11:47 -0700, Dave Hansen wrote: > On Wed, 2006-08-16 at 19:40 +0400, Kirill Korotaev wrote:
 > > --- ./include/linux/mm.h.kmemcore       2006-08-16 19:10:38.000000000
 > > +0400
 > > +++ ./include/linux/mm.h        2006-08-16 19:10:51.000000000 +0400
 > > @@ -274,8 +274,14 @@ struct page {
 > >         unsigned int gfp_mask;
 > >         unsigned long trace[8];
 > >  #endif
 > > +#ifdef CONFIG_USER_RESOURCE
 > > +       union {
 > > +               struct user_beancounter *page_ub;
 > > +       } bc;
 > > +#endif
 > >  };
 >
 > Is everybody OK with adding this accounting to the 'struct page'?
 
 My preference would be to have container (I keep on saying container,
 but resource beancounter) pointer embeded in task, mm(not sure),
 address_space and anon_vma structures.  This should allow us to track
 user land pages optimally.  But for tracking kernel usage on behalf of
 user, we will have to use an additional field (unless we can re-use
 mapping).  Please correct me if I'm wrong, though all the kernel
 resources will be allocated/freed in context of a user process.  And at
 that time we know if a allocation should succeed or not.  So we may
 actually not need to track kernel pages that closely.  We are not going
 to run reclaim on any of them anyways.
 
 -rohit
 |  
	|  |  |  
	|  |  
	|  |  
	| 
		
			| Re: [ckrm-tech] [RFC][PATCH] UBC: user resource beancounters [message #5253 is a reply to message #5192] | Thu, 17 August 2006 00:15   |  
			| 
				
				
					|  Chandra Seetharaman Messages: 88
 Registered: August 2006
 | Member |  |  |  
	| Kirill, 
 Thanks for posting the patches to ckrm-tech. I 'll look into it and post
 my comments tomorrow.
 
 Some documentation (or pointer to the documentation) on how to use this
 feature and high level design would really help.
 
 How does the hierarchy work ? (May be reading the code would clear it
 up :).
 
 few comments below..
 On Wed, 2006-08-16 at 19:24 +0400, Kirill Korotaev wrote:
 <snip>
 > The patches in these series are:
 > diff-ubc-kconfig.patch:
 >     Adds kernel/ub/Kconfig file with UBC options and
 >     includes it into arch Kconfigs
 
 Since the core functionality is arch independent, why not have the
 Kconfig stuff in some generic place like init/Kconfig ?
 
 >
 > diff-ubc-core.patch:
 >     Contains core functionality and interfaces of UBC:
 >     find/create beancounter, initialization,
 >     charge/uncharge of resource, core objects' declarations.
 >
 > diff-ubc-task.patch:
 >     Contains code responsible for setting UB on task,
 >     it's inheriting and setting host context in interrupts.
 >
 >     Task contains three beancounters:
 >     1. exec_ub  - current context. all resources are charged
 >                   to this beancounter.
 >     2. task_ub  - beancounter to which task_struct is charged
 >                   itself.
 >     3. fork_sub - beancounter which is inherited by
 >                   task's children on fork
 
 wondering why we need three of these ?
 
 >
 > diff-ubc-syscalls.patch:
 >     Patch adds system calls for UB management:
 >     1. sys_getluid    - get current UB id
 >     2. sys_setluid    - changes exec_ and fork_ UBs on current
 >     3. sys_setublimit - set limits for resources consumtions
 
 I agree with Rohit that configfs based interface would be more easy to
 use (you will not get into the system call number issue that Greg has
 pointed too).
 >
 > diff-ubc-kmem-core.patch:
 >     Introduces UB_KMEMSIZE resource which accounts kernel
 >     objects allocated by task's request.
 >
 >     Objects are accounted via struct page and slab objects.
 >     For the latter ones each slab contains a set of pointers
 >     corresponding object is charged to.
 >
 >     Allocation charge rules:
 >     1. Pages - if allocation is performed with __GFP_UBC flag - page
 >        is charged to current's exec_ub.
 >     2. Slabs - kmem_cache may be created with SLAB_UBC flag - in this
 >        case each allocation is charged. Caches used by kmalloc are
 >        created with SLAB_UBC | SLAB_UBC_NOCHARGE flags. In this case
 >        only __GFP_UBC allocations are charged.
 >
 > diff-ubc-kmem-charge.patch:
 >     Adds SLAB_UBC and __GFP_UBC flags in appropriate places
 >     to cause charging/limiting of specified resources.
 >
 > diff-ubc-proc.patch:
 >     Adds two proc entries user_beancounters and user_beancounters_sub
 >     allowing to see current state (usage/limits/fails for each UB).
 >     Implemented via seq files.
 
 again, configfs would be easier.
 
 >
 > Patch set is applicable to 2.6.18-rc4-mm1
 >
 > Thanks,
 > Kirill
 >
 >
 >  ------------------------------------------------------------ -------------
 > Using Tomcat but need to do more? Need to support web services, security?
 > Get stuff done quickly with pre-integrated technology to make your job easier
 > Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
 >  http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&b id=263057&dat=121642
 > _______________________________________________
 > ckrm-tech mailing list
 > https://lists.sourceforge.net/lists/listinfo/ckrm-tech
 --
 
 ------------------------------------------------------------ ----------
 Chandra Seetharaman               | Be careful what you choose....
 - sekharan@us.ibm.com   |      .......you may get it.
 ------------------------------------------------------------ ----------
 |  
	|  |  |  
	|  |  
	|  |  
	| 
		
			| Re: [RFC][PATCH 2/7] UBC: core (structures, API) [message #5279 is a reply to message #5236] | Thu, 17 August 2006 11:52   |  
			| 
				
				
					|  dev Messages: 1693
 Registered: September 2005
 Location: Moscow
 | Senior Member |  
 |  |  
	| Rohit Seth wrote: > On Wed, 2006-08-16 at 19:37 +0400, Kirill Korotaev wrote:
 >
 >>Core functionality and interfaces of UBC:
 >>find/create beancounter, initialization,
 >>charge/uncharge of resource, core objects' declarations.
 >>
 >>Basic structures:
 >>  ubparm           - resource description
 >>  user_beancounter - set of resources, id, lock
 >>
 >>Signed-Off-By: Pavel Emelianov <xemul@sw.ru>
 >>Signed-Off-By: Kirill Korotaev <dev@sw.ru>
 >>
 >>---
 >> include/ub/beancounter.h |  157 ++++++++++++++++++
 >> init/main.c              |    4
 >> kernel/Makefile          |    1
 >> kernel/ub/Makefile       |    7
 >> kernel/ub/beancounter.c  |  398 +++++++++++++++++++++++++++++++++++++++++++++++
 >> 5 files changed, 567 insertions(+)
 >>
 >>--- /dev/null	2006-07-18 14:52:43.075228448 +0400
 >>+++ ./include/ub/beancounter.h	2006-08-10 14:58:27.000000000 +0400
 >>@@ -0,0 +1,157 @@
 >>+/*
 >>+ *  include/ub/beancounter.h
 >>+ *
 >>+ *  Copyright (C) 2006 OpenVZ. SWsoft Inc
 >>+ *
 >>+ */
 >>+
 >>+#ifndef _LINUX_BEANCOUNTER_H
 >>+#define _LINUX_BEANCOUNTER_H
 >>+
 >>+/*
 >>+ *	Resource list.
 >>+ */
 >>+
 >>+#define UB_RESOURCES	0
 >>+
 >>+struct ubparm {
 >>+	/*
 >>+	 * A barrier over which resource allocations are failed gracefully.
 >>+	 * e.g. if the amount of consumed memory is over the barrier further
 >>+	 * sbrk() or mmap() calls fail, the existing processes are not killed.
 >>+	 */
 >>+	unsigned long	barrier;
 >>+	/* hard resource limit */
 >>+	unsigned long	limit;
 >>+	/* consumed resources */
 >>+	unsigned long	held;
 >>+	/* maximum amount of consumed resources through the last period */
 >>+	unsigned long	maxheld;
 >>+	/* minimum amount of consumed resources through the last period */
 >>+	unsigned long	minheld;
 >>+	/* count of failed charges */
 >>+	unsigned long	failcnt;
 >>+};
 >
 >
 > What is the difference between barrier and limit. They both sound like
 > hard limits.  No?
 check __charge_beancounter_locked and severity.
 It provides some kind of soft and hard limits.
 
 >>+
 >>+/*
 >>+ * Kernel internal part.
 >>+ */
 >>+
 >>+#ifdef __KERNEL__
 >>+
 >>+#include <linux/config.h>
 >>+#include <linux/spinlock.h>
 >>+#include <linux/list.h>
 >>+#include <asm/atomic.h>
 >>+
 >>+/*
 >>+ * UB_MAXVALUE is essentially LONG_MAX declared in a cross-compiling safe form.
 >>+ */
 >>+#define UB_MAXVALUE	( (1UL << (sizeof(unsigned long)*8-1)) - 1)
 >>+
 >>+
 >>+/*
 >>+ *	Resource management structures
 >>+ * Serialization issues:
 >>+ *   beancounter list management is protected via ub_hash_lock
 >>+ *   task pointers are set only for current task and only once
 >>+ *   refcount is managed atomically
 >>+ *   value and limit comparison and change are protected by per-ub spinlock
 >>+ */
 >>+
 >>+struct user_beancounter
 >>+{
 >>+	atomic_t		ub_refcount;
 >>+	spinlock_t		ub_lock;
 >>+	uid_t			ub_uid;
 >
 >
 > Why uid?  Will it be possible to club processes belonging to different
 > users to same bean counter.
 oh, its a misname. Should be ub_id. it is ID of user_beancounter
 and has nothing to do with user id.
 
 >>+	struct hlist_node	hash;
 >>+
 >>+	struct user_beancounter	*parent;
 >>+	void			*private_data;
 >>+
 >
 >
 > What are the above two fields used for?
 the first one is for hierarchical UBs,
 see beancounter_findcreate with UB_LOOKUP_SUB.
 private_data is probably not used yet :)
 
 >>+	/* resources statistics and settings */
 >>+	struct ubparm		ub_parms[UB_RESOURCES];
 >>+};
 >>+
 >
 >
 > I presume UB_RESOURCES value is going to change as different resources
 > start getting tracked.
 what's wrong with it?
 
 > I think something like configfs should be used for user interface.  It
 > automatically presents the right interfaces to user land (based on
 > kernel implementation).  And you wouldn't need any changes in glibc etc.
 1. UBC doesn't require glibc modificatins.
 2. if you think a bit more about it, adding UB parameters doesn't
 require user space changes as well.
 3. it is possible to add any kind of interface for UBC. but do you like the idea
 to grep 200(containers)x20(parameters) files for getting current usages?
 Do you like the idea to convert numbers to strings and back w/o
 thinking of data types?
 
 Thanks,
 Kirill
 |  
	|  |  |  
	|  |  
	| 
		
			| Re: [RFC][PATCH 4/7] UBC: syscalls (user interface) [message #5282 is a reply to message #5234] | Thu, 17 August 2006 12:00   |  
			| 
				
				
					|  dev Messages: 1693
 Registered: September 2005
 Location: Moscow
 | Senior Member |  
 |  |  
	| Greg KH wrote: > On Wed, Aug 16, 2006 at 07:39:43PM +0400, Kirill Korotaev wrote:
 >
 >>--- ./include/asm-sparc/unistd.h.arsys	2006-07-10 12:39:19.000000000 +0400
 >>+++ ./include/asm-sparc/unistd.h	2006-08-10 17:08:19.000000000 +0400
 >>@@ -318,6 +318,9 @@
 >>#define __NR_unshare		299
 >>#define __NR_set_robust_list	300
 >>#define __NR_get_robust_list	301
 >>+#define __NR_getluid		302
 >>+#define __NR_setluid		303
 >>+#define __NR_setublimit		304
 >
 >
 > Hm, you seem to be ignoring this:
 >
 >
 >>#ifdef __KERNEL__
 >>/* WARNING: You MAY NOT add syscall numbers larger than 301, since
 >
 >
 > Same thing for sparc64:
 [...skipped...]
 
 Oh, will fix NR_SYSCALLS in entry.S and the comment in unistd.h. Thanks for catching this!
 
 Thanks,
 Kirill
 |  
	|  |  |  
	| 
		
			| Re: [RFC][PATCH 4/7] UBC: syscalls (user interface) [message #5284 is a reply to message #5235] | Thu, 17 August 2006 12:03   |  
			| 
				
				
					|  dev Messages: 1693
 Registered: September 2005
 Location: Moscow
 | Senior Member |  
 |  |  
	| >>Add the following system calls for UB management: >>  1. sys_getluid    - get current UB id
 >>  2. sys_setluid    - changes exec_ and fork_ UBs on current
 >>  3. sys_setublimit - set limits for resources consumtions
 >>
 >
 >
 > Why not have another system call for getting the current limits?
 will add sys_getublimit().
 
 > But as I said in previous mail, configfs seems like a better choice for
 > user interface.  That way user has to go to one place to read/write
 > limits, see the current usage and other stats.
 Check another email about interfaces. I have arguments against it :/
 
 >>Signed-Off-By: Pavel Emelianov <xemul@sw.ru>
 >>Signed-Off-By: Kirill Korotaev <dev@sw.ru>
 >
 >
 > 	...<snip>...
 >
 >>+
 >>+/*
 >>+ *	The setbeanlimit syscall
 >>+ */
 >>+asmlinkage long sys_setublimit(uid_t uid, unsigned long resource,
 >>+		unsigned long *limits)
 >>+{
 >
 >
 >>+	ub = beancounter_findcreate(uid, NULL, 0);
 >>+	if (ub == NULL)
 >>+		goto out;
 >>+
 >>+	spin_lock_irqsave(&ub->ub_lock, flags);
 >>+	ub->ub_parms[resource].barrier = new_limits[0];
 >>+	ub->ub_parms[resource].limit = new_limits[1];
 >>+	spin_unlock_irqrestore(&ub->ub_lock, flags);
 >>+
 >
 >
 > I think there should be a check here for seeing if the new limits are
 > lower than the current usage of a resource.  If so then take appropriate
 > action.
 any idea what exact action to add here?
 Looks like can be added when needed, agree?
 
 Thanks,
 Kirill
 |  
	|  |  |  
	|  |  
	| 
		
			| Re: [ckrm-tech] [RFC][PATCH 4/7] UBC: syscalls (user interface) [message #5288 is a reply to message #5199] | Thu, 17 August 2006 11:09   |  
			| 
				
				
					|  Srivatsa Vaddagiri Messages: 241
 Registered: August 2006
 | Senior Member |  |  |  
	| On Wed, Aug 16, 2006 at 07:39:43PM +0400, Kirill Korotaev wrote: 
 > +/*
 > + *	The setbeanlimit syscall
 > + */
 > +asmlinkage long sys_setublimit(uid_t uid, unsigned long resource,
 > +		unsigned long *limits)
 > +{
 
 [snip]
 
 > +	spin_lock_irqsave(&ub->ub_lock, flags);
 > +	ub->ub_parms[resource].barrier = new_limits[0];
 > +	ub->ub_parms[resource].limit = new_limits[1];
 
 Would it be usefull to notify the "resource" controller about this
 change in limits? For ex: in case of the CPU controller I wrote
 (http://lkml.org/lkml/2006/8/4/9), I was finding it usefull to recv
 notification of changes to these limits, so that internal structures
 (which are kept per-task-group) can be updated.
 
 
 --
 Regards,
 vatsa
 |  
	|  |  |  
	|  |  
	|  |  
	|  |  
	|  | 
 
 
 Current Time: Sun Oct 26 19:36:01 GMT 2025 
 Total time taken to generate the page: 0.08770 seconds |