| Home » Mailing lists » Devel » [PATCH] BC: resource beancounters (v2) Goto Forum:
	| 
		
			| [PATCH] BC: resource beancounters (v2) [message #5542] | Wed, 23 August 2006 10:44  |  
			| 
				
				
					|  dev Messages: 1693
 Registered: September 2005
 Location: Moscow
 | Senior Member |  
 |  |  
	| The following patch set presents base of Resource Beancounters (BC).
 BC allows to account and control consumption
 of kernel resources used by group of processes.
 
 Draft UBC description on OpenVZ wiki can be found at
 http://wiki.openvz.org/UBC_parameters
 
 The full BC patch set allows to control:
 - kernel memory. All the kernel objects allocatable
 on user demand should be accounted and limited
 for DoS protection.
 E.g. page tables, task structs, vmas etc.
 
 - virtual memory pages. BCs allow to
 limit a container to some amount of memory and
 introduces 2-level OOM killer taking into account
 container's consumption.
 pages shared between containers are correctly
 charged as fractions (tunable).
 
 - network buffers. These includes TCP/IP rcv/snd
 buffers, dgram snd buffers, unix, netlinks and
 other buffers.
 
 - minor resources accounted/limited by number:
 tasks, files, flocks, ptys, siginfo, pinned dcache
 mem, sockets, iptentries (for containers with
 virtualized networking)
 
 As the first step we want to propose for discussion
 the most complicated parts of resource management:
 kernel memory and virtual memory.
 The patch set to be sent provides core for BC and
 management of kernel memory only. Virtual memory
 management will be sent in a couple of days.
 
 The patches in these series are:
 diff-bc-kconfig.patch:
 Adds kernel/bc/Kconfig file with UBC options and
 includes it into arch Kconfigs
 
 diff-bc-core.patch:
 Contains core functionality and interfaces of BC:
 find/create beancounter, initialization,
 charge/uncharge of resource, core objects' declarations.
 
 diff-bc-task.patch:
 Contains code responsible for setting BC on task,
 it's inheriting and setting host context in interrupts.
 
 Task contains three beancounters:
 1. exec_bc  - current context. all resources are charged
 to this beancounter.
 2. fork_bc  - beancounter which is inherited by
 task's children on fork
 
 diff-bc-syscalls.patch:
 Patch adds system calls for BC management:
 1. sys_get_bcid    - get current BC id
 2. sys_set_bcid    - changes exec_ and fork_ BCs on current
 3. sys_set_bclimit - set limits for resources consumtions
 4. sys_get_bcstat  - returns limits/usages/fails for BC
 
 diff-bc-kmem-core.patch:
 Introduces BC_KMEMSIZE resource which accounts kernel
 objects allocated by task's request.
 
 Objects are accounted via struct page and slab objects.
 For the latter ones each slab contains a set of pointers
 corresponding object is charged to.
 
 Allocation charge rules:
 1. Pages - if allocation is performed with __GFP_BC flag - page
 is charged to current's exec_bc.
 2. Slabs - kmem_cache may be created with SLAB_BC flag - in this
 case each allocation is charged. Caches used by kmalloc are
 created with SLAB_BC | SLAB_BC_NOCHARGE flags. In this case
 only __GFP_BC allocations are charged.
 
 diff-bc-kmem-charge.patch:
 Adds SLAB_BC and __GFP_BC flags in appropriate places
 to cause charging/limiting of specified resources.
 
 
 Summary of changes from v1 patch set:
 
 * CONFIG_BEANCOUNTERS is 'n' by default
 * fixed Kconfig includes in arches
 * removed hierarchical beancounters to simplify first patchset
 * removed unused 'private' pointer
 * removed unused EXPORTS
 * MAXVALUE redeclared as LONG_MAX
 * beancounter_findcreate clarification
 * renamed UBC -> BC, ub -> bc etc.
 * moved BC inheritance into copy_process
 * introduced reset_exec_bc() with proposed BUG_ON
 * removed task_bc beancounter (not used yet, for numproc)
 * fixed syscalls for sparc
 * added sys_get_bcstat(): return info that was in /proc
 * cond_syscall instead of #ifdefs
 
 Thanks to Oleg Nesterov, Alan Cox, Matt Helsley and
 others for patch review and comments.
 
 Patch set is applicable to 2.6.18-rc4-mm2
 
 Thanks,
 Kirill
 |  
	|  |  |  
	| 
		
			| [PATCH 1/6] BC: kconfig [message #5543 is a reply to message #5542] | Wed, 23 August 2006 10:59   |  
			| 
				
				
					|  dev Messages: 1693
 Registered: September 2005
 Location: Moscow
 | Senior Member |  
 |  |  
	| Add kernel/bc/Kconfig file with BC options and include it into arch Kconfigs
 
 Signed-off-by: Pavel Emelianov <xemul@sw.ru>
 Signed-off-by: Kirill Korotaev <dev@sw.ru>
 
 ---
 
 arch/i386/Kconfig    |    2 ++
 arch/ia64/Kconfig    |    2 ++
 arch/powerpc/Kconfig |    2 ++
 arch/ppc/Kconfig     |    2 ++
 arch/sparc/Kconfig   |    2 ++
 arch/sparc64/Kconfig |    2 ++
 arch/x86_64/Kconfig  |    2 ++
 kernel/bc/Kconfig    |   25 +++++++++++++++++++++++++
 8 files changed, 39 insertions(+)
 
 --- ./arch/i386/Kconfig.bckm	2006-07-10 12:39:10.000000000 +0400
 +++ ./arch/i386/Kconfig	2006-07-28 14:10:41.000000000 +0400
 @@ -1146,6 +1146,8 @@ source "crypto/Kconfig"
 
 source "lib/Kconfig"
 
 +source "kernel/bc/Kconfig"
 +
 #
 # Use the generic interrupt handling code in kernel/irq/:
 #
 --- ./arch/ia64/Kconfig.bckm	2006-07-10 12:39:10.000000000 +0400
 +++ ./arch/ia64/Kconfig	2006-07-28 14:10:56.000000000 +0400
 @@ -481,6 +481,8 @@ source "fs/Kconfig"
 
 source "lib/Kconfig"
 
 +source "kernel/bc/Kconfig"
 +
 #
 # Use the generic interrupt handling code in kernel/irq/:
 #Add kernel/bc/Kconfig file with BC options and
 include it into arch Kconfigs
 
 Signed-off-by: Pavel Emelianov <xemul@sw.ru>
 Signed-off-by: Kirill Korotaev <dev@sw.ru>
 
 ---
 
 arch/i386/Kconfig    |    2 ++
 arch/ia64/Kconfig    |    2 ++
 arch/powerpc/Kconfig |    2 ++
 arch/ppc/Kconfig     |    2 ++
 arch/sparc/Kconfig   |    2 ++
 arch/sparc64/Kconfig |    2 ++
 arch/x86_64/Kconfig  |    2 ++
 kernel/bc/Kconfig    |   25 +++++++++++++++++++++++++
 8 files changed, 39 insertions(+)
 
 --- ./arch/i386/Kconfig.bckm	2006-07-10 12:39:10.000000000 +0400
 +++ ./arch/i386/Kconfig	2006-07-28 14:10:41.000000000 +0400
 @@ -1146,6 +1146,8 @@ source "crypto/Kconfig"
 
 source "lib/Kconfig"
 
 +source "kernel/bc/Kconfig"
 +
 #
 # Use the generic interrupt handling code in kernel/irq/:
 #
 --- ./arch/ia64/Kconfig.bckm	2006-07-10 12:39:10.000000000 +0400
 +++ ./arch/ia64/Kconfig	2006-07-28 14:10:56.000000000 +0400
 @@ -481,6 +481,8 @@ source "fs/Kconfig"
 
 source "lib/Kconfig"
 
 +source "kernel/bc/Kconfig"
 +
 #
 # Use the generic interrupt handling code in kernel/irq/:
 #
 --- ./arch/powerpc/Kconfig.arkcfg	2006-08-07 14:07:12.000000000 +0400
 +++ ./arch/powerpc/Kconfig	2006-08-10 17:55:58.000000000 +0400
 @@ -1038,6 +1038,8 @@ source "arch/powerpc/platforms/iseries/K
 
 source "lib/Kconfig"
 
 +source "kernel/bc/Kconfig"
 +
 menu "Instrumentation Support"
 depends on EXPERIMENTAL
 
 --- ./arch/ppc/Kconfig.arkcfg	2006-07-10 12:39:10.000000000 +0400
 +++ ./arch/ppc/Kconfig	2006-08-10 17:56:13.000000000 +0400
 @@ -1414,6 +1414,8 @@ endmenu
 
 source "lib/Kconfig"
 
 +source "kernel/bc/Kconfig"
 +
 source "arch/powerpc/oprofile/Kconfig"
 
 source "arch/ppc/Kconfig.debug"
 --- ./arch/sparc/Kconfig.arkcfg	2006-04-21 11:59:32.000000000 +0400
 +++ ./arch/sparc/Kconfig	2006-08-10 17:56:24.000000000 +0400
 @@ -296,3 +296,5 @@ source "security/Kconfig"
 source "crypto/Kconfig"
 
 source "lib/Kconfig"
 +
 +source "kernel/bc/Kconfig"
 --- ./arch/sparc64/Kconfig.arkcfg	2006-07-17 17:01:11.000000000 +0400
 +++ ./arch/sparc64/Kconfig	2006-08-10 17:56:36.000000000 +0400
 @@ -432,3 +432,5 @@ source "security/Kconfig"
 source "crypto/Kconfig"
 
 source "lib/Kconfig"
 +
 +source "kernel/bc/Kconfig"
 --- ./arch/x86_64/Kconfig.bckm	2006-07-10 12:39:11.000000000 +0400
 +++ ./arch/x86_64/Kconfig	2006-07-28 14:10:49.000000000 +0400
 @@ -655,3 +655,5 @@ source "security/Kconfig"
 source "crypto/Kconfig"
 
 source "lib/Kconfig"
 +
 +source "kernel/bc/Kconfig"
 --- ./kernel/bc/Kconfig.bckm	2006-07-28 13:07:38.000000000 +0400
 +++ ./kernel/bc/Kconfig	2006-07-28 13:09:51.000000000 +0400
 @@ -0,0 +1,25 @@
 +#
 +# Resource beancounters (BC)
 +#
 +# Copyright (C) 2006 OpenVZ. SWsoft Inc
 +
 +menu "User resources"
 +
 +config BEANCOUNTERS
 +	bool "Enable resource accounting/control"
 +	default n
 +	help
 +          This patch provides accounting and allows to configure
 +          limits for user's consumption of exhaustible system resources.
 +          The most important resource controlled by this patch is unswappable
 +          memory (either mlock'ed or used by internal kernel structures and
 +          buffers). The main goal of this patch is to protect processes
 +          from running short of important resources because of an accidental
 +          misbehavior of processes or malicious activity aiming to ``kill''
 +          the system. It's worth to mention that resource limits configured
 +          by setrlimit(2) do not give an acceptable level of protection
 +          because they cover only small fraction of resources and work on a
 +          per-process basis.  Per-process accounting doesn't prevent malicious
 +          users from spawning a lot of resource-consuming processes.
 +
 +endmenu
 
 --- ./arch/powerpc/Kconfig.arkcfg	2006-08-07 14:07:12.000000000 +0400
 +++ ./arch/powerpc/Kconfig	2006-08-10 17:55:58.000000000 +0400
 @@ -1038,6 +1038,8 @@ source "arch/powerpc/platforms/iseries/K
 
 source "lib/Kconfig"
 
 +source "kernel/bc/Kconfig"
 +
 menu "Instrumentation Support"
 depends on EXPERIMENTAL
 
 --- ./arch/ppc/Kconfig.arkcfg	2006-07-10 12:39:10.000000000 +0400
 +++ ./arch/ppc/Kconfig	2006-08-10 17:56:13.000000000 +0400
 @@ -1414,6 +1414,8 @@ endmenu
 
 source "lib/Kconfig"
 
 +source "kernel/bc/Kconfig"
 +
 source "arch/powerpc/oprofile/Kconfig"
 
 source "arch/ppc/Kconfig.debug"
 --- ./arch/sparc/Kconfig.arkcfg	2006-04-21 11:59:32.000000000 +0400
 +++ ./arch/sparc/Kconfig	2006-08-10 17:56:24.000000000 +0400
 @@ -296,3 +296,5 @@ source "security/Kconfig"
 source "crypto/Kconfig"
 
 source "lib/Kconfig"
 +
 +source "kernel/bc/Kconfig"
 --- ./arch/sparc64/Kconfig.arkcfg	2006-07-17 17:01:11.000000000 +0400
 +++ ./arch/sparc64/Kconfig	2006-08-10 17:56:36.000000000 +0400
 @@ -432,3 +432,5 @@ source "security/Kconfig"
 source "crypto/Kconfig"
 
 source "lib/Kconfig"
 +
 +source "kernel/bc/Kconfig"
 --- ./arch/x86_64/Kconfig.bckm	2006-07-10 12:39:11.000000000 +0400
 +++ ./arch/x86_64/Kconfig	2006-07-28 14:10:49.000000000 +0400
 @@ -655,3 +655,5 @@ source "security/Kconfig"
 source "crypto/Kconfig"
 
 source "lib/Kconfig"
 +
 +source "kernel/bc/Kconfig"
 --- ./kernel/bc/Kconfig.bckm	2006-07-28 13:07:38.000000000 +0400
 +++ ./kernel/bc/Kconfig	2006-07-28 13:09:51.000000000 +0400
 @@ -0,0 +1,25 @@
 +#
 +# Resource beancounters (BC)
 +#
 +# Copyright (C) 2006 OpenVZ. SWsoft Inc
 +
 +menu "User resources"
 +
 +config BEANCOUNTERS
 +	bool "Enable resource accounting/control"
 +	default n
 +	help
 +          This patch provides accounting and allows to configure
 +          limits for user's consumption of exhaustible system resources.
 +          The most important resource controlled by this patch is unswappable
 +          memory (either mlock'ed or used by internal kernel structures and
 +          buffers). The main goal of this patch is to protect processes
 +          from running short of important resources because of an accidental
 +          misbehavior of processes or malicious activity aiming to ``kill''
 +          the system. It's worth to mention that resource limits configured
 +          by setrlimit(2) do not give an acceptable level of protection
 +          because they cover only small fraction of resources and work on a
 +          per-process basis.  Per-process accounting doesn't prevent malicious
 +          users from spawning a lot of resource-consuming processes.
 +
 +endmenu
 |  
	|  |  |  
	| 
		
			| [PATCH 2/6] BC: beancounters core (API) [message #5544 is a reply to message #5542] | Wed, 23 August 2006 11:00   |  
			| 
				
				
					|  dev Messages: 1693
 Registered: September 2005
 Location: Moscow
 | Senior Member |  
 |  |  
	| Core functionality and interfaces of BC: find/create beancounter, initialization,
 charge/uncharge of resource, core objects' declarations.
 
 Basic structures:
 bc_resource_parm - resource description
 beancounter      - set of resources, id, lock
 
 
 Signed-off-by: Pavel Emelianov <xemul@sw.ru>
 Signed-off-by: Kirill Korotaev <dev@sw.ru>
 
 ---
 
 include/bc/beancounter.h |  140 ++++++++++++++++++++++
 init/main.c              |    4
 kernel/Makefile          |    1
 kernel/bc/Makefile       |    7 +
 kernel/bc/beancounter.c  |  292 +++++++++++++++++++++++++++++++++++++++++++++++
 5 files changed, 444 insertions(+)
 
 --- /dev/null	2006-07-18 14:52:43.075228448 +0400
 +++ ./include/bc/beancounter.h	2006-08-21 13:14:01.000000000 +0400
 @@ -0,0 +1,140 @@
 +/*
 + *  include/bc/beancounter.h
 + *
 + *  Copyright (C) 2006 OpenVZ. SWsoft Inc
 + *
 + */
 +
 +#ifndef _LINUX_BEANCOUNTER_H
 +#define _LINUX_BEANCOUNTER_H
 +
 +/*
 + *	Resource list.
 + */
 +
 +#define BC_RESOURCES	0
 +
 +struct bc_resource_parm {
 +	unsigned long barrier;	/* A barrier over which resource allocations
 +				 * are failed gracefully. e.g. if the amount
 +				 * of consumed memory is over the barrier
 +				 * further sbrk() or mmap() calls fail, the
 +				 * existing processes are not killed.
 +				 */
 +	unsigned long limit;	/* hard resource limit */
 +	unsigned long held;	/* consumed resources */
 +	unsigned long maxheld;	/* maximum amount of consumed resources */
 +	unsigned long minheld;	/* minumum amount of consumed resources */
 +	unsigned long failcnt;	/* count of failed charges */
 +};
 +
 +/*
 + * Kernel internal part.
 + */
 +
 +#ifdef __KERNEL__
 +
 +#include <linux/config.h>
 +#include <linux/spinlock.h>
 +#include <linux/list.h>
 +#include <asm/atomic.h>
 +
 +#define BC_MAXVALUE	LONG_MAX
 +
 +/*
 + *	Resource management structures
 + * Serialization issues:
 + *   beancounter list management is protected via bc_hash_lock
 + *   task pointers are set only for current task and only once
 + *   refcount is managed atomically
 + *   value and limit comparison and change are protected by per-bc spinlock
 + */
 +
 +struct beancounter
 +{
 +	atomic_t		bc_refcount;
 +	spinlock_t		bc_lock;
 +	uid_t			bc_id;
 +	struct hlist_node	hash;
 +
 +	/* resources statistics and settings */
 +	struct bc_resource_parm	bc_parms[BC_RESOURCES];
 +};
 +
 +enum severity { BC_BARRIER, BC_LIMIT, BC_FORCE };
 +
 +/* Flags passed to beancounter_findcreate() */
 +#define BC_ALLOC		0x01 /* May allocate new one */
 +#define BC_ALLOC_ATOMIC		0x02 /* Allocate with GFP_ATOMIC */
 +
 +#define BC_HASH_SIZE		256
 +
 +#ifdef CONFIG_BEANCOUNTERS
 +extern struct hlist_head bc_hash[];
 +extern spinlock_t bc_hash_lock;
 +
 +/*
 + * This function tunes minheld and maxheld values for a given
 + * resource when held value changes
 + */
 +static inline void bc_adjust_held_minmax(struct beancounter *bc,
 +		int resource)
 +{
 +	if (bc->bc_parms[resource].maxheld < bc->bc_parms[resource].held)
 +		bc->bc_parms[resource].maxheld = bc->bc_parms[resource].held;
 +	if (bc->bc_parms[resource].minheld > bc->bc_parms[resource].held)
 +		bc->bc_parms[resource].minheld = bc->bc_parms[resource].held;
 +}
 +
 +void bc_print_resource_warning(struct beancounter *bc, int res,
 +		char *str, unsigned long val, unsigned long held);
 +void bc_print_id(struct beancounter *bc, char *str, int size);
 +
 +int bc_charge_locked(struct beancounter *bc,
 +		int res, unsigned long val, enum severity strict);
 +int bc_charge(struct beancounter *bc,
 +		int res, unsigned long val, enum severity strict);
 +
 +void bc_uncharge_locked(struct beancounter *bc,
 +		int res, unsigned long val);
 +void bc_uncharge(struct beancounter *bc,
 +		int res, unsigned long val);
 +
 +struct beancounter *beancounter_findcreate(uid_t id, int mask);
 +
 +static inline struct beancounter *get_beancounter(struct beancounter *bc)
 +{
 +	atomic_inc(&bc->bc_refcount);
 +	return bc;
 +}
 +
 +void __put_beancounter(struct beancounter *bc);
 +static inline void put_beancounter(struct beancounter *bc)
 +{
 +	__put_beancounter(bc);
 +}
 +
 +void bc_init_early(void);
 +void bc_init_late(void);
 +void bc_init_proc(void);
 +
 +extern struct beancounter init_bc;
 +extern const char *bc_rnames[];
 +
 +#else /* CONFIG_BEANCOUNTERS */
 +
 +#define beancounter_findcreate(id, f)			(NULL)
 +#define get_beancounter(bc)				(NULL)
 +#define put_beancounter(bc)				do { } while (0)
 +#define bc_charge_locked(bc, r, v, s)			(0)
 +#define bc_charge(bc, r, v)				(0)
 +#define bc_uncharge_locked(bc, r, v)			do { } while (0)
 +#define bc_uncharge(bc, r, v)				do { } while (0)
 +#define bc_init_early()					do { } while (0)
 +#define bc_init_late()					do { } while (0)
 +#define bc_init_proc()					do { } while (0)
 +
 +#endif /* CONFIG_BEANCOUNTERS */
 +#endif /* __KERNEL__ */
 +
 +#endif /* _LINUX_BEANCOUNTER_H */
 --- ./init/main.c.ve1	2006-08-21 12:25:13.000000000 +0400
 +++ ./init/main.c	2006-08-21 12:45:32.000000000 +0400
 @@ -52,6 +52,8 @@
 #include <linux/debug_locks.h>
 #include <linux/lockdep.h>
 
 +#include <bc/beancounter.h>
 +
 #include <asm/io.h>
 #include <asm/bugs.h>
 #include <asm/setup.h>
 @@ -494,6 +496,7 @@ asmlinkage void __init start_kernel(void
 early_boot_irqs_off();
 early_init_irq_lock_class();
 
 +	bc_init_early();
 /*
 * Interrupts are still disabled. Do necessary setups, then
 * enable them
 @@ -587,6 +590,7 @@ asmlinkage void __init start_kernel(void
 #endif
 fork_init(num_physpages);
 proc_caches_init();
 +	bc_init_late();
 buffer_init();
 unnamed_dev_init();
 key_init();
 --- ./kernel/Makefile.ve1	2006-08-21 12:25:14.000000000 +0400
 +++ ./kernel/Makefile	2006-08-21 12:25:27.000000000 +0400
 @@ -12,6 +12,7 @@ obj-y     = sched.o fork.o exec_domain.o
 
 obj-$(CONFIG_STACKTRACE) += stacktrace.o
 obj-y += time/
 +obj-y += bc/
 obj-$(CONFIG_DEBUG_MUTEXES) += mutex-debug.o
 obj-$(CONFIG_LOCKDEP) += lockdep.o
 ifeq ($(CONFIG_PROC_FS),y)
 --- /dev/null	2006-07-18 14:52:43.075228448 +0400
 +++ ./kernel/bc/Makefile	2006-08-21 12:25:27.000000000 +0400
 @@ -0,0 +1,7 @@
 +#
 +# Beancounters (BC)
 +#
 +# Copyright (C) 2006 OpenVZ. SWsoft Inc
 +#
 +
 +obj-$(CONFIG_BEANCOUNTERS) += beancounter.o
 --- /dev/null	2006-07-18 14:52:43.075228448 +0400
 +++ ./kernel/bc/beancounter.c	2006-08-21 13:13:11.000000000 +0400
 @@ -0,0 +1,292 @@
 +/*
 + *  kernel/bc/beancounter.c
 + *
 + *  Copyright (C) 2006 OpenVZ. SWsoft Inc
 + *  Original code by (C) 1998      Alan Cox
 + *                       1998-2000 Andrey Savochkin <saw@saw.sw.com.sg>
 + */
 +
 +#include <linux/slab.h>
 +#include <linux/module.h>
 +
 +#include <bc/beancounter.h>
 +
 +static kmem_cache_t *bc_cachep;
 +static struct beancounter default_beancounter;
 +
 +static void init_beancounter_struct(struct beancounter *bc, uid_t id);
 +
 +struct beancounter init_bc;
 +
 +const char *bc_rnames[] = {
 +};
 +
 +#define bc_hash_fun(x) ((((x) >> 8) ^ (x)) & (BC_HASH_SIZE - 1))
 +
 +struct hlist_head bc_hash[BC_HASH_SIZE];
 +spinlock_t bc_hash_lock;
 +
 +EXPORT_SYMBOL(bc_hash);
 +EXPORT_SYMBOL(bc_hash_lock);
 +
 +/*
 + *	Per resource beancounting. Resources are tied to their luid.
 + *	The resource structure itself is tagged both to the process and
 + *	the charging resources (a socket doesn't want to have to search for
 + *	things at irq time for example). Reference counters keep things in
 + *	hand.
 + *
 + *	The case where a user creates resource, kills all his processes and
 + *	then starts new ones is correctly handled this way. The refcounters
 + *	will mean the old entry is still around with resource tied to it.
 + */
 +
 +struct beancounter *beancounter_findcreate(uid_t uid, int mask)
 +{
 +	struct beancounter *new_bc, *bc;
 +	unsigned long flags;
 +	struct hlist_head *slot;
 +	struct hlist_node *pos;
 +
 +	slot = &bc_hash[bc_hash_fun(uid)];
 +	new_bc = NULL;
 +
 +retry:
 +	spin_lock_irqsave(&bc_hash_lock, flags);
 +	hlist_for_each_entry (bc, pos, slot, hash)
 +		if (bc->bc_id == uid)
 +			break;
 +
 +	if (pos != NULL) {
 +		get_beancounter(bc);
 +		spin_unlock_irqrestore(&bc_hash_lock, flags);
 +
 +		if (new_bc != NULL)
 +			kmem_cache_free(bc_cachep, new_bc);
 +		return bc;
 +	}
 +
 +	if (!(mask & BC_ALLOC))
 +		goto out_unlock;
 +
 +	if (new_bc != NULL)
 +		goto out_install;
 +
 +	spin_unlock_irqrestore(&bc_hash_lock, flags);
 +
 +	new_bc = kmem_cache_alloc(bc_cachep,
 +			mask & BC_ALLOC_ATOMIC ? GFP_ATOMIC : GFP_KERNEL);
 +	if (new_bc == NULL)
 +		goto out;
 +
 +	memcpy(new_bc, &default_beancounter, sizeof(*new_bc));
 +	init_beancounter_struct(new_bc, uid);
 +	goto retry;
 +
 +out_install:
 +	hlist_add_head(&new_bc->hash, slot);
 +out_unlock:
 +	spin_unlock_irqrestore(&bc_hash_lock, flags);
 +out:
 +	return new_bc;
 +}
 +
 +void bc_print_id(struct beancounter *bc, char *str, int size)
 +{
 +	snprintf(str, size, "%u", bc->bc_id);
 +}
 +
 +void bc_print_resource_warning(struct beancounter *bc, int res,
 +		char *str, unsigned long val, unsigned long held)
 +{
 +	char uid[64];
 +
 +	bc_print_id(bc, uid, sizeof(uid));
 +	printk(KERN_WARNING "BC %s %s warning: %s "
 +			"(held %lu, fails %lu, val %lu)\n",
 +			uid, bc_rnames[res], str,
 +			(res < BC_RESOURCES ? bc->bc_parms[res].held : held),
 +			(res < BC_RESOURCES ? bc->bc_parms[res].failcnt : 0),
 +			val);
 +}
 +
 +static inline void verify_held(struct beancounter *bc)
 +{
 +	int i;
 +
 +	for (i = 0; i < BC_RESOURCES; i++)
 +		if (bc->bc_parms[i].held != 0)
 +			bc_print_resource_warning(bc, i,
 +					"resource is held on put", 0, 0);
 +}
 +
 +void __put_beancounter(struct beancounter *bc)
 +{
 +	unsigned long flags;
 +
 +	/* equivalent to atomic_dec_and_lock_irqsave() */
 +	local_irq_save(flags);
 +	if (likely(!atomic_dec_and_lock(&bc->bc_refcount, &bc_hash_lock))) {
 +		local_irq_restore(flags);
 +		if (unlikely(atomic_read(&bc->bc_refcount) < 0))
 +			printk(KERN_ERR "BC: Bad refcount: bc=%p, "
 +					"luid=%d, ref=%d\n",
 +					bc, bc->bc_id,
 +					atomic_read(&bc->bc_refcount));
 +		return;
 +	}
 +
 +	BUG_ON(bc == &ini
...
 
 
 |  
	|  |  |  
	| 
		
			| [PATCH 3/6] BC: context inheriting and changing [message #5545 is a reply to message #5542] | Wed, 23 August 2006 11:02   |  
			| 
				
				
					|  dev Messages: 1693
 Registered: September 2005
 Location: Moscow
 | Senior Member |  
 |  |  
	| Contains code responsible for setting BC on task, it's inheriting and setting host context in interrupts.
 
 Task references 2 beancounters:
 1. exec_bc: current context. all resources are
 charged to this beancounter.
 3. fork_bc: beancounter which is inherited by
 task's children on fork
 
 Signed-off-by: Pavel Emelianov <xemul@sw.ru>
 Signed-off-by: Kirill Korotaev <dev@sw.ru>
 
 ---
 
 include/linux/sched.h   |    5 ++++
 include/bc/task.h       |   52 ++++++++++++++++++++++++++++++++++++++++++++++++
 kernel/fork.c           |   17 +++++++++++++--
 kernel/irq/handle.c     |    9 ++++++++
 kernel/softirq.c        |    8 +++++++
 kernel/bc/Makefile      |    1
 kernel/bc/beancounter.c |    3 ++
 kernel/bc/misc.c        |   32 +++++++++++++++++++++++++++++
 8 files changed, 125 insertions(+), 2 deletions(-)
 
 --- ./include/linux/sched.h.ve2	2006-08-21 13:15:39.000000000 +0400
 +++ ./include/linux/sched.h	2006-08-21 13:26:03.000000000 +0400
 @@ -83,6 +83,8 @@ struct sched_param {
 #include <linux/timer.h>
 #include <linux/hrtimer.h>
 
 +#include <bc/task.h>
 +
 #include <asm/processor.h>
 
 struct exec_domain;
 @@ -1032,6 +1034,9 @@ struct task_struct {
 spinlock_t delays_lock;
 struct task_delay_info *delays;
 #endif
 +#ifdef CONFIG_BEANCOUNTERS
 +	struct task_beancounter	task_bc;
 +#endif
 };
 
 static inline pid_t process_group(struct task_struct *tsk)
 --- ./include/bc/task.h.ve2	2006-08-21 13:26:03.000000000 +0400
 +++ ./include/bc/task.h	2006-08-21 13:35:04.000000000 +0400
 @@ -0,0 +1,52 @@
 +/*
 + *  include/bc/task.h
 + *
 + *  Copyright (C) 2006 OpenVZ. SWsoft Inc
 + *
 + */
 +
 +#ifndef __BC_TASK_H_
 +#define __BC_TASK_H_
 +
 +#include <linux/config.h>
 +
 +struct beancounter;
 +
 +struct task_beancounter {
 +	struct beancounter *exec_bc;
 +	struct beancounter *fork_bc;
 +};
 +
 +#ifdef CONFIG_BEANCOUNTERS
 +
 +#define get_exec_bc()		(current->task_bc.exec_bc)
 +#define set_exec_bc(new)			\
 +	({					\
 +		 struct beancounter *old;	\
 +		 struct task_beancounter *tbc;	\
 +		 tbc = ¤t->task_bc;	\
 +		 old = tbc->exec_bc;		\
 +		 tbc->exec_bc = new;		\
 +		 old;				\
 +	 })
 +#define reset_exec_bc(old, exp)			\
 +	do {					\
 +		struct task_beancounter *tbc;	\
 +		tbc = ¤t->task_bc;	\
 +		BUG_ON(tbc->exec_bc != exp);	\
 +		tbc->exec_bc = old;		\
 +	} while (0)
 +
 +int bc_task_charge(struct task_struct *parent, struct task_struct *new);
 +void bc_task_uncharge(struct task_struct *tsk);
 +
 +#else
 +
 +#define get_exec_bc()		(NULL)
 +#define set_exec_bc(new)	(NULL)
 +#define reset_exec_bc(new, exp)	do { } while (0)
 +#define bc_task_charge(p, t)	(0)
 +#define bc_task_uncharge(p)	do { } while (0)
 +
 +#endif
 +#endif
 --- ./kernel/fork.c.ve2	2006-08-21 13:15:39.000000000 +0400
 +++ ./kernel/fork.c	2006-08-21 13:40:16.000000000 +0400
 @@ -48,6 +48,8 @@
 #include <linux/taskstats_kern.h>
 #include <linux/random.h>
 
 +#include <bc/task.h>
 +
 #include <asm/pgtable.h>
 #include <asm/pgalloc.h>
 #include <asm/uaccess.h>
 @@ -102,12 +104,18 @@ kmem_cache_t *vm_area_cachep;
 /* SLAB cache for mm_struct structures (tsk->mm) */
 static kmem_cache_t *mm_cachep;
 
 -void free_task(struct task_struct *tsk)
 +static void __free_task(struct task_struct *tsk)
 {
 free_thread_info(tsk->thread_info);
 rt_mutex_debug_task_free(tsk);
 free_task_struct(tsk);
 }
 +
 +void free_task(struct task_struct *tsk)
 +{
 +	bc_task_uncharge(tsk);
 +	__free_task(tsk);
 +}
 EXPORT_SYMBOL(free_task);
 
 void __put_task_struct(struct task_struct *tsk)
 @@ -978,6 +986,9 @@ static struct task_struct *copy_process(
 if (!p)
 goto fork_out;
 
 +	if (bc_task_charge(current, p))
 +		goto bad_charge;
 +
 #ifdef CONFIG_TRACE_IRQFLAGS
 DEBUG_LOCKS_WARN_ON(!p->hardirqs_enabled);
 DEBUG_LOCKS_WARN_ON(!p->softirqs_enabled);
 @@ -1291,7 +1302,9 @@ bad_fork_cleanup_count:
 atomic_dec(&p->user->processes);
 free_uid(p->user);
 bad_fork_free:
 -	free_task(p);
 +	bc_task_uncharge(p);
 +bad_charge:
 +	__free_task(p);
 fork_out:
 return ERR_PTR(retval);
 }
 --- ./kernel/irq/handle.c.ve2	2006-07-10 12:39:20.000000000 +0400
 +++ ./kernel/irq/handle.c	2006-08-21 13:42:13.000000000 +0400
 @@ -16,6 +16,9 @@
 #include <linux/interrupt.h>
 #include <linux/kernel_stat.h>
 
 +#include <bc/beancounter.h>
 +#include <bc/task.h>
 +
 #include "internals.h"
 
 /**
 @@ -166,6 +169,9 @@ fastcall unsigned int __do_IRQ(unsigned
 struct irq_desc *desc = irq_desc + irq;
 struct irqaction *action;
 unsigned int status;
 +	struct beancounter *bc;
 +
 +	bc = set_exec_bc(&init_bc);
 
 kstat_this_cpu.irqs[irq]++;
 if (CHECK_IRQ_PER_CPU(desc->status)) {
 @@ -178,6 +184,8 @@ fastcall unsigned int __do_IRQ(unsigned
 desc->chip->ack(irq);
 action_ret = handle_IRQ_event(irq, regs, desc->action);
 desc->chip->end(irq);
 +
 +		reset_exec_bc(bc, &init_bc);
 return 1;
 }
 
 @@ -246,6 +254,7 @@ out:
 desc->chip->end(irq);
 spin_unlock(&desc->lock);
 
 +	reset_exec_bc(bc, &init_bc);
 return 1;
 }
 
 --- ./kernel/softirq.c.ve2	2006-08-21 13:15:39.000000000 +0400
 +++ ./kernel/softirq.c	2006-08-21 13:41:04.000000000 +0400
 @@ -18,6 +18,9 @@
 #include <linux/rcupdate.h>
 #include <linux/smp.h>
 
 +#include <bc/beancounter.h>
 +#include <bc/task.h>
 +
 #include <asm/irq.h>
 /*
 - No shared variables, all the data are CPU local.
 @@ -209,6 +212,9 @@ asmlinkage void __do_softirq(void)
 __u32 pending;
 int max_restart = MAX_SOFTIRQ_RESTART;
 int cpu;
 +	struct beancounter *bc;
 +
 +	bc = set_exec_bc(&init_bc);
 
 pending = local_softirq_pending();
 account_system_vtime(current);
 @@ -247,6 +253,8 @@ restart:
 
 account_system_vtime(current);
 _local_bh_enable();
 +
 +	reset_exec_bc(bc, &init_bc);
 }
 
 #ifndef __ARCH_HAS_DO_SOFTIRQ
 --- ./kernel/bc/Makefile.ve2	2006-08-21 13:26:03.000000000 +0400
 +++ ./kernel/bc/Makefile	2006-08-21 13:26:03.000000000 +0400
 @@ -5,3 +5,4 @@
 #
 
 obj-$(CONFIG_BEANCOUNTERS) += beancounter.o
 +obj-$(CONFIG_BEANCOUNTERS) += misc.o
 --- ./kernel/bc/beancounter.c.ve2	2006-08-21 13:13:11.000000000 +0400
 +++ ./kernel/bc/beancounter.c	2006-08-21 13:13:11.000000000 +0400
 @@ -280,6 +280,9 @@
 spin_lock_init(&bc_hash_lock);
 slot = &bc_hash[bc_hash_fun(bc->bc_id)];
 hlist_add_head(&bc->hash, slot);
 +
 +	current->task_bc.exec_bc = get_beancounter(bc);
 +	current->task_bc.fork_bc = get_beancounter(bc);
 }
 
 void __init bc_init_late(void)
 --- ./kernel/bc/misc.c.ve2	2006-08-21 13:26:03.000000000 +0400
 +++ ./kernel/bc/misc.c	2006-08-21 13:39:07.000000000 +0400
 @@ -0,0 +1,32 @@
 +/*
 + * kernel/bc/misc.c
 + *
 + * Copyright (C) 2006 OpenVZ. SWsoft Inc.
 + *
 + */
 +
 +#include <linux/sched.h>
 +
 +#include <bc/beancounter.h>
 +#include <bc/task.h>
 +
 +int bc_task_charge(struct task_struct *parent, struct task_struct *new)
 +{
 +	struct task_beancounter *old_bc;
 +	struct task_beancounter *new_bc;
 +	struct beancounter *bc;
 +
 +	old_bc = &parent->task_bc;
 +	new_bc = &new->task_bc;
 +
 +	bc = old_bc->fork_bc;
 +	new_bc->exec_bc = get_beancounter(bc);
 +	new_bc->fork_bc = get_beancounter(bc);
 +	return 0;
 +}
 +
 +void bc_task_uncharge(struct task_struct *tsk)
 +{
 +	put_beancounter(tsk->task_bc.exec_bc);
 +	put_beancounter(tsk->task_bc.fork_bc);
 +}
 |  
	|  |  |  
	| 
		
			| [PATCH 4/6] BC: user interface (syscalls) [message #5546 is a reply to message #5542] | Wed, 23 August 2006 11:03   |  
			| 
				
				
					|  dev Messages: 1693
 Registered: September 2005
 Location: Moscow
 | Senior Member |  
 |  |  
	| Add the following system calls for BC management: 1. sys_get_bcid     - get current BC id
 2. sys_set_bcid     - change exec_ and fork_ BCs on current
 3. sys_set_bclimit  - set limits for resources consumtions
 4. sys_get_bcstat   - return br_resource_parm on resource
 
 Signed-off-by: Pavel Emelianov <xemul@sw.ru>
 Signed-off-by: Kirill Korotaev <xemul@sw.ru>
 ---
 
 arch/i386/kernel/syscall_table.S |    4 +
 arch/ia64/kernel/entry.S         |    4 +
 arch/sparc/kernel/entry.S        |    2
 arch/sparc/kernel/systbls.S      |    6 +
 arch/sparc64/kernel/entry.S      |    2
 arch/sparc64/kernel/systbls.S    |   10 ++-
 include/asm-i386/unistd.h        |    6 +
 include/asm-ia64/unistd.h        |    6 +
 include/asm-powerpc/systbl.h     |    4 +
 include/asm-powerpc/unistd.h     |    6 +
 include/asm-sparc/unistd.h       |    4 +
 include/asm-sparc64/unistd.h     |    4 +
 include/asm-x86_64/unistd.h      |   10 ++-
 kernel/sys_ni.c                  |    6 +
 kernel/bc/Makefile               |    1
 kernel/bc/sys.c                  |  120 +++++++++++++++++++++++++++++++++++++++
 16 files changed, 186 insertions(+), 9 deletions(-)
 --- ./arch/i386/kernel/syscall_table.S.ve3	2006-08-21 13:15:37.000000000 +0400
 +++ ./arch/i386/kernel/syscall_table.S	2006-08-21 14:15:47.000000000 +0400
 @@ -318,3 +318,7 @@ ENTRY(sys_call_table)
 .long sys_vmsplice
 .long sys_move_pages
 .long sys_getcpu
 +	.long sys_get_bcid
 +	.long sys_set_bcid		/* 320 */
 +	.long sys_set_bclimit
 +	.long sys_get_bcstat
 --- ./arch/ia64/kernel/entry.S.ve3	2006-08-21 13:15:37.000000000 +0400
 +++ ./arch/ia64/kernel/entry.S	2006-08-21 14:17:07.000000000 +0400
 @@ -1610,5 +1610,9 @@ sys_call_table:
 data8 sys_sync_file_range		// 1300
 data8 sys_tee
 data8 sys_vmsplice
 +	data8 sys_get_bcid
 +	data8 sys_set_bcid
 +	data8 sys_set_bclimit			// 1305
 +	data8 sys_get_bcstat
 
 .org sys_call_table + 8*NR_syscalls	// guard against failures to increase NR_syscalls
 --- ./arch/sparc/kernel/entry.S.ve3	2006-07-10 12:39:10.000000000 +0400
 +++ ./arch/sparc/kernel/entry.S	2006-08-21 14:29:44.000000000 +0400
 @@ -37,7 +37,7 @@
 
 #define curptr      g6
 
 -#define NR_SYSCALLS 300      /* Each OS is different... */
 +#define NR_SYSCALLS 304      /* Each OS is different... */
 
 /* These are just handy. */
 #define _SV	save	%sp, -STACKFRAME_SZ, %sp
 --- ./arch/sparc/kernel/systbls.S.ve3	2006-07-10 12:39:10.000000000 +0400
 +++ ./arch/sparc/kernel/systbls.S	2006-08-21 14:30:43.000000000 +0400
 @@ -78,7 +78,8 @@ sys_call_table:
 /*285*/	.long sys_mkdirat, sys_mknodat, sys_fchownat, sys_futimesat, sys_fstatat64
 /*290*/	.long sys_unlinkat, sys_renameat, sys_linkat, sys_symlinkat, sys_readlinkat
 /*295*/	.long sys_fchmodat, sys_faccessat, sys_pselect6, sys_ppoll, sys_unshare
 -/*300*/	.long sys_set_robust_list, sys_get_robust_list
 +/*300*/	.long sys_set_robust_list, sys_get_robust_list, sys_get_bcid, sys_set_bcid, sys_set_bclimit
 +/*305*/	.long sys_get_bcstat
 
 #ifdef CONFIG_SUNOS_EMUL
 /* Now the SunOS syscall table. */
 @@ -192,4 +193,7 @@ sunos_sys_table:
 .long sunos_nosys, sunos_nosys, sunos_nosys
 .long sunos_nosys, sunos_nosys, sunos_nosys
 
 +	.long sunos_nosys, sunos_nosys, sunos_nosys,
 +	.long sunos_nosys
 +
 #endif
 --- ./arch/sparc64/kernel/entry.S.ve3	2006-07-10 12:39:10.000000000 +0400
 +++ ./arch/sparc64/kernel/entry.S	2006-08-21 14:29:56.000000000 +0400
 @@ -25,7 +25,7 @@
 
 #define curptr      g6
 
 -#define NR_SYSCALLS 300      /* Each OS is different... */
 +#define NR_SYSCALLS 304      /* Each OS is different... */
 
 .text
 .align		32
 --- ./arch/sparc64/kernel/systbls.S.ve3	2006-07-10 12:39:11.000000000 +0400
 +++ ./arch/sparc64/kernel/systbls.S	2006-08-21 14:32:26.000000000 +0400
 @@ -79,7 +79,8 @@ sys_call_table32:
 .word sys_mkdirat, sys_mknodat, sys_fchownat, compat_sys_futimesat, compat_sys_fstatat64
 /*290*/	.word sys_unlinkat, sys_renameat, sys_linkat, sys_symlinkat, sys_readlinkat
 .word sys_fchmodat, sys_faccessat, compat_sys_pselect6, compat_sys_ppoll, sys_unshare
 -/*300*/	.word compat_sys_set_robust_list, compat_sys_get_robust_list
 +/*300*/	.word compat_sys_set_robust_list, compat_sys_get_robust_list, sys_nis_syscall, sys_nis_syscall, sys_nis_syscall
 +	.word sys_nis_syscall
 
 #endif /* CONFIG_COMPAT */
 
 @@ -149,7 +150,9 @@ sys_call_table:
 .word sys_mkdirat, sys_mknodat, sys_fchownat, sys_futimesat, sys_fstatat64
 /*290*/	.word sys_unlinkat, sys_renameat, sys_linkat, sys_symlinkat, sys_readlinkat
 .word sys_fchmodat, sys_faccessat, sys_pselect6, sys_ppoll, sys_unshare
 -/*300*/	.word sys_set_robust_list, sys_get_robust_list
 +/*300*/	.word sys_set_robust_list, sys_get_robust_list, sys_get_bcid, sys_set_bcid, sys_set_bclimit
 +	.word sys_get_bcstat
 +
 
 #if defined(CONFIG_SUNOS_EMUL) || defined(CONFIG_SOLARIS_EMUL) || \
 defined(CONFIG_SOLARIS_EMUL_MODULE)
 @@ -263,4 +266,7 @@ sunos_sys_table:
 .word sunos_nosys, sunos_nosys, sunos_nosys
 .word sunos_nosys, sunos_nosys, sunos_nosys
 .word sunos_nosys, sunos_nosys, sunos_nosys
 +
 +	.word sunos_nosys, sunos_nosys, sunos_nosys
 +	.word sunos_nosys
 #endif
 --- ./include/asm-i386/unistd.h.ve3	2006-08-21 13:15:39.000000000 +0400
 +++ ./include/asm-i386/unistd.h	2006-08-21 14:22:53.000000000 +0400
 @@ -324,10 +324,14 @@
 #define __NR_vmsplice		316
 #define __NR_move_pages		317
 #define __NR_getcpu		318
 +#define __NR_get_bcid		319
 +#define __NR_set_bcid		320
 +#define __NR_set_bclimit	321
 +#define __NR_get_bcstat		322
 
 #ifdef __KERNEL__
 
 -#define NR_syscalls 318
 +#define NR_syscalls 323
 #include <linux/err.h>
 
 /*
 --- ./include/asm-ia64/unistd.h.ve3	2006-07-10 12:39:19.000000000 +0400
 +++ ./include/asm-ia64/unistd.h	2006-08-21 14:24:29.000000000 +0400
 @@ -291,11 +291,15 @@
 #define __NR_sync_file_range		1300
 #define __NR_tee			1301
 #define __NR_vmsplice			1302
 +#define __NR_get_bcid			1303
 +#define __NR_set_bcid			1304
 +#define __NR_set_bclimit		1305
 +#define __NR_get_bcstat			1306
 
 #ifdef __KERNEL__
 
 
 -#define NR_syscalls			279 /* length of syscall table */
 +#define NR_syscalls			283 /* length of syscall table */
 
 #define __ARCH_WANT_SYS_RT_SIGACTION
 
 --- ./include/asm-powerpc/systbl.h.ve3	2006-07-10 12:39:19.000000000 +0400
 +++ ./include/asm-powerpc/systbl.h	2006-08-21 14:28:46.000000000 +0400
 @@ -304,3 +304,7 @@ SYSCALL_SPU(fchmodat)
 SYSCALL_SPU(faccessat)
 COMPAT_SYS_SPU(get_robust_list)
 COMPAT_SYS_SPU(set_robust_list)
 +SYSCALL(sys_get_bcid)
 +SYSCALL(sys_set_bcid)
 +SYSCALL(sys_set_bclimit)
 +SYSCALL(sys_get_bcstat)
 --- ./include/asm-powerpc/unistd.h.ve3	2006-07-10 12:39:19.000000000 +0400
 +++ ./include/asm-powerpc/unistd.h	2006-08-21 14:28:24.000000000 +0400
 @@ -323,10 +323,14 @@
 #define __NR_faccessat		298
 #define __NR_get_robust_list	299
 #define __NR_set_robust_list	300
 +#define __NR_get_bcid		301
 +#define __NR_set_bcid		302
 +#define __NR_set_bclimit	303
 +#define __NR_get_bcstat		304
 
 #ifdef __KERNEL__
 
 -#define __NR_syscalls		301
 +#define __NR_syscalls		305
 
 #define __NR__exit __NR_exit
 #define NR_syscalls	__NR_syscalls
 --- ./include/asm-sparc/unistd.h.ve3	2006-07-10 12:39:19.000000000 +0400
 +++ ./include/asm-sparc/unistd.h	2006-08-21 14:33:20.000000000 +0400
 @@ -318,6 +318,10 @@
 #define __NR_unshare		299
 #define __NR_set_robust_list	300
 #define __NR_get_robust_list	301
 +#define __NR_get_bcid		302
 +#define __NR_set_bcid		303
 +#define __NR_set_bclimit	304
 +#define __NR_get_bcstat		305
 
 #ifdef __KERNEL__
 /* WARNING: You MAY NOT add syscall numbers larger than 301, since
 --- ./include/asm-sparc64/unistd.h.ve3	2006-07-10 12:39:19.000000000 +0400
 +++ ./include/asm-sparc64/unistd.h	2006-08-21 14:34:10.000000000 +0400
 @@ -320,6 +320,10 @@
 #define __NR_unshare		299
 #define __NR_set_robust_list	300
 #define __NR_get_robust_list	301
 +#define __NR_get_bcid		302
 +#define __NR_set_bcid		303
 +#define __NR_set_bclimit	304
 +#define __NR_get_bcstat		305
 
 #ifdef __KERNEL__
 /* WARNING: You MAY NOT add syscall numbers larger than 301, since
 --- ./include/asm-x86_64/unistd.h.ve3	2006-08-21 13:15:39.000000000 +0400
 +++ ./include/asm-x86_64/unistd.h	2006-08-21 14:35:19.000000000 +0400
 @@ -619,10 +619,18 @@ __SYSCALL(__NR_sync_file_range, sys_sync
 __SYSCALL(__NR_vmsplice, sys_vmsplice)
 #define __NR_move_pages		279
 __SYSCALL(__NR_move_pages, sys_move_pages)
 +#define __NR_get_bcid		280
 +__SYSCALL(__NR_get_bcid, sys_get_bcid)
 +#define __NR_set_bcid		281
 +__SYSCALL(__NR_set_bcid, sys_set_bcid)
 +#define __NR_set_bclimit	282
 +__SYSCALL(__NR_set_bclimit, sys_set_bclimit)
 +#define __NR_get_bcstat		283
 +__SYSCALL(__NR_get_bcstat, sys_get_bcstat)
 
 #ifdef __KERNEL__
 
 -#define __NR_syscall_max __NR_move_pages
 +#define __NR_syscall_max __NR_get_bcstat
 #include <linux/err.h>
 
 #ifndef __NO_STUBS
 --- ./kernel/sys_ni.c.ve3	2006-07-10 12:39:20.000000000 +0400
 +++ ./kernel/sys_ni.c	2006-08-21 14:12:49.000000000 +0400
 @@ -134,3 +134,9 @@ cond_syscall(sys_madvise);
 cond_syscall(sys_mremap);
 cond_syscall(sys_remap_file_pages);
 cond_syscall(compat_sys_move_pages);
 +
 +/* user resources syscalls */
 +cond_syscall(sys_set_bcid);
 +cond_syscall(sys_get_bcid);
 +cond_syscall(sys_set_bclimit);
 +cond_syscall(sys_get_bcstat);
 --- ./kernel/bc/Makefile.ve3	2006-08-21 13:49:49.000000000 +0400
 +++ ./kernel/bc/Makefile	2006-08-21 13:55:39.000000000 +0400
 @@ -6,3 +6,4 @@
 
 obj-$(CONFIG_BEANCOUNTERS) += beancounter.o
 obj-$(CONFIG_BEANCOUNTERS) += misc.o
 +obj-$(CONFIG_BEANCOUNTERS) += sys.o
 --- ./kernel/bc/sys.c.ve3	2006-08-21 13:49:49.000000000 +0400
 +++ ./kernel/bc/sys.c	2006-08-21 14:43:04.000000000 +0400
 @@ -0,0 +1,120 @@
 +/*
 + *  kernel/bc/sys.c
 + *
 + *  Copyright (C) 2006 OpenVZ. SWsoft Inc
 + *
 + */
 +
 +#include <linux/config.h>
 +#include <linux/sched.h>
 +#include <asm/uaccess.h>
 +
 +#include <bc/beancounter.h>
 +#include <bc/task.h>
 +
 +asmlinkage long sys_get_bcid(void)
 +{
 +	struct beancounter *bc;
 +
 +	bc = get_exec_bc();
 +	return bc->bc_id;
 +}
 +
 +asmlinkage long sys_set_bcid(uid_t id)
 +{
 +	int error;
 +	struct beancounter *bc;
 +	struct task_beanc
...
 
 
 |  
	|  |  |  
	| 
		
			| [PATCH 5/6] BC: kernel memory accounting (core) [message #5547 is a reply to message #5542] | Wed, 23 August 2006 11:04   |  
			| 
				
				
					|  dev Messages: 1693
 Registered: September 2005
 Location: Moscow
 | Senior Member |  
 |  |  
	| Introduce BC_KMEMSIZE resource which accounts kernel objects allocated by task's request.
 
 Reference to BC is kept on struct page or slab object.
 For slabs each struct slab contains a set of pointers
 corresponding objects are charged to.
 
 Allocation charge rules:
 1. Pages - if allocation is performed with __GFP_BC flag - page
 is charged to current's exec_bc.
 2. Slabs - kmem_cache may be created with SLAB_BC flag - in this
 case each allocation is charged. Caches used by kmalloc are
 created with SLAB_BC | SLAB_BC_NOCHARGE flags. In this case
 only __GFP_BC allocations are charged.
 
 Signed-off-by: Pavel Emelianov <xemul@sw.ru>
 Signed-off-by: Kirill Korotaev <dev@sw.ru>
 
 ---
 
 include/linux/gfp.h      |    8 ++-
 include/linux/mm.h       |    6 ++
 include/linux/slab.h     |    4 +
 include/linux/vmalloc.h  |    1
 include/bc/beancounter.h |    4 +
 include/bc/kmem.h        |   33 ++++++++++++
 kernel/bc/Makefile       |    1
 kernel/bc/beancounter.c  |    3 +
 kernel/bc/kmem.c         |   89 ++++++++++++++++++++++++++++++++++
 mm/mempool.c             |    2
 mm/page_alloc.c          |   11 ++++
 mm/slab.c                |  121 ++++++++++++++++++++++++++++++++++++++---------
 mm/vmalloc.c             |    6 ++
 13 files changed, 264 insertions(+), 25 deletions(-)
 
 --- ./include/linux/gfp.h.bckmem	2006-07-28 18:43:52.000000000 +0400
 +++ ./include/linux/gfp.h	2006-07-31 16:32:22.000000000 +0400
 @@ -46,8 +46,10 @@ struct vm_area_struct;
 #define __GFP_NOMEMALLOC ((__force gfp_t)0x10000u) /* Don't use emergency reserves */
 #define __GFP_HARDWALL   ((__force gfp_t)0x20000u) /* Enforce hardwall cpuset memory allocs */
 #define __GFP_THISNODE	((__force gfp_t)0x40000u)/* No fallback, no policies */
 +#define __GFP_BC	 ((__force gfp_t)0x80000u) /* Charge allocation with BC */
 +#define __GFP_BC_LIMIT ((__force gfp_t)0x100000u) /* Charge against BC limit */
 
 -#define __GFP_BITS_SHIFT 20	/* Room for 20 __GFP_FOO bits */
 +#define __GFP_BITS_SHIFT 21	/* Room for 21 __GFP_FOO bits */
 #define __GFP_BITS_MASK ((__force gfp_t)((1 << __GFP_BITS_SHIFT) - 1))
 
 /* if you forget to add the bitmask here kernel will crash, period */
 @@ -54,7 +56,8 @@ struct vm_area_struct;
 #define GFP_LEVEL_MASK (__GFP_WAIT|__GFP_HIGH|__GFP_IO|__GFP_FS| \
 __GFP_COLD|__GFP_NOWARN|__GFP_REPEAT| \
 __GFP_NOFAIL|__GFP_NORETRY|__GFP_NO_GROW|__GFP_COMP| \
 -			__GFP_NOMEMALLOC|__GFP_HARDWALL|__GFP_THISNODE)
 +			__GFP_NOMEMALLOC|__GFP_HARDWALL|__GFP_THISNODE| \
 +			__GFP_BC|__GFP_BC_LIMIT)
 
 /* This equals 0, but use constants in case they ever change */
 #define GFP_NOWAIT	(GFP_ATOMIC & ~__GFP_HIGH)
 @@ -63,6 +66,7 @@ struct vm_area_struct;
 #define GFP_NOIO	(__GFP_WAIT)
 #define GFP_NOFS	(__GFP_WAIT | __GFP_IO)
 #define GFP_KERNEL	(__GFP_WAIT | __GFP_IO | __GFP_FS)
 +#define GFP_KERNEL_BC	(__GFP_WAIT | __GFP_IO | __GFP_FS | __GFP_BC)
 #define GFP_USER	(__GFP_WAIT | __GFP_IO | __GFP_FS | __GFP_HARDWALL)
 #define GFP_HIGHUSER	(__GFP_WAIT | __GFP_IO | __GFP_FS | __GFP_HARDWALL | \
 __GFP_HIGHMEM)
 --- ./include/linux/mm.h.bckmem	2006-07-28 18:43:52.000000000 +0400
 +++ ./include/linux/mm.h	2006-07-31 17:50:29.000000000 +0400
 @@ -267,8 +267,14 @@ struct page {
 unsigned int gfp_mask;
 unsigned long trace[8];
 #endif
 +#ifdef CONFIG_BEANCOUNTERS
 +	union {
 +		struct beancounter	*page_bc;
 +	} bc;
 +#endif
 };
 
 +#define page_bc(page)			((page)->bc.page_bc)
 #define page_private(page)		((page)->private)
 #define set_page_private(page, v)	((page)->private = (v))
 
 --- ./include/linux/slab.h.bckmem	2006-07-10 12:39:19.000000000 +0400
 +++ ./include/linux/slab.h	2006-07-31 17:02:08.000000000 +0400
 @@ -46,6 +46,8 @@ typedef struct kmem_cache kmem_cache_t;
 #define SLAB_PANIC		0x00040000UL	/* panic if kmem_cache_create() fails */
 #define SLAB_DESTROY_BY_RCU	0x00080000UL	/* defer freeing pages to RCU */
 #define SLAB_MEM_SPREAD		0x00100000UL	/* Spread some memory over cpuset */
 +#define SLAB_BC		0x00200000UL	/* Account with BC */
 +#define SLAB_BC_NOCHARGE	0x00400000UL	/* Explicit accounting */
 
 /* flags passed to a constructor func */
 #define	SLAB_CTOR_CONSTRUCTOR	0x001UL		/* if not set, then deconstructor */
 @@ -265,6 +267,8 @@ extern kmem_cache_t	*bio_cachep;
 
 extern atomic_t slab_reclaim_pages;
 
 +struct beancounter;
 +struct beancounter **kmem_cache_bcp(kmem_cache_t *cachep, void *obj);
 #endif	/* __KERNEL__ */
 
 #endif	/* _LINUX_SLAB_H */
 --- ./include/linux/vmalloc.h.bckmem	2006-07-17 17:01:12.000000000 +0400
 +++ ./include/linux/vmalloc.h	2006-08-01 13:21:59.000000000 +0400
 @@ -36,6 +36,7 @@ struct vm_struct {
 *	Highlevel APIs for driver use
 */
 extern void *vmalloc(unsigned long size);
 +extern void *vmalloc_bc(unsigned long size);
 extern void *vmalloc_user(unsigned long size);
 extern void *vmalloc_node(unsigned long size, int node);
 extern void *vmalloc_exec(unsigned long size);
 --- ./include/bc/beancounter.h.bckmem	2006-07-28 18:43:52.000000000 +0400
 +++ ./include/bc/beancounter.h	2006-08-03 16:03:01.000000000 +0400
 @@ -14,7 +14,9 @@
 *	Resource list.
 */
 
 -#define BC_RESOURCES	0
 +#define BC_KMEMSIZE	0
 +
 +#define BC_RESOURCES	1
 
 struct resource_parm {
 /*
 --- ./include/bc/kmem.h.bckmem	2006-07-28 18:43:52.000000000 +0400
 +++ ./include/bc/kmem.h	2006-07-31 17:37:05.000000000 +0400
 @@ -0,0 +1,33 @@
 +/*
 + *  include/bc/kmem.h
 + *
 + *  Copyright (C) 2006 OpenVZ. SWsoft Inc
 + *
 + */
 +
 +#ifndef __BC_KMEM_H_
 +#define __BC_KMEM_H_
 +
 +#include <linux/config.h>
 +
 +/*
 + * BC_KMEMSIZE accounting
 + */
 +
 +struct mm_struct;
 +struct page;
 +struct beancounter;
 +
 +#ifdef CONFIG_BEANCOUNTERS
 +int  bc_page_charge(struct page *page, int order, gfp_t flags);
 +void bc_page_uncharge(struct page *page, int order);
 +
 +int bc_slab_charge(kmem_cache_t *cachep, void *obj, gfp_t flags);
 +void bc_slab_uncharge(kmem_cache_t *cachep, void *obj);
 +#else
 +#define bc_page_charge(pg, o, mask)	(0)
 +#define bc_page_uncharge(pg, o)		do { } while (0)
 +#define bc_slab_charge(cachep, o, f)	(0)
 +#define bc_slab_uncharge(cachep, o)	do { } while (0)
 +#endif
 +#endif /* __BC_SLAB_H_ */
 --- ./kernel/bc/Makefile.bcsys	2006-07-28 14:08:37.000000000 +0400
 +++ ./kernel/bc/Makefile	2006-08-01 11:08:39.000000000 +0400
 @@ -7,3 +7,4 @@
 obj-$(CONFIG_BEANCOUNTERS) += beancounter.o
 obj-$(CONFIG_BEANCOUNTERS) += misc.o
 obj-$(CONFIG_BEANCOUNTERS) += sys.o
 +obj-$(CONFIG_BEANCOUNTERS) += kmem.o
 --- ./kernel/bc/beancounter.c.bckmem	2006-07-28 18:43:52.000000000 +0400
 +++ ./kernel/bc/beancounter.c	2006-08-03 16:14:17.000000000 +0400
 @@ -19,6 +19,7 @@ static void init_beancounter_struct(stru
 struct beancounter init_bc;
 
 const char *bc_rnames[] = {
 +	"kmemsize",	/* 0 */
 };
 
 #define bc_hash_fun(x) ((((x) >> 8) ^ (x)) & (BC_HASH_SIZE - 1))
 @@ -348,6 +378,8 @@ static void init_beancounter_syslimits(s
 {
 int k;
 
 +	bc->bc_parms[BC_KMEMSIZE].limit = 32 * 1024 * 1024;
 +
 for (k = 0; k < BC_RESOURCES; k++)
 bc->bc_parms[k].barrier = bc->bc_parms[k].limit;
 }
 --- ./kernel/bc/kmem.c.bckmem	2006-07-31 16:32:22.000000000 +0400
 +++ ./kernel/bc/kmem.c	2006-07-31 17:51:27.000000000 +0400
 @@ -0,0 +1,89 @@
 +/*
 + *  kernel/bc/kmem.c
 + *
 + *  Copyright (C) 2006 OpenVZ. SWsoft Inc
 + *
 + */
 +
 +#include <linux/sched.h>
 +#include <linux/gfp.h>
 +#include <linux/slab.h>
 +#include <linux/mm.h>
 +
 +#include <bc/beancounter.h>
 +#include <bc/kmem.h>
 +#include <bc/task.h>
 +
 +/*
 + * Slab accounting
 + */
 +
 +int bc_slab_charge(kmem_cache_t *cachep, void *objp, gfp_t flags)
 +{
 +	unsigned int size;
 +	struct beancounter *bc, **slab_bcp;
 +
 +	bc = get_exec_bc();
 +	if (bc == NULL)
 +		return 0;
 +
 +	size = kmem_cache_size(cachep);
 +	if (bc_charge(bc, BC_KMEMSIZE, size,
 +			(flags & __GFP_BC_LIMIT ? BC_LIMIT : BC_BARRIER)))
 +		return -ENOMEM;
 +
 +	slab_bcp = kmem_cache_bcp(cachep, objp);
 +	*slab_bcp = get_beancounter(bc);
 +	return 0;
 +}
 +
 +void bc_slab_uncharge(kmem_cache_t *cachep, void *objp)
 +{
 +	unsigned int size;
 +	struct beancounter *bc, **slab_bcp;
 +
 +	slab_bcp = kmem_cache_bcp(cachep, objp);
 +	if (*slab_bcp == NULL)
 +		return;
 +
 +	bc = *slab_bcp;
 +	size = kmem_cache_size(cachep);
 +	bc_uncharge(bc, BC_KMEMSIZE, size);
 +	put_beancounter(bc);
 +	*slab_bcp = NULL;
 +}
 +
 +/*
 + * Pages accounting
 + */
 +
 +int bc_page_charge(struct page *page, int order, gfp_t flags)
 +{
 +	struct beancounter *bc;
 +
 +	BUG_ON(page_bc(page) != NULL);
 +
 +	bc = get_exec_bc();
 +	if (bc == NULL)
 +		return 0;
 +
 +	if (bc_charge(bc, BC_KMEMSIZE, PAGE_SIZE << order,
 +			(flags & __GFP_BC_LIMIT ? BC_LIMIT : BC_BARRIER)))
 +		return -ENOMEM;
 +
 +	page_bc(page) = get_beancounter(bc);
 +	return 0;
 +}
 +
 +void bc_page_uncharge(struct page *page, int order)
 +{
 +	struct beancounter *bc;
 +
 +	bc = page_bc(page);
 +	if (bc == NULL)
 +		return;
 +
 +	bc_uncharge(bc, BC_KMEMSIZE, PAGE_SIZE << order);
 +	put_beancounter(bc);
 +	page_bc(page) = NULL;
 +}
 --- ./mm/mempool.c.bckmem	2006-04-21 11:59:36.000000000 +0400
 +++ ./mm/mempool.c	2006-08-01 13:25:26.000000000 +0400
 @@ -119,6 +119,7 @@ int mempool_resize(mempool_t *pool, int
 unsigned long flags;
 
 BUG_ON(new_min_nr <= 0);
 +	gfp_mask &= ~__GFP_BC;
 
 spin_lock_irqsave(&pool->lock, flags);
 if (new_min_nr <= pool->min_nr) {
 @@ -212,6 +213,7 @@ void * mempool_alloc(mempool_t *pool, gf
 gfp_mask |= __GFP_NOMEMALLOC;	/* don't allocate emergency reserves */
 gfp_mask |= __GFP_NORETRY;	/* don't loop in __alloc_pages */
 gfp_mask |= __GFP_NOWARN;	/* failures are OK */
 +	gfp_mask &= ~__GFP_BC;		/* do not charge */
 
 gfp_temp = gfp_mask & ~(__GFP_WAIT|__GFP_IO);
 
 --- ./mm/page_alloc.c.bckmem	2006-07-28 18:43:52.000000000 +0400
 +++ ./mm/page_alloc.c	2006-07-31 19:52:08.000000000 +0400
 @@ -38,6 +38,8 @@
 #include <linux/mempolicy.h>
 #include <linux/stop_machine.h>
 
 +#include <bc/kmem.h>
 +
 #include <asm/tlbflush.h>
 #include <asm/div64.h>
 #include "internal.h"
 @@ -453,6 +455,8 @@ static void __free_pages_ok(struct page
 if (reserved)
 return;
 
 +	bc_page_uncharge(page, order);
 +
 kernel_map_pages(page, 1 << order, 0);
 local_irq_save(flags);
 __count_vm_events(PGFREE, 1 << order);
 @@ -724,6 +728,8 @@ static void fastcall free_hot_cold_page(
 if (free_pages_check(page))
 return;
 
 +	bc_page_uncharge(page, 0);
 +
 kernel_map_pages(page, 1, 0);
 
 pcp = &zone_pcp(zone, get_cpu())->pcp[cold];
 @@ -1056,6 +1062,11 @@ nopage:
 show_mem();
 }
 got_pg:
 +	if ((gfp_mask & __GFP_BC) &&
 +			bc_page_charge(page, order, gfp_mask)) {
 +		__free_pages(page, order);
 +		page = NULL;
 +	}
 #ifdef CONFIG_PAGE_OWNER
 if (page)
 set_page_owner(page, order, gfp_mask);
 --- ./mm/slab.c.bckmem	2006-07-17 17:01:12.000000000 +0400
 +++ ./mm/slab.c	2006-08-01 13:24:06.000000000 +0400
 @@ -109,6 +109,8 @@
 #include	<linux/mutex.h>
 #include	<linux/rtmutex.h>
 
 +#include	<bc/kmem.h>
 +
 #include	<asm/uaccess.h>
 #include	<asm/cacheflush.h>
 #include	<asm/tlbflush.h>
 @@ -176,11 +178,13 @@
 SLAB_CACHE_DMA | \
 SLAB_MUST_HWCACHE_ALIGN | SLAB_STORE_USER | \
 SLAB_RECLAIM_ACCOUNT | SLAB_PANIC | \
 +			 SLAB_BC | SLAB_BC_NOCHARGE | \
 SLAB_DESTROY_BY_RCU | SLAB_MEM_SPREAD)
 #else
 # define CREATE_MASK	(SLAB_HWCACHE_ALIGN | \
 SLAB_CACHE_DMA | SLAB_MUST_HWCACHE_ALIGN | \
 SLAB_RECLAIM_ACCOUNT | SLAB_PANIC | \
 +			 SLAB_BC | SLAB_BC_NOCHARGE | \
 SLAB_DESTROY_BY_RCU | SLAB_MEM_SPREAD)
 #endif
 
 @@ -774,9 +778,33 @@ struct kmem_cache *kmem_find_general_cac
 return __find_general_cachep(size, gfpflags);
 }
 
 -static size_t slab_mgmt_size(size_t nr_objs, size_t align)
 +static size_t slab_mgmt_size_raw(size_t nr_objs)
 {
 -	return ALIGN(sizeof(struct slab)+nr_objs*sizeof(kmem_bufctl_t), align);
 +	return sizeof(struct slab) + nr_objs * sizeof(kmem_bufctl_t);
 +}
 +
 +#ifdef CONFIG_BEANCOUNTERS
 +#define BC_EXTRASIZE	sizeof(struct beancounter *)
 +static inline size_t slab_mgmt_size_noalign(int flags, size_t nr_objs)
 +{
 +	size_t size;
 +
 +	size = slab_mgmt_size_raw(nr_objs);
 +	if (flags & SLAB_BC)
 +		size = ALIGN(size, BC_EXTRASIZE) + nr_objs * BC_EXTRASIZE;
 +	return size;
 +}
 +#else
 +#define BC_EXTRASIZE	0
 +static inline size_t slab_mgmt_size_noalign(int flags, size_t nr_objs)
 +{
 +	return slab_mgmt_size_raw(nr_objs);
 +}
 +#endif
 +
 +static inline size_t slab_mgmt_size(int flags, size_t nr_objs, size_t align)
 +{
 +	return ALIGN(slab_mgmt_size_noalign(flags, nr_objs), align);
 }
 
 /*
 @@ -821,20 +849,21 @@ static void cache_estimate(unsigned long
 * into account.
 */
 nr_objs = (slab_size - sizeof(struct slab)) /
 -			  (buffer_size + sizeof(kmem_bufctl_t));
 +			  (buffer_size + sizeof(kmem_bufctl_t) +
 +			  (flags & SLAB_BC ? BC_EXTRASIZE : 0));
 
 /*
 * This calculated number will be either the right
 * amount, or one greater than what we want.
 */
 -		if (slab_mgmt_size(nr_objs, align) + nr_objs*buffer_size
 +		if (slab_mgmt_size(flags, nr_objs, align) + nr_objs*buffer_size
 > slab_size)
 nr_objs--;
 
 if (nr_objs > SLAB_LIMIT)
 nr_objs = SLAB_LIMIT;
 
 -		mgmt_size = slab_mgmt_size(nr_objs, align);
 +		mgmt_size = slab_mgmt_size(flags, nr_objs, align);
 }
 *num = nr_objs;
 *left_over = slab_size - nr_objs*buffer_size - mgmt_size;
 @@ -1393,7 +1422,8 @@ void __init kmem_cache_init(void)
 sizes[INDEX_AC].cs_cachep = kmem_cache_create(names[INDEX_AC].name,
 sizes[INDEX_AC].cs_size,
 ARCH_KMALLOC_MINALIGN,
 -					ARCH_KMALLOC_FLAGS|SLAB_PANIC,
 +					ARCH_KMALLOC_FLAGS | SLAB_BC |
 +						SLAB_BC_NOCHARGE | SLAB_PANIC,
 NULL, NULL);
 
 if (INDEX_AC != INDEX_L3) {
 @@ -1401,7 +1431,8 @@ void __init kmem_cache_init(void)
 kmem_cache_create(names[INDEX_L3].name,
 sizes[INDEX_L3].cs_size,
 ARCH_KMALLOC_MINALIGN,
 -				ARCH_KMALLOC_FLAGS|SLAB_PANIC,
 +				ARCH_KMALLOC_FLAGS | SLAB_BC |
 +					SLAB_BC_NOCHARGE | SLAB_PANIC,
 NULL, NULL);
 }
 
 @@ -1419,7 +1450,8 @@ void __init kmem_cache_init(void)
 sizes->cs_cachep = kmem_cache_create(names->name,
 sizes->cs_size,
 ARCH_KMALLOC_MINALIGN,
 -					ARCH_KMALLOC_FLAGS|SLAB_PANIC,
 +					ARCH_KMALLOC_FLAGS | SLAB_BC |
 +						SLAB_BC_NOCHARGE | SLAB_PANIC,
 NULL, NULL);
 }
 
 @@ -1869,7 +1901,8 @@ static size_t calculate_slab_order(struc
 * looping condition in cache_grow().
 */
 offslab_limit = size - sizeof(struct slab);
 -			offslab_limit /= sizeof(kmem_bufctl_t);
 +			offslab_limit /= (sizeof(kmem_bufctl_t) +
 +					(flags & SLAB_BC ? BC_EXTRASIZE : 0));
 
 if (num > offslab_limit)
 break;
 @@ -2170,8 +2203,8 @@ kmem_cache_create (const char *name, siz
 cachep = NULL;
 goto oops;
 }
 -	slab_size = ALIGN(cachep->num * sizeof(kmem_bufctl_t)
 -			  + sizeof(struct slab), align);
 +
 +	slab_size = slab_mgmt_size(flags, cachep->num, align);
 
 /*
 * If the slab has been placed off-slab, and we have enough space then
 @@ -2182,11 +2215,9 @@ kmem_cache_create (const char *name, siz
 left_over -= slab_size;
 }
 
 -	if (flags & CFLGS_OFF_SLAB) {
 +	if (flags & CFLGS_OFF_SLAB)
 /* really off slab. No need for manual alignment */
 -		slab_size =
 -		    cachep->num * sizeof(kmem_bufctl_t) + sizeof(struct slab);
 -	}
 +		slab_size = slab_mgmt_size_noalign(flags, cachep->num);
 
 cachep->colour_off = cache_line_size();
 /* Offset must be a multiple of the alignment. */
 @@ -2435,6 +2466,30 @@ int kmem_cache_destroy(struct kmem_cache
 }
 EXPORT_SYMBOL(kmem_cache_destroy);
 
 +static inline kmem_bufctl_t *slab_bufctl(struct slab *slabp)
 +{
 +	return (kmem_bufctl_t *) (slabp + 1);
 +}
 +
 +#ifdef CONFIG_BEANCOUNTERS
 +static inline struct beancounter **slab_bc_ptrs(kmem_cache_t *cachep,
 +		struct slab *slabp)
 +{
 +	return (struct beancounter **) ALIGN((unsigned long)
 +			(slab_bufctl(slabp) + cachep->num), BC_EXTRASIZE);
 +}
 +
 +struct beancounter **kmem_cache_bcp(kmem_cache_t *cachep, void *objp)
 +{
 +	struct slab *slabp;
 +	struct beancounter **bcs;
 +
 +	slabp = virt_to_slab(objp);
 +	bcs = slab_bc_ptrs(cachep, slabp);
 +	return bcs + obj_to_index(cachep, slabp, objp);
 +}
 +#endif
 +
 /*
 * Get the memory for a slab management obj.
 * For a slab cache when the slab descriptor is off-slab, slab descriptors
 @@ -2445,7 +2500,8 @@ static struct slab *alloc_slabmgmt(struc
 if (OFF_SLAB(cachep)) {
 /* Slab management obj is off-slab. */
 slabp = kmem_cache_alloc_node(cachep->slabp_cache,
 -					      local_flags, nodeid);
 +					      local_flags & (~__GFP_BC),
 +					      nodeid);
 if (!slabp)
 return NULL;
 } else {
 @@ -2456,14 +2512,14 @@ static struct slab *alloc_slabmgmt(struc
 slabp->colouroff = colour_off;
 slabp->s_mem = objp + colour_off;
 slabp->nodeid = nodeid;
 +#ifdef CONFIG_BEANCOUNTERS
 +	if (cachep->flags & SLAB_BC)
 +		memset(slab_bc_ptrs(cachep, slabp), 0,
 +				cachep->num * BC_EXTRASIZE);
 +#endif
 return slabp;
 }
 
 -static inline kmem_bufctl_t *slab_bufctl(struct slab *slabp)
 -{
 -	return (kmem_bufctl_t *) (slabp + 1);
 -}
 -
 static void cache_init_objs(struct kmem_cache *cachep,
 struct slab *slabp, unsigned long ctor_flags)
 {
 @@ -2641,7 +2697,7 @@ static int cache_grow(struct kmem_cache
 * Get mem for the objs.  Attempt to allocate a physical page from
 * 'nodeid'.
 */
 -	objp = kmem_getpages(cachep, flags, nodeid);
 +	objp = kmem_getpages(cachep, flags & (~__GFP_BC), nodeid);
 if (!objp)
 goto failed;
 
 @@ -2989,6 +3045,19 @@ static inline void *____cache_alloc(stru
 return objp;
 }
 
 +static inline int bc_should_charge(kmem_cache_t *cachep, gfp_t flags)
 +{
 +#ifdef CONFIG_BEANCOUNTERS
 +	if (!(cachep->flags & SLAB_BC))
 +		return 0;
 +	if (flags & __GFP_BC)
 +		return 1;
 +	if (!(cachep->flags & SLAB_BC_NOCHARGE))
 +		return 1;
 +#endif
 +	return 0;
 +}
 +
 static __always_inline void *__cache_alloc(struct kmem_cache *cachep,
 gfp_t flags, void *caller)
 {
 @@ -3002,6 +3071,12 @@ static __always_inline void *__cache_all
 local_irq_restore(save_flags);
 objp = cache_alloc_debugcheck_after(cachep, flags, objp,
 caller);
 +
 +	if (objp && bc_should_charge(cachep, flags))
 +		if (bc_slab_charge(cachep, objp, flags)) {
 +			kmem_cache_free(cachep, objp);
 +			objp = NULL;
 +		}
 prefetchw(objp);
 return objp;
 }
 @@ -3193,6 +3266,8 @@ static inline void __cache_free(struct k
 struct array_cache *ac = cpu_cache_get(cachep);
 
 check_irq_off();
 +	if (cachep->flags & SLAB_BC)
 +		bc_slab_uncharge(cachep, objp);
 objp = cache_free_debugcheck(cachep, objp, __builtin_return_address(0));
 
 if (cache_free_alien(cachep, objp))
 --- ./mm/vmalloc.c.bckmem	2006-07-17 17:01:12.000000000 +0400
 +++ ./mm/vmalloc.c	2006-08-01 15:27:24.000000000 +0400
 @@ -518,6 +518,12 @@ void *vmalloc(unsigned long size)
 }
 EXPORT_SYMBOL(vmalloc);
 
 +void *vmalloc_bc(unsigned long size)
 +{
 +	return __vmalloc(size, GFP_KERNEL_BC | __GFP_HIGHMEM, PAGE_KERNEL);
 +}
 +EXPORT_SYMBOL(vmalloc_bc);
 +
 /**
 *	vmalloc_user  -  allocate virtually contiguous memory which has
 *			   been zeroed so it can be mapped to userspace without
 |  
	|  |  |  
	| 
		
			| [PATCH 6/6] BC: kernel memory accounting (marks) [message #5548 is a reply to message #5542] | Wed, 23 August 2006 11:05   |  
			| 
				
				
					|  dev Messages: 1693
 Registered: September 2005
 Location: Moscow
 | Senior Member |  
 |  |  
	| Mark some kmem caches with SLAB_BC and some allocations with __GFP_BC to cause charging/limiting of appropriate
 kernel resources.
 
 Signed-off-by: Pavel Emelianov <xemul@sw.ru>
 Signed-off-by: Kirill Korotaev <dev@sw.ru>
 
 ---
 
 arch/i386/kernel/ldt.c           |    4 ++--
 arch/i386/mm/init.c              |    4 ++--
 arch/i386/mm/pgtable.c           |    6 ++++--
 drivers/char/tty_io.c            |   10 +++++-----
 fs/file.c                        |    8 ++++----
 fs/locks.c                       |    2 +-
 fs/namespace.c                   |    3 ++-
 fs/select.c                      |    7 ++++---
 include/asm-i386/thread_info.h   |    4 ++--
 include/asm-ia64/pgalloc.h       |   24 +++++++++++++++++-------
 include/asm-x86_64/pgalloc.h     |   12 ++++++++----
 include/asm-x86_64/thread_info.h |    5 +++--
 ipc/msgutil.c                    |    4 ++--
 ipc/sem.c                        |    7 ++++---
 ipc/util.c                       |    8 ++++----
 kernel/fork.c                    |   15 ++++++++-------
 kernel/posix-timers.c            |    3 ++-
 kernel/signal.c                  |    2 +-
 kernel/user.c                    |    2 +-
 mm/rmap.c                        |    3 ++-
 mm/shmem.c                       |    3 ++-
 21 files changed, 80 insertions(+), 56 deletions(-)
 
 --- ./arch/i386/kernel/ldt.c.bcslabs	2006-04-21 11:59:31.000000000 +0400
 +++ ./arch/i386/kernel/ldt.c	2006-08-01 13:22:30.000000000 +0400
 @@ -39,9 +39,9 @@ static int alloc_ldt(mm_context_t *pc, i
 oldsize = pc->size;
 mincount = (mincount+511)&(~511);
 if (mincount*LDT_ENTRY_SIZE > PAGE_SIZE)
 -		newldt = vmalloc(mincount*LDT_ENTRY_SIZE);
 +		newldt = vmalloc_bc(mincount*LDT_ENTRY_SIZE);
 else
 -		newldt = kmalloc(mincount*LDT_ENTRY_SIZE, GFP_KERNEL);
 +		newldt = kmalloc(mincount*LDT_ENTRY_SIZE, GFP_KERNEL_BC);
 
 if (!newldt)
 return -ENOMEM;
 --- ./arch/i386/mm/init.c.bcslabs	2006-07-10 12:39:10.000000000 +0400
 +++ ./arch/i386/mm/init.c	2006-08-01 13:17:07.000000000 +0400
 @@ -680,7 +680,7 @@ void __init pgtable_cache_init(void)
 pmd_cache = kmem_cache_create("pmd",
 PTRS_PER_PMD*sizeof(pmd_t),
 PTRS_PER_PMD*sizeof(pmd_t),
 -					0,
 +					SLAB_BC,
 pmd_ctor,
 NULL);
 if (!pmd_cache)
 @@ -689,7 +689,7 @@ void __init pgtable_cache_init(void)
 pgd_cache = kmem_cache_create("pgd",
 PTRS_PER_PGD*sizeof(pgd_t),
 PTRS_PER_PGD*sizeof(pgd_t),
 -				0,
 +				SLAB_BC,
 pgd_ctor,
 PTRS_PER_PMD == 1 ? pgd_dtor : NULL);
 if (!pgd_cache)
 --- ./arch/i386/mm/pgtable.c.bcslabs	2006-07-10 12:39:10.000000000 +0400
 +++ ./arch/i386/mm/pgtable.c	2006-08-01 13:27:35.000000000 +0400
 @@ -158,9 +158,11 @@ struct page *pte_alloc_one(struct mm_str
 struct page *pte;
 
 #ifdef CONFIG_HIGHPTE
 -	pte =  alloc_pages(GFP_KERNEL|__GFP_HIGHMEM|__GFP_REPEAT|__GFP_ZERO , 0);
 +	pte =  alloc_pages(GFP_KERNEL|__GFP_HIGHMEM|__GFP_REPEAT|__GFP_ZERO |
 +			__GFP_BC | __GFP_BC_LIMIT, 0);
 #else
 -	pte = alloc_pages(GFP_KERNEL|__GFP_REPEAT|__GFP_ZERO, 0);
 +	pte = alloc_pages(GFP_KERNEL|__GFP_REPEAT|__GFP_ZERO|
 +			__GFP_BC | __GFP_BC_LIMIT, 0);
 #endif
 return pte;
 }
 --- ./drivers/char/tty_io.c.bcslabs	2006-07-10 12:39:11.000000000 +0400
 +++ ./drivers/char/tty_io.c	2006-08-01 15:21:21.000000000 +0400
 @@ -158,7 +158,7 @@ static struct tty_struct *alloc_tty_stru
 
 static struct tty_struct *alloc_tty_struct(void)
 {
 -	return kzalloc(sizeof(struct tty_struct), GFP_KERNEL);
 +	return kzalloc(sizeof(struct tty_struct), GFP_KERNEL_BC);
 }
 
 static void tty_buffer_free_all(struct tty_struct *);
 @@ -1495,7 +1495,7 @@ static int init_dev(struct tty_driver *d
 
 if (!*tp_loc) {
 tp = (struct termios *) kmalloc(sizeof(struct termios),
 -						GFP_KERNEL);
 +						GFP_KERNEL_BC);
 if (!tp)
 goto free_mem_out;
 *tp = driver->init_termios;
 @@ -1503,7 +1503,7 @@ static int init_dev(struct tty_driver *d
 
 if (!*ltp_loc) {
 ltp = (struct termios *) kmalloc(sizeof(struct termios),
 -						 GFP_KERNEL);
 +						 GFP_KERNEL_BC);
 if (!ltp)
 goto free_mem_out;
 memset(ltp, 0, sizeof(struct termios));
 @@ -1528,7 +1528,7 @@ static int init_dev(struct tty_driver *d
 
 if (!*o_tp_loc) {
 o_tp = (struct termios *)
 -				kmalloc(sizeof(struct termios), GFP_KERNEL);
 +				kmalloc(sizeof(struct termios), GFP_KERNEL_BC);
 if (!o_tp)
 goto free_mem_out;
 *o_tp = driver->other->init_termios;
 @@ -1536,7 +1536,7 @@ static int init_dev(struct tty_driver *d
 
 if (!*o_ltp_loc) {
 o_ltp = (struct termios *)
 -				kmalloc(sizeof(struct termios), GFP_KERNEL);
 +				kmalloc(sizeof(struct termios), GFP_KERNEL_BC);
 if (!o_ltp)
 goto free_mem_out;
 memset(o_ltp, 0, sizeof(struct termios));
 --- ./fs/file.c.bcslabs	2006-07-17 17:01:12.000000000 +0400
 +++ ./fs/file.c	2006-08-01 15:18:03.000000000 +0400
 @@ -44,9 +44,9 @@ struct file ** alloc_fd_array(int num)
 int size = num * sizeof(struct file *);
 
 if (size <= PAGE_SIZE)
 -		new_fds = (struct file **) kmalloc(size, GFP_KERNEL);
 +		new_fds = (struct file **) kmalloc(size, GFP_KERNEL_BC);
 else
 -		new_fds = (struct file **) vmalloc(size);
 +		new_fds = (struct file **) vmalloc_bc(size);
 return new_fds;
 }
 
 @@ -213,9 +213,9 @@ fd_set * alloc_fdset(int num)
 int size = num / 8;
 
 if (size <= PAGE_SIZE)
 -		new_fdset = (fd_set *) kmalloc(size, GFP_KERNEL);
 +		new_fdset = (fd_set *) kmalloc(size, GFP_KERNEL_BC);
 else
 -		new_fdset = (fd_set *) vmalloc(size);
 +		new_fdset = (fd_set *) vmalloc_bc(size);
 return new_fdset;
 }
 
 --- ./fs/locks.c.bcslabs	2006-07-10 12:39:16.000000000 +0400
 +++ ./fs/locks.c	2006-08-01 12:46:47.000000000 +0400
 @@ -2226,7 +2226,7 @@ EXPORT_SYMBOL(lock_may_write);
 static int __init filelock_init(void)
 {
 filelock_cache = kmem_cache_create("file_lock_cache",
 -			sizeof(struct file_lock), 0, SLAB_PANIC,
 +			sizeof(struct file_lock), 0, SLAB_PANIC | SLAB_BC,
 init_once, NULL);
 return 0;
 }
 --- ./fs/namespace.c.bcslabs	2006-07-10 12:39:16.000000000 +0400
 +++ ./fs/namespace.c	2006-08-01 12:47:12.000000000 +0400
 @@ -1825,7 +1825,8 @@ void __init mnt_init(unsigned long mempa
 init_rwsem(&namespace_sem);
 
 mnt_cache = kmem_cache_create("mnt_cache", sizeof(struct vfsmount),
 -			0, SLAB_HWCACHE_ALIGN | SLAB_PANIC, NULL, NULL);
 +			0, SLAB_HWCACHE_ALIGN | SLAB_BC | SLAB_PANIC,
 +			NULL, NULL);
 
 mount_hashtable = (struct list_head *)__get_free_page(GFP_ATOMIC);
 
 --- ./fs/select.c.bcslabs	2006-07-10 12:39:17.000000000 +0400
 +++ ./fs/select.c	2006-08-01 15:17:01.000000000 +0400
 @@ -103,7 +103,8 @@ static struct poll_table_entry *poll_get
 if (!table || POLL_TABLE_FULL(table)) {
 struct poll_table_page *new_table;
 
 -		new_table = (struct poll_table_page *) __get_free_page(GFP_KERNEL);
 +		new_table = (struct poll_table_page *)
 +			__get_free_page(GFP_KERNEL_BC);
 if (!new_table) {
 p->error = -ENOMEM;
 __set_current_state(TASK_RUNNING);
 @@ -339,7 +340,7 @@ static int core_sys_select(int n, fd_set
 if (size > sizeof(stack_fds) / 6) {
 /* Not enough space in on-stack array; must use kmalloc */
 ret = -ENOMEM;
 -		bits = kmalloc(6 * size, GFP_KERNEL);
 +		bits = kmalloc(6 * size, GFP_KERNEL_BC);
 if (!bits)
 goto out_nofds;
 }
 @@ -693,7 +694,7 @@ int do_sys_poll(struct pollfd __user *uf
 if (!stack_pp)
 stack_pp = pp = (struct poll_list *)stack_pps;
 else {
 -			pp = kmalloc(size, GFP_KERNEL);
 +			pp = kmalloc(size, GFP_KERNEL_BC);
 if (!pp)
 goto out_fds;
 }
 --- ./include/asm-i386/thread_info.h.bcslabs	2006-07-10 12:39:19.000000000 +0400
 +++ ./include/asm-i386/thread_info.h	2006-08-01 15:19:50.000000000 +0400
 @@ -99,13 +99,13 @@ static inline struct thread_info *curren
 ({							\
 struct thread_info *ret;			\
 \
 -		ret = kmalloc(THREAD_SIZE, GFP_KERNEL);		\
 +		ret = kmalloc(THREAD_SIZE, GFP_KERNEL_BC);	\
 if (ret)					\
 memset(ret, 0, THREAD_SIZE);		\
 ret;						\
 })
 #else
 -#define alloc_thread_info(tsk) kmalloc(THREAD_SIZE, GFP_KERNEL)
 +#define alloc_thread_info(tsk) kmalloc(THREAD_SIZE, GFP_KERNEL_BC)
 #endif
 
 #define free_thread_info(info)	kfree(info)
 --- ./include/asm-ia64/pgalloc.h.bcslabs	2006-07-10 12:39:19.000000000 +0400
 +++ ./include/asm-ia64/pgalloc.h	2006-08-01 13:35:49.000000000 +0400
 @@ -19,6 +19,8 @@
 #include <linux/page-flags.h>
 #include <linux/threads.h>
 
 +#include <bc/kmem.h>
 +
 #include <asm/mmu_context.h>
 
 DECLARE_PER_CPU(unsigned long *, __pgtable_quicklist);
 @@ -37,7 +39,7 @@ static inline long pgtable_quicklist_tot
 return ql_size;
 }
 
 -static inline void *pgtable_quicklist_alloc(void)
 +static inline void *pgtable_quicklist_alloc(int charge)
 {
 unsigned long *ret = NULL;
 
 @@ -45,13 +47,20 @@ static inline void *pgtable_quicklist_al
 
 ret = pgtable_quicklist;
 if (likely(ret != NULL)) {
 +		if (charge && bc_page_charge(virt_to_page(ret),
 +					0, __GFP_BC_LIMIT)) {
 +			ret = NULL;
 +			goto out;
 +		}
 pgtable_quicklist = (unsigned long *)(*ret);
 ret[0] = 0;
 --pgtable_quicklist_size;
 +out:
 preempt_enable();
 } else {
 preempt_enable();
 -		ret = (unsigned long *)__get_free_page(GFP_KERNEL | __GFP_ZERO);
 +		ret = (unsigned long *)__get_free_page(GFP_KERNEL |
 +				__GFP_ZERO | __GFP_BC | __GFP_BC_LIMIT);
 }
 
 return ret;
 @@ -69,6 +78,7 @@ static inline void pgtable_quicklist_fre
 #endif
 
 preempt_disable();
 +	bc_page_uncharge(virt_to_page(pgtable_entry), 0);
 *(unsigned long *)pgtable_entry = (unsigned long)pgtable_quicklist;
 pgtable_quicklist = (unsigned long *)pgtable_entry;
 ++pgtable_quicklist_size;
 @@ -77,7 +87,7 @@ static inline void pgtable_quicklist_fre
 
 static inline pgd_t *pgd_alloc(struct mm_struct *mm)
 {
 -	return pgtable_quicklist_alloc();
 +	return pgtable_quicklist_alloc(1);
 }
 
 static inline void pgd_free(pgd_t * pgd)
 @@ -94,7 +104,7 @@ pgd_populate(struct mm_struct *mm, pgd_t
 
 static inline pud_t *pud_alloc_one(struct mm_struct *mm, unsigned long addr)
 {
 -	return pgtable_quicklist_alloc();
 +	return pgtable
...
 
 
 |  
	|  |  |  
	|  |  
	| 
		
			| Re: [PATCH 2/6] BC: beancounters core (API) [message #5556 is a reply to message #5550] | Wed, 23 August 2006 13:25   |  
			| 
				
				
					|  dev Messages: 1693
 Registered: September 2005
 Location: Moscow
 | Senior Member |  
 |  |  
	| Andi Kleen wrote: > On Wednesday 23 August 2006 13:03, Kirill Korotaev wrote:
 >
 >
 >>+#ifdef CONFIG_BEANCOUNTERS
 >>+extern struct hlist_head bc_hash[];
 >>+extern spinlock_t bc_hash_lock;
 >
 >
 > I wonder who pokes into that hash from other files? Looks a bit dangerous.
 it was kernel/ub/proc.c with proc interface :)
 however, we removed it from this patchset version, but forgot extern's...
 
 will remove
 
 >>+void __put_beancounter(struct beancounter *bc);
 >>+static inline void put_beancounter(struct beancounter *bc)
 >>+{
 >>+	__put_beancounter(bc);
 >>+}
 >
 >
 > The wrapper seems pointless too.
 yep, almost the same reason :)
 
 > The file could use a overview comment what the various counter
 > types actually are.
 you mean comment about what resource parameters we introduce?
 ok, will add it with each resource patch.
 
 >>+	bc_print_id(bc, uid, sizeof(uid));
 >>+	printk(KERN_WARNING "BC %s %s warning: %s "
 >
 >
 > Doesn't this need some rate limiting? Or can it be only triggered
 > by code bugs?
 only due to code bugs.
 
 >>+	bc = &default_beancounter;
 >>+	memset(bc, 0, sizeof(default_beancounter));
 >
 >
 > You don't trust the BSS to be zero? @)
 :))
 
 Kirill
 |  
	|  |  |  
	| 
		
			| Re: [PATCH 4/6] BC: user interface (syscalls) [message #5557 is a reply to message #5546] | Wed, 23 August 2006 13:40   |  
			| 
				
				
					|  dev Messages: 1693
 Registered: September 2005
 Location: Moscow
 | Senior Member |  
 |  |  
	| Sorry, wrong patch was attached. This one fixes fixed but missed SPARC hunks.
 
 --------------------------------------
 
 Add the following system calls for BC management:
 1. sys_get_bcid     - get current BC id
 2. sys_set_bcid     - change exec_ and fork_ BCs on current
 3. sys_set_bclimit  - set limits for resources consumtions
 4. sys_get_bcstat   - return br_resource_parm on resource
 
 Signed-off-by: Pavel Emelianov <xemul@sw.ru>
 Signed-off-by: Kirill Korotaev <dev@sw.ru>
 
 ---
 
 arch/i386/kernel/syscall_table.S |    4 +
 arch/ia64/kernel/entry.S         |    4 +
 arch/sparc/kernel/entry.S        |    2
 arch/sparc/kernel/systbls.S      |    6 +
 arch/sparc64/kernel/entry.S      |    2
 arch/sparc64/kernel/systbls.S    |   10 ++-
 include/asm-i386/unistd.h        |    6 +
 include/asm-ia64/unistd.h        |    6 +
 include/asm-powerpc/systbl.h     |    4 +
 include/asm-powerpc/unistd.h     |    6 +
 include/asm-sparc/unistd.h       |    4 +
 include/asm-sparc64/unistd.h     |    4 +
 include/asm-x86_64/unistd.h      |   10 ++-
 kernel/sys_ni.c                  |    6 +
 kernel/bc/Makefile               |    1
 kernel/bc/sys.c                  |  120 +++++++++++++++++++++++++++++++++++++++
 16 files changed, 186 insertions(+), 9 deletions(-)
 
 --- ./arch/i386/kernel/syscall_table.S.ve3	2006-08-21 13:15:37.000000000 +0400
 +++ ./arch/i386/kernel/syscall_table.S	2006-08-21 14:15:47.000000000 +0400
 @@ -318,3 +318,7 @@ ENTRY(sys_call_table)
 .long sys_vmsplice
 .long sys_move_pages
 .long sys_getcpu
 +	.long sys_get_bcid
 +	.long sys_set_bcid		/* 320 */
 +	.long sys_set_bclimit
 +	.long sys_get_bcstat
 --- ./arch/ia64/kernel/entry.S.ve3	2006-08-21 13:15:37.000000000 +0400
 +++ ./arch/ia64/kernel/entry.S	2006-08-21 14:17:07.000000000 +0400
 @@ -1610,5 +1610,9 @@ sys_call_table:
 data8 sys_sync_file_range		// 1300
 data8 sys_tee
 data8 sys_vmsplice
 +	data8 sys_get_bcid
 +	data8 sys_set_bcid
 +	data8 sys_set_bclimit			// 1305
 +	data8 sys_get_bcstat
 
 .org sys_call_table + 8*NR_syscalls	// guard against failures to increase NR_syscalls
 --- ./arch/sparc/kernel/entry.S.ve3	2006-07-10 12:39:10.000000000 +0400
 +++ ./arch/sparc/kernel/entry.S	2006-08-21 14:29:44.000000000 +0400
 @@ -37,7 +37,7 @@
 
 #define curptr      g6
 
 -#define NR_SYSCALLS 300      /* Each OS is different... */
 +#define NR_SYSCALLS 304      /* Each OS is different... */
 
 /* These are just handy. */
 #define _SV	save	%sp, -STACKFRAME_SZ, %sp
 --- ./arch/sparc/kernel/systbls.S.ve3	2006-07-10 12:39:10.000000000 +0400
 +++ ./arch/sparc/kernel/systbls.S	2006-08-21 14:30:43.000000000 +0400
 @@ -78,7 +78,8 @@ sys_call_table:
 /*285*/	.long sys_mkdirat, sys_mknodat, sys_fchownat, sys_futimesat, sys_fstatat64
 /*290*/	.long sys_unlinkat, sys_renameat, sys_linkat, sys_symlinkat, sys_readlinkat
 /*295*/	.long sys_fchmodat, sys_faccessat, sys_pselect6, sys_ppoll, sys_unshare
 -/*300*/	.long sys_set_robust_list, sys_get_robust_list
 +/*300*/	.long sys_set_robust_list, sys_get_robust_list, sys_get_bcid, sys_set_bcid, sys_set_bclimit
 +/*305*/	.long sys_get_bcstat
 
 #ifdef CONFIG_SUNOS_EMUL
 /* Now the SunOS syscall table. */
 @@ -192,4 +193,7 @@ sunos_sys_table:
 .long sunos_nosys, sunos_nosys, sunos_nosys
 .long sunos_nosys, sunos_nosys, sunos_nosys
 
 +	.long sunos_nosys, sunos_nosys, sunos_nosys,
 +	.long sunos_nosys
 +
 #endif
 --- ./arch/sparc64/kernel/entry.S.ve3	2006-07-10 12:39:10.000000000 +0400
 +++ ./arch/sparc64/kernel/entry.S	2006-08-21 14:29:56.000000000 +0400
 @@ -25,7 +25,7 @@
 
 #define curptr      g6
 
 -#define NR_SYSCALLS 300      /* Each OS is different... */
 +#define NR_SYSCALLS 304      /* Each OS is different... */
 
 .text
 .align		32
 --- ./arch/sparc64/kernel/systbls.S.ve3	2006-07-10 12:39:11.000000000 +0400
 +++ ./arch/sparc64/kernel/systbls.S	2006-08-21 14:32:26.000000000 +0400
 @@ -79,7 +79,8 @@ sys_call_table32:
 .word sys_mkdirat, sys_mknodat, sys_fchownat, compat_sys_futimesat, compat_sys_fstatat64
 /*290*/	.word sys_unlinkat, sys_renameat, sys_linkat, sys_symlinkat, sys_readlinkat
 .word sys_fchmodat, sys_faccessat, compat_sys_pselect6, compat_sys_ppoll, sys_unshare
 -/*300*/	.word compat_sys_set_robust_list, compat_sys_get_robust_list
 +/*300*/	.word compat_sys_set_robust_list, compat_sys_get_robust_list, sys_nis_syscall, sys_nis_syscall, sys_nis_syscall
 +	.word sys_nis_syscall
 
 #endif /* CONFIG_COMPAT */
 
 @@ -149,7 +150,9 @@ sys_call_table:
 .word sys_mkdirat, sys_mknodat, sys_fchownat, sys_futimesat, sys_fstatat64
 /*290*/	.word sys_unlinkat, sys_renameat, sys_linkat, sys_symlinkat, sys_readlinkat
 .word sys_fchmodat, sys_faccessat, sys_pselect6, sys_ppoll, sys_unshare
 -/*300*/	.word sys_set_robust_list, sys_get_robust_list
 +/*300*/	.word sys_set_robust_list, sys_get_robust_list, sys_get_bcid, sys_set_bcid, sys_set_bclimit
 +	.word sys_get_bcstat
 +
 
 #if defined(CONFIG_SUNOS_EMUL) || defined(CONFIG_SOLARIS_EMUL) || \
 defined(CONFIG_SOLARIS_EMUL_MODULE)
 @@ -263,4 +266,7 @@ sunos_sys_table:
 .word sunos_nosys, sunos_nosys, sunos_nosys
 .word sunos_nosys, sunos_nosys, sunos_nosys
 .word sunos_nosys, sunos_nosys, sunos_nosys
 +
 +	.word sunos_nosys, sunos_nosys, sunos_nosys
 +	.word sunos_nosys
 #endif
 --- ./include/asm-i386/unistd.h.ve3	2006-08-21 13:15:39.000000000 +0400
 +++ ./include/asm-i386/unistd.h	2006-08-21 14:22:53.000000000 +0400
 @@ -324,10 +324,14 @@
 #define __NR_vmsplice		316
 #define __NR_move_pages		317
 #define __NR_getcpu		318
 +#define __NR_get_bcid		319
 +#define __NR_set_bcid		320
 +#define __NR_set_bclimit	321
 +#define __NR_get_bcstat		322
 
 #ifdef __KERNEL__
 
 -#define NR_syscalls 318
 +#define NR_syscalls 323
 #include <linux/err.h>
 
 /*
 --- ./include/asm-ia64/unistd.h.ve3	2006-07-10 12:39:19.000000000 +0400
 +++ ./include/asm-ia64/unistd.h	2006-08-21 14:24:29.000000000 +0400
 @@ -291,11 +291,15 @@
 #define __NR_sync_file_range		1300
 #define __NR_tee			1301
 #define __NR_vmsplice			1302
 +#define __NR_get_bcid			1303
 +#define __NR_set_bcid			1304
 +#define __NR_set_bclimit		1305
 +#define __NR_get_bcstat			1306
 
 #ifdef __KERNEL__
 
 
 -#define NR_syscalls			279 /* length of syscall table */
 +#define NR_syscalls			283 /* length of syscall table */
 
 #define __ARCH_WANT_SYS_RT_SIGACTION
 
 --- ./include/asm-powerpc/systbl.h.ve3	2006-07-10 12:39:19.000000000 +0400
 +++ ./include/asm-powerpc/systbl.h	2006-08-21 14:28:46.000000000 +0400
 @@ -304,3 +304,7 @@ SYSCALL_SPU(fchmodat)
 SYSCALL_SPU(faccessat)
 COMPAT_SYS_SPU(get_robust_list)
 COMPAT_SYS_SPU(set_robust_list)
 +SYSCALL(sys_get_bcid)
 +SYSCALL(sys_set_bcid)
 +SYSCALL(sys_set_bclimit)
 +SYSCALL(sys_get_bcstat)
 --- ./include/asm-powerpc/unistd.h.ve3	2006-07-10 12:39:19.000000000 +0400
 +++ ./include/asm-powerpc/unistd.h	2006-08-21 14:28:24.000000000 +0400
 @@ -323,10 +323,14 @@
 #define __NR_faccessat		298
 #define __NR_get_robust_list	299
 #define __NR_set_robust_list	300
 +#define __NR_get_bcid		301
 +#define __NR_set_bcid		302
 +#define __NR_set_bclimit	303
 +#define __NR_get_bcstat		304
 
 #ifdef __KERNEL__
 
 -#define __NR_syscalls		301
 +#define __NR_syscalls		305
 
 #define __NR__exit __NR_exit
 #define NR_syscalls	__NR_syscalls
 --- ./include/asm-sparc/unistd.h.ve3	2006-07-10 12:39:19.000000000 +0400
 +++ ./include/asm-sparc/unistd.h	2006-08-21 14:33:20.000000000 +0400
 @@ -318,6 +318,10 @@
 #define __NR_unshare		299
 #define __NR_set_robust_list	300
 #define __NR_get_robust_list	301
 +#define __NR_get_bcid		302
 +#define __NR_set_bcid		303
 +#define __NR_set_bclimit	304
 +#define __NR_get_bcstat		305
 
 #ifdef __KERNEL__
 /* WARNING: You MAY NOT add syscall numbers larger than 301, since
 --- ./include/asm-sparc64/unistd.h.ve3	2006-07-10 12:39:19.000000000 +0400
 +++ ./include/asm-sparc64/unistd.h	2006-08-21 14:34:10.000000000 +0400
 @@ -320,6 +320,10 @@
 #define __NR_unshare		299
 #define __NR_set_robust_list	300
 #define __NR_get_robust_list	301
 +#define __NR_get_bcid		302
 +#define __NR_set_bcid		303
 +#define __NR_set_bclimit	304
 +#define __NR_get_bcstat		305
 
 #ifdef __KERNEL__
 /* WARNING: You MAY NOT add syscall numbers larger than 301, since
 --- ./include/asm-x86_64/unistd.h.ve3	2006-08-21 13:15:39.000000000 +0400
 +++ ./include/asm-x86_64/unistd.h	2006-08-21 14:35:19.000000000 +0400
 @@ -619,10 +619,18 @@ __SYSCALL(__NR_sync_file_range, sys_sync
 __SYSCALL(__NR_vmsplice, sys_vmsplice)
 #define __NR_move_pages		279
 __SYSCALL(__NR_move_pages, sys_move_pages)
 +#define __NR_get_bcid		280
 +__SYSCALL(__NR_get_bcid, sys_get_bcid)
 +#define __NR_set_bcid		281
 +__SYSCALL(__NR_set_bcid, sys_set_bcid)
 +#define __NR_set_bclimit	282
 +__SYSCALL(__NR_set_bclimit, sys_set_bclimit)
 +#define __NR_get_bcstat		283
 +__SYSCALL(__NR_get_bcstat, sys_get_bcstat)
 
 #ifdef __KERNEL__
 
 -#define __NR_syscall_max __NR_move_pages
 +#define __NR_syscall_max __NR_get_bcstat
 #include <linux/err.h>
 
 #ifndef __NO_STUBS
 --- ./kernel/sys_ni.c.ve3	2006-07-10 12:39:20.000000000 +0400
 +++ ./kernel/sys_ni.c	2006-08-21 14:12:49.000000000 +0400
 @@ -134,3 +134,9 @@ cond_syscall(sys_madvise);
 cond_syscall(sys_mremap);
 cond_syscall(sys_remap_file_pages);
 cond_syscall(compat_sys_move_pages);
 +
 +/* user resources syscalls */
 +cond_syscall(sys_set_bcid);
 +cond_syscall(sys_get_bcid);
 +cond_syscall(sys_set_bclimit);
 +cond_syscall(sys_get_bcstat);
 --- ./kernel/bc/Makefile.ve3	2006-08-21 13:49:49.000000000 +0400
 +++ ./kernel/bc/Makefile	2006-08-21 13:55:39.000000000 +0400
 @@ -6,3 +6,4 @@
 
 obj-$(CONFIG_BEANCOUNTERS) += beancounter.o
 obj-$(CONFIG_BEANCOUNTERS) += misc.o
 +obj-$(CONFIG_BEANCOUNTERS) += sys.o
 --- ./kernel/bc/sys.c.ve3	2006-08-21 13:49:49.000000000 +0400
 +++ ./kernel/bc/sys.c	2006-08-21 14:43:04.000000000 +0400
 @@ -0,0 +1,120 @@
 +/*
 + *  kernel/bc/sys.c
 + *
 + *  Copyright (C) 2006 OpenVZ. SWsoft Inc
 + *
 + */
 +
 +#include <linux/config.h>
 +#include <linux/sched.h>
 +#include <asm/uaccess.h>
 +
 +#include <bc/beancounter.h>
 +#include <bc/task.h>
 +
 +asmlinkage long sys_get_bcid(void)
 +{
 +	struct beancounter *bc;
 +
 +	bc = get_exec_bc();
 +	return b
...
 
 
 |  
	|  |  |  
	| 
		
			| Re: [PATCH 2/6] BC: beancounters core (API) [message #5558 is a reply to message #5544] | Wed, 23 August 2006 13:46   |  
			| 
				
				
					|  dev Messages: 1693
 Registered: September 2005
 Location: Moscow
 | Senior Member |  
 |  |  
	| Alexey Dobriyan wrote: > On Wed, Aug 23, 2006 at 03:03:07PM +0400, Kirill Korotaev wrote:
 >
 >
 >>--- /dev/null
 >>+++ ./include/bc/beancounter.h
 >
 >
 >>+#define BC_RESOURCES	0
 >
 >
 > Do you want userspace to see it?
 yep.
 
 >>+struct bc_resource_parm {
 >>+	unsigned long barrier;	/* A barrier over which resource allocations
 >>+				 * are failed gracefully. e.g. if the amount
 >>+				 * of consumed memory is over the barrier
 >>+				 * further sbrk() or mmap() calls fail, the
 >>+				 * existing processes are not killed.
 >>+				 */
 >>+	unsigned long limit;	/* hard resource limit */
 >>+	unsigned long held;	/* consumed resources */
 >>+	unsigned long maxheld;	/* maximum amount of consumed resources */
 >>+	unsigned long minheld;	/* minumum amount of consumed resources */
 >
 >
 > Stupid question: when minimum amount is useful?
 to monitor usage statistics (range of used resources).
 this value will be usefull when ubstat will be added.
 this field probably would be more logical to add later,
 but since it is part of user space interface it is left here
 for not changing API later.
 
 >>+/*
 >>+ * Kernel internal part.
 >>+ */
 >
 >
 > Redundant comment, YMMV.
 >
 >
 >>+#ifdef __KERNEL__
 >>+
 >>+#include <linux/config.h>
 >
 >
 > config.h is unneeded. You can drop it from everywhere.
 ok.
 
 >>+struct beancounter
 >>+{
 >
 >
 > nit:
 > 	"struct beancounter {"
 >
 >
 >>+	atomic_t		bc_refcount;
 >>+	spinlock_t		bc_lock;
 >>+	uid_t			bc_id;
 >>+	struct hlist_node	hash;
 >>+
 >>+	/* resources statistics and settings */
 >>+	struct bc_resource_parm	bc_parms[BC_RESOURCES];
 >>+};
 >>+
 >>+enum severity { BC_BARRIER, BC_LIMIT, BC_FORCE };
 >
 >
 > bc_severity?
 ok
 
 >
 >
 >>--- /dev/null	2006-07-18 14:52:43.075228448 +0400
 >>+++ ./kernel/bc/beancounter.c	2006-08-21 13:13:11.000000000 +0400
 >
 >
 >>+#define bc_hash_fun(x) ((((x) >> 8) ^ (x)) & (BC_HASH_SIZE - 1))
 >>+
 >>+struct hlist_head bc_hash[BC_HASH_SIZE];
 >>+spinlock_t bc_hash_lock;
 >>+
 >>+EXPORT_SYMBOL(bc_hash);
 >>+EXPORT_SYMBOL(bc_hash_lock);
 >
 >
 > tasklist_lock was unexported recently and this looks equally low-level.
 > I couldn't find place in patchbomb where you use it in modular fashion.
 > perhaps, i need more sleep.
 I answered to Andi that it is left from prev patch set which used it in proc.c
 will be removed
 thanks for noting this!!!
 
 >>+void __put_beancounter(struct beancounter *bc)
 >>+{
 >>+	unsigned long flags;
 >>+
 >>+	/* equivalent to atomic_dec_and_lock_irqsave() */
 >>+	local_irq_save(flags);
 >>+	if (likely(!atomic_dec_and_lock(&bc->bc_refcount, &bc_hash_lock))) {
 >>+		local_irq_restore(flags);
 >>+		if (unlikely(atomic_read(&bc->bc_refcount) < 0))
 >>+			printk(KERN_ERR "BC: Bad refcount: bc=%p, "
 >>+					"luid=%d, ref=%d\n",
 >>+					bc, bc->bc_id,
 >>+					atomic_read(&bc->bc_refcount));
 >
 >
 > Should this BUG_ON() ?
 BUG_ON doesn't print much information :)
 ok, will replace
 
 >>+		return;
 >>+	}
 >>+
 >>+	BUG_ON(bc == &init_bc);
 >>+	verify_held(bc);
 >>+	hlist_del(&bc->hash);
 >>+	spin_unlock_irqrestore(&bc_hash_lock, flags);
 >>+	kmem_cache_free(bc_cachep, bc);
 >>+}
 >>+
 >>+EXPORT_SYMBOL(__put_beancounter);
 >
 >
 >>+int bc_charge_locked(struct beancounter *bc, int resource, unsigned long
 >>val,
 >>+		enum severity strict)
 >>+{
 >>+	/*
 >>+	 * bc_value <= BC_MAXVALUE, value <= BC_MAXVALUE, and only one
 >>addition
 >>+	 * at the moment is possible so an overflow is impossible.
 >>+	 */
 >>+	bc->bc_parms[resource].held += val;
 >>+
 >>+	switch (strict) {
 >>+		case BC_BARRIER:
 >
 >
 > nit: one tab less
 >
 > -
 > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
 > the body of a message to majordomo@vger.kernel.org
 > More majordomo info at  http://vger.kernel.org/majordomo-info.html
 > Please read the FAQ at  http://www.tux.org/lkml/
 >
 |  
	|  |  |  
	|  |  
	|  |  
	| 
		
			| Re: [PATCH 4/6] BC: user interface (syscalls) [message #5561 is a reply to message #5546] | Wed, 23 August 2006 13:41   |  
			| 
				
				
					|  Alexey Dobriyan Messages: 195
 Registered: August 2006
 | Senior Member |  |  |  
	| On Wed, Aug 23, 2006 at 03:06:05PM +0400, Kirill Korotaev wrote: > --- ./kernel/bc/sys.c.ve3
 > +++ ./kernel/bc/sys.c
 
 > +asmlinkage long sys_set_bclimit(uid_t id, unsigned long resource,
 > +		unsigned long *limits)
 
 unsigned long __user *limits
 
 > +{
 > +	int error;
 > +	unsigned long flags;
 > +	struct beancounter *bc;
 > +	unsigned long new_limits[2];
 > +
 > +	error = -EPERM;
 > +	if(!capable(CAP_SYS_RESOURCE))
 > +		goto out;
 > +
 > +	error = -EINVAL;
 > +	if (resource >= BC_RESOURCES)
 > +		goto out;
 > +
 > +	error = -EFAULT;
 > +	if (copy_from_user(&new_limits, limits, sizeof(new_limits)))
 > +		goto out;
 
 > +int sys_get_bcstat(uid_t id, unsigned long resource,
 > +		struct bc_resource_parm *uparm)
 
 __user *uparm
 > +{
 > +	int error;
 > +	unsigned long flags;
 > +	struct beancounter *bc;
 > +	struct bc_resource_parm parm;
 > +
 > +	error = -EINVAL;
 > +	if (resource > BC_RESOURCES)
 > +		goto out;
 > +
 > +	error = -ENOENT;
 > +	bc = beancounter_findcreate(id, 0);
 > +	if (bc == NULL)
 > +		goto out;
 > +
 > +	spin_lock_irqsave(&bc->bc_lock, flags);
 > +	parm = bc->bc_parms[resource];
 > +	spin_unlock_irqrestore(&bc->bc_lock, flags);
 > +	put_beancounter(bc);
 > +
 > +	error = 0;
 > +	if (copy_to_user(uparm, &parm, sizeof(parm)))
 > +		error = -EFAULT;
 |  
	|  |  |  
	|  |  
	| 
		
			| Re: [PATCH 2/6] BC: beancounters core (API) [message #5566 is a reply to message #5544] | Wed, 23 August 2006 13:30   |  
			| 
				
				
					|  Alexey Dobriyan Messages: 195
 Registered: August 2006
 | Senior Member |  |  |  
	| On Wed, Aug 23, 2006 at 03:03:07PM +0400, Kirill Korotaev wrote: 
 > --- /dev/null
 > +++ ./include/bc/beancounter.h
 
 > +#define BC_RESOURCES	0
 
 Do you want userspace to see it?
 
 > +struct bc_resource_parm {
 > +	unsigned long barrier;	/* A barrier over which resource allocations
 > +				 * are failed gracefully. e.g. if the amount
 > +				 * of consumed memory is over the barrier
 > +				 * further sbrk() or mmap() calls fail, the
 > +				 * existing processes are not killed.
 > +				 */
 > +	unsigned long limit;	/* hard resource limit */
 > +	unsigned long held;	/* consumed resources */
 > +	unsigned long maxheld;	/* maximum amount of consumed resources */
 > +	unsigned long minheld;	/* minumum amount of consumed resources */
 
 Stupid question: when minimum amount is useful?
 
 > +/*
 > + * Kernel internal part.
 > + */
 
 Redundant comment, YMMV.
 
 > +#ifdef __KERNEL__
 > +
 > +#include <linux/config.h>
 
 config.h is unneeded. You can drop it from everywhere.
 
 > +struct beancounter
 > +{
 
 nit:
 "struct beancounter {"
 
 > +	atomic_t		bc_refcount;
 > +	spinlock_t		bc_lock;
 > +	uid_t			bc_id;
 > +	struct hlist_node	hash;
 > +
 > +	/* resources statistics and settings */
 > +	struct bc_resource_parm	bc_parms[BC_RESOURCES];
 > +};
 > +
 > +enum severity { BC_BARRIER, BC_LIMIT, BC_FORCE };
 
 bc_severity?
 
 > --- /dev/null	2006-07-18 14:52:43.075228448 +0400
 > +++ ./kernel/bc/beancounter.c	2006-08-21 13:13:11.000000000 +0400
 
 > +#define bc_hash_fun(x) ((((x) >> 8) ^ (x)) & (BC_HASH_SIZE - 1))
 > +
 > +struct hlist_head bc_hash[BC_HASH_SIZE];
 > +spinlock_t bc_hash_lock;
 > +
 > +EXPORT_SYMBOL(bc_hash);
 > +EXPORT_SYMBOL(bc_hash_lock);
 
 tasklist_lock was unexported recently and this looks equally low-level.
 I couldn't find place in patchbomb where you use it in modular fashion.
 perhaps, i need more sleep.
 
 > +void __put_beancounter(struct beancounter *bc)
 > +{
 > +	unsigned long flags;
 > +
 > +	/* equivalent to atomic_dec_and_lock_irqsave() */
 > +	local_irq_save(flags);
 > +	if (likely(!atomic_dec_and_lock(&bc->bc_refcount, &bc_hash_lock))) {
 > +		local_irq_restore(flags);
 > +		if (unlikely(atomic_read(&bc->bc_refcount) < 0))
 > +			printk(KERN_ERR "BC: Bad refcount: bc=%p, "
 > +					"luid=%d, ref=%d\n",
 > +					bc, bc->bc_id,
 > +					atomic_read(&bc->bc_refcount));
 
 Should this BUG_ON() ?
 
 > +		return;
 > +	}
 > +
 > +	BUG_ON(bc == &init_bc);
 > +	verify_held(bc);
 > +	hlist_del(&bc->hash);
 > +	spin_unlock_irqrestore(&bc_hash_lock, flags);
 > +	kmem_cache_free(bc_cachep, bc);
 > +}
 > +
 > +EXPORT_SYMBOL(__put_beancounter);
 
 > +int bc_charge_locked(struct beancounter *bc, int resource, unsigned long
 > val,
 > +		enum severity strict)
 > +{
 > +	/*
 > +	 * bc_value <= BC_MAXVALUE, value <= BC_MAXVALUE, and only one
 > addition
 > +	 * at the moment is possible so an overflow is impossible.
 > +	 */
 > +	bc->bc_parms[resource].held += val;
 > +
 > +	switch (strict) {
 > +		case BC_BARRIER:
 
 nit: one tab less
 |  
	|  |  |  
	| 
		
			| Re: [PATCH 2/6] BC: beancounters core (API) [message #5568 is a reply to message #5544] | Wed, 23 August 2006 16:42   |  
			| 
				
				
					|  Andrew Morton Messages: 127
 Registered: December 2005
 | Senior Member |  |  |  
	| On Wed, 23 Aug 2006 15:03:07 +0400 Kirill Korotaev <dev@sw.ru> wrote:
 
 > Core functionality and interfaces of BC:
 > find/create beancounter, initialization,
 > charge/uncharge of resource, core objects' declarations.
 >
 > Basic structures:
 >   bc_resource_parm - resource description
 >   beancounter      - set of resources, id, lock
 >
 >
 > ..
 >
 > +enum severity { BC_BARRIER, BC_LIMIT, BC_FORCE };
 
 That's a bit generic-sounding.  Make it bc_severity?
 
 > +static inline void bc_adjust_held_minmax(struct beancounter *bc,
 > +		int resource)
 > +{
 > +	if (bc->bc_parms[resource].maxheld < bc->bc_parms[resource].held)
 > +		bc->bc_parms[resource].maxheld = bc->bc_parms[resource].held;
 > +	if (bc->bc_parms[resource].minheld > bc->bc_parms[resource].held)
 > +		bc->bc_parms[resource].minheld = bc->bc_parms[resource].held;
 > +}
 
 That might be a bit big to inline.
 
 Suggest you check that the compiler successfully CSE's the
 `bc->bc_parms[resource]' evaulation.  If not, create a temporary.
 
 > +#define beancounter_findcreate(id, f)			(NULL)
 > +#define get_beancounter(bc)				(NULL)
 > +#define put_beancounter(bc)				do { } while (0)
 > +#define bc_charge_locked(bc, r, v, s)			(0)
 > +#define bc_charge(bc, r, v)				(0)
 
 akpm:/home/akpm> cat t.c
 void foo(void)
 {
 (0);
 }
 akpm:/home/akpm> gcc -c -Wall t.c
 t.c: In function 'foo':
 t.c:4: warning: statement with no effect
 
 > +#define bc_hash_fun(x) ((((x) >> 8) ^ (x)) & (BC_HASH_SIZE - 1))
 
 Use hash_long()?
 
 > +/*
 > + *	Per resource beancounting. Resources are tied to their luid.
 > + *	The resource structure itself is tagged both to the process and
 > + *	the charging resources (a socket doesn't want to have to search for
 > + *	things at irq time for example). Reference counters keep things in
 > + *	hand.
 > + *
 > + *	The case where a user creates resource, kills all his processes and
 > + *	then starts new ones is correctly handled this way. The refcounters
 > + *	will mean the old entry is still around with resource tied to it.
 > + */
 > +
 > +struct beancounter *beancounter_findcreate(uid_t uid, int mask)
 > +{
 > +	struct beancounter *new_bc, *bc;
 > +	unsigned long flags;
 > +	struct hlist_head *slot;
 > +	struct hlist_node *pos;
 > +
 > +	slot = &bc_hash[bc_hash_fun(uid)];
 > +	new_bc = NULL;
 > +
 > +retry:
 > +	spin_lock_irqsave(&bc_hash_lock, flags);
 > +	hlist_for_each_entry (bc, pos, slot, hash)
 > +		if (bc->bc_id == uid)
 > +			break;
 > +
 > +	if (pos != NULL) {
 > +		get_beancounter(bc);
 > +		spin_unlock_irqrestore(&bc_hash_lock, flags);
 > +
 > +		if (new_bc != NULL)
 > +			kmem_cache_free(bc_cachep, new_bc);
 > +		return bc;
 > +	}
 > +
 > +	if (!(mask & BC_ALLOC))
 > +		goto out_unlock;
 > +
 > +	if (new_bc != NULL)
 > +		goto out_install;
 > +
 > +	spin_unlock_irqrestore(&bc_hash_lock, flags);
 > +
 > +	new_bc = kmem_cache_alloc(bc_cachep,
 > +			mask & BC_ALLOC_ATOMIC ? GFP_ATOMIC : GFP_KERNEL);
 > +	if (new_bc == NULL)
 > +		goto out;
 > +
 > +	memcpy(new_bc, &default_beancounter, sizeof(*new_bc));
 > +	init_beancounter_struct(new_bc, uid);
 > +	goto retry;
 > +
 > +out_install:
 > +	hlist_add_head(&new_bc->hash, slot);
 > +out_unlock:
 > +	spin_unlock_irqrestore(&bc_hash_lock, flags);
 > +out:
 > +	return new_bc;
 > +}
 
 Can remove the global bc_hash_lock and make the locking per-hash-bucket.
 
 > +static inline void verify_held(struct beancounter *bc)
 > +{
 > +	int i;
 > +
 > +	for (i = 0; i < BC_RESOURCES; i++)
 > +		if (bc->bc_parms[i].held != 0)
 > +			bc_print_resource_warning(bc, i,
 > +					"resource is held on put", 0, 0);
 > +}
 > +
 > +void __put_beancounter(struct beancounter *bc)
 > +{
 > +	unsigned long flags;
 > +
 > +	/* equivalent to atomic_dec_and_lock_irqsave() */
 > +	local_irq_save(flags);
 > +	if (likely(!atomic_dec_and_lock(&bc->bc_refcount, &bc_hash_lock))) {
 > +		local_irq_restore(flags);
 > +		if (unlikely(atomic_read(&bc->bc_refcount) < 0))
 > +			printk(KERN_ERR "BC: Bad refcount: bc=%p, "
 > +					"luid=%d, ref=%d\n",
 > +					bc, bc->bc_id,
 > +					atomic_read(&bc->bc_refcount));
 > +		return;
 > +	}
 > +
 > +	BUG_ON(bc == &init_bc);
 > +	verify_held(bc);
 > +	hlist_del(&bc->hash);
 > +	spin_unlock_irqrestore(&bc_hash_lock, flags);
 > +	kmem_cache_free(bc_cachep, bc);
 > +}
 
 I wonder if it's safe and worthwhile to optimise away the local_irq_save():
 
 if (atomic_dec_and_test(&bc->bc_refcount)) {
 spin_lock_irqsave(&bc_hash_lock, flags);
 if (atomic_read(&bc->bc_refcount) == 0) {
 free it
 |  
	|  |  |  
	|  |  
	| 
		
			| Re: [PATCH] BC: resource beancounters (v2) [message #5570 is a reply to message #5542] | Wed, 23 August 2006 17:05   |  
			| 
				
				
					|  Andrew Morton Messages: 127
 Registered: December 2005
 | Senior Member |  |  |  
	| On Wed, 23 Aug 2006 14:46:19 +0400 Kirill Korotaev <dev@sw.ru> wrote:
 
 > The following patch set presents base of
 > Resource Beancounters (BC).
 > BC allows to account and control consumption
 > of kernel resources used by group of processes.
 >
 > Draft UBC description on OpenVZ wiki can be found at
 > http://wiki.openvz.org/UBC_parameters
 >
 > The full BC patch set allows to control:
 > - kernel memory. All the kernel objects allocatable
 >  on user demand should be accounted and limited
 >  for DoS protection.
 >  E.g. page tables, task structs, vmas etc.
 >
 > - virtual memory pages. BCs allow to
 >  limit a container to some amount of memory and
 >  introduces 2-level OOM killer taking into account
 >  container's consumption.
 >  pages shared between containers are correctly
 >  charged as fractions (tunable).
 >
 > - network buffers. These includes TCP/IP rcv/snd
 >  buffers, dgram snd buffers, unix, netlinks and
 >  other buffers.
 >
 > - minor resources accounted/limited by number:
 >  tasks, files, flocks, ptys, siginfo, pinned dcache
 >  mem, sockets, iptentries (for containers with
 >  virtualized networking)
 >
 > As the first step we want to propose for discussion
 > the most complicated parts of resource management:
 > kernel memory and virtual memory.
 
 The patches look reasonable to me - mergeable after updating them for
 today's batch of review commentlets.
 
 I have two high-level problems though.
 
 a) I don't yet have a sense of whether this implementation
 is appropriate/sufficient for the various other
 applications which people are working on.
 
 If the general shape is OK and we think this
 implementation can be grown into one which everyone can
 use then fine.
 
 And...
 
 > The patch set to be sent provides core for BC and
 > management of kernel memory only. Virtual memory
 > management will be sent in a couple of days.
 
 We need to go over this work before we can commit to the BC
 core.  Last time I looked at the VM accounting patch it
 seemed rather unpleasing from a maintainability POV.
 
 And, if I understand it correctly, the only response to a job
 going over its VM limits is to kill it, rather than trimming
 it.  Which sounds like a big problem?
 |  
	|  |  |  
	|  |  
	|  |  
	|  |  
	| 
		
			| Re:  [PATCH 1/6] BC: kconfig [message #5579 is a reply to message #5543] | Wed, 23 August 2006 22:04   |  
			| 
				
				
					|  Dave Hansen Messages: 240
 Registered: October 2005
 | Senior Member |  |  |  
	| On Wed, 2006-08-23 at 15:01 +0400, Kirill Korotaev wrote: > --- ./arch/sparc64/Kconfig.arkcfg	2006-07-17 17:01:11.000000000 +0400
 > +++ ./arch/sparc64/Kconfig	2006-08-10 17:56:36.000000000 +0400
 > @@ -432,3 +432,5 @@ source "security/Kconfig"
 >  source "crypto/Kconfig"
 >
 >  source "lib/Kconfig"
 > +
 > +source "kernel/bc/Kconfig"
 ...
 > --- ./arch/sparc64/Kconfig.arkcfg	2006-07-17 17:01:11.000000000 +0400
 > +++ ./arch/sparc64/Kconfig	2006-08-10 17:56:36.000000000 +0400
 > @@ -432,3 +432,5 @@ source "security/Kconfig"
 >  source "crypto/Kconfig"
 >
 >  source "lib/Kconfig"
 > +
 > +source "kernel/bc/Kconfig"
 
 Is it just me, or do these patches look a little funky?  Looks like it
 is trying to patch the same thing into the same file, twice.  Also, the
 patches look to be -p0 instead of -p1.
 
 I'm having a few problems applying them.
 
 -- Dave
 |  
	|  |  |  
	| 
		
			| Re:  [PATCH 1/6] BC: kconfig [message #5580 is a reply to message #5543] | Wed, 23 August 2006 22:29   |  
			| 
				
				
					|  rdunlap Messages: 11
 Registered: May 2006
 | Junior Member |  |  |  
	| On Wed, 23 Aug 2006 15:13:42 -0700 Matt Helsley wrote: 
 > On Wed, 2006-08-23 at 15:04 -0700, Dave Hansen wrote:
 > > On Wed, 2006-08-23 at 15:01 +0400, Kirill Korotaev wrote:
 > > > --- ./arch/sparc64/Kconfig.arkcfg	2006-07-17 17:01:11.000000000 +0400
 > > > +++ ./arch/sparc64/Kconfig	2006-08-10 17:56:36.000000000 +0400
 > > > @@ -432,3 +432,5 @@ source "security/Kconfig"
 > > >  source "crypto/Kconfig"
 > > >
 > > >  source "lib/Kconfig"
 > > > +
 > > > +source "kernel/bc/Kconfig"
 > > ...
 > > > --- ./arch/sparc64/Kconfig.arkcfg	2006-07-17 17:01:11.000000000 +0400
 > > > +++ ./arch/sparc64/Kconfig	2006-08-10 17:56:36.000000000 +0400
 > > > @@ -432,3 +432,5 @@ source "security/Kconfig"
 > > >  source "crypto/Kconfig"
 > > >
 > > >  source "lib/Kconfig"
 > > > +
 > > > +source "kernel/bc/Kconfig"
 > >
 > > Is it just me, or do these patches look a little funky?  Looks like it
 > > is trying to patch the same thing into the same file, twice.  Also, the
 > > patches look to be -p0 instead of -p1.
 >
 > They do appear to be -p0
 >
 > 	They aren't adding the same thing twice to the same file. This patch
 > makes different arches source the same Kconfig.
 
 Look again.  There are 2 diffstat blocks and 2 of these at least:
 
 --- ./kernel/bc/Kconfig.bckm	2006-07-28 13:07:38.000000000 +0400
 +++ ./kernel/bc/Kconfig	2006-07-28 13:09:51.000000000 +0400
 @@ -0,0 +1,25 @@
 
 
 > 	I seem to recall Chandra suggested that instead of doing it this way it
 > would be more appropriate to add the source line to init/Kconfig because
 > it's more central and arch-independent. I tend to agree.
 
 ---
 ~Randy
 |  
	|  |  |  
	| 
		
			| Re:  [PATCH 1/6] BC: kconfig [message #5581 is a reply to message #5580] | Wed, 23 August 2006 22:27   |  
			| 
				
				
					|  Matt Helsley Messages: 86
 Registered: August 2006
 | Member |  |  |  
	| On Wed, 2006-08-23 at 15:32 -0700, Randy.Dunlap wrote: > On Wed, 23 Aug 2006 15:13:42 -0700 Matt Helsley wrote:
 >
 > > On Wed, 2006-08-23 at 15:04 -0700, Dave Hansen wrote:
 > > > On Wed, 2006-08-23 at 15:01 +0400, Kirill Korotaev wrote:
 > > > > --- ./arch/sparc64/Kconfig.arkcfg	2006-07-17 17:01:11.000000000 +0400
 > > > > +++ ./arch/sparc64/Kconfig	2006-08-10 17:56:36.000000000 +0400
 > > > > @@ -432,3 +432,5 @@ source "security/Kconfig"
 > > > >  source "crypto/Kconfig"
 > > > >
 > > > >  source "lib/Kconfig"
 > > > > +
 > > > > +source "kernel/bc/Kconfig"
 > > > ...
 > > > > --- ./arch/sparc64/Kconfig.arkcfg	2006-07-17 17:01:11.000000000 +0400
 > > > > +++ ./arch/sparc64/Kconfig	2006-08-10 17:56:36.000000000 +0400
 > > > > @@ -432,3 +432,5 @@ source "security/Kconfig"
 > > > >  source "crypto/Kconfig"
 > > > >
 > > > >  source "lib/Kconfig"
 > > > > +
 > > > > +source "kernel/bc/Kconfig"
 > > >
 > > > Is it just me, or do these patches look a little funky?  Looks like it
 > > > is trying to patch the same thing into the same file, twice.  Also, the
 > > > patches look to be -p0 instead of -p1.
 > >
 > > They do appear to be -p0
 > >
 > > 	They aren't adding the same thing twice to the same file. This patch
 > > makes different arches source the same Kconfig.
 >
 > Look again.  There are 2 diffstat blocks and 2 of these at least:
 >
 > --- ./kernel/bc/Kconfig.bckm	2006-07-28 13:07:38.000000000 +0400
 > +++ ./kernel/bc/Kconfig	2006-07-28 13:09:51.000000000 +0400
 > @@ -0,0 +1,25 @@
 >
 >
 > > 	I seem to recall Chandra suggested that instead of doing it this way it
 > > would be more appropriate to add the source line to init/Kconfig because
 > > it's more central and arch-independent. I tend to agree.
 >
 > ---
 > ~Randy
 
 My mistake, Dave is right.
 
 Cheers,
 -Matt Helsley
 |  
	|  |  |  
	| 
		
			| Re:  [PATCH 1/6] BC: kconfig [message #5582 is a reply to message #5579] | Wed, 23 August 2006 22:13   |  
			| 
				
				
					|  Matt Helsley Messages: 86
 Registered: August 2006
 | Member |  |  |  
	| On Wed, 2006-08-23 at 15:04 -0700, Dave Hansen wrote: > On Wed, 2006-08-23 at 15:01 +0400, Kirill Korotaev wrote:
 > > --- ./arch/sparc64/Kconfig.arkcfg	2006-07-17 17:01:11.000000000 +0400
 > > +++ ./arch/sparc64/Kconfig	2006-08-10 17:56:36.000000000 +0400
 > > @@ -432,3 +432,5 @@ source "security/Kconfig"
 > >  source "crypto/Kconfig"
 > >
 > >  source "lib/Kconfig"
 > > +
 > > +source "kernel/bc/Kconfig"
 > ...
 > > --- ./arch/sparc64/Kconfig.arkcfg	2006-07-17 17:01:11.000000000 +0400
 > > +++ ./arch/sparc64/Kconfig	2006-08-10 17:56:36.000000000 +0400
 > > @@ -432,3 +432,5 @@ source "security/Kconfig"
 > >  source "crypto/Kconfig"
 > >
 > >  source "lib/Kconfig"
 > > +
 > > +source "kernel/bc/Kconfig"
 >
 > Is it just me, or do these patches look a little funky?  Looks like it
 > is trying to patch the same thing into the same file, twice.  Also, the
 > patches look to be -p0 instead of -p1.
 
 They do appear to be -p0
 
 They aren't adding the same thing twice to the same file. This patch
 makes different arches source the same Kconfig.
 
 I seem to recall Chandra suggested that instead of doing it this way it
 would be more appropriate to add the source line to init/Kconfig because
 it's more central and arch-independent. I tend to agree.
 
 Cheers,
 -Matt Helsley
 |  
	|  |  |  
	| 
		
			| Re:  [PATCH 6/6] BC: kernel memory accounting (marks) [message #5583 is a reply to message #5548] | Wed, 23 August 2006 23:03   |  
			| 
				
				
					|  Dave Hansen Messages: 240
 Registered: October 2005
 | Senior Member |  |  |  
	| On Wed, 2006-08-23 at 15:08 +0400, Kirill Korotaev wrote: >  include/asm-i386/thread_info.h   |    4 ++--
 >  include/asm-ia64/pgalloc.h       |   24 +++++++++++++++++-------
 >  include/asm-x86_64/pgalloc.h     |   12 ++++++++----
 >  include/asm-x86_64/thread_info.h |    5 +++--
 
 Do you think we need to cover a few more architectures before
 considering merging this, or should we just fix them up as we need them?
 
 I'm working on a patch to unify as many of the alloc_thread_info()
 functions as I can.  That should at least give you one place to modify
 and track the thread_info allocations.  I've only compiled for x86_64
 and i386, but I'm working on more.  A preliminary version is attached.
 
 -- Dave
 
 ---
 
 clean-dave/include/asm-alpha/thread_info.h   |    5 ---
 clean-dave/include/asm-frv/thread_info.h     |   17 ----------
 clean-dave/include/asm-h8300/thread_info.h   |    8 +----
 clean-dave/include/asm-i386/thread_info.h    |   18 -----------
 clean-dave/include/asm-m32r/thread_info.h    |   17 ----------
 clean-dave/include/asm-m68k/thread_info.h    |    9 -----
 clean-dave/include/asm-powerpc/thread_info.h |   27 -----------------
 clean-dave/include/linux/thread_alloc.h      |   42 +++++++++++++++++++++++++++
 clean-dave/kernel/fork.c                     |    1
 9 files changed, 47 insertions(+), 97 deletions(-)
 
 diff -puN arch/arm/kernel/process.c~unify-alloc-thread-info arch/arm/kernel/process.c
 diff -puN include/asm-alpha/thread_info.h~unify-alloc-thread-info include/asm-alpha/thread_info.h
 ---  clean/include/asm-alpha/thread_info.h~unify-alloc-thread-inf o	2006-08-23 15:38:37.000000000 -0700
 +++ clean-dave/include/asm-alpha/thread_info.h	2006-08-23 15:40:23.000000000 -0700
 @@ -50,10 +50,7 @@ register struct thread_info *__current_t
 #define current_thread_info()  __current_thread_info
 
 /* Thread information allocation.  */
 -#define THREAD_SIZE (2*PAGE_SIZE)
 -#define alloc_thread_info(tsk) \
 -  ((struct thread_info *) __get_free_pages(GFP_KERNEL,1))
 -#define free_thread_info(ti) free_pages((unsigned long) (ti), 1)
 +#define THREAD_SHIFT (PAGE_SHIFT+1)
 
 #endif /* __ASSEMBLY__ */
 
 diff -puN include/asm-frv/thread_info.h~unify-alloc-thread-info include/asm-frv/thread_info.h
 --- clean/include/asm-frv/thread_info.h~unify-alloc-thread-info	2006-08-23 15:40:26.000000000 -0700
 +++ clean-dave/include/asm-frv/thread_info.h	2006-08-23 15:40:40.000000000 -0700
 @@ -82,23 +82,6 @@ register struct thread_info *__current_t
 
 #define current_thread_info() ({ __current_thread_info; })
 
 -/* thread information allocation */
 -#ifdef CONFIG_DEBUG_STACK_USAGE
 -#define alloc_thread_info(tsk)					\
 -	({							\
 -		struct thread_info *ret;			\
 -								\
 -		ret = kmalloc(THREAD_SIZE, GFP_KERNEL);		\
 -		if (ret)					\
 -			memset(ret, 0, THREAD_SIZE);		\
 -		ret;						\
 -	})
 -#else
 -#define alloc_thread_info(tsk)	kmalloc(THREAD_SIZE, GFP_KERNEL)
 -#endif
 -
 -#define free_thread_info(info)	kfree(info)
 -
 #endif /* __ASSEMBLY__ */
 
 /*
 diff -puN include/asm-h8300/thread_info.h~unify-alloc-thread-info include/asm-h8300/thread_info.h
 ---  clean/include/asm-h8300/thread_info.h~unify-alloc-thread-inf o	2006-08-23 15:40:43.000000000 -0700
 +++ clean-dave/include/asm-h8300/thread_info.h	2006-08-23 15:41:54.000000000 -0700
 @@ -49,8 +49,8 @@ struct thread_info {
 /*
 * Size of kernel stack for each process. This must be a power of 2...
 */
 -#define THREAD_SIZE		8192	/* 2 pages */
 -
 +#define THREAD_SHIFT	1
 +#define THREAD_SIZE	(1<<THREAD_SHIFT)
 
 /* how to get the thread information struct from C */
 static inline struct thread_info *current_thread_info(void)
 @@ -65,10 +65,6 @@ static inline struct thread_info *curren
 return ti;
 }
 
 -/* thread information allocation */
 -#define alloc_thread_info(tsk) ((struct thread_info *) \
 -				__get_free_pages(GFP_KERNEL, 1))
 -#define free_thread_info(ti)	free_pages((unsigned long) (ti), 1)
 #endif /* __ASSEMBLY__ */
 
 /*
 diff -puN include/asm-i386/thread_info.h~unify-alloc-thread-info include/asm-i386/thread_info.h
 ---  clean/include/asm-i386/thread_info.h~unify-alloc-thread-info	2006-08-23 15:19:27.000000000 -0700
 +++ clean-dave/include/asm-i386/thread_info.h	2006-08-23 15:33:30.000000000 -0700
 @@ -92,24 +92,6 @@ static inline struct thread_info *curren
 {
 return (struct thread_info *)(current_stack_pointer & ~(THREAD_SIZE - 1));
 }
 -
 -/* thread information allocation */
 -#ifdef CONFIG_DEBUG_STACK_USAGE
 -#define alloc_thread_info(tsk)					\
 -	({							\
 -		struct thread_info *ret;			\
 -								\
 -		ret = kmalloc(THREAD_SIZE, GFP_KERNEL);		\
 -		if (ret)					\
 -			memset(ret, 0, THREAD_SIZE);		\
 -		ret;						\
 -	})
 -#else
 -#define alloc_thread_info(tsk) kmalloc(THREAD_SIZE, GFP_KERNEL)
 -#endif
 -
 -#define free_thread_info(info)	kfree(info)
 -
 #else /* !__ASSEMBLY__ */
 
 /* how to get the thread information struct from ASM */
 diff -puN include/asm-ia64/thread_info.h~unify-alloc-thread-info include/asm-ia64/thread_info.h
 diff -puN include/asm-m32r/thread_info.h~unify-alloc-thread-info include/asm-m32r/thread_info.h
 ---  clean/include/asm-m32r/thread_info.h~unify-alloc-thread-info	2006-08-23 15:44:38.000000000 -0700
 +++ clean-dave/include/asm-m32r/thread_info.h	2006-08-23 15:44:51.000000000 -0700
 @@ -94,23 +94,6 @@ static inline struct thread_info *curren
 return ti;
 }
 
 -/* thread information allocation */
 -#ifdef CONFIG_DEBUG_STACK_USAGE
 -#define alloc_thread_info(tsk)					\
 -	({							\
 -		struct thread_info *ret;			\
 -	 							\
 -	 	ret = kmalloc(THREAD_SIZE, GFP_KERNEL);		\
 -	 	if (ret)					\
 -	 		memset(ret, 0, THREAD_SIZE);		\
 -	 	ret;						\
 -	 })
 -#else
 -#define alloc_thread_info(tsk) kmalloc(THREAD_SIZE, GFP_KERNEL)
 -#endif
 -
 -#define free_thread_info(info) kfree(info)
 -
 #define TI_FLAG_FAULT_CODE_SHIFT	28
 
 static inline void set_thread_fault_code(unsigned int val)
 diff -puN include/asm-m68k/thread_info.h~unify-alloc-thread-info include/asm-m68k/thread_info.h
 ---  clean/include/asm-m68k/thread_info.h~unify-alloc-thread-info	2006-08-23 15:44:52.000000000 -0700
 +++ clean-dave/include/asm-m68k/thread_info.h	2006-08-23 15:45:32.000000000 -0700
 @@ -24,14 +24,7 @@ struct thread_info {
 },					\
 }
 
 -/* THREAD_SIZE should be 8k, so handle differently for 4k and 8k machines */
 -#if PAGE_SHIFT == 13 /* 8k machines */
 -#define alloc_thread_info(tsk)   ((struct thread_info *)__get_free_pages(GFP_KERNEL,0))
 -#define free_thread_info(ti)  free_pages((unsigned long)(ti),0)
 -#else /* otherwise assume 4k pages */
 -#define alloc_thread_info(tsk)   ((struct thread_info *)__get_free_pages(GFP_KERNEL,1))
 -#define free_thread_info(ti)  free_pages((unsigned long)(ti),1)
 -#endif /* PAGE_SHIFT == 13 */
 +#define THREAD_SHIFT	1
 
 #define init_thread_info	(init_task.thread.info)
 #define init_stack		(init_thread_union.stack)
 diff -puN include/asm-powerpc/thread_info.h~unify-alloc-thread-info include/asm-powerpc/thread_info.h
 ---  clean/include/asm-powerpc/thread_info.h~unify-alloc-thread-i nfo	2006-08-07 12:21:11.000000000 -0700
 +++ clean-dave/include/asm-powerpc/thread_info.h	2006-08-23 15:48:09.000000000 -0700
 @@ -62,33 +62,6 @@ struct thread_info {
 #define init_thread_info	(init_thread_union.thread_info)
 #define init_stack		(init_thread_union.stack)
 
 -/* thread information allocation */
 -
 -#if THREAD_SHIFT >= PAGE_SHIFT
 -
 -#define THREAD_ORDER	(THREAD_SHIFT - PAGE_SHIFT)
 -
 -#ifdef CONFIG_DEBUG_STACK_USAGE
 -#define alloc_thread_info(tsk)	\
 -	((struct thread_info *)__get_free_pages(GFP_KERNEL | \
 -		__GFP_ZERO, THREAD_ORDER))
 -#else
 -#define alloc_thread_info(tsk)	\
 -	((struct thread_info *)__get_free_pages(GFP_KERNEL, THREAD_ORDER))
 -#endif
 -#define free_thread_info(ti)	free_pages((unsigned long)ti, THREAD_ORDER)
 -
 -#else /* THREAD_SHIFT < PAGE_SHIFT */
 -
 -#ifdef CONFIG_DEBUG_STACK_USAGE
 -#define alloc_thread_info(tsk)	kzalloc(THREAD_SIZE, GFP_KERNEL)
 -#else
 -#define alloc_thread_info(tsk)	kmalloc(THREAD_SIZE, GFP_KERNEL)
 -#endif
 -#define free_thread_info(ti)	kfree(ti)
 -
 -#endif /* THREAD_SHIFT < PAGE_SHIFT */
 -
 /* how to get the thread information struct from C */
 static inline struct thread_info *current_thread_info(void)
 {
 diff -puN /dev/null include/linux/thread_alloc.h
 --- /dev/null	2005-03-30 22:36:15.000000000 -0800
 +++ clean-dave/include/linux/thread_alloc.h	2006-08-23 16:00:41.000000000 -0700
 @@ -0,0 +1,42 @@
 +#ifndef _LINUX_THREAD_ALLOC
 +#define _LINUX_THREAD_ALLOC
 +
 +#ifndef THREAD_SHIFT
 +#define THREAD_SHIFT PAGE_SHIFT
 +#endif
 +#ifndef THREAD_ORDER
 +#define THREAD_ORDER    (THREAD_SHIFT - PAGE_SHIFT)
 +#endif
 +
 +struct thread_info;
 +struct task;
 +
 +#if THREAD_SHIFT >= PAGE_SHIFT
 +static inline struct thread_info *alloc_thread_info(struct task_struct *tsk)
 +{
 +	gfp_t flags = GFP_KERNEL;
 +#ifdef CONFIG_DEBUG_STACK_USAGE
 +	flags |= __GFP_ZERO;
 +#endif
 +	return (struct thread_info *)__get_free_pages(flags, THREAD_ORDER);
 +}
 +static inline void free_thread_info(struct thread_info *ti)
 +{
 +	free_pages((unsigned long)ti, THREAD_ORDER);
 +}
 +#else /* THREAD_SHIFT < PAGE_SHIFT */
 +static inline struct thread_info *alloc_thread_info(struct task_struct *tsk)
 +{
 +#ifdef CONFIG_DEBUG_STACK_USAGE
 +	return kzalloc(THREAD_SIZE, GFP_KERNEL);
 +#else
 +	return kmalloc(THREAD_SIZE, GFP_KERNEL);
 +#endif
 +}
 +static inline void free_thread_info(struct thread_info *ti)
 +{
 +	kfree(ti);
 +}
 +#endif /* THREAD_SHIFT < PAGE_SHIFT */
 +
 +#endif /* _LINUX_THREAD_ALLOC */
 diff -puN include/linux/thread_info.h~unify-alloc-thread-info include/linux/thread_info.h
 diff -puN kernel/fork.c~unify-alloc-thread-info kernel/fork.c
 --- clean/kernel/fork.c~unify-alloc-thread-info	2006-08-23 15:19:28.000000000 -0700
 +++ clean-dave/kernel/fork.c	2006-08-23 15:47:35.000000000 -0700
 @@ -45,6 +45,7 @@
 #include <linux/cn_proc.h>
 #include <linux/delayacct.h>
 #include <linux/taskstats_kern.h>
 +#include <linux/thread_alloc.h>
 
 #include <asm/pgtable.h>
 #include <asm/pgalloc.h>
 _
...
 
 
 |  
	|  |  |  
	| 
		
			| Re: [PATCH 2/6] BC: beancounters core (API) [message #5584 is a reply to message #5558] | Wed, 23 August 2006 22:05   |  
			| 
				
				
					|  Matt Helsley Messages: 86
 Registered: August 2006
 | Member |  |  |  
	| On Wed, 2006-08-23 at 17:49 +0400, Kirill Korotaev wrote: > Alexey Dobriyan wrote:
 > > On Wed, Aug 23, 2006 at 03:03:07PM +0400, Kirill Korotaev wrote:
 > >
 > >
 > >>--- /dev/null
 > >>+++ ./include/bc/beancounter.h
 > >
 > >
 > >>+#define BC_RESOURCES	0
 > >
 > >
 > > Do you want userspace to see it?
 > yep.
 >
 > >>+struct bc_resource_parm {
 > >>+	unsigned long barrier;	/* A barrier over which resource allocations
 > >>+				 * are failed gracefully. e.g. if the amount
 > >>+				 * of consumed memory is over the barrier
 > >>+				 * further sbrk() or mmap() calls fail, the
 > >>+				 * existing processes are not killed.
 > >>+				 */
 > >>+	unsigned long limit;	/* hard resource limit */
 > >>+	unsigned long held;	/* consumed resources */
 > >>+	unsigned long maxheld;	/* maximum amount of consumed resources */
 > >>+	unsigned long minheld;	/* minumum amount of consumed resources */
 > >
 > >
 > > Stupid question: when minimum amount is useful?
 > to monitor usage statistics (range of used resources).
 > this value will be usefull when ubstat will be added.
 > this field probably would be more logical to add later,
 > but since it is part of user space interface it is left here
 > for not changing API later.
 
 Then I think it belongs in a separate patch. Add it and the scattered
 bits and pieces that use it with the same patch. Then folks can clearly
 see what it's for, where it impacts the code, and how it works. Yes,
 factoring it out causes the API to evolve over the course of applying
 the patch series -- IMHO that evolution is useful information to convey
 to reviewers too.
 
 Cheers,
 -Matt Helsley
 |  
	|  |  |  
	| 
		
			| Re: [PATCH] BC: resource beancounters (v2) [message #5586 is a reply to message #5570] | Thu, 24 August 2006 00:17   |  
			| 
				
				
					|  Chandra Seetharaman Messages: 88
 Registered: August 2006
 | Member |  |  |  
	| On Wed, 2006-08-23 at 10:05 -0700, Andrew Morton wrote: > On Wed, 23 Aug 2006 14:46:19 +0400
 > Kirill Korotaev <dev@sw.ru> wrote:
 >
 > > The following patch set presents base of
 > > Resource Beancounters (BC).
 > > BC allows to account and control consumption
 > > of kernel resources used by group of processes.
 > >
 > > Draft UBC description on OpenVZ wiki can be found at
 > > http://wiki.openvz.org/UBC_parameters
 > >
 > > The full BC patch set allows to control:
 > > - kernel memory. All the kernel objects allocatable
 > >  on user demand should be accounted and limited
 > >  for DoS protection.
 > >  E.g. page tables, task structs, vmas etc.
 > >
 > > - virtual memory pages. BCs allow to
 > >  limit a container to some amount of memory and
 > >  introduces 2-level OOM killer taking into account
 > >  container's consumption.
 > >  pages shared between containers are correctly
 > >  charged as fractions (tunable).
 > >
 > > - network buffers. These includes TCP/IP rcv/snd
 > >  buffers, dgram snd buffers, unix, netlinks and
 > >  other buffers.
 > >
 > > - minor resources accounted/limited by number:
 > >  tasks, files, flocks, ptys, siginfo, pinned dcache
 > >  mem, sockets, iptentries (for containers with
 > >  virtualized networking)
 > >
 > > As the first step we want to propose for discussion
 > > the most complicated parts of resource management:
 > > kernel memory and virtual memory.
 >
 > The patches look reasonable to me - mergeable after updating them for
 > today's batch of review commentlets.
 
 If you are considering this infrastructure for generic resource
 management, I have few concerns:
 - There is no CPU controller under this framework
 - There is no I/O controller under this framework
 - Minimum of 3 parameters need to be used to manage memory.
 (in other words, usage is not simple. In order to provide a minimum
 guarantee of a resource, one needs to define a new parameter)
 
 >
 > I have two high-level problems though.
 >
 > a) I don't yet have a sense of whether this implementation
 >    is appropriate/sufficient for the various other
 >    applications which people are working on.
 >
 >    If the general shape is OK and we think this
 >    implementation can be grown into one which everyone can
 >    use then fine.
 
 Here are some of other infrastructure related issues I have raised.
 http://marc.theaimsgroup.com/?l=ckrm-tech&m=115593001810 616&w=2
 
 >
 > And...
 >
 > > The patch set to be sent provides core for BC and
 > > management of kernel memory only. Virtual memory
 > > management will be sent in a couple of days.
 >
 > We need to go over this work before we can commit to the BC
 > core.  Last time I looked at the VM accounting patch it
 > seemed rather unpleasing from a maintainability POV.
 >
 > And, if I understand it correctly, the only response to a job
 > going over its VM limits is to kill it, rather than trimming
 > it.  Which sounds like a big problem?
 
 Yes, it does.
 
 IMHO (as mentioned in a different email), a group with a resource
 constraint should behave no different than a kernel with a specified
 amount of memory. i.e it should do reclamation before it starts failing
 allocation requests. It could even do it preemptively.
 >
 --
 
 ------------------------------------------------------------ ----------
 Chandra Seetharaman               | Be careful what you choose....
 - sekharan@us.ibm.com   |      .......you may get it.
 ------------------------------------------------------------ ----------
 |  
	|  |  |  
	| 
		
			| Re: [PATCH 1/6] BC: kconfig [message #5587 is a reply to message #5543] | Thu, 24 August 2006 00:23   |  
			| 
				
				
					|  Chandra Seetharaman Messages: 88
 Registered: August 2006
 | Member |  |  |  
	| Is there a reason why these can be moved to a arch-neutral place ? 
 PS: Please keep ckrm-tech on Cc: please.
 
 On Wed, 2006-08-23 at 15:01 +0400, Kirill Korotaev wrote:
 > Add kernel/bc/Kconfig file with BC options and
 > include it into arch Kconfigs
 >
 > Signed-off-by: Pavel Emelianov <xemul@sw.ru>
 > Signed-off-by: Kirill Korotaev <dev@sw.ru>
 >
 > ---
 >
 >  arch/i386/Kconfig    |    2 ++
 >  arch/ia64/Kconfig    |    2 ++
 >  arch/powerpc/Kconfig |    2 ++
 >  arch/ppc/Kconfig     |    2 ++
 >  arch/sparc/Kconfig   |    2 ++
 >  arch/sparc64/Kconfig |    2 ++
 >  arch/x86_64/Kconfig  |    2 ++
 >  kernel/bc/Kconfig    |   25 +++++++++++++++++++++++++
 >  8 files changed, 39 insertions(+)
 >
 > --- ./arch/i386/Kconfig.bckm	2006-07-10 12:39:10.000000000 +0400
 > +++ ./arch/i386/Kconfig	2006-07-28 14:10:41.000000000 +0400
 > @@ -1146,6 +1146,8 @@ source "crypto/Kconfig"
 >
 >  source "lib/Kconfig"
 >
 > +source "kernel/bc/Kconfig"
 > +
 >  #
 >  # Use the generic interrupt handling code in kernel/irq/:
 >  #
 > --- ./arch/ia64/Kconfig.bckm	2006-07-10 12:39:10.000000000 +0400
 > +++ ./arch/ia64/Kconfig	2006-07-28 14:10:56.000000000 +0400
 > @@ -481,6 +481,8 @@ source "fs/Kconfig"
 >
 >  source "lib/Kconfig"
 >
 > +source "kernel/bc/Kconfig"
 > +
 >  #
 >  # Use the generic interrupt handling code in kernel/irq/:
 >  #Add kernel/bc/Kconfig file with BC options and
 > include it into arch Kconfigs
 >
 > Signed-off-by: Pavel Emelianov <xemul@sw.ru>
 > Signed-off-by: Kirill Korotaev <dev@sw.ru>
 >
 > ---
 >
 >  arch/i386/Kconfig    |    2 ++
 >  arch/ia64/Kconfig    |    2 ++
 >  arch/powerpc/Kconfig |    2 ++
 >  arch/ppc/Kconfig     |    2 ++
 >  arch/sparc/Kconfig   |    2 ++
 >  arch/sparc64/Kconfig |    2 ++
 >  arch/x86_64/Kconfig  |    2 ++
 >  kernel/bc/Kconfig    |   25 +++++++++++++++++++++++++
 >  8 files changed, 39 insertions(+)
 >
 > --- ./arch/i386/Kconfig.bckm	2006-07-10 12:39:10.000000000 +0400
 > +++ ./arch/i386/Kconfig	2006-07-28 14:10:41.000000000 +0400
 > @@ -1146,6 +1146,8 @@ source "crypto/Kconfig"
 >
 >  source "lib/Kconfig"
 >
 > +source "kernel/bc/Kconfig"
 > +
 >  #
 >  # Use the generic interrupt handling code in kernel/irq/:
 >  #
 > --- ./arch/ia64/Kconfig.bckm	2006-07-10 12:39:10.000000000 +0400
 > +++ ./arch/ia64/Kconfig	2006-07-28 14:10:56.000000000 +0400
 > @@ -481,6 +481,8 @@ source "fs/Kconfig"
 >
 >  source "lib/Kconfig"
 >
 > +source "kernel/bc/Kconfig"
 > +
 >  #
 >  # Use the generic interrupt handling code in kernel/irq/:
 >  #
 > --- ./arch/powerpc/Kconfig.arkcfg	2006-08-07 14:07:12.000000000 +0400
 > +++ ./arch/powerpc/Kconfig	2006-08-10 17:55:58.000000000 +0400
 > @@ -1038,6 +1038,8 @@ source "arch/powerpc/platforms/iseries/K
 >
 >  source "lib/Kconfig"
 >
 > +source "kernel/bc/Kconfig"
 > +
 >  menu "Instrumentation Support"
 >          depends on EXPERIMENTAL
 >
 > --- ./arch/ppc/Kconfig.arkcfg	2006-07-10 12:39:10.000000000 +0400
 > +++ ./arch/ppc/Kconfig	2006-08-10 17:56:13.000000000 +0400
 > @@ -1414,6 +1414,8 @@ endmenu
 >
 >  source "lib/Kconfig"
 >
 > +source "kernel/bc/Kconfig"
 > +
 >  source "arch/powerpc/oprofile/Kconfig"
 >
 >  source "arch/ppc/Kconfig.debug"
 > --- ./arch/sparc/Kconfig.arkcfg	2006-04-21 11:59:32.000000000 +0400
 > +++ ./arch/sparc/Kconfig	2006-08-10 17:56:24.000000000 +0400
 > @@ -296,3 +296,5 @@ source "security/Kconfig"
 >  source "crypto/Kconfig"
 >
 >  source "lib/Kconfig"
 > +
 > +source "kernel/bc/Kconfig"
 > --- ./arch/sparc64/Kconfig.arkcfg	2006-07-17 17:01:11.000000000 +0400
 > +++ ./arch/sparc64/Kconfig	2006-08-10 17:56:36.000000000 +0400
 > @@ -432,3 +432,5 @@ source "security/Kconfig"
 >  source "crypto/Kconfig"
 >
 >  source "lib/Kconfig"
 > +
 > +source "kernel/bc/Kconfig"
 > --- ./arch/x86_64/Kconfig.bckm	2006-07-10 12:39:11.000000000 +0400
 > +++ ./arch/x86_64/Kconfig	2006-07-28 14:10:49.000000000 +0400
 > @@ -655,3 +655,5 @@ source "security/Kconfig"
 >  source "crypto/Kconfig"
 >
 >  source "lib/Kconfig"
 > +
 > +source "kernel/bc/Kconfig"
 > --- ./kernel/bc/Kconfig.bckm	2006-07-28 13:07:38.000000000 +0400
 > +++ ./kernel/bc/Kconfig	2006-07-28 13:09:51.000000000 +0400
 > @@ -0,0 +1,25 @@
 > +#
 > +# Resource beancounters (BC)
 > +#
 > +# Copyright (C) 2006 OpenVZ. SWsoft Inc
 > +
 > +menu "User resources"
 > +
 > +config BEANCOUNTERS
 > +	bool "Enable resource accounting/control"
 > +	default n
 > +	help
 > +          This patch provides accounting and allows to configure
 > +          limits for user's consumption of exhaustible system resources.
 > +          The most important resource controlled by this patch is unswappable
 > +          memory (either mlock'ed or used by internal kernel structures and
 > +          buffers). The main goal of this patch is to protect processes
 > +          from running short of important resources because of an accidental
 > +          misbehavior of processes or malicious activity aiming to ``kill''
 > +          the system. It's worth to mention that resource limits configured
 > +          by setrlimit(2) do not give an acceptable level of protection
 > +          because they cover only small fraction of resources and work on a
 > +          per-process basis.  Per-process accounting doesn't prevent malicious
 > +          users from spawning a lot of resource-consuming processes.
 > +
 > +endmenu
 >
 > --- ./arch/powerpc/Kconfig.arkcfg	2006-08-07 14:07:12.000000000 +0400
 > +++ ./arch/powerpc/Kconfig	2006-08-10 17:55:58.000000000 +0400
 > @@ -1038,6 +1038,8 @@ source "arch/powerpc/platforms/iseries/K
 >
 >  source "lib/Kconfig"
 >
 > +source "kernel/bc/Kconfig"
 > +
 >  menu "Instrumentation Support"
 >          depends on EXPERIMENTAL
 >
 > --- ./arch/ppc/Kconfig.arkcfg	2006-07-10 12:39:10.000000000 +0400
 > +++ ./arch/ppc/Kconfig	2006-08-10 17:56:13.000000000 +0400
 > @@ -1414,6 +1414,8 @@ endmenu
 >
 >  source "lib/Kconfig"
 >
 > +source "kernel/bc/Kconfig"
 > +
 >  source "arch/powerpc/oprofile/Kconfig"
 >
 >  source "arch/ppc/Kconfig.debug"
 > --- ./arch/sparc/Kconfig.arkcfg	2006-04-21 11:59:32.000000000 +0400
 > +++ ./arch/sparc/Kconfig	2006-08-10 17:56:24.000000000 +0400
 > @@ -296,3 +296,5 @@ source "security/Kconfig"
 >  source "crypto/Kconfig"
 >
 >  source "lib/Kconfig"
 > +
 > +source "kernel/bc/Kconfig"
 > --- ./arch/sparc64/Kconfig.arkcfg	2006-07-17 17:01:11.000000000 +0400
 > +++ ./arch/sparc64/Kconfig	2006-08-10 17:56:36.000000000 +0400
 > @@ -432,3 +432,5 @@ source "security/Kconfig"
 >  source "crypto/Kconfig"
 >
 >  source "lib/Kconfig"
 > +
 > +source "kernel/bc/Kconfig"
 > --- ./arch/x86_64/Kconfig.bckm	2006-07-10 12:39:11.000000000 +0400
 > +++ ./arch/x86_64/Kconfig	2006-07-28 14:10:49.000000000 +0400
 > @@ -655,3 +655,5 @@ source "security/Kconfig"
 >  source "crypto/Kconfig"
 >
 >  source "lib/Kconfig"
 > +
 > +source "kernel/bc/Kconfig"
 > --- ./kernel/bc/Kconfig.bckm	2006-07-28 13:07:38.000000000 +0400
 > +++ ./kernel/bc/Kconfig	2006-07-28 13:09:51.000000000 +0400
 > @@ -0,0 +1,25 @@
 > +#
 > +# Resource beancounters (BC)
 > +#
 > +# Copyright (C) 2006 OpenVZ. SWsoft Inc
 > +
 > +menu "User resources"
 > +
 > +config BEANCOUNTERS
 > +	bool "Enable resource accounting/control"
 > +	default n
 > +	help
 > +          This patch provides accounting and allows to configure
 > +          limits for user's consumption of exhaustible system resources.
 > +          The most important resource controlled by this patch is unswappable
 > +          memory (either mlock'ed or used by internal kernel structures and
 > +          buffers). The main goal of this patch is to protect processes
 > +          from running short of important resources because of an accidental
 > +          misbehavior of processes or malicious activity aiming to ``kill''
 > +          the system. It's worth to mention that resource limits configured
 > +          by setrlimit(2) do not give an acceptable level of protection
 > +          because they cover only small fraction of resources and work on a
 > +          per-process basis.  Per-process accounting doesn't prevent malicious
 > +          users from spawning a lot of resource-consuming processes.
 > +
 > +endmenu
 --
 
 ------------------------------------------------------------ ----------
 Chandra Seetharaman               | Be careful what you choose....
 - sekharan@us.ibm.com   |      .......you may get it.
 ------------------------------------------------------------ ----------
...
 
 
 |  
	|  |  |  
	| 
		
			| Re: [PATCH 4/6] BC: user interface (syscalls) [message #5588 is a reply to message #5546] | Thu, 24 August 2006 00:30   |  
			| 
				
				
					|  Chandra Seetharaman Messages: 88
 Registered: August 2006
 | Member |  |  |  
	| On Wed, 2006-08-23 at 15:06 +0400, Kirill Korotaev wrote: <snip>
 
 > +asmlinkage long sys_set_bclimit(uid_t id, unsigned long resource,
 > +		unsigned long *limits)
 > +{
 > +	int error;
 > +	unsigned long flags;
 > +	struct beancounter *bc;
 > +	unsigned long new_limits[2];
 > +
 > +	error = -EPERM;
 > +	if(!capable(CAP_SYS_RESOURCE))
 > +		goto out;
 > +
 > +	error = -EINVAL;
 > +	if (resource >= BC_RESOURCES)
 > +		goto out;
 > +
 > +	error = -EFAULT;
 > +	if (copy_from_user(&new_limits, limits, sizeof(new_limits)))
 > +		goto out;
 > +
 > +	error = -EINVAL;
 > +	if (new_limits[0] > BC_MAXVALUE || new_limits[1] > BC_MAXVALUE)
 > +		goto out;
 > +
 > +	error = -ENOENT;
 > +	bc = beancounter_findcreate(id, 0);
 > +	if (bc == NULL)
 > +		goto out;
 > +
 > +	spin_lock_irqsave(&bc->bc_lock, flags);
 > +	bc->bc_parms[resource].barrier = new_limits[0];
 > +	bc->bc_parms[resource].limit = new_limits[1];
 
 No check for barrier <= limit
 > +	spin_unlock_irqrestore(&bc->bc_lock, flags);
 > +
 > +	put_beancounter(bc);
 > +	error = 0;
 > +out:
 > +	return error;
 > +}
 <snip>
 --
 
 ------------------------------------------------------------ ----------
 Chandra Seetharaman               | Be careful what you choose....
 - sekharan@us.ibm.com   |      .......you may get it.
 ------------------------------------------------------------ ----------
 |  
	|  |  |  
	| 
		
			| Re: [PATCH 5/6] BC: kernel memory accounting (core) [message #5589 is a reply to message #5547] | Thu, 24 August 2006 00:36   |  
			| 
				
				
					|  Chandra Seetharaman Messages: 88
 Registered: August 2006
 | Member |  |  |  
	| On Wed, 2006-08-23 at 15:06 +0400, Kirill Korotaev wrote: 
 <snip>
 
 > --- ./include/bc/beancounter.h.bckmem	2006-07-28 18:43:52.000000000 +0400
 > +++ ./include/bc/beancounter.h	2006-08-03 16:03:01.000000000 +0400
 > @@ -14,7 +14,9 @@
 >   *	Resource list.
 >   */
 >
 > -#define BC_RESOURCES	0
 > +#define BC_KMEMSIZE	0
 > +
 > +#define BC_RESOURCES	1
 
 As suggested before, a clean interface to define these would be better
 >
 >  struct resource_parm {
 >  	/*
 > --- ./include/bc/kmem.h.bckmem	2006-07-28 18:43:52.000000000 +0400
 > +++ ./include/bc/kmem.h	2006-07-31 17:37:05.000000000 +0400
 > @@ -0,0 +1,33 @@
 > +/*
 > + *  include/bc/kmem.h
 > + *
 > + *  Copyright (C) 2006 OpenVZ. SWsoft Inc
 > + *
 > + */
 > +
 > +#ifndef __BC_KMEM_H_
 > +#define __BC_KMEM_H_
 > +
 > +#include <linux/config.h>
 > +
 > +/*
 > + * BC_KMEMSIZE accounting
 > + */
 > +
 > +struct mm_struct;
 > +struct page;
 > +struct beancounter;
 > +
 > +#ifdef CONFIG_BEANCOUNTERS
 > +int  bc_page_charge(struct page *page, int order, gfp_t flags);
 > +void bc_page_uncharge(struct page *page, int order);
 > +
 > +int bc_slab_charge(kmem_cache_t *cachep, void *obj, gfp_t flags);
 > +void bc_slab_uncharge(kmem_cache_t *cachep, void *obj);
 > +#else
 > +#define bc_page_charge(pg, o, mask)	(0)
 > +#define bc_page_uncharge(pg, o)		do { } while (0)
 > +#define bc_slab_charge(cachep, o, f)	(0)
 > +#define bc_slab_uncharge(cachep, o)	do { } while (0)
 > +#endif
 > +#endif /* __BC_SLAB_H_ */
 > --- ./kernel/bc/Makefile.bcsys	2006-07-28 14:08:37.000000000 +0400
 > +++ ./kernel/bc/Makefile	2006-08-01 11:08:39.000000000 +0400
 > @@ -7,3 +7,4 @@
 >  obj-$(CONFIG_BEANCOUNTERS) += beancounter.o
 >  obj-$(CONFIG_BEANCOUNTERS) += misc.o
 >  obj-$(CONFIG_BEANCOUNTERS) += sys.o
 > +obj-$(CONFIG_BEANCOUNTERS) += kmem.o
 > --- ./kernel/bc/beancounter.c.bckmem	2006-07-28 18:43:52.000000000 +0400
 > +++ ./kernel/bc/beancounter.c	2006-08-03 16:14:17.000000000 +0400
 > @@ -19,6 +19,7 @@ static void init_beancounter_struct(stru
 >  struct beancounter init_bc;
 >
 >  const char *bc_rnames[] = {
 > +	"kmemsize",	/* 0 */
 >  };
 
 As suggested before, a clean interface would be a better idea.
 >
 >  #define bc_hash_fun(x) ((((x) >> 8) ^ (x)) & (BC_HASH_SIZE - 1))
 > @@ -348,6 +378,8 @@ static void init_beancounter_syslimits(s
 >  {
 >  	int k;
 >
 > +	bc->bc_parms[BC_KMEMSIZE].limit = 32 * 1024 * 1024;
 > +
 
 This could be either be a configurable value by/for the controller or
 can be set by the controller through a callout.
 
 >  	for (k = 0; k < BC_RESOURCES; k++)
 >  		bc->bc_parms[k].barrier = bc->bc_parms[k].limit;
 >  }
 
 <snip>
 
 --
 
 ------------------------------------------------------------ ----------
 Chandra Seetharaman               | Be careful what you choose....
 - sekharan@us.ibm.com   |      .......you may get it.
 ------------------------------------------------------------ ----------
 |  
	|  |  |  
	| 
		
			| Re: [PATCH 4/6] BC: user interface (syscalls) [message #5595 is a reply to message #5571] | Thu, 24 August 2006 04:35   |  
			| 
				
				
					|  Andrew Morton Messages: 127
 Registered: December 2005
 | Senior Member |  |  |  
	| On Wed, 23 Aug 2006 18:29:42 +0100 Alan Cox <alan@lxorguk.ukuu.org.uk> wrote:
 
 > Ar Mer, 2006-08-23 am 09:50 -0700, ysgrifennodd Andrew Morton:
 > > On Wed, 23 Aug 2006 17:43:16 +0400
 > > Kirill Korotaev <dev@sw.ru> wrote:
 > >
 > > > +asmlinkage long sys_set_bclimit(uid_t id, unsigned long resource,
 > > > +		unsigned long *limits)
 > >
 > > I'm still a bit mystified about the use of uid_t here.  It's not a uid, is
 > > it?
 >
 > Its a uid_t because of setluid() and twenty odd years of existing unix
 > practice.
 >
 
 I don't understand.  This number is an identifier for an accounting
 container, which was somehow dreamed up by userspace.
 
 AFAICT it is wholly unrelated to user ID's.
 
 (How does userspace avoid collisions, btw?)
 |  
	|  |  |  
	|  |  
	| 
		
			| Re:  [PATCH 6/6] BC: kernel memory accounting (marks) [message #5602 is a reply to message #5583] | Thu, 24 August 2006 09:30   |  
			| 
				
				
					|  Geert Uytterhoeven Messages: 1
 Registered: August 2006
 | Junior Member |  |  |  
	| On Wed, 23 Aug 2006, Dave Hansen wrote: > I'm working on a patch to unify as many of the alloc_thread_info()
 > functions as I can.  That should at least give you one place to modify
 > and track the thread_info allocations.  I've only compiled for x86_64
 > and i386, but I'm working on more.  A preliminary version is attached.
 
 > ---  clean/include/asm-m68k/thread_info.h~unify-alloc-thread-info	2006-08-23 15:44:52.000000000 -0700
 > +++ clean-dave/include/asm-m68k/thread_info.h	2006-08-23 15:45:32.000000000 -0700
 > @@ -24,14 +24,7 @@ struct thread_info {
 >  	},					\
 >  }
 >
 > -/* THREAD_SIZE should be 8k, so handle differently for 4k and 8k machines */
 > -#if PAGE_SHIFT == 13 /* 8k machines */
 > -#define alloc_thread_info(tsk)   ((struct thread_info *)__get_free_pages(GFP_KERNEL,0))
 > -#define free_thread_info(ti)  free_pages((unsigned long)(ti),0)
 > -#else /* otherwise assume 4k pages */
 > -#define alloc_thread_info(tsk)   ((struct thread_info *)__get_free_pages(GFP_KERNEL,1))
 > -#define free_thread_info(ti)  free_pages((unsigned long)(ti),1)
 > -#endif /* PAGE_SHIFT == 13 */
 > +#define THREAD_SHIFT	1
 ^
 Shouldn't this be 13?
 
 > --- /dev/null	2005-03-30 22:36:15.000000000 -0800
 > +++ clean-dave/include/linux/thread_alloc.h	2006-08-23 16:00:41.000000000 -0700
 > @@ -0,0 +1,42 @@
 > +#ifndef _LINUX_THREAD_ALLOC
 > +#define _LINUX_THREAD_ALLOC
 > +
 > +#ifndef THREAD_SHIFT
 > +#define THREAD_SHIFT PAGE_SHIFT
 > +#endif
 > +#ifndef THREAD_ORDER
 > +#define THREAD_ORDER    (THREAD_SHIFT - PAGE_SHIFT)
 > +#endif
 > +
 > +struct thread_info;
 > +struct task;
 > +
 > +#if THREAD_SHIFT >= PAGE_SHIFT
 > +static inline struct thread_info *alloc_thread_info(struct task_struct *tsk)
 > +{
 > +	gfp_t flags = GFP_KERNEL;
 > +#ifdef CONFIG_DEBUG_STACK_USAGE
 > +	flags |= __GFP_ZERO;
 > +#endif
 > +	return (struct thread_info *)__get_free_pages(flags, THREAD_ORDER);
 > +}
 > +static inline void free_thread_info(struct thread_info *ti)
 > +{
 > +	free_pages((unsigned long)ti, THREAD_ORDER);
 > +}
 > +#else /* THREAD_SHIFT < PAGE_SHIFT */
 > +static inline struct thread_info *alloc_thread_info(struct task_struct *tsk)
 > +{
 > +#ifdef CONFIG_DEBUG_STACK_USAGE
 > +	return kzalloc(THREAD_SIZE, GFP_KERNEL);
 > +#else
 > +	return kmalloc(THREAD_SIZE, GFP_KERNEL);
 > +#endif
 > +}
 > +static inline void free_thread_info(struct thread_info *ti)
 > +{
 > +	kfree(ti);
 > +}
 > +#endif /* THREAD_SHIFT < PAGE_SHIFT */
 > +
 > +#endif /* _LINUX_THREAD_ALLOC */
 
 Gr{oetje,eeting}s,
 
 Geert
 
 --
 Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org
 
 In personal conversations with technical people, I call myself a hacker. But
 when I'm talking to journalists I just say "programmer" or something like that.
 -- Linus Torvalds
 |  
	|  |  |  
	|  |  
	|  |  
	| 
		
			| Re: [PATCH 2/6] BC: beancounters core (API) [message #5606 is a reply to message #5568] | Thu, 24 August 2006 11:09   |  
			| 
				
				
					|  Oleg Nesterov Messages: 143
 Registered: August 2006
 | Senior Member |  |  |  
	| On 08/23, Andrew Morton wrote: >
 > On Wed, 23 Aug 2006 15:03:07 +0400
 > Kirill Korotaev <dev@sw.ru> wrote:
 >
 > > +void __put_beancounter(struct beancounter *bc)
 > > +{
 > > +	unsigned long flags;
 > > +
 > > +	/* equivalent to atomic_dec_and_lock_irqsave() */
 > > +	local_irq_save(flags);
 > > +	if (likely(!atomic_dec_and_lock(&bc->bc_refcount, &bc_hash_lock))) {
 > > +		local_irq_restore(flags);
 > > +		if (unlikely(atomic_read(&bc->bc_refcount) < 0))
 > > +			printk(KERN_ERR "BC: Bad refcount: bc=%p, "
 > > +					"luid=%d, ref=%d\n",
 > > +					bc, bc->bc_id,
 > > +					atomic_read(&bc->bc_refcount));
 > > +		return;
 > > +	}
 > > +
 > > +	BUG_ON(bc == &init_bc);
 > > +	verify_held(bc);
 > > +	hlist_del(&bc->hash);
 > > +	spin_unlock_irqrestore(&bc_hash_lock, flags);
 > > +	kmem_cache_free(bc_cachep, bc);
 > > +}
 >
 > I wonder if it's safe and worthwhile to optimise away the local_irq_save():
 
 Suppose ->bc_refcount == 1
 
 > 	if (atomic_dec_and_test(&bc->bc_refcount)) {
 
 Yes, preempted or blocks on spin_lock() below.
 
 another cpu locks bc_hash_lock, does get_beancounter() (beancounter_findcreate),
 then does put_beancounter(), and frees it.
 
 > 		spin_lock_irqsave(&bc_hash_lock, flags);
 > 		if (atomic_read(&bc->bc_refcount) == 0) {
 
 Yes,
 
 > 			free it
 >
 
 Already freed.
 
 Oleg.
 |  
	|  |  |  
	| 
		
			| Re: [ckrm-tech] [PATCH 1/6] BC: kconfig [message #5610 is a reply to message #5587] | Thu, 24 August 2006 11:47   |  
			| 
				
				
					|  dev Messages: 1693
 Registered: September 2005
 Location: Moscow
 | Senior Member |  
 |  |  
	| > Is there a reason why these can be moved to a arch-neutral place ? I think a good place for BC would be kernel/Kconfig.bc
 but still this should be added into archs.
 ok?
 
 > PS: Please keep ckrm-tech on Cc: please.
 Sorry, it is very hard to track emails coming from authors and 3 mailing lists.
 Better tell me the interested people emails.
 
 Thanks,
 Kirill
 
 >
 > On Wed, 2006-08-23 at 15:01 +0400, Kirill Korotaev wrote:
 >
 >>Add kernel/bc/Kconfig file with BC options and
 >>include it into arch Kconfigs
 >>
 >>Signed-off-by: Pavel Emelianov <xemul@sw.ru>
 >>Signed-off-by: Kirill Korotaev <dev@sw.ru>
 >>
 >>---
 >>
 >> arch/i386/Kconfig    |    2 ++
 >> arch/ia64/Kconfig    |    2 ++
 >> arch/powerpc/Kconfig |    2 ++
 >> arch/ppc/Kconfig     |    2 ++
 >> arch/sparc/Kconfig   |    2 ++
 >> arch/sparc64/Kconfig |    2 ++
 >> arch/x86_64/Kconfig  |    2 ++
 >> kernel/bc/Kconfig    |   25 +++++++++++++++++++++++++
 >> 8 files changed, 39 insertions(+)
 >>
 >>--- ./arch/i386/Kconfig.bckm	2006-07-10 12:39:10.000000000 +0400
 >>+++ ./arch/i386/Kconfig	2006-07-28 14:10:41.000000000 +0400
 >>@@ -1146,6 +1146,8 @@ source "crypto/Kconfig"
 >>
 >> source "lib/Kconfig"
 >>
 >>+source "kernel/bc/Kconfig"
 >>+
 >> #
 >> # Use the generic interrupt handling code in kernel/irq/:
 >> #
 >>--- ./arch/ia64/Kconfig.bckm	2006-07-10 12:39:10.000000000 +0400
 >>+++ ./arch/ia64/Kconfig	2006-07-28 14:10:56.000000000 +0400
 >>@@ -481,6 +481,8 @@ source "fs/Kconfig"
 >>
 >> source "lib/Kconfig"
 >>
 >>+source "kernel/bc/Kconfig"
 >>+
 >> #
 >> # Use the generic interrupt handling code in kernel/irq/:
 >> #Add kernel/bc/Kconfig file with BC options and
 >>include it into arch Kconfigs
 >>
 >>Signed-off-by: Pavel Emelianov <xemul@sw.ru>
 >>Signed-off-by: Kirill Korotaev <dev@sw.ru>
 >>
 >>---
 >>
 >> arch/i386/Kconfig    |    2 ++
 >> arch/ia64/Kconfig    |    2 ++
 >> arch/powerpc/Kconfig |    2 ++
 >> arch/ppc/Kconfig     |    2 ++
 >> arch/sparc/Kconfig   |    2 ++
 >> arch/sparc64/Kconfig |    2 ++
 >> arch/x86_64/Kconfig  |    2 ++
 >> kernel/bc/Kconfig    |   25 +++++++++++++++++++++++++
 >> 8 files changed, 39 insertions(+)
 >>
 >>--- ./arch/i386/Kconfig.bckm	2006-07-10 12:39:10.000000000 +0400
 >>+++ ./arch/i386/Kconfig	2006-07-28 14:10:41.000000000 +0400
 >>@@ -1146,6 +1146,8 @@ source "crypto/Kconfig"
 >>
 >> source "lib/Kconfig"
 >>
 >>+source "kernel/bc/Kconfig"
 >>+
 >> #
 >> # Use the generic interrupt handling code in kernel/irq/:
 >> #
 >>--- ./arch/ia64/Kconfig.bckm	2006-07-10 12:39:10.000000000 +0400
 >>+++ ./arch/ia64/Kconfig	2006-07-28 14:10:56.000000000 +0400
 >>@@ -481,6 +481,8 @@ source "fs/Kconfig"
 >>
 >> source "lib/Kconfig"
 >>
 >>+source "kernel/bc/Kconfig"
 >>+
 >> #
 >> # Use the generic interrupt handling code in kernel/irq/:
 >> #
 >>--- ./arch/powerpc/Kconfig.arkcfg	2006-08-07 14:07:12.000000000 +0400
 >>+++ ./arch/powerpc/Kconfig	2006-08-10 17:55:58.000000000 +0400
 >>@@ -1038,6 +1038,8 @@ source "arch/powerpc/platforms/iseries/K
 >>
 >> source "lib/Kconfig"
 >>
 >>+source "kernel/bc/Kconfig"
 >>+
 >> menu "Instrumentation Support"
 >>         depends on EXPERIMENTAL
 >>
 >>--- ./arch/ppc/Kconfig.arkcfg	2006-07-10 12:39:10.000000000 +0400
 >>+++ ./arch/ppc/Kconfig	2006-08-10 17:56:13.000000000 +0400
 >>@@ -1414,6 +1414,8 @@ endmenu
 >>
 >> source "lib/Kconfig"
 >>
 >>+source "kernel/bc/Kconfig"
 >>+
 >> source "arch/powerpc/oprofile/Kconfig"
 >>
 >> source "arch/ppc/Kconfig.debug"
 >>--- ./arch/sparc/Kconfig.arkcfg	2006-04-21 11:59:32.000000000 +0400
 >>+++ ./arch/sparc/Kconfig	2006-08-10 17:56:24.000000000 +0400
 >>@@ -296,3 +296,5 @@ source "security/Kconfig"
 >> source "crypto/Kconfig"
 >>
 >> source "lib/Kconfig"
 >>+
 >>+source "kernel/bc/Kconfig"
 >>--- ./arch/sparc64/Kconfig.arkcfg	2006-07-17 17:01:11.000000000 +0400
 >>+++ ./arch/sparc64/Kconfig	2006-08-10 17:56:36.000000000 +0400
 >>@@ -432,3 +432,5 @@ source "security/Kconfig"
 >> source "crypto/Kconfig"
 >>
 >> source "lib/Kconfig"
 >>+
 >>+source "kernel/bc/Kconfig"
 >>--- ./arch/x86_64/Kconfig.bckm	2006-07-10 12:39:11.000000000 +0400
 >>+++ ./arch/x86_64/Kconfig	2006-07-28 14:10:49.000000000 +0400
 >>@@ -655,3 +655,5 @@ source "security/Kconfig"
 >> source "crypto/Kconfig"
 >>
 >> source "lib/Kconfig"
 >>+
 >>+source "kernel/bc/Kconfig"
 >>--- ./kernel/bc/Kconfig.bckm	2006-07-28 13:07:38.000000000 +0400
 >>+++ ./kernel/bc/Kconfig	2006-07-28 13:09:51.000000000 +0400
 >>@@ -0,0 +1,25 @@
 >>+#
 >>+# Resource beancounters (BC)
 >>+#
 >>+# Copyright (C) 2006 OpenVZ. SWsoft Inc
 >>+
 >>+menu "User resources"
 >>+
 >>+config BEANCOUNTERS
 >>+	bool "Enable resource accounting/control"
 >>+	default n
 >>+	help
 >>+          This patch provides accounting and allows to configure
 >>+          limits for user's consumption of exhaustible system resources.
 >>+          The most important resource controlled by this patch is unswappable
 >>+          memory (either mlock'ed or used by internal kernel structures and
 >>+          buffers). The main goal of this patch is to protect processes
 >>+          from running short of important resources because of an accidental
 >>+          misbehavior of processes or malicious activity aiming to ``kill''
 >>+          the system. It's worth to mention that resource limits configured
 >>+          by setrlimit(2) do not give an acceptable level of protection
 >>+          because they cover only small fraction of resources and work on a
 >>+          per-process basis.  Per-process accounting doesn't prevent malicious
 >>+          users from spawning a lot of resource-consuming processes.
 >>+
 >>+endmenu
 >>
 >>--- ./arch/powerpc/Kconfig.arkcfg	2006-08-07 14:07:12.000000000 +0400
 >>+++ ./arch/powerpc/Kconfig	2006-08-10 17:55:58.000000000 +0400
 >>@@ -1038,6 +1038,8 @@ source "arch/powerpc/platforms/iseries/K
 >>
 >> source "lib/Kconfig"
 >>
 >>+source "kernel/bc/Kconfig"
 >>+
 >> menu "Instrumentation Support"
 >>         depends on EXPERIMENTAL
 >>
 >>--- ./arch/ppc/Kconfig.arkcfg	2006-07-10 12:39:10.000000000 +0400
 >>+++ ./arch/ppc/Kconfig	2006-08-10 17:56:13.000000000 +0400
 >>@@ -1414,6 +1414,8 @@ endmenu
 >>
 >> source "lib/Kconfig"
 >>
 >>+source "kernel/bc/Kconfig"
 >>+
 >> source "arch/powerpc/oprofile/Kconfig"
 >>
 >> source "arch/ppc/Kconfig.debug"
 >>--- ./arch/sparc/Kconfig.arkcfg	2006-04-21 11:59:32.000000000 +0400
 >>+++ ./arch/sparc/Kconfig	2006-08-10 17:56:24.000000000 +0400
 >>@@ -296,3 +296,5 @@ source "security/Kconfig"
 >> source "crypto/Kconfig"
 >>
 >> source "lib/Kconfig"
 >>+
 >>+source "kernel/bc/Kconfig"
 >>--- ./arch/sparc64/Kconfig.arkcfg	2006-07-17 17:01:11.000000000 +0400
 >>+++ ./arch/sparc64/Kconfig	2006-08-10 17:56:36.000000000 +0400
 >>@@ -432,3 +432,5 @@ source "security/Kconfig"
 >> source "crypto/Kconfig"
 >>
 >> source "lib/Kconfig"
 >>+
 >>+source "kernel/bc/Kconfig"
 >>--- ./arch/x86_64/Kconfig.bckm	2006-07-10 12:39:11.000000000 +0400
 >>+++ ./arch/x86_64/Kconfig	2006-07-28 14:10:49.000000000 +0400
 >>@@ -655,3 +655,5 @@ source "security/Kconfig"
 >> source "crypto/Kconfig"
 >>
 >> source "lib/Kconfig"
 >>+
 >>+source "kernel/bc/Kconfig"
 >>--- ./kernel/bc/Kconfig.bckm	2006-07-28 13:07:38.000000000 +0400
 >>+++ ./kernel/bc/Kconfig	2006-07-28 13:09:51.000000000 +0400
 >>@@ -0,0 +1,25 @@
 >>+#
 >>+# Resource beancounters (BC)
 >>+#
 >>+# Copyright (C) 2006 OpenVZ. SWsoft Inc
 >>+
 >>+menu "User resources"
 >>+
 >>+config BEANCOUNTERS
 >>+	bool "Enable resource accounting/control"
 >>+	default n
 >>+	help
 >>+          This patch provides accounting and allows to configure
 >>+          limits for user's consumption of exhaustible system resources.
 >>+          The most important resource controlled by this patch is unswappable
 >>+          memory (either mlock'ed or used by internal kernel structures and
 >>+          buffers). The main goal of this patch is to protect processes
 >>+          from running short of important resources because of an accidental
 >>+          misbehavior of processes or malicious activity aiming to ``kill''
 >>+          the system. It's worth to mention that resource limits configured
 >>+          by setrlimit(2) do not give an acceptable level of protection
 >>+          because they cover only small fraction
...
 
 
 |  
	|  |  |  
	| 
		
			| Re: [PATCH 2/6] BC: beancounters core (API) [message #5612 is a reply to message #5568] | Thu, 24 August 2006 12:03   |  
			| 
				
				
					|  dev Messages: 1693
 Registered: September 2005
 Location: Moscow
 | Senior Member |  
 |  |  
	| Andrew Morton wrote: [... snip ...]
 
 >>+#define bc_charge_locked(bc, r, v, s)			(0)
 >>> +#define bc_charge(bc, r, v)				(0)
 >
 >akpm:/home/akpm> cat t.c
 >void foo(void)
 >{
 >	(0);
 >}
 >akpm:/home/akpm> gcc -c -Wall t.c
 >t.c: In function 'foo':
 >t.c:4: warning: statement with no effect
 
 these functions return value should always be checked (!).
 i.e. it is never called like:
 ub_charge(bc, r, v);
 
 >>+struct beancounter *beancounter_findcreate(uid_t uid, int mask)
 >>+{
 >>+	struct beancounter *new_bc, *bc;
 >>+	unsigned long flags;
 >>+	struct hlist_head *slot;
 >>+	struct hlist_node *pos;
 >>+
 >>+	slot = &bc_hash[bc_hash_fun(uid)];
 >>+	new_bc = NULL;
 >>+
 >>+retry:
 >>+	spin_lock_irqsave(&bc_hash_lock, flags);
 >>+	hlist_for_each_entry (bc, pos, slot, hash)
 >>+		if (bc->bc_id == uid)
 >>+			break;
 >>+
 >>+	if (pos != NULL) {
 >>+		get_beancounter(bc);
 >>+		spin_unlock_irqrestore(&bc_hash_lock, flags);
 >>+
 >>+		if (new_bc != NULL)
 >>+			kmem_cache_free(bc_cachep, new_bc);
 >>+		return bc;
 >>+	}
 >>+
 >>+	if (!(mask & BC_ALLOC))
 >>+		goto out_unlock;
 >>+
 >>+	if (new_bc != NULL)
 >>+		goto out_install;
 >>+
 >>+	spin_unlock_irqrestore(&bc_hash_lock, flags);
 >>+
 >>+	new_bc = kmem_cache_alloc(bc_cachep,
 >>+			mask & BC_ALLOC_ATOMIC ? GFP_ATOMIC : GFP_KERNEL);
 >>+	if (new_bc == NULL)
 >>+		goto out;
 >>+
 >>+	memcpy(new_bc, &default_beancounter, sizeof(*new_bc));
 >>+	init_beancounter_struct(new_bc, uid);
 >>+	goto retry;
 >>+
 >>+out_install:
 >>+	hlist_add_head(&new_bc->hash, slot);
 >>+out_unlock:
 >>+	spin_unlock_irqrestore(&bc_hash_lock, flags);
 >>+out:
 >>+	return new_bc;
 >>+}
 >
 >
 > Can remove the global bc_hash_lock and make the locking per-hash-bucket.
 it is not performance critical path IMHO.
 this lock is taken on container create/change/destroy/user interfaces only.
 
 >>+static inline void verify_held(struct beancounter *bc)
 >>+{
 >>+	int i;
 >>+
 >>+	for (i = 0; i < BC_RESOURCES; i++)
 >>+		if (bc->bc_parms[i].held != 0)
 >>+			bc_print_resource_warning(bc, i,
 >>+					"resource is held on put", 0, 0);
 >>+}
 >>+
 >>+void __put_beancounter(struct beancounter *bc)
 >>+{
 >>+	unsigned long flags;
 >>+
 >>+	/* equivalent to atomic_dec_and_lock_irqsave() */
 >>+	local_irq_save(flags);
 >>+	if (likely(!atomic_dec_and_lock(&bc->bc_refcount, &bc_hash_lock))) {
 >>+		local_irq_restore(flags);
 >>+		if (unlikely(atomic_read(&bc->bc_refcount) < 0))
 >>+			printk(KERN_ERR "BC: Bad refcount: bc=%p, "
 >>+					"luid=%d, ref=%d\n",
 >>+					bc, bc->bc_id,
 >>+					atomic_read(&bc->bc_refcount));
 >>+		return;
 >>+	}
 >>+
 >>+	BUG_ON(bc == &init_bc);
 >>+	verify_held(bc);
 >>+	hlist_del(&bc->hash);
 >>+	spin_unlock_irqrestore(&bc_hash_lock, flags);
 >>+	kmem_cache_free(bc_cachep, bc);
 >>+}
 >
 > I wonder if it's safe and worthwhile to optimise away the local_irq_save():
 >
 > 	if (atomic_dec_and_test(&bc->bc_refcount)) {
 > 		spin_lock_irqsave(&bc_hash_lock, flags);
 > 		if (atomic_read(&bc->bc_refcount) == 0) {
 > 			free it
 put_beancounter can happen from IRQ context.
 so we need something like atomic_dec_and_lock_irqsave()
 Oleg Nesterov proposed more details.
 
 Thanks,
 Kirill
 |  
	|  |  | 
 
 
 Current Time: Sat Oct 25 11:24:41 GMT 2025 
 Total time taken to generate the page: 0.09139 seconds |