OpenVZ Forum


Home » Mailing lists » Devel » [PATCH RESEND 2/2] Fix some kallsyms_lookup() vs rmmod races
[PATCH RESEND 2/2] Fix some kallsyms_lookup() vs rmmod races [message #11265] Fri, 16 March 2007 11:39 Go to next message
Alexey Dobriyan is currently offline  Alexey Dobriyan
Messages: 195
Registered: August 2006
Senior Member
[cc'ing folks whose proc files are affected]

kallsyms_lookup() can call module_address_lookup() which iterates over
modules list without module_mutex taken. Comment at the top of
module_address_lookup() says it's for oops resolution so races are
irrelevant, but in some cases it's reachable from regular code:

BUG: unable to handle kernel paging request at virtual address f94fb19c
printing eip:
c0138097
*pde = 33576067
Oops: 0000 [#1]
PREEMPT
Modules linked in: ohci_hcd af_packet e1000 ehci_hcd uhci_hcd usbcore
CPU: 0
EIP: 0060:[<c0138097>] Not tainted VLI
EFLAGS: 00010246 (2.6.21-rc3-8b9909ded6922c33c221b105b26917780cfa497d #25)
EIP is at module_address_lookup+0x43/0x24c
eax: 00000000 ebx: 00000000 ecx: f94fb080 edx: f882e704
esi: 00000000 edi: 00000000 ebp: 00000000 esp: d30b8e78
ds: 007b es: 007b fs: 00d8 gs: 0033 ss: 0068
Process cat (pid: 13274, ti=d30b8000 task=d2b50af0 task.ti=d30b8000)
Stack: d30b8f44 d30b8f48 00008001 00000000 c039a23c d30b8ec4 00000000 d30b8f44
d30b8f48 c013876c d30b8f4c 00000000 cf544000 fffffff4 d38ebb7c c01880ff
d30b8f4c d30b8ec4 00000246 c0399600 00000001 00000001 00000000 c0399654
Call Trace:
[<c013876c>] kallsyms_lookup+0x5e/0x7c
[<c01880ff>] proc_pid_wchan+0x38/0x76
[<c01404fe>] __alloc_pages+0x4f/0x2f9
[<c01891c6>] proc_info_read+0x6f/0xa8
[<c015bbfd>] sys_fstat64+0x1e/0x23
[<c01592d4>] vfs_read+0x7d/0xb5
[<c0189157>] proc_info_read+0x0/0xa8
[<c0159621>] sys_read+0x41/0x6a
[<c0103e72>] sysenter_past_esp+0x5f/0x99
=======================
Code: 04 8b 51 04 0f 18 02 90 81 3d 4c 8d 39 c0 4c 8d 39 c0 74 3f 8b b9 18 01 00 00 8b 99 10 01 00 00 39 eb 77 07 8d 04 3b 39 c5 72 32 <8b> b1 1c 01 00 00 8b 81 14 01 00 00 39 c5 72 06 01 f0 39 c5 72
EIP: [<c0138097>] module_address_lookup+0x43/0x24c SS:ESP 0068:d30b8e78

Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
---

fs/proc/base.c | 7 +++++++
kernel/time/timer_list.c | 6 ++++++
kernel/time/timer_stats.c | 6 ++++++
mm/slab.c | 3 +++
4 files changed, 22 insertions(+)

--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -61,6 +61,7 @@ #include <linux/seq_file.h>
#include <linux/namei.h>
#include <linux/mnt_namespace.h>
#include <linux/mm.h>
+#include <linux/module.h>
#include <linux/smp_lock.h>
#include <linux/rcupdate.h>
#include <linux/kallsyms.h>
@@ -282,7 +283,13 @@ static int proc_pid_wchan(struct task_st

wchan = get_wchan(task);

+ module_mutex_lock();
sym_name = kallsyms_lookup(wchan, &size, &offset, &modname, namebuf);
+ /*
+ * size, offset, modname aren't used, sym_name is copied into namebuf,
+ * so drop it now.
+ */
+ module_mutex_unlock();
if (sym_name)
return sprintf(buffer, "%s", sym_name);
return sprintf(buffer, "%lu", wchan);
--- a/kernel/time/timer_list.c
+++ b/kernel/time/timer_list.c
@@ -44,7 +44,13 @@ static void print_name_offset(struct seq
const char *sym_name;
char *modname;

+ module_mutex_lock();
sym_name = kallsyms_lookup(addr, &size, &offset, &modname, namebuf);
+ /*
+ * size, offset, modname aren't used, sym_name is copied into namebuf,
+ * so drop it now.
+ */
+ module_mutex_unlock();
if (sym_name)
SEQ_printf(m, "%s", sym_name);
else
--- a/kernel/time/timer_stats.c
+++ b/kernel/time/timer_stats.c
@@ -262,7 +262,13 @@ static void print_name_offset(struct seq
const char *sym_name;
char *modname;

+ module_mutex_lock();
sym_name = kallsyms_lookup(addr, &size, &offset, &modname, namebuf);
+ /*
+ * size, offset, modname aren't used, sym_name is copied into namebuf,
+ * so drop it now.
+ */
+ module_mutex_unlock();
if (sym_name)
seq_printf(m, "%s", sym_name);
else
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -4385,14 +4385,17 @@ #ifdef CONFIG_KALLSYMS
unsigned long offset, size;
char namebuf[KSYM_NAME_LEN+1];

+ module_mutex_lock();
name = kallsyms_lookup(address, &size, &offset, &modname, namebuf);

if (name) {
seq_printf(m, "%s+%#lx/%#lx", name, offset, size);
if (modname)
seq_printf(m, " [%s]", modname);
+ module_mutex_unlock();
return;
}
+ module_mutex_unlock();
#endif
seq_printf(m, "%p", (void *)address);
}
Re: [PATCH RESEND 2/2] Fix some kallsyms_lookup() vs rmmod races [message #11268 is a reply to message #11265] Fri, 16 March 2007 11:51 Go to previous messageGo to next message
Ingo Molnar is currently offline  Ingo Molnar
Messages: 51
Registered: December 2005
Member
* Alexey Dobriyan <adobriyan@sw.ru> wrote:

> [cc'ing folks whose proc files are affected]
>
> kallsyms_lookup() can call module_address_lookup() which iterates over
> modules list without module_mutex taken. Comment at the top of
> module_address_lookup() says it's for oops resolution so races are
> irrelevant, but in some cases it's reachable from regular code:

looking at the problem from another angle: wouldnt this be something
that would benefit from freeze_processes()/unfreeze_processes(), and
hence no locking would be required?

Ingo
Re: [PATCH RESEND 2/2] Fix some kallsyms_lookup() vs rmmod races [message #11286 is a reply to message #11268] Fri, 16 March 2007 16:16 Go to previous messageGo to next message
Paulo Marques is currently offline  Paulo Marques
Messages: 7
Registered: March 2007
Junior Member
Ingo Molnar wrote:
> * Alexey Dobriyan <adobriyan@sw.ru> wrote:
>
>> [cc'ing folks whose proc files are affected]
>>
>> kallsyms_lookup() can call module_address_lookup() which iterates over
>> modules list without module_mutex taken. Comment at the top of
>> module_address_lookup() says it's for oops resolution so races are
>> irrelevant, but in some cases it's reachable from regular code:
>
> looking at the problem from another angle: wouldnt this be something
> that would benefit from freeze_processes()/unfreeze_processes(), and
> hence no locking would be required?

I also considered this, but it seemed a little too "blunt" to stop
everything (including completely unrelated processes and kernel threads)
just to remove a module.

How about something like this instead? (just for review)

It's a little more intrusive, since the interface for symbol lookup is
changed (and it affects the callers), but it makes the caller aware that
it should set "safe_to_lock" if possible.

This way we avoid exposing module_mutex outside of module.c and avoid
returning any data pointer to some data structure that might disappear
before the caller tries to use it.

Some of these changes are actually cleanups, because the callers where
creating a dummy modname variables that they didn't use afterwards.

The thing I like the less about this patch is the increase of stack
usage on some code paths by about 60 bytes.

Anyway, I don't have very strong feelings about this, so if you think it
would be better to use freeze_processes()/unfreeze_processes(), I can
cook up a patch for that too...

--
Paulo Marques - www.grupopie.com

"Very funny Scotty. Now beam up my clothes."

diffstat:
> arch/parisc/kernel/unwind.c | 3 --
> arch/powerpc/xmon/xmon.c | 11 ++++-----
> arch/ppc/xmon/xmon.c | 8 +++---
> arch/sh64/kernel/unwind.c | 4 +--
> arch/x86_64/kernel/traps.c | 10 ++++----
> fs/proc/base.c | 3 --
> include/linux/kallsyms.h | 6 +++-
> include/linux/module.h | 44 +++++++++++++++++++-----------------
> kernel/kallsyms.c | 53 +++++++++++++++++++++-----------------------
> kernel/kprobes.c | 6 ++--
> kernel/lockdep.c | 3 --
> kernel/module.c | 44 +++++++++++++++++++++++++++---------
> kernel/time/timer_list.c | 3 --
> kernel/time/timer_stats.c | 3 --
> mm/slab.c | 6 ++--
> 15 files changed, 114 insertions(+), 93 deletions(-)


diff --git a/arch/parisc/kernel/unwind.c b/arch/parisc/kernel/unwind.c
--- a/arch/parisc/kernel/unwind.c
+++ b/arch/parisc/kernel/unwind.c
@@ -216,11 +216,10 @@ static void unwind_frame_regs(struct unw
/* Handle some frequent special cases.... */
{
char symname[KSYM_NAME_LEN+1];
- char *modname;
unsigned long symsize, offset;

kallsyms_lookup(info->ip, &symsize, &offset,
- &modname, symname);
+ NULL, symname, 0);

dbg("info->ip = 0x%lx, name = %s\n", info->ip, symname);

diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c
--- a/arch/powerpc/xmon/xmon.c
+++ b/arch/powerpc/xmon/xmon.c
@@ -1218,7 +1218,6 @@ static void get_function_bounds(unsigned
{
unsigned long size, offset;
const char *name;
- char *modname;

*startp = *endp = 0;
if (pc == 0)
@@ -1226,7 +1225,7 @@ static void get_function_bounds(unsigned
if (setjmp(bus_error_jmp) == 0) {
catch_memory_errors = 1;
sync();
- name = kallsyms_lookup(pc, &size, &offset, &modname, tmpstr);
+ name = kallsyms_lookup(pc, &size, &offset, NULL, tmpstr, 0);
if (name != NULL) {
*startp = pc - offset;
*endp = pc - offset + size;
@@ -2496,7 +2495,7 @@ symbol_lookup(void)
static void xmon_print_symbol(unsigned long address, const char *mid,
const char *after)
{
- char *modname;
+ char modname[MODULE_NAME_LEN + 1];
const char *name = NULL;
unsigned long offset, size;

@@ -2504,8 +2503,8 @@ static void xmon_print_symbol(unsigned l
if (setjmp(bus_error_jmp) == 0) {
catch_memory_errors = 1;
sync();
- name = kallsyms_lookup(address, &size, &offset, &modname,
- tmpstr);
+ name = kallsyms_lookup(address, &size, &offset, modname,
+ tmpstr, 0);
sync();
/* wait a little while to see if we get a machine check */
__delay(200);
@@ -2515,7 +2514,7 @@ static void xmon_print_symbol(unsigned l

if (name) {
printf("%s%s+%#lx/%#lx", mid, name, offset, size);
- if (modname)
+ if (modname[0])
printf(" [%s]", modname);
}
printf("%s", after);
diff --git a/arch/ppc/xmon/xmon.c b/arch/ppc/xmon/xmon.c
--- a/arch/ppc/xmon/xmon.c
+++ b/arch/ppc/xmon/xmon.c
@@ -177,7 +177,7 @@ extern inline void __delay(unsigned int
static void xmon_print_symbol(unsigned long address, const char *mid,
const char *after)
{
- char *modname;
+ char modname[MODULE_NAME_LEN+1];
const char *name = NULL;
unsigned long offset, size;
static char tmpstr[128];
@@ -186,8 +186,8 @@ static void xmon_print_symbol(unsigned l
if (setjmp(bus_error_jmp) == 0) {
debugger_fault_handler = handle_fault;
sync();
- name = kallsyms_lookup(address, &size, &offset, &modname,
- tmpstr);
+ name = kallsyms_lookup(address, &size, &offset, modname,
+ tmpstr, 0);
sync();
/* wait a little while to see if we get a machine check */
__delay(200);
@@ -196,7 +196,7 @@ static void xmon_print_symbol(unsigned l

if (name) {
printf("%s%s+%#lx/%#lx", mid, name, offset, size);
- if (modname)
+ if (modname[0])
printf(" [%s]", modname);
}
printf("%s", after);
diff --git a/arch/sh64/kernel/unwind.c b/arch/sh64/kernel/unwind.c
--- a/arch/sh64/kernel/unwind.c
+++ b/arch/sh64/kernel/unwind.c
@@ -46,7 +46,7 @@ static int lookup_prev_stack_frame(unsig
struct pt_regs *regs)
{
const char *sym;
- char *modname, namebuf[128];
+ char namebuf[128];
unsigned long offset, size;
unsigned long prologue = 0;
unsigned long fp_displacement = 0;
@@ -54,7 +54,7 @@ static int lookup_prev_stack_frame(unsig
unsigned long offset_r14 = 0, offset_r18 = 0;
int i, found_prologue_end = 0;

- sym = kallsyms_lookup(pc, &size, &offset, &modname, namebuf);
+ sym = kallsyms_lookup(pc, &size, &offset, NULL, namebuf, 0);
if (!sym)
return -EINVAL;

diff --git a/arch/x86_64/kernel/traps.c b/arch/x86_64/kernel/traps.c
--- a/arch/x86_64/kernel/traps.c
+++ b/arch/x86_64/kernel/traps.c
@@ -116,18 +116,18 @@ void printk_address(unsigned long addres
{
unsigned long offset = 0, symsize;
const char *symname;
- char *modname;
+ char modname[MODULE_NAME_LEN+1];
char *delim = ":";
- char namebuf[128];
+ char namebuf[KSYM_NAME_LEN+1];

symname = kallsyms_lookup(address, &symsize, &offset,
- &modname, namebuf);
+ modname, namebuf, 0);
if (!symname) {
printk(" [<%016lx>]\n", address);
return;
}
- if (!modname)
- modname = delim = "";
+ if (!modname[0])
+ delim = "";
printk(" [<%016lx>] %s%s%s%s+0x%lx/0x%lx\n",
address, delim, modname, delim, symname, offset, symsize);
}
diff --git a/fs/proc/base.c b/fs/proc/base.c
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -275,14 +275,13 @@ static int proc_pid_auxv(struct task_str
*/
static int proc_pid_wchan(struct task_struct *task, char *buffer)
{
- char *modname;
const char *sym_name;
unsigned long wchan, size, offset;
char namebuf[KSYM_NAME_LEN+1];

wchan = get_wchan(task);

- sym_name = kallsyms_lookup(wchan, &size, &offset, &modname, namebuf);
+ sym_name = kallsyms_lookup(wchan, &size, &offset, NULL, namebuf, 1);
if (sym_name)
return sprintf(buffer, "%s", sym_name);
return sprintf(buffer, "%lu", wchan);
diff --git a/include/linux/kallsyms.h b/include/linux/kallsyms.h
--- a/include/linux/kallsyms.h
+++ b/include/linux/kallsyms.h
@@ -20,7 +20,8 @@ extern int kallsyms_lookup_size_offset(u
const char *kallsyms_lookup(unsigned long addr,
unsigned long *symbolsize,
unsigned long *offset,
- char **modname, char *namebuf);
+ char *modname, char *namebuf,
+ int safe_to_lock);

/* Replace "%s" in format with address, if found */
extern void __print_symbol(const char *fmt, unsigned long address);
@@ -42,7 +43,8 @@ static inline int kallsyms_lookup_size_o
static inline const char *kallsyms_lookup(unsigned long addr,
unsigned long *symbolsize,
unsigned long *offset,
- char **modname, char *namebuf)
+ char *modname, char *namebuf,
+ int safe_to_lock)
{
return NULL;
}
diff --git a/include/linux/module.h b/include/linux/module.h
--- a/include/linux/module.h
+++ b/include/linux/module.h
@@ -370,10 +370,10 @@ struct module *module_text_address(unsig
struct module *__module_text_address(unsigned long addr);
int is_module_address(unsigned long addr);

-/* Returns module and fills in value, defined and namebuf, or NULL if
- symnum out of range. */
-struct module *module_get_kallsym(unsigned int symnum, unsigned long *value,
- char *type, char *name, size_t namelen);
+/* fills in value, type, name and module_name. returns 0 on success
+ or -ENOENT if symnum out of range. */
+int module_get_kallsym(unsigned int symnum, unsigned long *value,
+ char *type, char *name, char *module_name);

/* Look for this name: can be of form module:name. */
unsigned long module_kallsyms_lookup_name(const char *name);
@@ -451,11 +451,15 @@ do { \
} \
} while(0)

-/* For kallsyms to ask for address resolution. NULL means not found. */
-const char *module_address_lookup(unsigned long addr,
-
...

Re: [PATCH RESEND 2/2] Fix some kallsyms_lookup() vs rmmod races [message #11287 is a reply to message #11286] Fri, 16 March 2007 16:18 Go to previous messageGo to next message
Ingo Molnar is currently offline  Ingo Molnar
Messages: 51
Registered: December 2005
Member
* Paulo Marques <pmarques@grupopie.com> wrote:

> >looking at the problem from another angle: wouldnt this be something
> >that would benefit from freeze_processes()/unfreeze_processes(), and
> >hence no locking would be required?
>
> I also considered this, but it seemed a little too "blunt" to stop
> everything (including completely unrelated processes and kernel
> threads) just to remove a module.

'just to remove a module' is very, very rare, on the timescale of most
kernel ops. Almost no distro does it. Furthermore, because we want to do
CPU-hotplug that way, we really want to make
freeze_processes()/unfreeze_processes() 'instantaneous' to the human -
and it is that already. (if it isnt in some case we can make it so)

Ingo
Re: [PATCH RESEND 2/2] Fix some kallsyms_lookup() vs rmmod races [message #11289 is a reply to message #11287] Fri, 16 March 2007 17:16 Go to previous messageGo to next message
Paulo Marques is currently offline  Paulo Marques
Messages: 7
Registered: March 2007
Junior Member
Ingo Molnar wrote:
> * Paulo Marques <pmarques@grupopie.com> wrote:
>
>>> looking at the problem from another angle: wouldnt this be something
>>> that would benefit from freeze_processes()/unfreeze_processes(), and
>>> hence no locking would be required?
>> I also considered this, but it seemed a little too "blunt" to stop
>> everything (including completely unrelated processes and kernel
>> threads) just to remove a module.
>
> 'just to remove a module' is very, very rare, on the timescale of most
> kernel ops. Almost no distro does it. Furthermore, because we want to do
> CPU-hotplug that way, we really want to make
> freeze_processes()/unfreeze_processes() 'instantaneous' to the human -
> and it is that already. (if it isnt in some case we can make it so)

Ok. I started to look at this approach and realized that module.c
already does this:

> ....
> static int __unlink_module(void *_mod)
> {
> struct module *mod = _mod;
> list_del(&mod->list);
> return 0;
> }
>
> /* Free a module, remove from lists, etc (must hold module mutex). */
> static void free_module(struct module *mod)
> {
> /* Delete from various lists */
> stop_machine_run(__unlink_module, mod, NR_CPUS);
> ....

However stop_machine_run doesn't seem like the right thing to do,
because users of the "modules" list don't seem to do anything to prevent
preemption. Am I missing something?

Does freeze_processes() / unfreeze_processes() solve this by only
freezing processes that have voluntarily scheduled (opposed to just
being preempted)?

--
Paulo Marques - www.grupopie.com

"The Computer made me do it."
Re: [PATCH RESEND 2/2] Fix some kallsyms_lookup() vs rmmod races [message #11290 is a reply to message #11289] Fri, 16 March 2007 18:15 Go to previous messageGo to next message
Andrew Morton is currently offline  Andrew Morton
Messages: 127
Registered: December 2005
Senior Member
On Fri, 16 Mar 2007 17:16:39 +0000 Paulo Marques <pmarques@grupopie.com> wrote:

> Does freeze_processes() / unfreeze_processes() solve this by only
> freezing processes that have voluntarily scheduled (opposed to just
> being preempted)?

It goes much much further than that. Those processes need to actually
perform an explicit call to try_to_freeze().
Re: [PATCH RESEND 2/2] Fix some kallsyms_lookup() vs rmmod races [message #11291 is a reply to message #11290] Fri, 16 March 2007 20:27 Go to previous messageGo to next message
Paulo Marques is currently offline  Paulo Marques
Messages: 7
Registered: March 2007
Junior Member
Andrew Morton wrote:
> On Fri, 16 Mar 2007 17:16:39 +0000 Paulo Marques <pmarques@grupopie.com> wrote:
>
>> Does freeze_processes() / unfreeze_processes() solve this by only
>> freezing processes that have voluntarily scheduled (opposed to just
>> being preempted)?
>
> It goes much much further than that. Those processes need to actually
> perform an explicit call to try_to_freeze().

Ok, I've just done a few tests with the attached patch. It basically
creates a freeze_machine_run function that is equivalent in interface to
stop_machine_run, but uses freeze_processes / thaw_processes to stop the
machine.

This is more of a proof of concept than an actual patch. At the very
least "freeze_machine_run" should be moved to kernel/power/process.c and
declared at include/linux/freezer.h so that it could be treated as a
more general purpose function and not something that is module specific.

Anyway, I then tested it by running a modprobe/rmmod loop while running
a "cat /proc/kallsyms" loop.

On the first run I forgot to remove the mutex_lock(module_mutex) from
the /proc/kallsyms read path and the freezer was unable to freeze the
"cat" process that was waiting for the same mutex that the freezer
process was holding :P

After removing the module_mutex locking from "module_get_kallsym"
everything was going fine (at least I got no oopses) until I got this:

kernel: Stopping user space processes timed out after 20 seconds (1
tasks refusing to freeze):
kernel: kbluetoothd
kernel: Restarting tasks ... <4> Strange, kseriod not stopped
kernel: Strange, pdflush not stopped
kernel: Strange, pdflush not stopped
kernel: Strange, kswapd0 not stopped
kernel: Strange, cifsoplockd not stopped
kernel: Strange, cifsdnotifyd not stopped
kernel: Strange, jfsIO not stopped
kernel: Strange, jfsCommit not stopped
kernel: Strange, jfsCommit not stopped
kernel: Strange, jfsSync not stopped
kernel: Strange, xfslogd/0 not stopped
kernel: Strange, xfslogd/1 not stopped
kernel: Strange, xfsdatad/0 not stopped
kernel: Strange, xfsdatad/1 not stopped
kernel: Strange, kjournald not stopped
kernel: Strange, khubd not stopped
kernel: Strange, khelper not stopped
kernel: Strange, kbluetoothd not stopped
kernel: done.

I repeated the test and did a Alt+SysRq+T to try to find out what
kbluetoothd was doing and got this:

kernel: kbluetoothd D 79A11860 0 19156 1 19142
(NOTLB)
kernel: 9a269e4c 00000082 00000001 79a11860 00000000 79a09860 c7018030
00000003
kernel: 9a269e71 78475100 c7ebe000 c6730e40 00000000 00000001 00000001
00000001
kernel: 00000000 9a2d7570 79a11860 c7018140 00000000 00001832 42430d03
000000ab
kernel: Call Trace:
kernel: [<7845dba3>] wait_for_completion+0x7d/0xb7
kernel: [<781190ba>] default_wake_function+0x0/0xc
kernel: [<781190ba>] default_wake_function+0x0/0xc
kernel: [<7812c759>] call_usermodehelper_keys+0xd1/0xf1
kernel: [<7812c41e>] request_module+0x96/0xd9
kernel: [<783e30fe>] sock_alloc_inode+0x20/0x4e
kernel: [<78172559>] alloc_inode+0x15/0x115
kernel: [<78172d87>] new_inode+0x24/0x81
kernel: [<783e4003>] __sock_create+0x111/0x199
kernel: [<783e40a3>] sock_create+0x18/0x1d
kernel: [<783e40e1>] sys_socket+0x1c/0x43
kernel: [<783e51da>] sys_socketcall+0x247/0x24c
kernel: [<78121b2d>] sys_gettimeofday+0x2c/0x65
kernel: [<78103f10>] sysenter_past_esp+0x5d/0x81

And this was as far as I got...

This actually seems like a better approach than to hold module_mutex
everywhere to account for an operation that should be "rare" (module
loading/unloading). If something like this goes in, there are probably a
few more places inside module.c where we can drop the locking completely.

However, it still has a few gotchas. Apart from the problem above (which
may still be me doing something wrong) it makes module loading /
unloading depend on CONFIG_PM which is somewhat unexpected for the user.

Would it make sense to separate the process freezing / thawing API from
actual power management and create a new config option (CONFIG_FREEZER?)
that was automatically selected by the systems that used it (CONFIG_PM,
CONFIG_MODULES, etc.)? or is that overkill?

--
Paulo Marques - www.grupopie.com

"Nostalgia isn't what it used to be."


--- a/kernel/module.c
+++ b/kernel/module.c
@@ -35,7 +35,7 @@
#include <linux/vermagic.h>
#include <linux/notifier.h>
#include <linux/sched.h>
-#include <linux/stop_machine.h>
+#include <linux/freezer.h>
#include <linux/device.h>
#include <linux/string.h>
#include <linux/mutex.h>
@@ -618,13 +618,23 @@ static int __try_stop_module(void *_sref
return 0;
}

+static int freeze_machine_run(int (*fn)(void *), void *data, unsigned int cpu)
+{
+ int ret;
+ freeze_processes();
+ ret = fn(data);
+ thaw_processes();
+ return ret;
+}
+
static int try_stop_module(struct module *mod, int flags, int *forced)
{
struct stopref sref = { mod, flags, forced };

- return stop_machine_run(__try_stop_module, &sref, NR_CPUS);
+ return freeze_machine_run(__try_stop_module, &sref, NR_CPUS);
}

+
unsigned int module_refcount(struct module *mod)
{
unsigned int i, total = 0;
@@ -1198,7 +1208,7 @@ static int __unlink_module(void *_mod)
static void free_module(struct module *mod)
{
/* Delete from various lists */
- stop_machine_run(__unlink_module, mod, NR_CPUS);
+ freeze_machine_run(__unlink_module, mod, NR_CPUS);
remove_sect_attrs(mod);
mod_kobject_remove(mod);

@@ -1997,7 +2007,7 @@ sys_init_module(void __user *umod,

/* Now sew it into the lists. They won't access us, since
strong_try_module_get() will fail. */
- stop_machine_run(__link_module, mod, NR_CPUS);
+ freeze_machine_run(__link_module, mod, NR_CPUS);

/* Drop lock so they can recurse */
mutex_unlock(&module_mutex);
@@ -2124,19 +2134,16 @@ struct module *module_get_kallsym(unsign
{
struct module *mod;

- mutex_lock(&module_mutex);
list_for_each_entry(mod, &modules, list) {
if (symnum < mod->num_symtab) {
*value = mod->symtab[symnum].st_value;
*type = mod->symtab[symnum].st_info;
strlcpy(name, mod->strtab + mod->symtab[symnum].st_name,
namelen);
- mutex_unlock(&module_mutex);
return mod;
}
symnum -= mod->num_symtab;
}
- mutex_unlock(&module_mutex);
return NULL;
}
Re: [PATCH RESEND 2/2] Fix some kallsyms_lookup() vs rmmod races [message #11292 is a reply to message #11291] Fri, 16 March 2007 20:49 Go to previous messageGo to next message
Andrew Morton is currently offline  Andrew Morton
Messages: 127
Registered: December 2005
Senior Member
On Fri, 16 Mar 2007 20:27:29 +0000 Paulo Marques <pmarques@grupopie.com> wrote:

> Andrew Morton wrote:
> > On Fri, 16 Mar 2007 17:16:39 +0000 Paulo Marques <pmarques@grupopie.com> wrote:
> >
> >> Does freeze_processes() / unfreeze_processes() solve this by only
> >> freezing processes that have voluntarily scheduled (opposed to just
> >> being preempted)?
> >
> > It goes much much further than that. Those processes need to actually
> > perform an explicit call to try_to_freeze().
>
> Ok, I've just done a few tests with the attached patch. It basically
> creates a freeze_machine_run function that is equivalent in interface to
> stop_machine_run, but uses freeze_processes / thaw_processes to stop the
> machine.
>
> This is more of a proof of concept than an actual patch. At the very
> least "freeze_machine_run" should be moved to kernel/power/process.c and
> declared at include/linux/freezer.h so that it could be treated as a
> more general purpose function and not something that is module specific.

OK.

> Anyway, I then tested it by running a modprobe/rmmod loop while running
> a "cat /proc/kallsyms" loop.
>
> On the first run I forgot to remove the mutex_lock(module_mutex) from
> the /proc/kallsyms read path and the freezer was unable to freeze the
> "cat" process that was waiting for the same mutex that the freezer
> process was holding :P
>
> After removing the module_mutex locking from "module_get_kallsym"
> everything was going fine (at least I got no oopses) until I got this:
>
> kernel: Stopping user space processes timed out after 20 seconds (1
> tasks refusing to freeze):
> kernel: kbluetoothd
> kernel: Restarting tasks ... <4> Strange, kseriod not stopped
> kernel: Strange, pdflush not stopped
> kernel: Strange, pdflush not stopped
> kernel: Strange, kswapd0 not stopped
> kernel: Strange, cifsoplockd not stopped
> kernel: Strange, cifsdnotifyd not stopped
> kernel: Strange, jfsIO not stopped
> kernel: Strange, jfsCommit not stopped
> kernel: Strange, jfsCommit not stopped
> kernel: Strange, jfsSync not stopped
> kernel: Strange, xfslogd/0 not stopped
> kernel: Strange, xfslogd/1 not stopped
> kernel: Strange, xfsdatad/0 not stopped
> kernel: Strange, xfsdatad/1 not stopped
> kernel: Strange, kjournald not stopped
> kernel: Strange, khubd not stopped
> kernel: Strange, khelper not stopped
> kernel: Strange, kbluetoothd not stopped
> kernel: done.

There are a bunch of freezer fixes in -mm. But problems might still remain
- I don't think freezer has had a lot of load put on it yet, but it will
soon and it needs to become reliable.


> I repeated the test and did a Alt+SysRq+T to try to find out what
> kbluetoothd was doing and got this:
>
> kernel: kbluetoothd D 79A11860 0 19156 1 19142
> (NOTLB)
> kernel: 9a269e4c 00000082 00000001 79a11860 00000000 79a09860 c7018030
> 00000003
> kernel: 9a269e71 78475100 c7ebe000 c6730e40 00000000 00000001 00000001
> 00000001
> kernel: 00000000 9a2d7570 79a11860 c7018140 00000000 00001832 42430d03
> 000000ab
> kernel: Call Trace:
> kernel: [<7845dba3>] wait_for_completion+0x7d/0xb7
> kernel: [<781190ba>] default_wake_function+0x0/0xc
> kernel: [<781190ba>] default_wake_function+0x0/0xc
> kernel: [<7812c759>] call_usermodehelper_keys+0xd1/0xf1
> kernel: [<7812c41e>] request_module+0x96/0xd9
> kernel: [<783e30fe>] sock_alloc_inode+0x20/0x4e
> kernel: [<78172559>] alloc_inode+0x15/0x115
> kernel: [<78172d87>] new_inode+0x24/0x81
> kernel: [<783e4003>] __sock_create+0x111/0x199
> kernel: [<783e40a3>] sock_create+0x18/0x1d
> kernel: [<783e40e1>] sys_socket+0x1c/0x43
> kernel: [<783e51da>] sys_socketcall+0x247/0x24c
> kernel: [<78121b2d>] sys_gettimeofday+0x2c/0x65
> kernel: [<78103f10>] sysenter_past_esp+0x5d/0x81
>
> And this was as far as I got...
>
> This actually seems like a better approach than to hold module_mutex
> everywhere to account for an operation that should be "rare" (module
> loading/unloading). If something like this goes in, there are probably a
> few more places inside module.c where we can drop the locking completely.

Yes, using the freezer and module load/unload time seems like a good idea.

> However, it still has a few gotchas. Apart from the problem above (which
> may still be me doing something wrong) it makes module loading /
> unloading depend on CONFIG_PM which is somewhat unexpected for the user.

yup.

> Would it make sense to separate the process freezing / thawing API from
> actual power management and create a new config option (CONFIG_FREEZER?)
> that was automatically selected by the systems that used it (CONFIG_PM,
> CONFIG_MODULES, etc.)? or is that overkill?

Yes, freezer needs to be decoupled from swsusp and from power management
and it should become a first-class core kernel component. Whether we would
need a CONFIG_FREEZER isn't clear - I suspect we'd end up just compiling it
unconditionally.

I cc'ed Rafael, who is doing the freezer revamp work.
Re: [PATCH RESEND 2/2] Fix some kallsyms_lookup() vs rmmod races [message #11304 is a reply to message #11265] Sat, 17 March 2007 09:32 Go to previous messageGo to next message
Rusty Russell is currently offline  Rusty Russell
Messages: 10
Registered: March 2007
Junior Member
On Fri, 2007-03-16 at 14:44 +0300, Alexey Dobriyan wrote:
> [cc'ing folks whose proc files are affected]
>
> kallsyms_lookup() can call module_address_lookup() which iterates over
> modules list without module_mutex taken. Comment at the top of
> module_address_lookup() says it's for oops resolution so races are
> irrelevant, but in some cases it's reachable from regular code:

Yes, this changed somewhere along the way.

I prefer keeping the lock internal as much as possible, and have the
crash code use an __ variant of the function.

Note also that it might be an idea to have less-powerful accessors than
kallsyms_lookup...

Thanks!
Rusty.
Re: [PATCH RESEND 2/2] Fix some kallsyms_lookup() vs rmmod races [message #11305 is a reply to message #11292] Sat, 17 March 2007 10:36 Go to previous messageGo to next message
Rusty Russell is currently offline  Rusty Russell
Messages: 10
Registered: March 2007
Junior Member
On Fri, 2007-03-16 at 12:49 -0800, Andrew Morton wrote:
> > Ok, I've just done a few tests with the attached patch. It basically
> > creates a freeze_machine_run function that is equivalent in interface to
> > stop_machine_run, but uses freeze_processes / thaw_processes to stop the
> > machine.

I've never been convinced that the freezer was a good idea.
stop_machine is a damn big hammer, but it works.

Rusty.
Re: [PATCH RESEND 2/2] Fix some kallsyms_lookup() vs rmmod races [message #11306 is a reply to message #11268] Sat, 17 March 2007 09:37 Go to previous messageGo to next message
Rusty Russell is currently offline  Rusty Russell
Messages: 10
Registered: March 2007
Junior Member
On Fri, 2007-03-16 at 12:51 +0100, Ingo Molnar wrote:
> * Alexey Dobriyan <adobriyan@sw.ru> wrote:
>
> > [cc'ing folks whose proc files are affected]
> >
> > kallsyms_lookup() can call module_address_lookup() which iterates over
> > modules list without module_mutex taken. Comment at the top of
> > module_address_lookup() says it's for oops resolution so races are
> > irrelevant, but in some cases it's reachable from regular code:
>
> looking at the problem from another angle: wouldnt this be something
> that would benefit from freeze_processes()/unfreeze_processes(), and
> hence no locking would be required?

Actually, the list manipulation is done with stop_machine for this
reason. Alexey, is preempt enabled in your kernel?

Rusty.
Re: [PATCH RESEND 2/2] Fix some kallsyms_lookup() vs rmmod races [message #11317 is a reply to message #11291] Mon, 19 March 2007 09:50 Go to previous messageGo to next message
Alexey Dobriyan is currently offline  Alexey Dobriyan
Messages: 195
Registered: August 2006
Senior Member
On Fri, Mar 16, 2007 at 08:27:29PM +0000, Paulo Marques wrote:
> Andrew Morton wrote:
> >On Fri, 16 Mar 2007 17:16:39 +0000 Paulo Marques <pmarques@grupopie.com>
> >wrote:
> >
> >>Does freeze_processes() / unfreeze_processes() solve this by only
> >>freezing processes that have voluntarily scheduled (opposed to just
> >>being preempted)?
> >
> >It goes much much further than that. Those processes need to actually
> >perform an explicit call to try_to_freeze().
>
> Ok, I've just done a few tests with the attached patch. It basically
> creates a freeze_machine_run function that is equivalent in interface to
> stop_machine_run, but uses freeze_processes / thaw_processes to stop the
> machine.
>
> This is more of a proof of concept than an actual patch. At the very
> least "freeze_machine_run" should be moved to kernel/power/process.c and
> declared at include/linux/freezer.h so that it could be treated as a
> more general purpose function and not something that is module specific.
>
> Anyway, I then tested it by running a modprobe/rmmod loop while running
> a "cat /proc/kallsyms" loop.
>
> On the first run I forgot to remove the mutex_lock(module_mutex) from
> the /proc/kallsyms read path and the freezer was unable to freeze the
> "cat" process that was waiting for the same mutex that the freezer
> process was holding :P
>
> After removing the module_mutex locking from "module_get_kallsym"
> everything was going fine (at least I got no oopses) until I got this:
>
> kernel: Stopping user space processes timed out after 20 seconds (1
> tasks refusing to freeze):
> kernel: kbluetoothd
> kernel: Restarting tasks ... <4> Strange, kseriod not stopped
> kernel: Strange, pdflush not stopped
> kernel: Strange, pdflush not stopped
> kernel: Strange, kswapd0 not stopped
> kernel: Strange, cifsoplockd not stopped
> kernel: Strange, cifsdnotifyd not stopped
> kernel: Strange, jfsIO not stopped
> kernel: Strange, jfsCommit not stopped
> kernel: Strange, jfsCommit not stopped
> kernel: Strange, jfsSync not stopped
> kernel: Strange, xfslogd/0 not stopped
> kernel: Strange, xfslogd/1 not stopped
> kernel: Strange, xfsdatad/0 not stopped
> kernel: Strange, xfsdatad/1 not stopped
> kernel: Strange, kjournald not stopped
> kernel: Strange, khubd not stopped
> kernel: Strange, khelper not stopped
> kernel: Strange, kbluetoothd not stopped
> kernel: done.
>
> I repeated the test and did a Alt+SysRq+T to try to find out what
> kbluetoothd was doing and got this:
>
> kernel: kbluetoothd D 79A11860 0 19156 1 19142
> (NOTLB)
> kernel: 9a269e4c 00000082 00000001 79a11860 00000000 79a09860 c7018030
> 00000003
> kernel: 9a269e71 78475100 c7ebe000 c6730e40 00000000 00000001 00000001
> 00000001
> kernel: 00000000 9a2d7570 79a11860 c7018140 00000000 00001832 42430d03
> 000000ab
> kernel: Call Trace:
> kernel: [<7845dba3>] wait_for_completion+0x7d/0xb7
> kernel: [<781190ba>] default_wake_function+0x0/0xc
> kernel: [<781190ba>] default_wake_function+0x0/0xc
> kernel: [<7812c759>] call_usermodehelper_keys+0xd1/0xf1
> kernel: [<7812c41e>] request_module+0x96/0xd9
> kernel: [<783e30fe>] sock_alloc_inode+0x20/0x4e
> kernel: [<78172559>] alloc_inode+0x15/0x115
> kernel: [<78172d87>] new_inode+0x24/0x81
> kernel: [<783e4003>] __sock_create+0x111/0x199
> kernel: [<783e40a3>] sock_create+0x18/0x1d
> kernel: [<783e40e1>] sys_socket+0x1c/0x43
> kernel: [<783e51da>] sys_socketcall+0x247/0x24c
> kernel: [<78121b2d>] sys_gettimeofday+0x2c/0x65
> kernel: [<78103f10>] sysenter_past_esp+0x5d/0x81
>
> And this was as far as I got...
>
> This actually seems like a better approach than to hold module_mutex
> everywhere to account for an operation that should be "rare" (module
> loading/unloading). If something like this goes in, there are probably a
> few more places inside module.c where we can drop the locking completely.
>
> However, it still has a few gotchas. Apart from the problem above (which
> may still be me doing something wrong) it makes module loading /
> unloading depend on CONFIG_PM which is somewhat unexpected for the user.

It'd be a bug. cat /proc/kallsyms should work reliably regardless of
CONFIG_PM, CONFIG_MODULES, etc.

> Would it make sense to separate the process freezing / thawing API from
> actual power management and create a new config option (CONFIG_FREEZER?)
> that was automatically selected by the systems that used it (CONFIG_PM,
> CONFIG_MODULES, etc.)? or is that overkill?

I tried you patch on top of 2.6.21-rc4-5851fadce8824d5d4b8fd02c22ae098401f6489e
*shrug*
Let's say that is doesn't work here. :)

On boot I got
--------------------------------
Stopping user space processes timed out after 20 seconds (1 tasks refusing to freeze):
mount
Strange, kseriod not stopped
Strange, pdflush not stopped
Strange, pdflush not stopped
Strange, kswapd0 not stopped
Strange, kjournald not stopped
Strange, khelper not stopped
Strange, mount not stopped
Filesystem "loop0": Disabling barriers, not supported by the underlying device
XFS mounting filesystem loop0
Ending clean XFS mount for filesystem: loop0
-----------------------------------------------
Stopping user space processes timed out after 20 seconds (1 tasks refusing to freeze):
dhcpcd
Strange, kseriod not stopped
Strange, pdflush not stopped
Strange, pdflush not stopped
Strange, kswapd0 not stopped
Strange, kjournald not stopped
Strange, xfslogd/0 not stopped
Strange, xfsdatad/0 not stopped
Strange, xfsbufd not stopped
Strange, xfssyncd not stopped
Strange, khubd not stopped
Strange, khelper not stopped
Strange, dhcpcd not stopped
NET: Registered protocol family 17
ohci_hcd: 2006 August 04 USB 1.1 'Open' Host Controller (OHCI) Driver
=================================================

Now it booted (slowly) but running fine. Time to test cat /proc/kallsyms.
rmmod hangs in D-state (probably immediately)

=================================================
rmmod D 00000046 2724 6868 6861 (NOTLB)
f3640f14 00000086 c03e213c 00000046 00000000 00000000 c0465408 00000005
c1b74070 9091506d 0000001c 0000e340 c1b74190 c04653e8 00000046 c04653ec
c04653e8 c04653ec c04653e8 f3640f28 f3640f3c c02d260c 00000001 c1b74070
Call Trace:
[<c02d260c>] wait_for_completion+0x9e/0xb7
[<c0113f67>] default_wake_function+0x0/0xc
[<c0127f72>] kthread_stop+0x4c/0x6a
[<c0125470>] destroy_workqueue+0x24/0x54
[<f94deada>] xfs_buf_terminate+0x14/0x2d [xfs]
[<f94e6dc9>] exit_xfs_fs+0x19/0x27 [xfs]
[<c0138662>] sys_delete_module+0x11e/0x17e
[<c012b051>] up_read+0x14/0x27
[<c0111ea4>] do_page_fault+0x311/0x5ad
[<c0103e72>] sysenter_past_esp+0x5f/0x99
==================================================
Also following two lines were attached at the end of the Alt+SysRq+T output

Clocksource tsc unstable (delta = 23420074248 ns)
Time: acpi_pm clocksource has been installed.
Re: [PATCH RESEND 2/2] Fix some kallsyms_lookup() vs rmmod races [message #11319 is a reply to message #11306] Mon, 19 March 2007 10:14 Go to previous messageGo to next message
Alexey Dobriyan is currently offline  Alexey Dobriyan
Messages: 195
Registered: August 2006
Senior Member
On Sat, Mar 17, 2007 at 08:37:18PM +1100, Rusty Russell wrote:
> On Fri, 2007-03-16 at 12:51 +0100, Ingo Molnar wrote:
> > * Alexey Dobriyan <adobriyan@sw.ru> wrote:
> >
> > > [cc'ing folks whose proc files are affected]
> > >
> > > kallsyms_lookup() can call module_address_lookup() which iterates over
> > > modules list without module_mutex taken. Comment at the top of
> > > module_address_lookup() says it's for oops resolution so races are
> > > irrelevant, but in some cases it's reachable from regular code:
> >
> > looking at the problem from another angle: wouldnt this be something
> > that would benefit from freeze_processes()/unfreeze_processes(), and
> > hence no locking would be required?
>
> Actually, the list manipulation is done with stop_machine for this
> reason.

mmm, my changelog is slightly narrow than it should be.

Non-emergency code is traversing modules list.
It finds "struct module *".
module is removed.
"struct module *" is now meaningless, but still dereferenced.

How would all this refrigerator stuff would help? It wouldn't,

Non-emergency code is traversing modules list.
It finds "struct module *".
Everything is freezed.
Module is removed.
Everything is unfreezed.
"struct module *" is now meaningless, but still dereferenced.

> Alexey, is preempt enabled in your kernel?

Yes. FWIW,

CONFIG_PREEMPT=y
CONFIG_PREEMPT_BKL=y
CONFIG_DEBUG_PREEMPT=y

I very much agree with proto-patch which _copies_ all relevant
information into caller-supplied structure, keeping module_mutex private.
Time to split it sanely.
Re: [PATCH RESEND 2/2] Fix some kallsyms_lookup() vs rmmod races [message #11331 is a reply to message #11319] Mon, 19 March 2007 15:17 Go to previous messageGo to next message
Paulo Marques is currently offline  Paulo Marques
Messages: 7
Registered: March 2007
Junior Member
Alexey Dobriyan wrote:
> On Sat, Mar 17, 2007 at 08:37:18PM +1100, Rusty Russell wrote:
>> On Fri, 2007-03-16 at 12:51 +0100, Ingo Molnar wrote:
>>> [...]
>>> looking at the problem from another angle: wouldnt this be something
>>> that would benefit from freeze_processes()/unfreeze_processes(), and
>>> hence no locking would be required?
>> Actually, the list manipulation is done with stop_machine for this
>> reason.
>
> mmm, my changelog is slightly narrow than it should be.
>
> Non-emergency code is traversing modules list.
> It finds "struct module *".
> module is removed.
> "struct module *" is now meaningless, but still dereferenced.
>
> How would all this refrigerator stuff would help? It wouldn't,
>
> Non-emergency code is traversing modules list.
> It finds "struct module *".
> Everything is freezed.
> Module is removed.
> Everything is unfreezed.
> "struct module *" is now meaningless, but still dereferenced.

That is why I asked if the refrigerator would preempt processes or not.
AFAICS there is no path where the "struct module *" that is returned is
used after a voluntary preemption point, so it should be safe. I might
be missing something, though.

However, this isn't very robust. Since the functions are still returning
pointers to module data, some changes in the future might keep the
pointer, and use it after a valid freezing point -> oops.

>> Alexey, is preempt enabled in your kernel?
>
> Yes. FWIW,
>
> CONFIG_PREEMPT=y
> CONFIG_PREEMPT_BKL=y
> CONFIG_DEBUG_PREEMPT=y
>
> I very much agree with proto-patch which _copies_ all relevant
> information into caller-supplied structure, keeping module_mutex private.
> Time to split it sanely.

That depends on the roadmap: if we think the freezer approach is the
best in the long run, maybe your less intrusive (in the sense that it
changes less stuff) patch should go in now (as a quick fix to mainline)
so that after we've sorted out the bugs from the freezer in -mm, it will
be easier to revert.

However, if we think the most reliable solution would be to not return
internal module information through pointers and keep all that logic
internal to module.c, then the "proto-patch" with some improvements
might be the way to go...

--
Paulo Marques - www.grupopie.com

"God is love. Love is blind. Ray Charles is blind. Ray Charles is God."
Re: [PATCH RESEND 2/2] Fix some kallsyms_lookup() vs rmmod races [message #11335 is a reply to message #11319] Mon, 19 March 2007 23:23 Go to previous message
Rusty Russell is currently offline  Rusty Russell
Messages: 10
Registered: March 2007
Junior Member
On Mon, 2007-03-19 at 13:21 +0300, Alexey Dobriyan wrote:
> I very much agree with proto-patch which _copies_ all relevant
> information into caller-supplied structure, keeping module_mutex private.
> Time to split it sanely.

Indeed. The current interface needs to be ripped apart and put together
sanely.

Thanks!
Rusty.
Previous Topic: [PATCH] p4-clockmod: switch to rdmsr_on_cpu/wrmsr_on_cpu
Next Topic: [PATCH RESEND 1/2] Race between cat /proc/kallsyms and rmmod
Goto Forum:
  


Current Time: Tue Oct 21 23:28:56 GMT 2025

Total time taken to generate the page: 0.17984 seconds