[SPARC64]: resumable error decoding [message #8497] |
Tue, 21 November 2006 13:34 |
Kirill Korotaev
Messages: 137 Registered: January 2006
|
Senior Member |
|
|
David,
Running stress tests on OpenVZ 2.6.18 sparc64 kernel we hit the following:
------- cut --------
[285401.094964] RESUMABLE ERROR: Reporting on cpu 0
[285401.626736] RESUMABLE ERROR: err_handle[410000000000c6f] err_stick[103921ee2007c] err_type[00000004:warning resumable]
[285402.869015] RESUMABLE ERROR: err_attrs[00000020: ]
[285403.491920] RESUMABLE ERROR: err_raddr[0000000000000000] err_size[0] err_cpu[0]
[285404.347508] TSTATE: 0000004480001602 TPC: 000000000041931c TNPC: 0000000000419320 Y: 00000000 Not tainted
[285405.496613] TPC: <cpu_idle+0x84/0xc0>
[285405.892615] g0: 00000000006e2531 g1: 0000000000000016 g2: 0000000000000014 g3: 00000000006def80
[285406.550536] g4: 00000000006e2f80 g5: fffff8000449bd40 g6: 00000000006def80 g7: 0000038000004000
[285406.884717] o0: 0000000000000000 o1: 00000000006def88 o2: 0000000000004000 o3: 4000000000000000
[285407.214724] o4: 0000000000001290 o5: 0000000000000012 sp: 00000000006e2531 ret_pc: 0000000000419308
[285407.562135] RPC: <cpu_idle+0x70/0xc0>
[285407.701342] l0: 00000000006de800 l1: 0000000000000027 l2: 0000000000000000 l3: 00000001ff000000
[285408.029282] l4: 0000000040004110 l5: 00000000fff74080 l6: 00000000fff4d701 l7: 00000000f0254040
[285408.348195] i0: 0000000100000000 i1: 0000000000000000 i2: 0000000000000000 i3: 0000000100000000
[285408.681920] i4: 0000000000000080 i5: 0000000000000080 i6: 00000000006e25f1 i7: 00000000007a67ec
[285409.010870] I7: <start_kernel+0x294/0x300>
------- cut --------
it looks like the hardware reports some problem and
the most interesting field is err_attrs...
u32 err_attrs;
#define SUN4V_ERR_ATTRS_PROCESSOR 0x00000001
#define SUN4V_ERR_ATTRS_MEMORY 0x00000002
#define SUN4V_ERR_ATTRS_PIO 0x00000004
#define SUN4V_ERR_ATTRS_INT_REGISTERS 0x00000008
#define SUN4V_ERR_ATTRS_FPU_REGISTERS 0x00000010
#define SUN4V_ERR_ATTRS_USER_MODE 0x01000000
#define SUN4V_ERR_ATTRS_PRIV_MODE 0x02000000
#define SUN4V_ERR_ATTRS_RES_QUEUE_FULL 0x80000000
.. which should explain what subsystem is faulty.
However, 2.6.18 kernel knows nothing about the value 0x20 :/
I also didn't find anything in available documenation about this.
Can you sched some light on this please?
A link to the doc or some hint would be very much appreciated.
Thanks,
Kirill
|
|
|
|
|
Re: [SPARC64]: resumable error decoding [message #8666 is a reply to message #8515] |
Thu, 30 November 2006 20:29 |
davem
Messages: 463 Registered: February 2006
|
Senior Member |
|
|
From: Kirill Korotaev <dev@sw.ru>
Date: Wed, 22 Nov 2006 13:19:28 +0300
> > I should add proper support for this, this report is a good reminder
> > :-)
> would be nice :@)
I tested the following patch and it worked fine for me on a T2000, let
me know if it works for you too:
commit 035f09edbbc921b9688a65ec58c0f49b822e605c
Author: David S. Miller <davem@sunset.davemloft.net>
Date: Wed Nov 29 21:16:21 2006 -0800
[SPARC64]: Run ctrl-alt-del action for sun4v powerdown request.
Signed-off-by: David S. Miller <davem@davemloft.net>
diff --git a/arch/sparc64/kernel/traps.c b/arch/sparc64/kernel/traps.c
index ec7a601..ad67784 100644
--- a/arch/sparc64/kernel/traps.c
+++ b/arch/sparc64/kernel/traps.c
@@ -10,7 +10,7 @@
*/
#include <linux/module.h>
-#include <linux/sched.h> /* for jiffies */
+#include <linux/sched.h>
#include <linux/kernel.h>
#include <linux/kallsyms.h>
#include <linux/signal.h>
@@ -1873,6 +1873,16 @@ void sun4v_resum_error(struct pt_regs *r
put_cpu();
+ if (ent->err_type == SUN4V_ERR_TYPE_WARNING_RES) {
+ /* If err_type is 0x4, it's a powerdown request. Do
+ * not do the usual resumable error log because that
+ * makes it look like some abnormal error.
+ */
+ printk(KERN_INFO "Power down request...\n");
+ kill_cad_pid(SIGINT, 1);
+ return;
+ }
+
sun4v_log_error(regs, &local_copy, cpu,
KERN_ERR "RESUMABLE ERROR",
&sun4v_resum_oflow_cnt);
|
|
|