OpenVZ Forum


Home » General » Support » 2.6.18-164.2.1.el5.028stab066.7 VMCORE (3 crash withn 48 hours with newly install 2.6.18-164.2.1.el5.028stab066.7, how to get data from vmcore)
2.6.18-164.2.1.el5.028stab066.7 VMCORE [message #38218] Sun, 29 November 2009 17:27 Go to next message
Jean-Marc Pigeon is currently offline  Jean-Marc Pigeon
Messages: 27
Registered: October 2007
Junior Member
Obviously there is a problem with 2.6.18-164.2.1.el5.028stab066.7, within 48 hours it crashed 3 times
(while 2.6.18-128.2.1.el5.028stab064.8 never give me even one trouble).

I activated KDUMP, such for the last crash generated a vmcore file.
Problem "crash" is not willing to give me any data, as there is no debugging information in the kernel...
I tried to recompile kernel with builddebug and buildkdump set to 1, but crash is still not happy...


Could somebody give me some hint how to extract meaningful data from vmcore and feed the list??
what is the best way to proceed?
Seems to me the two last crash occur at almost the same time
(may be a cron action trigerring fault?).


Many Thanks for your help.
Re: 2.6.18-164.2.1.el5.028stab066.7 VMCORE [message #38231 is a reply to message #38218] Tue, 01 December 2009 03:42 Go to previous messageGo to next message
gombadi is currently offline  gombadi
Messages: 5
Registered: June 2006
Junior Member
Are you using nfs in the VPs's?

I have just upgraded a machine to the latest kernel - 2.6.18-164.2.1.el5.028stab066.7 - and it crashed each time during container startup. I selected the previous kernel in grub (stab064.7) and it boots fine.

I have limited information available but ask if you want more.

Some screenshots of the crash -
http://www.zagbot.com/panic1.png
http://www.zagbot.com/panic2.png
Re: 2.6.18-164.2.1.el5.028stab066.7 VMCORE [message #38232 is a reply to message #38231] Tue, 01 December 2009 04:02 Go to previous messageGo to next message
Jean-Marc Pigeon is currently offline  Jean-Marc Pigeon
Messages: 27
Registered: October 2007
Junior Member
My crash is random, everything is fine at boot time...
Yes I am using NFS (clients) inside VPS, but crash seems not related.
I wonder if there a real kernel trouble or if new
kernel showing hardware weakness/sensitivity....


My understanding: to analyze vmcore file, you need to
generate kernel with debuginfo RPM...
seems to me, specs file is not providing this for many release now
in 2.6.18-rhel5

Could someone knowledgeable on kernel RPM confirm this??,
is there a way to recompile kernel and generate debuginfo RPM??

Many thanks...
Re: 2.6.18-164.2.1.el5.028stab066.7 VMCORE [message #38292 is a reply to message #38232] Fri, 04 December 2009 14:01 Go to previous messageGo to next message
maratrus is currently offline  maratrus
Messages: 1495
Registered: August 2007
Location: Moscow
Senior Member
Hello,

it would be great if you were able to install serial console and get all logs from the crashing system.
http://wiki.openvz.org/Remote_console_setup#Serial_console
Moreover, when the kernel is in panic try to use Alt-Sysrq-*
- p (twice the number of CPUs)
- w (several times)
- t (for all processes' calltraces. Please, note this is a time consuming operation.)

Then file a bug. Describe your situation and provide the gathered logs.
http://bugzilla.openvz.org/

A properly filed bugreport will help you to solve the problem.
Re: 2.6.18-164.2.1.el5.028stab066.7 VMCORE [message #38310 is a reply to message #38292] Sat, 05 December 2009 16:00 Go to previous messageGo to next message
Jean-Marc Pigeon is currently offline  Jean-Marc Pigeon
Messages: 27
Registered: October 2007
Junior Member
I'll try this next time.
but system is not responding at all when crashing, nothing
is displayed, console switch is not working, no disk activity.
power off is the only way...

Seems to me 2.6.18-164.2.1.el5.028stab066.7 is a "troublemaker",
I put it on another hardware (dell 2800) and it crashed too (while
2.6.18-128.2.1.el5.028stab064.7 is no concern with the exact same configuration and hardware).

It is bad debuginfo capabilities are not in spec file anymore,
I am curious to know what could be the event triggering the crash.
Re: 2.6.18-164.2.1.el5.028stab066.7 VMCORE [message #38316 is a reply to message #38310] Sat, 05 December 2009 21:56 Go to previous messageGo to next message
maratrus is currently offline  maratrus
Messages: 1495
Registered: August 2007
Location: Moscow
Senior Member
Hi,

Quote:

I am curious to know what could be the event triggering the crash.


It's impossible to answer without having any information.
Serial console can help us to get ALL logs from your system and there is a chance that "Alt-Sysrq-*" will work.
If it was a crash something should be printed in logs if it was a kind of lock "Alt-Sysrq-*" may help to find out the reason.
Re: 2.6.18-164.2.1.el5.028stab066.7 VMCORE [message #38319 is a reply to message #38316] Sun, 06 December 2009 01:58 Go to previous messageGo to next message
Jean-Marc Pigeon is currently offline  Jean-Marc Pigeon
Messages: 27
Registered: October 2007
Junior Member
I had the vmcore file generated by kdump, I think everything is
inside, but debuginfo tools are missing...
Re: 2.6.18-164.2.1.el5.028stab066.7 VMCORE [message #38396 is a reply to message #38319] Mon, 14 December 2009 20:49 Go to previous messageGo to next message
khorenko is currently offline  khorenko
Messages: 533
Registered: January 2006
Location: Moscow, Russia
Senior Member
Hi guys,

i have good news for you: the problem mentioned here seems to be resolved.
Please, take a look at
http://bugzilla.openvz.org/show_bug.cgi?id=1375

There is a patch intended to fix the issue, so we appreciate if you test it and report back.

Thank you.

--
Konstantin


If your problem is solved - please, report it!
It's even more important than reporting the problem itself...
Re: 2.6.18-164.2.1.el5.028stab066.7 VMCORE [message #38397 is a reply to message #38396] Mon, 14 December 2009 21:21 Go to previous messageGo to next message
gombadi is currently offline  gombadi
Messages: 5
Registered: June 2006
Junior Member
Is this patch the same as the bug fix for an NFS OOPS that was included in 028stab066.10 that was released a few days ago or different things?

Re: 2.6.18-164.2.1.el5.028stab066.7 VMCORE [message #38398 is a reply to message #38396] Mon, 14 December 2009 21:24 Go to previous messageGo to next message
Jean-Marc Pigeon is currently offline  Jean-Marc Pigeon
Messages: 27
Registered: October 2007
Junior Member
Thanks for the info.
"my" crash was not as obvious as described in bug report
(crash happened "randomly", related to load?).

My main concern was the kernel provided without
debuginfo capabilities, such we (we as the users) were
not able to provide meaningful data (just guessing) to
fix/upgrade openVZ kernel.

I'll try 028stab066.10 quickly and let you know.

Busy working this week-end on "vzgot".. an RPM of mine.
so fare I have vzgoot boot|shutdown 'container_name' working and
I am able to run container within a PLAIN (untouched) 2.6.31.6-162.fc12 kernel.
I am using the exact same fc12 template I am using with openVZ, rc.sysinit and networking are working fine too.
(was able to access the container from outside via SSH).
I need to make the link to cgroup and implement something
as vzgot firstboot|migrate (not live) to have a nice little toy to play with.
Re: 2.6.18-164.2.1.el5.028stab066.7 VMCORE [message #38402 is a reply to message #38398] Tue, 15 December 2009 09:29 Go to previous message
khorenko is currently offline  khorenko
Messages: 533
Registered: January 2006
Location: Moscow, Russia
Senior Member
Hi,

Quote:
Is this patch the same as the bug fix for an NFS OOPS that was included in 028stab066.10 that was released a few days ago or different things?


No, the bug was found only yesterday and was not included into 66.10, which was released earlier.

Quote:
My main concern was the kernel provided without
debuginfo capabilities, such we (we as the users) were
not able to provide meaningful data (just guessing) to
fix/upgrade openVZ kernel.


Well, on one hand you are right, having a kernel built with debuginfo could help you provide us info about the problem.
But on the other hand:
1) most time memory dumps are redundant to findout a bug triggered an oops, complete oops messages are often enough.
2) At the same time configuring serial or network console is much simpler task than configuring crashdump, then use "crash" to provide necessary info - thus in most cases we ask for console logs.
3) Next: if a guy is advanced enough to configure crashdump and he is aware how to use "crash" utility, i believe it won't be so big problem for him to compile additionally a kernel with debuginfo enabled. Smile

Jean-Marc, good luck with "vzgot"! Hope you'll share it with the community once it is ready!

--
Konstantin


If your problem is solved - please, report it!
It's even more important than reporting the problem itself...
Previous Topic: About the network interface for OpenVZ
Next Topic: VZ NFS problem ...
Goto Forum:
  


Current Time: Thu Jul 18 15:35:15 GMT 2024

Total time taken to generate the page: 0.03632 seconds