OpenVZ Forum


Home » General » Support » *SOLVED* Random Crashes
*SOLVED* Random Crashes [message #10452] Mon, 19 February 2007 12:00 Go to next message
webmotive is currently offline  webmotive
Messages: 1
Registered: February 2007
Junior Member
Hi Members of the Support Forum,

i post this here and not at the Bugs section - because iam not realy sure we are talking about a bug or some missconfiguration on our side.

We are using OpenVz since some weeks and have random crashes of the hole machine. We made sure that the hardware is not the cause by moving all the stuff to a different server (same hardware config though).

Now we are sure now that it is not a memory or cpu failure. But it could still be some kind of conflict between some harware components and the VZKernel ...?

There are no debug messages in the syslog that are of any use ...

When the mashine crashes it is still possible to ping it for a while. After that it is not even possible to ping it any more.

We had to change the beancounters very much - otherwise we were getting errors about open,files etc, (see further down)

The crash seems to occour when load goes up. Any Ideas where we should start to search? Monitoring the beancounters did not give us a hint.

At the moment we definitly cant use the system for production ...

Very strange all together ...

Thanks

Christian

Kernel:
Linux version 2.6.18-8+openvz.2-007.1 (root@Nitrox) (gcc version 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)) #1 SMP Fri Jan


lspci:
00:00.0 Host bridge: VIA Technologies, Inc. K8T800Pro Host Bridge
00:00.1 Host bridge: VIA Technologies, Inc. K8T800Pro Host Bridge
00:00.2 Host bridge: VIA Technologies, Inc. K8T800Pro Host Bridge
00:00.3 Host bridge: VIA Technologies, Inc. K8T800Pro Host Bridge
00:00.4 Host bridge: VIA Technologies, Inc. K8T800Pro Host Bridge
00:00.7 Host bridge: VIA Technologies, Inc. K8T800Pro Host Bridge
00:01.0 PCI bridge: VIA Technologies, Inc. VT8237 PCI bridge [K8T800/K8T890 South]
00:09.0 RAID bus controller: 3ware Inc 7xxx/8xxx-series PATA/SATA-RAID (rev 01)
00:0b.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8169 Gigabit Ethernet (rev 10)
00:0d.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8169 Gigabit Ethernet (rev 10)
00:0f.0 RAID bus controller: VIA Technologies, Inc. VIA VT6420 SATA RAID Controller (rev 80)
00:0f.1 IDE interface: VIA Technologies, Inc. VT82C586A/B/VT82C686/A/B/VT823x/A/C PIPC Bus Master IDE (rev 06)
00:10.0 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 81)
00:10.1 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 81)
00:10.2 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 81)
00:10.3 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 81)
00:10.4 USB Controller: VIA Technologies, Inc. USB 2.0 (rev 86)
00:11.0 ISA bridge: VIA Technologies, Inc. VT8237 ISA bridge [KT600/K8T800/K8T890 South]
00:18.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration
00:18.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map
00:18.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller
00:18.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control
01:00.0 VGA compatible controller: Matrox Graphics, Inc. MGA G400/G450 (rev 85)


cat /proc/user_beancounters
Version: 2.5
       uid  resource           held    maxheld    barrier      limit    failcnt
        0:  kmemsize      204025604  269105689 2147483647 2147483647          0
            lockedpages           0          0 2147483647 2147483647          0
            privvmpages        5998      41589 2147483647 2147483647          0
            shmpages           1289       1625 2147483647 2147483647          0
            dummy                 0          0 2147483647 2147483647          0
            numproc              62        113 2147483647 2147483647          0
            physpages          3201      30419 2147483647 2147483647          0
            vmguarpages           0          0 2147483647 2147483647          0
            oomguarpages       3210      30428 2147483647 2147483647          0
            numtcpsock            5         12 2147483647 2147483647          0
            numflock              2          4 2147483647 2147483647          0
            numpty                2          2 2147483647 2147483647          0
            numsiginfo            1          5 2147483647 2147483647          0
            tcpsndbuf         66720     284672 2147483647 2147483647          0
            tcprcvbuf         81920     185264 2147483647 2147483647          0
            othersockbuf      33360     408016 2147483647 2147483647          0
            dgramrcvbuf           0      11696 2147483647 2147483647          0
            numothersock         32         49 2147483647 2147483647          0
            dcachesize      1614538    2428481 2147483647 2147483647          0
            numfile          975818    1299251 2147483647 2147483647          0
            dummy                 0          0 2147483647 2147483647          0
            dummy                 0          0 2147483647 2147483647          0
            dummy                 0          0 2147483647 2147483647          0
            numiptent            31         31 2147483647 2147483647          0



cat /proc/cpuinfo
processor       : 0
vendor_id       : AuthenticAMD
cpu family      : 15
model           : 43
model name      : AMD Athlon(tm) 64 X2 Dual Core Processor 4600+
stepping        : 1
cpu MHz         : 2399.830
cache size      : 512 KB
physical id     : 0
siblings        : 2
core id         : 0
cpu cores       : 2
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 1
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt lm 3dnowext 3dnow pni lahf_lm cmp_legacy ts fid vid ttp
bogomips        : 4805.43

processor       : 1
vendor_id       : AuthenticAMD
cpu family      : 15
model           : 43
model name      : AMD Athlon(tm) 64 X2 Dual Core Processor 4600+
stepping        : 1
cpu MHz         : 2399.830
cache size      : 512 KB
physical id     : 0
siblings        : 2
core id         : 1
cpu cores       : 2
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 1
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt lm 3dnowext 3dnow pni lahf_lm cmp_legacy ts fid vid ttp
bogomips        : 4799.54


appropriate kernel config for openvz
CONFIG_VZ_QUOTA=m
# CONFIG_VZ_QUOTA_UNLOAD is not set
CONFIG_VZ_QUOTA_UGID=y
# OpenVZ
CONFIG_VZ_GENCALLS=y
CONFIG_VZ_DEV=m
CONFIG_VZ_WDOG=m
CONFIG_VZ_CHECKPOINT=m

[Updated on: Thu, 22 March 2007 13:44] by Moderator

Report message to a moderator

Re: Random Crashes [message #10454 is a reply to message #10452] Mon, 19 February 2007 12:21 Go to previous messageGo to next message
Vasily Tarasov is currently offline  Vasily Tarasov
Messages: 1345
Registered: January 2006
Senior Member
Hello,

You're using the very old kernel! There were a lot of memory leaks and other bugs. Please, upgrade till 2.6.18-028test015, and report us, if the problem persists.

Thank you,
Vasily
Re: Random Crashes [message #10592 is a reply to message #10454] Fri, 23 February 2007 08:57 Go to previous messageGo to next message
grisu is currently offline  grisu
Messages: 8
Registered: February 2007
Junior Member
Vasily Tarasov wrote on Mon, 19 February 2007 07:21

Hello,

You're using the very old kernel! There were a lot of memory leaks and other bugs. Please, upgrade till 2.6.18-028test015, and report us, if the problem persists.



Ok, now we have a new kernel and 1.5 houre later, we have the next 'crash'.

it is running:
# cat /proc/version
Linux version 2.6.18-028test015-grisu-ovz028test015.1


and after some time (the backup-script with some IO was running) the maschin begin to crash.

You can't start a real ssh connection on the computer, but 'ssh root@XXXX uptime' was working, 'ssh root@XXXX bash' also.

If I typed 'date', I get a 'permission deny'?!

After this I push reset and now the computer is running... I wait for the next crash.

Some new ideas?

Thanks

Grisu
Re: Random Crashes [message #10593 is a reply to message #10592] Fri, 23 February 2007 10:23 Go to previous messageGo to next message
JimL is currently offline  JimL
Messages: 116
Registered: February 2007
Senior Member
Sounds strange enough to be a result of a hacked system. If you have an extra partition, I'd suggest reinstalling the system from scratch on a separate partition. Since you do appear have extra hardware perhaps you could use it instead just to eliminate that possibility. This is a long shot, but somthing to try.

Jim
Re: Random Crashes [message #10594 is a reply to message #10593] Fri, 23 February 2007 11:00 Go to previous messageGo to next message
grisu is currently offline  grisu
Messages: 8
Registered: February 2007
Junior Member
JimL wrote on Fri, 23 February 2007 05:23

Sounds strange enough to be a result of a hacked system. If you have an extra partition, I'd suggest reinstalling the system from scratch on a separate partition. Since you do appear have extra hardware perhaps you could use it instead just to eliminate that possibility. This is a long shot, but somthing to try.

Jim


Sorry, the ssh-problem was on the host-system and the hostsystem was reinstalled on both hardware. (we only copy the openvz's, not the whole disk)

Also the backup-script (faubackup) was running in the host-system.

At the time of the crash the openvz are running, but not in real use...

Thanks
Grisu
Re: Random Crashes [message #10666 is a reply to message #10594] Mon, 26 February 2007 10:41 Go to previous messageGo to next message
Vasily Tarasov is currently offline  Vasily Tarasov
Messages: 1345
Registered: January 2006
Senior Member
Hello,

It will be perfect, you'll post here:
1) full .config file you used, while compiling OpenVZ kernel in use
2) Your backup-script. Having it in hand we will be able to produce the same stress test locally and probably reproduce the crash.


It will be great, if you can set up a remote console and press magic keys while next crash (http://wiki.openvz.org/Remote_console_setup http://wiki.openvz.org/Magic_SysRq_Key)

Also you can compile OpenVZ kernel with official configs from openvz.org and check, is the problem reproducible with such kernels.

Thank you,
We appreciate you help very much,
Vasily
Re: Random Crashes [message #10714 is a reply to message #10666] Mon, 26 February 2007 23:01 Go to previous messageGo to next message
grisu is currently offline  grisu
Messages: 8
Registered: February 2007
Junior Member
Vasily Tarasov wrote on Mon, 26 February 2007 05:41

Hello,

It will be perfect, you'll post here:
1) full .config file you used, while compiling OpenVZ kernel in use



see Attachment.

Quote:


2) Your backup-script. Having it in hand we will be able to produce the same stress test locally and probably reproduce the crash.



it is only:
/usr/sbin/faubackup --one-file-system 10.0.0.136:/data1

see http://faubackup.sourceforge.net/

Quote:


It will be great, if you can set up a remote console and press magic keys while next crash (http://wiki.openvz.org/Remote_console_setup http://wiki.openvz.org/Magic_SysRq_Key)



Ok, I start Remote_console_setup...

Quote:


Also you can compile OpenVZ kernel with official configs from openvz.org and check, is the problem reproducible with such kernels.



I download kernel-smp-2.6.18-ovz028test015.1.i686.rpm and I try to using this kernel after the next reboot...

Thanks for your help

Grisu
Re: Random Crashes [message #10724 is a reply to message #10714] Tue, 27 February 2007 08:57 Go to previous messageGo to next message
Vasily Tarasov is currently offline  Vasily Tarasov
Messages: 1345
Registered: January 2006
Senior Member
Hello,


Thanks for the cooperation. I look forrward to information from you. Meanwhile in the evening I'll set your io-test on kernel compiled on your config.

Thanks.
Re: Random Crashes [message #10757 is a reply to message #10724] Wed, 28 February 2007 08:52 Go to previous messageGo to next message
grisu is currently offline  grisu
Messages: 8
Registered: February 2007
Junior Member
Vasily Tarasov wrote on Tue, 27 February 2007 03:57

Thanks for the cooperation. I look forrward to information from you. Meanwhile in the evening I'll set your io-test on kernel compiled on your config.



The first infos:
  • we have 2.6.18-ovz028test015.1-sm running
  • the remote console is running

and we now waiting for the next crash. Smile

Thanks

Grisu
Re: Random Crashes [message #10964 is a reply to message #10757] Sat, 10 March 2007 10:56 Go to previous messageGo to next message
grisu is currently offline  grisu
Messages: 8
Registered: February 2007
Junior Member
Quote:

The first infos:
  • we have 2.6.18-ovz028test015.1-sm running
  • the remote console is running

and we now waiting for the next crash. Smile



Ok. I have new infos:

first we run the OVZ on the Server1 with the 2.6.18-ovz028test015.1-sm-Kernel. With out any problem. Uptime >10 Days. Nice. Your Kernel is running.

Yesterday I switch the OVZ back the the Server2 with the 2.6.18-ovz028test015.1-sm-Kernel. The Software ist the same, the HW also, with one exception: the server2 have a raid-controller

The Server2 crash after X hours... Hm.

I add the remote-console-output from both Server. One with the kernel-ops.

Maybe you see now the problem...

Thanks for your help....

Grisu
  • Attachment: crash.txt
    (Size: 15.89KB, Downloaded 361 times)
  • Attachment: work.txt
    (Size: 14.03KB, Downloaded 299 times)

[Updated on: Tue, 13 March 2007 07:50] by Moderator

Report message to a moderator

Re: Random Crashes [message #11070 is a reply to message #10964] Tue, 13 March 2007 08:00 Go to previous messageGo to next message
Vasily Tarasov is currently offline  Vasily Tarasov
Messages: 1345
Registered: January 2006
Senior Member
Hello,

sorry for the answer delay! Your crash seems to be a hardware related problem. Can you, please, run memtest and cpuburn on server2, where you had Ooops (http://wiki.openvz.org/Hardware_testing). Please, run cpuburn under non-OpenVZ kernel.

Thanks,
Vasily.
Re: Random Crashes [message #11384 is a reply to message #11070] Thu, 22 March 2007 13:26 Go to previous messageGo to next message
grisu is currently offline  grisu
Messages: 8
Registered: February 2007
Junior Member
Hello

Thanks

Now we have checked the server and the company found a HW-error.

With the new server, openVz is working...

Thanks

Grisu
Re: Random Crashes [message #11386 is a reply to message #11384] Thu, 22 March 2007 13:43 Go to previous message
Vasily Tarasov is currently offline  Vasily Tarasov
Messages: 1345
Registered: January 2006
Senior Member
Thank you for info.

Good luck,
Vasily.
Previous Topic: *SOLVED* Different kernels in VE and hardware node
Next Topic: *SOLVED* kernel error
Goto Forum:
  


Current Time: Sun Sep 29 13:24:45 GMT 2024

Total time taken to generate the page: 0.04399 seconds