*SOLVED* Random Crashes [message #10452] |
Mon, 19 February 2007 12:00 |
webmotive
Messages: 1 Registered: February 2007
|
Junior Member |
|
|
Hi Members of the Support Forum,
i post this here and not at the Bugs section - because iam not realy sure we are talking about a bug or some missconfiguration on our side.
We are using OpenVz since some weeks and have random crashes of the hole machine. We made sure that the hardware is not the cause by moving all the stuff to a different server (same hardware config though).
Now we are sure now that it is not a memory or cpu failure. But it could still be some kind of conflict between some harware components and the VZKernel ...?
There are no debug messages in the syslog that are of any use ...
When the mashine crashes it is still possible to ping it for a while. After that it is not even possible to ping it any more.
We had to change the beancounters very much - otherwise we were getting errors about open,files etc, (see further down)
The crash seems to occour when load goes up. Any Ideas where we should start to search? Monitoring the beancounters did not give us a hint.
At the moment we definitly cant use the system for production ...
Very strange all together ...
Thanks
Christian
Kernel:
Linux version 2.6.18-8+openvz.2-007.1 (root@Nitrox) (gcc version 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)) #1 SMP Fri Jan
lspci:
00:00.0 Host bridge: VIA Technologies, Inc. K8T800Pro Host Bridge
00:00.1 Host bridge: VIA Technologies, Inc. K8T800Pro Host Bridge
00:00.2 Host bridge: VIA Technologies, Inc. K8T800Pro Host Bridge
00:00.3 Host bridge: VIA Technologies, Inc. K8T800Pro Host Bridge
00:00.4 Host bridge: VIA Technologies, Inc. K8T800Pro Host Bridge
00:00.7 Host bridge: VIA Technologies, Inc. K8T800Pro Host Bridge
00:01.0 PCI bridge: VIA Technologies, Inc. VT8237 PCI bridge [K8T800/K8T890 South]
00:09.0 RAID bus controller: 3ware Inc 7xxx/8xxx-series PATA/SATA-RAID (rev 01)
00:0b.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8169 Gigabit Ethernet (rev 10)
00:0d.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8169 Gigabit Ethernet (rev 10)
00:0f.0 RAID bus controller: VIA Technologies, Inc. VIA VT6420 SATA RAID Controller (rev 80)
00:0f.1 IDE interface: VIA Technologies, Inc. VT82C586A/B/VT82C686/A/B/VT823x/A/C PIPC Bus Master IDE (rev 06)
00:10.0 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 81)
00:10.1 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 81)
00:10.2 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 81)
00:10.3 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (rev 81)
00:10.4 USB Controller: VIA Technologies, Inc. USB 2.0 (rev 86)
00:11.0 ISA bridge: VIA Technologies, Inc. VT8237 ISA bridge [KT600/K8T800/K8T890 South]
00:18.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration
00:18.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map
00:18.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller
00:18.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control
01:00.0 VGA compatible controller: Matrox Graphics, Inc. MGA G400/G450 (rev 85)
cat /proc/user_beancounters
Version: 2.5
uid resource held maxheld barrier limit failcnt
0: kmemsize 204025604 269105689 2147483647 2147483647 0
lockedpages 0 0 2147483647 2147483647 0
privvmpages 5998 41589 2147483647 2147483647 0
shmpages 1289 1625 2147483647 2147483647 0
dummy 0 0 2147483647 2147483647 0
numproc 62 113 2147483647 2147483647 0
physpages 3201 30419 2147483647 2147483647 0
vmguarpages 0 0 2147483647 2147483647 0
oomguarpages 3210 30428 2147483647 2147483647 0
numtcpsock 5 12 2147483647 2147483647 0
numflock 2 4 2147483647 2147483647 0
numpty 2 2 2147483647 2147483647 0
numsiginfo 1 5 2147483647 2147483647 0
tcpsndbuf 66720 284672 2147483647 2147483647 0
tcprcvbuf 81920 185264 2147483647 2147483647 0
othersockbuf 33360 408016 2147483647 2147483647 0
dgramrcvbuf 0 11696 2147483647 2147483647 0
numothersock 32 49 2147483647 2147483647 0
dcachesize 1614538 2428481 2147483647 2147483647 0
numfile 975818 1299251 2147483647 2147483647 0
dummy 0 0 2147483647 2147483647 0
dummy 0 0 2147483647 2147483647 0
dummy 0 0 2147483647 2147483647 0
numiptent 31 31 2147483647 2147483647 0
cat /proc/cpuinfo
processor : 0
vendor_id : AuthenticAMD
cpu family : 15
model : 43
model name : AMD Athlon(tm) 64 X2 Dual Core Processor 4600+
stepping : 1
cpu MHz : 2399.830
cache size : 512 KB
physical id : 0
siblings : 2
core id : 0
cpu cores : 2
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 1
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt lm 3dnowext 3dnow pni lahf_lm cmp_legacy ts fid vid ttp
bogomips : 4805.43
processor : 1
vendor_id : AuthenticAMD
cpu family : 15
model : 43
model name : AMD Athlon(tm) 64 X2 Dual Core Processor 4600+
stepping : 1
cpu MHz : 2399.830
cache size : 512 KB
physical id : 0
siblings : 2
core id : 1
cpu cores : 2
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 1
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt lm 3dnowext 3dnow pni lahf_lm cmp_legacy ts fid vid ttp
bogomips : 4799.54
appropriate kernel config for openvz
CONFIG_VZ_QUOTA=m
# CONFIG_VZ_QUOTA_UNLOAD is not set
CONFIG_VZ_QUOTA_UGID=y
# OpenVZ
CONFIG_VZ_GENCALLS=y
CONFIG_VZ_DEV=m
CONFIG_VZ_WDOG=m
CONFIG_VZ_CHECKPOINT=m
[Updated on: Thu, 22 March 2007 13:44] by Moderator Report message to a moderator
|
|
|
|
Re: Random Crashes [message #10592 is a reply to message #10454] |
Fri, 23 February 2007 08:57 |
grisu
Messages: 8 Registered: February 2007
|
Junior Member |
|
|
Vasily Tarasov wrote on Mon, 19 February 2007 07:21 | Hello,
You're using the very old kernel! There were a lot of memory leaks and other bugs. Please, upgrade till 2.6.18-028test015, and report us, if the problem persists.
|
Ok, now we have a new kernel and 1.5 houre later, we have the next 'crash'.
it is running:
# cat /proc/version
Linux version 2.6.18-028test015-grisu-ovz028test015.1
and after some time (the backup-script with some IO was running) the maschin begin to crash.
You can't start a real ssh connection on the computer, but 'ssh root@XXXX uptime' was working, 'ssh root@XXXX bash' also.
If I typed 'date', I get a 'permission deny'?!
After this I push reset and now the computer is running... I wait for the next crash.
Some new ideas?
Thanks
Grisu
|
|
|
|
|
|
|
|
|
|
|
|
|