funny thing. I wasn't actually right about "the failed VE's processes popping up in top".
top - 22:32:30 up 7 days, 8:04, 2 users, load average: 49.88, 39.72, 23.21
Tasks: 390 total, 2 running, 387 sleeping, 0 stopped, 1 zombie
Cpu0 : 1.0%us, 7.0%sy, 0.0%ni, 91.7%id, 0.3%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu1 : 1.0%us, 6.0%sy, 0.0%ni, 93.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu2 : 0.0%us, 0.0%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu3 : 2.6%us, 13.8%sy, 0.0%ni, 83.6%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu4 : 2.0%us, 12.8%sy, 0.0%ni, 85.2%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu5 : 1.7%us, 12.2%sy, 0.0%ni, 86.1%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu6 : 1.3%us, 10.6%sy, 0.0%ni, 87.8%id, 0.3%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu7 : 1.3%us, 8.2%sy, 0.0%ni, 90.5%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 8164444k total, 8119288k used, 45156k free, 149272k buffers
Swap: 4192956k total, 156k used, 4192800k free, 576340k cached
This has happened just now. I've killed the applications running on the failed VE. NOTHING is using the CPU at all. LA is growing. I'm floored. top only shows its own "top" process - and that's natural... there's nothing launched except it. the LA is already above 50. I can't restart the failed VE, can't raise the kmemsize or anything. The error about table overflow is there in dmesg. I am now nearly sure it's a kernel bug - i can't imagine any other reason.
This may be the last time i'm facing this - i have loads of RAM and have been raising memory limits for all failing VEs. It's been over a week of stable work, the next time will probably happen in August if it will happen at all. that entire situation just sucks.
"It's the power cord", I say
[Updated on: Fri, 17 July 2009 20:33] Report message to a moderator
|