OpenVZ Forum


Home » Mailing lists » Users » System hangs
System hangs [message #10596] Fri, 23 February 2007 11:10 Go to previous message
unilynx is currently offline  unilynx
Messages: 2
Registered: February 2007
Junior Member
I have two openvz servers, which both seem to like to hang 'in the
morning'. I've seen the problem with both the suse kernel
vmlinux-2.6.16.21-2.2-smp yesterday, and the stable
vmlinux-2.6.9-023stab040.1 today.

This time, I had some 'top's open, which report a load over >80. I can
SSH connect to the system, but both local and remote logins hang.
Interestingly, the VZs running on the machine still work, I can run
commands in them and they report no uptime.

I run the vzs on reiserfs/ext3 partitions, mounted over AoE. I have the
feeling the kernel might actually be hanging over NFS (I use NFS to
share configuration and administrative files for openvz, but not for the
VZs themselves: running VZ on NFS mounts didn't work), but restarting
the NFS server doesn't help anything. I rebooted one of the hanging
servers, and it could access the NFS just fine afterwards, so NFS itself
seems to be up.

syslog still worked, and I grabbed the following callstacks using sysrq
- I noticed at lot of cron processes hanging with this trace:

Feb 23 11:31:36 web2 kernel: cron S 0000807940a0
000001011ae0c050 0 3018 6456 3022 3019 3014 (NOTLB)
Feb 23 11:31:36 web2 kernel: 0000010119c23df8 0000000000000006
000001013f674f00 ffffffffa012706b
Feb 23 11:31:36 web2 kernel: 0000000000000000 ffffffff8017c62b
ffffffff8054bc80 0000000000000000
Feb 23 11:31:36 web2 kernel: 000001011ae0c050 0000807940a0edd0
Feb 23 11:31:36 web2 kernel: Call Trace: [<ffffffffa012706b>]
:simfs:sim_systemcall+0x6b/0x280
Feb 23 11:31:36 web2 kernel: [<ffffffff8017c62b>] do_wp_page+0x44b/0x4c0
Feb 23 11:31:36 web2 kernel: [<ffffffff8019ceb0>] pipe_wait+0xa0/0xf0
Feb 23 11:31:36 web2 kernel: [<ffffffff8013b8a0>]
autoremove_wake_function+0x0/0x30

....

None of the VZs should be running crontab as far as I know, so this
should be the crontab of the underlying system. I'm not sure if it
should even be in a simfs function?

I think these are the crons that invoke vpsnetclean and vpsreboot (which
also occur a lot in the process list), so this probably explains the >80
load.

The stack trace of vpsreboot:
Feb 23 11:31:44 web2 kernel: vpsreboot D 00008ad76e6a
0000010117e6e3d0 0 4316 4315 (NOTLB)
Feb 23 11:31:44 web2 kernel: 0000010117db9928 0000000000000006
0000000000000003 ffffffff8016f624
Feb 23 11:31:44 web2 kernel: 000001000000f380 0000000000000202
ffffffff8054bc80 0000000000000000
Feb 23 11:31:44 web2 kernel: 0000010117e6e3d0 00008ad76e6abd1c
Feb 23 11:31:44 web2 kernel: Call Trace: [<ffffffff8016f624>]
__alloc_collect_stats+0x54/0xc0
Feb 23 11:31:44 web2 kernel: [<ffffffffa00b2ec1>]
:sunrpc:rpc_sleep_on+0x41/0x70
Feb 23 11:31:44 web2 kernel: [<ffffffffa00b3bd0>]
:sunrpc:__rpc_execute+0x1f0/0x3c0
Feb 23 11:31:44 web2 kernel: [<ffffffff8013b8a0>]
autoremove_wake_function+0x0/0x30
Feb 23 11:31:44 web2 kernel: [<ffffffffa00b36c7>]
:sunrpc:rpc_init_task+0x157/0x1f0
Feb 23 11:31:44 web2 kernel: [<ffffffff8013b8a0>]
autoremove_wake_function+0x0/0x30
Feb 23 11:31:44 web2 kernel: [<ffffffffa00ae8d2>]
:sunrpc:rpc_call_sync+0x82/0xc0
Feb 23 11:31:44 web2 kernel: [<ffffffffa00fa41e>]
:nfs:nfs3_rpc_wrapper+0x2e/0x90
Feb 23 11:31:44 web2 kernel: [<ffffffffa00fabe9>]
:nfs:nfs3_proc_access+0x109/0x180

and vpsnetclean:
Feb 23 11:31:44 web2 kernel: vpsnetclean D 00008ad76e6a
0000010117e5ccf0 0 4318 4317 (NOTLB)
Feb 23 11:31:44 web2 kernel: 0000010117ed5928 0000000000000006
00000101312e67a8 ffffffff8016f624
Feb 23 11:31:44 web2 kernel: 000002000000f380 0000000000000001
ffffffff8054bc80 0000000000000000
Feb 23 11:31:44 web2 kernel: 0000010117e5ccf0 00008ad76e6a901c
Feb 23 11:31:44 web2 kernel: Call Trace: [<ffffffff8016f624>]
__alloc_collect_stats+0x54/0xc0
Feb 23 11:31:44 web2 kernel: [<ffffffffa00b2ec1>]
:sunrpc:rpc_sleep_on+0x41/0x70
Feb 23 11:31:44 web2 kernel: [<ffffffffa00b3bd0>]
:sunrpc:__rpc_execute+0x1f0/0x3c0
Feb 23 11:31:44 web2 kernel: [<ffffffff8013b8a0>]
autoremove_wake_function+0x0/0x30
Feb 23 11:31:44 web2 kernel: [<ffffffffa00b36c7>]
:sunrpc:rpc_init_task+0x157/0x1f0
Feb 23 11:31:44 web2 kernel: [<ffffffff8013b8a0>]
autoremove_wake_function+0x0/0x30
Feb 23 11:31:44 web2 kernel: [<ffffffffa00ae8d2>]
:sunrpc:rpc_call_sync+0x82/0xc0
Feb 23 11:31:44 web2 kernel: [<ffffffffa00fa41e>]
:nfs:nfs3_rpc_wrapper+0x2e/0x90

Any idea what I can do to investigate this further? Could putting
/etc/vz and /etc/sysconfig/vz-scripts on NFS be the source of the problems ?
 
Read Message
Read Message
Read Message
Read Message
Previous Topic: I/O stall with SuSE 10 Kernel
Next Topic: Patch for openSUSE 10.2 kernels?
Goto Forum:
  


Current Time: Thu Dec 26 21:45:50 GMT 2024

Total time taken to generate the page: 0.02643 seconds