IBM RAID controller re-mount lock [message #36976] |
Wed, 05 August 2009 14:35 |
maximilian.csuk
Messages: 2 Registered: August 2009
|
Junior Member |
|
|
Dear OpenVZ community,
We have a system from IBM (System x3400) with an IBM ServeRAID 8K-I controller on which we are running Debian-Lenny 4.1.2-25. The kernel is openvz-kernel 2.6.26-2-openvz-686.
The containers are located on a separate ext3-partition and are mounted like this:
/dev/drbd0 on /mnt/hq-drbd-stor01 type ext3 (rw)
In particular, we experience a lockup when we try to un-mount and remount the container's partition.
The following steps work:
vzctl chkpnt 77
vzctl stop 77
/etc/init.d/openvz stop
/etc/init.d/openvz start
vzctl restore 77
But these don't work:
vzctl chkpnt 77
vzctl stop 77
/etc/init.d/openvz stop
umount /dev/drbd0
mount /dev/drbd0
/etc/init.d/openvz start
vzctl restore 77
The restore produces the lockup.
The output until it hangs:
Restoring VE ...
Starting VE ...
VE is mounted
undump...
Adding IP address(es): 10.20.60.20
Setting CPU units: 1000
Configure meminfo: 65536
Nothing more is displayed after this.
The host shows 100% CPU (system-)usage, load goes up to about 10. From our understanding, a container's processes run inside a kernel-process on the host. The container's currently active process (as far as we can tell, it's a different one on any try) takes all the CPU from the host.
OpenVZ-version is 036test001.
We already tried changing the RAID-controller driver in use from aacraid-1.1.5.2449 to aacraid-1.1.5.2467 but without any luck.
Then we tried using USB-disks instead of the dev/drbd0 where the restore worked even after re-mounting, so the issue must lie within the RAID controller.
We also tried to run a DRBD device on two USB-disks instead of using ServeRAID attached disks this constellation also works, so we can assume that DRDB is working correctly
Is this a known issue? If yes, (how) can it be solved?
Is there a way to debug this and provide you and us with more information?
Thank you for your time.,
Maximilian Csuk
ipcenter.at
|
|
|
|
Re: IBM RAID controller re-mount lock [message #36990 is a reply to message #36984] |
Fri, 07 August 2009 09:07 |
maximilian.csuk
Messages: 2 Registered: August 2009
|
Junior Member |
|
|
Quote: | let's increase the verbose level of vzctl utility. In /etc/vz/vz.conf please set LOG_LEVEL and VERBOSE variables both to 10.
|
Done.
Quote: | Do you have a chance to get the pid of the process that consumes 100% cpu? What is the output of "/proc/pid/wchan"?
|
Yes. The output is "0".
Quote: | Are there any suspicious messages in dmesg or in logs?
|
I am not sure, but I guess not. Please have a look:
/var/log/vzctl:
2009-08-07T09:40:01+0200 vzctl : VE 77 : VE start in progress...
2009-08-07T09:40:01+0200 vzctl : VE 77 : Restoring completed succesfully
2009-08-07T09:41:28+0200 vzctl : VE 77 : Setting up checkpoint...
2009-08-07T11:41:28+0400 vzctl : VE 77 : suspend...
2009-08-07T11:41:28+0400 vzctl : VE 77 : dump...
2009-08-07T11:41:29+0400 vzctl : VE 77 : kill...
2009-08-07T09:41:29+0200 vzctl : VE 77 : Running: /usr/lib/vzctl/scripts/vps-net_del
2009-08-07T09:41:29+0200 vzctl : VE 77 : Running: /usr/sbin/vzquota stat 77 -f
2009-08-07T09:41:29+0200 vzctl : VE 77 : Running: /usr/sbin/vzquota off 77
2009-08-07T09:41:29+0200 vzctl : VE 77 : VE is unmounted
2009-08-07T09:41:29+0200 vzctl : VE 77 : Checkpointing completed succesfully
2009-08-07T09:41:36+0200 vzctl : VE 77 : Restoring VE ...
2009-08-07T09:41:36+0200 vzctl : VE 77 : Starting VE ...
2009-08-07T09:41:36+0200 vzctl : VE 77 : Running: /usr/sbin/vzquota show 77
2009-08-07T09:41:36+0200 vzctl : VE 77 : Running: /usr/sbin/vzquota on 77 -r 0 -b 1048676 -B 1153124 -i 200100 -I 220100 -e 0 -n 0 -s 0
2009-08-07T09:41:36+0200 vzctl : VE 77 : Mounting root: /var/lib/vz/root/77 /var/lib/vz/private/77
2009-08-07T09:41:36+0200 vzctl : VE 77 : VE is mounted
2009-08-07T09:41:36+0200 vzctl : VE 77 : Adding IP address(es): 10.20.60.20
2009-08-07T09:41:36+0200 vzctl : VE 77 : Running: /usr/lib/vzctl/scripts/vps-net_add
2009-08-07T09:41:38+0200 vzctl : VE 77 : Setting CPU units: 1000
2009-08-07T09:41:38+0200 vzctl : VE 77 : Configure meminfo: 65536
2009-08-07T09:41:38+0200 vzctl : VE 77 : Running: /usr/sbin/vzquota stat 77 -f
2009-08-07T09:41:38+0200 vzctl : VE 77 : Running: vzquota setlimit 77 -b 1048576 -B 1153024 -i 200000 -I 220000 -e 0 -n 0
2009-08-07T09:41:38+0200 vzctl : VE 77 : VE start in progress...
2009-08-07T09:41:38+0200 vzctl : VE 77 : Restoring completed succesfully
2009-08-07T09:42:11+0200 vzctl : VE 77 : Setting up checkpoint...
2009-08-07T11:42:11+0400 vzctl : VE 77 : suspend...
2009-08-07T11:42:11+0400 vzctl : VE 77 : dump...
2009-08-07T11:42:11+0400 vzctl : VE 77 : kill...
2009-08-07T09:42:11+0200 vzctl : VE 77 : Running: /usr/lib/vzctl/scripts/vps-net_del
2009-08-07T09:42:11+0200 vzctl : VE 77 : Running: /usr/sbin/vzquota stat 77 -f
2009-08-07T09:42:11+0200 vzctl : VE 77 : Running: /usr/sbin/vzquota off 77
2009-08-07T09:42:11+0200 vzctl : VE 77 : VE is unmounted
2009-08-07T09:42:11+0200 vzctl : VE 77 : Checkpointing completed succesfully
2009-08-07T09:42:12+0200 vzctl : VE 77 : Restoring VE ...
2009-08-07T09:42:12+0200 vzctl : VE 77 : Starting VE ...
2009-08-07T09:42:12+0200 vzctl : VE 77 : Running: /usr/sbin/vzquota show 77
2009-08-07T09:42:12+0200 vzctl : VE 77 : Running: /usr/sbin/vzquota on 77 -r 0 -b 1048676 -B 1153124 -i 200100 -I 220100 -e 0 -n 0 -s 0
2009-08-07T09:42:12+0200 vzctl : VE 77 : Mounting root: /var/lib/vz/root/77 /var/lib/vz/private/77
2009-08-07T09:42:12+0200 vzctl : VE 77 : VE is mounted
2009-08-07T09:42:12+0200 vzctl : VE 77 : Adding IP address(es): 10.20.60.20
2009-08-07T09:42:12+0200 vzctl : VE 77 : Running: /usr/lib/vzctl/scripts/vps-net_add
2009-08-07T09:42:14+0200 vzctl : VE 77 : Setting CPU units: 1000
2009-08-07T09:42:14+0200 vzctl : VE 77 : Configure meminfo: 65536
2009-08-07T09:42:14+0200 vzctl : VE 77 : Running: /usr/sbin/vzquota stat 77 -f
2009-08-07T09:42:14+0200 vzctl : VE 77 : Running: vzquota setlimit 77 -b 1048576 -B 1153024 -i 200000 -I 220000 -e 0 -n 0
Nothing more after this...
In dmesg, shortly before the lockup, it says:
[74677.049342] CT: 77: started
But then, there are more and more messages showing that the particular pid is hanging. In this try, it was sendmail-mta, but others have been the cause too (like apache):
[75007.309371] BUG: soft lockup - CPU#3 stuck for 61s! [sendmail-mta:13227]
[75007.309371] Modules linked in: vzethdev vznetdev simfs vzrst vzcpt tun vzmon xt_tcpudp xt_length ipt_ttl xt_tcpmss xt_TCPMSS iptable_mangle iptable_filter xt_multiport xt_limit xt_dscp ipt_REJECT ip_tables x_tables vzdquota vzdev drbd cn bridge ipv6 bonding loop iTCO_wdt i2c_i801 rng_core pcspkr shpchp parport_pc parport i2c_core i5000_edac edac_core pci_hotplug container button joydev evdev ext3 jbd mbcache usbhid hid ff_memless usb_storage sg sr_mod cdrom 8139too ata_piix ata_generic libata dock ses enclosure sd_mod 8139cp mii ide_pci_generic ide_core ehci_hcd uhci_hcd usbcore tg3 e1000e aacraid scsi_mod thermal processor fan thermal_sys [last unloaded: simfs]
[75007.309371]
[75007.309371] Pid: 13227, comm: sendmail-mta Not tainted (2.6.26-2-openvz-686 #1 036test001)
[75007.309371] EIP: 0060:[<c013d31f>] EFLAGS: 00000257 CPU: 3
[75007.309371] EIP is at up_read+0x7/0xf
[75007.309371] EAX: f7996bf8 EBX: f156121c ECX: 00000000 EDX: 00000001
[75007.309371] ESI: f7996bf8 EDI: f7996bc0 EBP: 00000090 ESP: f148fc2c
[75007.309371] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
[75007.309371] CR0: 8005003b CR2: b7856000 CR3: 37c3e000 CR4: 000006f0
[75007.309371] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
[75007.309371] DR6: ffff0ff0 DR7: 00000400
[75007.309371] [<c011b3f4>] ? do_page_fault+0x50e/0x8c6
[75007.309371] [<c0131599>] ? run_timer_softirq+0x30/0x1d1
[75007.309371] [<c012dff2>] ? __do_softirq+0x9b/0x14e
[75007.309371] [<c011aee6>] ? do_page_fault+0x0/0x8c6
[75007.309371] [<c02c5672>] ? error_code+0x72/0x78
[75007.309371] [<c02c00d8>] ? init_cyrix+0x168/0x46b
[75007.309371] [<c0161b69>] ? file_read_actor+0x2b/0xcc
[75007.309371] [<c016398d>] ? generic_file_aio_read+0x2e2/0x4da
[75007.309371] [<c0184346>] ? do_sync_read+0xbf/0xfe
[75007.309371] [<c013a2c4>] ? autoremove_wake_function+0x0/0x2d
[75007.309371] [<f9191224>] ? file_pread+0x3c/0x66 [vzrst]
[75007.309371] [<f9190f5d>] ? _rst_get_object+0x85/0x98 [vzrst]
[75007.309371] [<f9195d1d>] ? do_rst_mm+0x1642/0x1b40 [vzrst]
[75007.309371] [<f919634b>] ? rst_mm_complete+0x130/0x1f5 [vzrst]
[75007.309371] [<f918fe58>] ? hook+0xa13/0x19db [vzrst]
[75007.309371] [<c01db7ed>] ? cic_free_func+0x4b/0x53
[75007.309371] [<c012cd42>] ? do_exit+0x642/0x646
[75007.309371] [<c0197941>] ? mntput_no_expire+0x13/0xd2
[75007.309371] [<c01079c7>] ? __switch_to+0xa5/0x125
[75007.309371] [<c0125314>] ? schedule_tail+0x17/0x44
[75007.309371] [<c01087fa>] ? ret_from_fork+0x6/0xe
[75007.309371] [<f918f445>] ? hook+0x0/0x19db [vzrst]
[75007.309371] [<f918f445>] ? hook+0x0/0x19db [vzrst]
[75007.309371] [<c0109597>] ? kernel_thread_helper+0x7/0x10
[75007.309371] =======================
Sorry for the long output, but this is what the sysrq-calls gave me:
Output of sysrq-m:
[ 739.006053] Show Memory
[ 739.006053] Mem-info:
[ 739.006053] DMA per-cpu:
[ 739.006053] CPU 0: hi: 0, btch: 1 usd: 0
[ 739.006053] CPU 1: hi: 0, btch: 1 usd: 0
[ 739.006053] CPU 2: hi: 0, btch: 1 usd: 0
[ 739.006053] CPU 3: hi: 0, btch: 1 usd: 0
[ 739.006053] Normal per-cpu:
[ 739.006053] CPU 0: hi: 186, btch: 31 usd: 132
[ 739.006053] CPU 1: hi: 186, btch: 31 usd: 55
[ 739.006053] CPU 2: hi: 186, btch: 31 usd: 156
[ 739.006053] CPU 3: hi: 186, btch: 31 usd: 88
[ 739.006053] HighMem per-cpu:
[ 739.006053] CPU 0: hi: 186, btch: 31 usd: 156
[ 739.006053] CPU 1: hi: 186, btch: 31 usd: 64
[ 739.006053] CPU 2: hi: 186, btch: 31 usd: 112
[ 739.006053] CPU 3: hi: 186, btch: 31 usd: 41
[ 739.006053] Active:7699 inactive:4989 dirty:53 writeback:0 unstable:0
[ 739.006053] free:1535325 slab:2934 mapped:1615 pagetables:136 bounce:0
[ 739.006053] DMA free:12852kB min:68kB low:84kB high:100kB active:0kB inactive:0kB present:16240kB pages_scanned:0 all_unreclaimable? no
[ 739.006053] lowmem_reserve[]: 0 872 7089 7089
[ 739.006053] Normal free:798948kB min:3740kB low:4672kB high:5608kB active:12220kB inactive:5760kB present:893200kB pages_scanned:0 all_unreclaimable? no
[ 739.006053] lowmem_reserve[]: 0 0 49735 49735
[ 739.006053] HighMem free:5329500kB min:512kB low:7180kB high:13852kB active:18576kB inactive:14196kB present:6366080kB pages_scanned:0 all_unreclaimable? no
[ 739.006053] lowmem_reserve[]: 0 0 0 0
[ 739.006053] DMA: 3*4kB 3*8kB 5*16kB 2*32kB 4*64kB 3*128kB 1*256kB 1*512kB 1*1024kB 1*2048kB 2*4096kB = 12852kB
[ 739.006053] Normal: 107*4kB 315*8kB 236*16kB 170*32kB 91*64kB 47*128kB 25*256kB 7*512kB 5*1024kB 7*2048kB 182*4096kB = 798916kB
[ 739.006053] HighMem: 3*4kB 19*8kB 41*16kB 112*32kB 83*64kB 44*128kB 12*256kB 5*512kB 2*1024kB 3*2048kB 1294*4096kB = 5329396kB
[ 739.006053] 10767 total pagecache pages
[ 739.006053] Swap cache: add 0, delete 0, find 0/0
[ 739.006053] Free swap = 979956kB
[ 739.006053] Total swap = 979956kB
[ 739.024690] 1835008 pages of RAM
[ 739.024690] 1605632 pages of HIGHMEM
[ 739.024690] 279573 reserved pages
[ 739.024690] 10533 pages shared
[ 739.024690] 0 pages swap cached
[ 739.024690] 53 pages dirty
[ 739.024690] 0 pages writeback
[ 739.024690] 1615 pages mapped
[ 739.024690] 2934 pages slab
[ 739.024690] 136 pages pagetables
[ 739.024690] Top 10 caches:
[ 739.024690] ext3_inode_cache : size 461760 objsize 512
[ 739.024690] sysfs_dir_cache : size 434176 objsize 44
[ 739.024690] radix_tree_node : size 413696 objsize 288
...
|
|
|
|