OpenVZ Forum


Home » General » Support » Checkpoint bug
Checkpoint bug [message #8818] Thu, 07 December 2006 10:17 Go to next message
dagr is currently offline  dagr
Messages: 83
Registered: February 2006
Member
Again i get strange results with checkpointing

2.6.9-023stab032 SMP - RHEL4 x86 - vzctl version 3.0.13

//////////////////////////////////////////////////
[dagr@ws-ca dagr]$ sudo vzctl start 555
Starting VE ...
VE is mounted
Adding IP address(es): 10.0.0.55
Setting CPU units: 1000
Set hostname: 10.0.0.55
VE start in progress...
[dagr@ws-ca dagr]$ sudo vzctl chkpnt 555 --suspend
Setting up checkpoint...
suspend...
get context...
Checkpointing completed succesfully
[dagr@ws-ca dagr]$ sudo vzctl chkpnt 555 --dump --dumpfile ./img
Setting up checkpoint...
join context..
dump...
Checkpointing completed succesfully
[dagr@ws-ca dagr]$ sudo vzctl status 555
VEID 555 exist mounted running
[dagr@ws-ca dagr]$ sudo vzctl chkpnt 555 --kill
Killing...
[dagr@ws-ca dagr]$ sudo vzctl status 555
VEID 555 exist mounted down
[dagr@ws-ca dagr]$ sudo vzctl restore 555 --undump --dumpfile ./img
Restoring VE ...
Starting VE ...
VE is mounted
undump...
Adding IP address(es): 10.0.0.55
Setting CPU units: 1000
get context...
VE start in progress...
Restoring completed succesfully
[dagr@ws-ca dagr]$ sudo vzctl enter 555
enter into VE 555 failed
[dagr@ws-ca dagr]$ sudo vzctl stop 555
Stopping VE ...
Unable to stop VE: operation timed out
//////////////////////////////////////////////////////////// //
After this there is no way to stop VPS without rebooting HN. Also /etc/init.d/vz stop is not able to stop it and just continuosly tries , hanging the whole server, so only hard reset helps !!
After reboot vps is ok to run, start,stop,enter , until next try to dump and restore - i checked , its "restore" section which make it hang.
//////////////////////////////////////////////////////////// //
Before that i tried to restore dump on another hn - an got this

Adding IP address(es): 10.0.0.55
Setting CPU units: 1000
Error: undump failed: No such file or directory
Restoring failed:
rst_file: -2 20496
rst_files: -2
make_baby: -2
rst_clone_children
VE start failed
Stopping VE ...
VE was stopped
VE is unmounted

///////////////////////////////////////////
Same error if i do all on same hn , but dont kill after making dump , but resume and then stop . Also i noticed that after killing - it doesnt umounts VPS , tried also umount after killing - didnt help .

///////////////////////////////////////////////////////

At least tell me - if the any way top stop it without restarting the HN ?




Re: Checkpoint bug [message #8823 is a reply to message #8818] Thu, 07 December 2006 11:25 Go to previous messageGo to next message
dagr is currently offline  dagr
Messages: 83
Registered: February 2006
Member
Tried also vzctl 3.0.13 - and 2.6.18-ovz028test005.1-smp

restore on same HN

[dagr@ssw2-ca utils]$ sudo vzctl restore 10121 --undump --dumpfile ./img
Restoring VE ...
Starting VE ...
VE is unmounted
VE is mounted
undump...
Adding IP address(es): 10.1.2.1
Setting CPU units: 1000

////////////////////
Then its just hangs and server hangs as well , (on kernel 2.6.9 it hanged only during reboot) , so again hard reset. On console see this

[<c02889cf>] unblank_screen+0xf/0x20
[<c023cb16>] bust_spinlocks+0x36/0x50
[<c010f43a>] smp_nmi_callback+0x6a/0x90
[<c010546c>] do_nmi+0x7c/0xa0
[<c0103d12>] nmi_stack_correct+0x1d/0x22
[<c011007b>] do_boot_cpu+0x16b/0x2e0
[<c0494c8d>] __read_lock_failed+0x5/0x18
[<c0496e9b>] _read_lock+0xb/0x10
[<c0132f7d>] send_group_sig_info+0xd/0x40
[<c012f894>] lock_timer_base+0x24/0x50
[<c0129d0e>] it_real_fn+0x2e/0x80
[<c0129ce0>] it_real_fn+0x0/0x80
[<c0140d1a>] hrtimer_run_queues+0x8a/0x110
[<c0130cd7>] run_timer_softirq+0x27/0x1e0
[<c012b322>] __do_softirq+0xa2/0x150
[<c0105ce5>] do_softirq+0x55/0xb0
=======================
[<c0130b8a>] update_process_times+0x4a/0xc0
[<c011122e>] smp_apic_timer_interrupt+0x8e/0xb0
[<c0100e6c>] default_idle+0x2c/0x60
[<c0103bbb>] apic_timer_interrupt+0x1f/0x24
[<c0100e6c>] default_idle+0x2c/0x60
[<c0100f32>] cpu_idle+0x72/0x90

Re: Checkpoint bug [message #8827 is a reply to message #8823] Thu, 07 December 2006 12:06 Go to previous messageGo to next message
dagr is currently offline  dagr
Messages: 83
Registered: February 2006
Member
Ok - i found out with first kernel . It wasnt bug . I just forgot to make restore resume.

But restore with kernel 2.6.18 - is a REAL BUG.
Re: Checkpoint bug [message #8832 is a reply to message #8827] Thu, 07 December 2006 12:23 Go to previous messageGo to next message
dim is currently offline  dim
Messages: 344
Registered: August 2005
Senior Member
Please, try to replicate with 028test007 kernel and file it, if exists. Note, checkpointing is still buggy indeed in 2.6.18 branch.

http://static.openvz.org/openvz_userbar_en.gif
Re: Checkpoint bug [message #8840 is a reply to message #8818] Thu, 07 December 2006 14:23 Go to previous messageGo to next message
dagr is currently offline  dagr
Messages: 83
Registered: February 2006
Member
File it ? What do you mean? I gave all output.
Re: Checkpoint bug [message #8841 is a reply to message #8818] Thu, 07 December 2006 14:25 Go to previous messageGo to next message
dagr is currently offline  dagr
Messages: 83
Registered: February 2006
Member
Also - is there any way to use CPT with mounts. When i start vps with any kind of mount ( bind or loop,ro ) , its not possible even suspend -
[dagr@ws-ca mnt]$ sudo vzctl chkpnt php4_base --suspend
Setting up checkpoint...
suspend...
Can not suspend VE: Invalid argument
unsupported fs type ext3
Re: Checkpoint bug [message #8843 is a reply to message #8840] Thu, 07 December 2006 14:27 Go to previous messageGo to next message
dim is currently offline  dim
Messages: 344
Registered: August 2005
Senior Member
I mean create new bug at http://bugzilla.openvz.org, where specify kernel version and post full oops messages from dmesg output in it.


http://static.openvz.org/openvz_userbar_en.gif
Re: Checkpoint bug [message #8844 is a reply to message #8818] Thu, 07 December 2006 14:36 Go to previous messageGo to next message
dagr is currently offline  dagr
Messages: 83
Registered: February 2006
Member
did it
So what about checkpointing VPS with mounts ?
Re: Checkpoint bug [message #8846 is a reply to message #8841] Thu, 07 December 2006 15:40 Go to previous messageGo to next message
Andrey Mirkin is currently offline  Andrey Mirkin
Messages: 193
Registered: May 2006
Senior Member
Support for bind mounts in CPT already implemented.
Right now you can use latest development kernel (2.6.18-007).
New stable version (2.6.9 kernel) with bind mounts support in CPT will be released soon.


Andrey Mirkin
http://static.openvz.org/userbars/openvz-developer.png
Re: Checkpoint bug [message #8849 is a reply to message #8846] Thu, 07 December 2006 16:18 Go to previous message
dagr is currently offline  dagr
Messages: 83
Registered: February 2006
Member
Cool , at first look it works ok with test7 kernel , will test more.
Previous Topic: *SOLVED* samba and openvz
Next Topic: Memory Usage
Goto Forum:
  


Current Time: Fri Oct 11 20:23:12 GMT 2024

Total time taken to generate the page: 0.08395 seconds