*SOLVED* Error: undump failed: Resource temporarily unavailable [message #36644] |
Tue, 07 July 2009 14:12 |
duswil
Messages: 77 Registered: January 2006
|
Member |
|
|
(My post was lost in the big database failure... I'll try to recreate as much as I can from the original, from my brain.)
I am using DRBD + GFS on two HNs (named: kotoko and tomoka), both running Ubuntu Hardy (no backports or custom packages, fully updated, everything that is installed is from the official repositories). The two HNs are both 686 kernels and are running similar hardware (nothing 64-bit, etc). Same kernel versions (and same versions of everything else too).
Linux kotoko 2.6.24-24-openvz #1 SMP Tue Jun 30 23:37:21 UTC 2009 i686 GNU/Linux
Linux tomoka 2.6.24-24-openvz #1 SMP Tue Jun 30 23:37:21 UTC 2009 i686 GNU/Linux
DRBD VERSION: version: 8.0.11 (api:86/proto:86)
DRBD STATUS: 0: cs:Connected st:Primary/Primary ds:UpToDate/UpToDate C r---
The GFS filesystem is mounted on /var/lib/vz on both machines. They're both able to read/write the mount with no trouble. I have some VEs running on each box (but not the same VE running on both, each individual VE only runs on one HN at a time of course).
Everything is working perfectly. I have finely tuned the system to run beautifully. DRBD/GFS is running over a crossover cable between the nodes, not shared on the other internal office networks that the boxes are connected to.
THE PROBLEM:
When I try to do a live migration between the two HNs, it fails with an error message (shown below), generally summed up as "Error: undump failed: Resource temporarily unavailable". Offline migration (power off on one HN and power on on the other HN) works perfectly. The problem only appears during live migration.
THE ERROR:
root@tomoka:~# vzctl restore 3015 --undump --dumpfile /var/lib/vz/temp/DUMP.3015
Restoring VE ...
Starting VE ...
Initializing quota ...
VE is mounted
undump...
Setting CPU units: 2167
Configure meminfo: 1000000
Configure veth devices: veth3015.0 veth3015.1
Error: undump failed: Resource temporarily unavailable
Restoring failed:
Error: rst_file: -11 27808
Error: rst_files: -11
Error: make_baby: -11
Error: rst_clone_children
VE start failed
Stopping VE ...
VE was stopped
VE is unmounted
Any ideas?
[Updated on: Wed, 08 July 2009 11:28] Report message to a moderator
|
|
|
|
|
Re: Error: undump failed: Resource temporarily unavailable [message #36659 is a reply to message #36644] |
Wed, 08 July 2009 11:28 |
duswil
Messages: 77 Registered: January 2006
|
Member |
|
|
Heh.. that worked nicely.
For those interested, here's my script (not broken out into nice functions for the error handling (some duplicate code), but it works great):
#!/bin/bash
VEID=$1
# test DRBD
if grep -q 'UpToDate/UpToDate' /proc/drbd; then
echo
echo "DRBD is OK. Continuing..."
echo
else
echo
cat /proc/drbd
echo
echo "*** DRBD is NOT OK. Quitting..."
echo
exit
fi
# test VEID
RUNNING=`vzctl status $VEID | awk '{print $5}'`
if [ "$RUNNING" != "running" ]; then
echo "*** $VEID is not running or does not exist. Online migration is pointless in this situation."
echo
exit 1
fi
# freeze and dump
echo "REMOVING OLD DUMPS (if any)..."
rm -f "/var/lib/vz/dump/$VEID"
echo "LOCALLY FREEZING $VEID..."
if ! vzctl chkpnt $VEID --suspend; then
echo "*** FAILED to suspend $VEID."
exit
fi
echo "LOCALLY DUMPING $VEID..."
if ! vzctl chkpnt $VEID --dump --dumpfile /var/lib/vz/dump/$VEID; then
echo "*** FAILED to dump $VEID. Resuming $VEID on current HN."
if ! vzctl chkpnt $VEID --resume; then
echo "*** FAILED to resume $VEID on current HN. I'll leave it to you to figure it out."
exit
fi
exit
fi
# stop and umount
echo "LOCALLY STOPPING $VEID..."
if ! vzctl chkpnt $VEID --kill; then
echo "*** FAILED to kill $VEID on current HN. Resuming $VEID on current HN."
if ! vzctl chkpnt $VEID --resume; then
echo "*** FAILED to resume $VEID on current HN. I'll leave it to you to figure it out."
exit
fi
exit
fi
echo "LOCALLY UMOUNTING $VEID..."
if ! vzctl umount $VEID; then
echo "*** FAILED to umount $VEID on current HN. Resuming $VEID on current HN."
if ! vzctl chkpnt $VEID --undump --dumpfile /var/lib/vz/dump/$VEID; then
echo "*** FAILED to undump $VEID on current HN. I'll leave it to you to figure it out."
exit
fi
if ! vzctl chkpnt $VEID --resume; then
echo "*** FAILED to resume $VEID on current HN. I'll leave it to you to figure it out."
exit
fi
fi
# undump and thaw
echo "REMOTELY ADDING CONF FOR $VEID..."
if ! ssh root@vzpeer ln -s /var/lib/vz/conf/$VEID.* /etc/vz/conf/; then
echo "*** FAILED adding $VEID conf files to vzpeer. Resuming $VEID on current HN."
if ! vzctl chkpnt $VEID --undump --dumpfile /var/lib/vz/dump/$VEID; then
echo "*** FAILED to undump $VEID on current HN. I'll leave it to you to figure it out."
exit
fi
if ! vzctl chkpnt $VEID --resume; then
echo "*** FAILED to resume $VEID on current HN. I'll leave it to you to figure it out."
exit
fi
exit
fi
echo "REMOTELY UNDUMPING $VEID..."
if ! ssh root@vzpeer vzctl restore $VEID --undump --dumpfile /var/lib/vz/dump/$VEID; then
echo "*** FAILED undumping $VEID on vzpeer. Removing $VEID conf on vzpeer and resuming $VEID on current HN."
ssh root@vzpeer rm -f /etc/vz/conf/$VEID.*
if ! vzctl chkpnt $VEID --undump --dumpfile /var/lib/vz/dump/$VEID; then
echo "*** FAILED to undump $VEID on current HN. I'll leave it to you to figure it out."
exit
fi
if ! vzctl chkpnt $VEID --resume; then
echo "*** FAILED to resume $VEID on current HN. I'll leave it to you to figure it out."
exit
fi
exit
fi
echo "REMOTELY THAWING $VEID..."
if ! ssh root@vzpeer vzctl restore $VEID --resume; then
echo "*** FAILED to resume $VEID on vzpeer. Removing $VEID conf on vzpeer and resuming $VEID on current HN."
ssh root@vzpeer rm -f /etc/vz/conf/$VEID.*
if ! vzctl chkpnt $VEID --undump --dumpfile /var/lib/vz/dump/$VEID; then
echo "*** FAILED to undump $VEID on current HN. I'll leave it to you to figure it out."
exit
fi
if ! vzctl chkpnt $VEID --resume; then
echo "*** FAILED to resume $VEID on current HN. I'll leave it to you to figure it out."
exit
fi
exit
fi
# cleanup
echo "REMOVING DUMP..."
if ! rm -f "/var/lib/vz/dump/$VEID"; then
echo "*** FAILED to remove $VEID dump file. Maybe there isn't a problem. I'll let you decide."
fi
echo "REMOVING LOCAL CONF FOR $VEID..."
if ! rm -f /etc/vz/conf/$VEID.*; then
echo "*** FAILED to remove $VEID conf symlinks on current HN. If they still exist, you might have the VE running on two machines at once on reboot. I'll let you figure it out."
fi
# done
echo "All done!"
[Updated on: Wed, 08 July 2009 11:29] Report message to a moderator
|
|
|