|
|
|
|
Re: swsusp done by migration (was Re: [RFC][PATCH 1/5] Virtualization/containers: startup) [message #1490 is a reply to message #1484] |
Fri, 10 February 2006 00:21 |
Kyle Moffett
Messages: 4 Registered: February 2006
|
Junior Member |
|
|
On Feb 09, 2006, at 13:20, Eric W. Biederman wrote:
> Pavel Machek <pavel@ucw.cz> writes:
>> Well, for now software suspend is done at very different level (it
>> snapshots complete kernel state), but being able to use migration
>> for this is certainly nice option.
>>
>> BTW you could do whole-machine-migration now with uswsusp; but
>> you'd need identical hardware and it would take a bit long...
>
> Right part of the goal is with doing it as we are doing it is that
> we can define what the interesting state is.
>
> Replacing software suspend is not an immediate goal but I think it
> is a worthy thing to target. In part because if we really can rip
> things out of the kernel store them in a portable format and
> restore them we will also have the ability to upgrade the kernel
> with out stopping user space applications...
>
> But being able to avoid the uninteresting parts, and having the
> policy complete controlled outside the kernel are the big wins we
> are shooting for.
<wishful thinking>
I can see another extension to this functionality. With appropriate
changes it might also be possible to have a container exist across
multiple computers using some cluster code for synchronization and
fencing. The outermost container would be the system boot container,
and multiple inner containers would use some sort of network-
container-aware cluster filesystem to spread multiple vservers across
multiple servers, distributing CPU and network load appropriately.
</wishful thinking>
Cheers,
Kyle Moffett
--
I have yet to see any problem, however complicated, which, when you
looked at it in the right way, did not become still more complicated.
-- Poul Anderson
|
|
|
|
|
|
Re: swsusp done by migration (was Re: [RFC][PATCH 1/5] Virtualization/containers: startup) [message #1504 is a reply to message #1497] |
Fri, 10 February 2006 08:29 |
Kyle Moffett
Messages: 4 Registered: February 2006
|
Junior Member |
|
|
On Feb 09, 2006, at 23:31, Sam Vilain wrote:
> Kyle Moffett wrote:
>> <wishful thinking>
>> I can see another extension to this functionality. With
>> appropriate changes it might also be possible to have a container
>> exist across multiple computers using some cluster code for
>> synchronization and fencing. The outermost container would be
>> the system boot container, and multiple inner containers would
>> use some sort of network- container-aware cluster filesystem to
>> spread multiple vservers across multiple servers, distributing
>> CPU and network load appropriately.
>> </wishful thinking>
>
> Yeah. If you fudged/virtualised /dev/random, the system clock, etc
> you could even have Tandem-style transparent High Availability.
> </more wishful thinking>
>
> Actually there is relatively little difference between a NUMA
> system and a cluster...
Yeah, a cluster is just a multi-tiered multi-address-space RNUMA
(*Really* Non-Uniform Memory Architecture) :-D. With some kind of
RDMA infiniband card and the right kernel and userspace tools, that
kind of cluster could be practical.
I _suspect_ (never really considered the issue before) that a
properly virtualized container could even achieve extremely high
fault tolerance by allowing systems to "vote" on correct output. If
you synchronize /dev/random and network IO across the system
correctly such that each instance of each userspace process on each
system sees _exactly_ the same virtual inputs and virtual clock in
the exact same order, then you could binary-compare the output of 3
different servers. If one didn't agree, it could be discarded and
marked as failing.
Cheers,
Kyle Moffett
--
There are two ways of constructing a software design. One way is to
make it so simple that there are obviously no deficiencies. And the
other way is to make it so complicated that there are no obvious
deficiencies. The first method is far more difficult.
-- C.A.R. Hoare
|
|
|
Re: [RFC][PATCH 1/5] Virtualization/containers: startup [message #1507 is a reply to message #1314] |
Fri, 10 February 2006 05:40 |
Nigel Cunningham
Messages: 3 Registered: February 2006
|
Junior Member |
|
|
Hi.
On Tuesday 07 February 2006 04:37, Eric W. Biederman wrote:
> Dave Hansen <haveblue@us.ibm.com> writes:
> > On Mon, 2006-02-06 at 02:19 -0700, Eric W. Biederman wrote:
> >> That you placed the namespaces in a separate structure from
> >> task_struct.
> >> That part seems completely unnecessary, that and the addition of a
> >> global id in a completely new namespace that will be a pain to
> >> virtualize
> >> when it's time comes.
> >
> > Could you explain a bit why the container ID would need to be
> > virtualized?
>
> As someone said to me a little bit ago, for migration or checkpointing
> ultimately you have to capture the entire user/kernel interface if
> things are going to work properly. Now if we add this facility to
> the kernel and it is a general purpose facility. It is only a matter
> of time before we need to deal with nested containers.
>
> Not considering the case of having nested containers now is just foolish.
> Maybe we don't have to implement it yet but not considering it is silly.
>
> As far as I can tell there is a very reasonable chance that when we
> are complete there is a very reasonable chance that software suspend
> will just be a special case of migration, done complete in user space.
> That is one of the more practical examples I can think of where this
> kind of functionality would be used.
Am I missing something? I though migration referred only to userspace
processes. Software suspend on the other hand, deals with the whole system,
of which process data/context is only a part.
Regards,
Nigel
|
|
|
Re: Re: swsusp done by migration (was Re: [RFC][PATCH 1/5] Virtualization/containers: startup) [message #1519 is a reply to message #1499] |
Sat, 11 February 2006 02:38 |
Sam Vilain
Messages: 73 Registered: February 2006
|
Member |
|
|
On Fri, 2006-02-10 at 09:23 +0300, Vasily Averin wrote:
> >> <wishful thinking>
> >> I can see another extension to this functionality. With appropriate
> >> changes it might also be possible to have a container exist across
> >> multiple computers using some cluster code for synchronization and
> >> fencing. The outermost container would be the system boot container,
> >> and multiple inner containers would use some sort of network-
> >> container-aware cluster filesystem to spread multiple vservers across
> >> multiple servers, distributing CPU and network load appropriately.
> >> </wishful thinking>
> > Yeah. If you fudged/virtualised /dev/random, the system clock, etc you
> > could even have Tandem-style transparent High Availability.
> > </more wishful thinking>
> Could you please explain, why you want to virtualize /dev/random?
When checkpointing it is important to preserve all state. If you are
doing transparent highly available computing, you need to make sure all
system calls get the same answers in the clones. So you would need to
virtualise the entropy pool.
There are likely to be dozens of other quite hard problems in the way
first. Like I said, wishful thinking :-).
Sam.
|
|
|
|
|