OpenVZ Forum


Home » Mailing lists » Devel » [RFC][PATCH 1/5] Virtualization/containers: startup
Re: [RFC][PATCH 1/5] Virtualization/containers: startup [message #1369 is a reply to message #1335] Tue, 07 February 2006 12:25 Go to previous messageGo to next message
dev is currently offline  dev
Messages: 1693
Registered: September 2005
Location: Moscow
Senior Member

>>>How do we want to create the container?
>>>In our patch we did it through a /proc/container filesystem.
>>>Which created the container object and then on fork/exec switched over.
>>
>>this doesn't look good for a full virtualization solution, since proc
>>should be virtualized as well :)
>
>
> Well, /proc should be "virtualized" or "isolated", how do you expect a
> container to work correctly ? plenty of user space tools depend on it.
Sorry, not actually understand your question... :(
There is not much problems with virtualization of proc. It can be
virtualized correctly, so that tools are still working. For example, in
OpenVZ /proc has 2 trees - global and local.
global tree contains the entries which are visiable in all containers.
and local tree - only those which are visible to containers.
PIDs are shown also only those which present in container.

Kirill
Re: [RFC][PATCH 1/5] Virtualization/containers: startup [message #1398 is a reply to message #1367] Tue, 07 February 2006 22:21 Go to previous messageGo to next message
Sam Vilain is currently offline  Sam Vilain
Messages: 73
Registered: February 2006
Member
Kirill Korotaev wrote:
>>> I'd suggest
>>>
>>> current->container - the current EFFECTIVE container
>>> current->master_container - the "long term" container.
>>>
>>> (replace "master" with some other non-S&M term if you want)
>> Hmm. You actually need a linked list, otherwise you have replaced a one
>> level flat structure with a two level one, and you miss out on some of
>> the applications. VServer uses a special structure for this.
>
> Nope! :) This is pointer to current/effective container, which can be
> anywhere in the hierarchy. list should be inside container struct.

So why store anything other than the effective container in the task?

Sam.
swsusp done by migration (was Re: [RFC][PATCH 1/5] Virtualization/containers: startup) [message #1481 is a reply to message #1314] Wed, 08 February 2006 21:54 Go to previous messageGo to next message
Pavel Machek is currently offline  Pavel Machek
Messages: 34
Registered: February 2006
Member
Hi!

> > Could you explain a bit why the container ID would need to be
> > virtualized?
>
> As someone said to me a little bit ago, for migration or checkpointing
> ultimately you have to capture the entire user/kernel interface if
> things are going to work properly. Now if we add this facility to
> the kernel and it is a general purpose facility. It is only a matter
> of time before we need to deal with nested containers.
>
> Not considering the case of having nested containers now is just foolish.
> Maybe we don't have to implement it yet but not considering it is silly.
>
> As far as I can tell there is a very reasonable chance that when we
> are complete there is a very reasonable chance that software suspend
> will just be a special case of migration, done complete in user space.
> That is one of the more practical examples I can think of where this
> kind of functionality would be used.

Well, for now software suspend is done at very different level
(it snapshots complete kernel state), but being able to use
migration for this is certainly nice option.

BTW you could do whole-machine-migration now with uswsusp; but you'd
need identical hardware and it would take a bit long...

Pavel
--
Thanks, Sharp!
Re: swsusp done by migration (was Re: [RFC][PATCH 1/5] Virtualization/containers: startup) [message #1484 is a reply to message #1481] Thu, 09 February 2006 18:20 Go to previous messageGo to next message
ebiederm is currently offline  ebiederm
Messages: 1354
Registered: February 2006
Senior Member
Pavel Machek <pavel@ucw.cz> writes:

> Well, for now software suspend is done at very different level
> (it snapshots complete kernel state), but being able to use
> migration for this is certainly nice option.
>
> BTW you could do whole-machine-migration now with uswsusp; but you'd
> need identical hardware and it would take a bit long...

Right part of the goal is with doing it as we are doing it is that we can
define what the interesting state is.

Replacing software suspend is not an immediate goal but I think it is
a worthy thing to target. In part because if we really can rip things
out of the kernel store them in a portable format and restore them
we will also have the ability to upgrade the kernel with out stopping
user space applications...

But being able to avoid the uninteresting parts, and having the policy
complete controlled outside the kernel are the big wins we are shooting for.

Eric
Re: swsusp done by migration (was Re: [RFC][PATCH 1/5] Virtualization/containers: startup) [message #1490 is a reply to message #1484] Fri, 10 February 2006 00:21 Go to previous messageGo to next message
Kyle Moffett is currently offline  Kyle Moffett
Messages: 4
Registered: February 2006
Junior Member
On Feb 09, 2006, at 13:20, Eric W. Biederman wrote:
> Pavel Machek <pavel@ucw.cz> writes:
>> Well, for now software suspend is done at very different level (it
>> snapshots complete kernel state), but being able to use migration
>> for this is certainly nice option.
>>
>> BTW you could do whole-machine-migration now with uswsusp; but
>> you'd need identical hardware and it would take a bit long...
>
> Right part of the goal is with doing it as we are doing it is that
> we can define what the interesting state is.
>
> Replacing software suspend is not an immediate goal but I think it
> is a worthy thing to target. In part because if we really can rip
> things out of the kernel store them in a portable format and
> restore them we will also have the ability to upgrade the kernel
> with out stopping user space applications...
>
> But being able to avoid the uninteresting parts, and having the
> policy complete controlled outside the kernel are the big wins we
> are shooting for.

<wishful thinking>
I can see another extension to this functionality. With appropriate
changes it might also be possible to have a container exist across
multiple computers using some cluster code for synchronization and
fencing. The outermost container would be the system boot container,
and multiple inner containers would use some sort of network-
container-aware cluster filesystem to spread multiple vservers across
multiple servers, distributing CPU and network load appropriately.
</wishful thinking>

Cheers,
Kyle Moffett

--
I have yet to see any problem, however complicated, which, when you
looked at it in the right way, did not become still more complicated.
-- Poul Anderson
Re: swsusp done by migration (was Re: [RFC][PATCH 1/5] Virtualization/containers: startup) [message #1497 is a reply to message #1490] Fri, 10 February 2006 04:31 Go to previous messageGo to next message
Sam Vilain is currently offline  Sam Vilain
Messages: 73
Registered: February 2006
Member
Kyle Moffett wrote:
> <wishful thinking>
> I can see another extension to this functionality. With appropriate
> changes it might also be possible to have a container exist across
> multiple computers using some cluster code for synchronization and
> fencing. The outermost container would be the system boot container,
> and multiple inner containers would use some sort of network-
> container-aware cluster filesystem to spread multiple vservers across
> multiple servers, distributing CPU and network load appropriately.
> </wishful thinking>

Yeah. If you fudged/virtualised /dev/random, the system clock, etc you
could even have Tandem-style transparent High Availability.
</more wishful thinking>

Actually there is relatively little difference between a NUMA system and
a cluster...

Sam.
Re: [RFC][PATCH 1/5] Virtualization/containers: startup [message #1498 is a reply to message #1239] Fri, 10 February 2006 06:01 Go to previous messageGo to next message
ebiederm is currently offline  ebiederm
Messages: 1354
Registered: February 2006
Senior Member
Nigel Cunningham <ncunningham@cyclades.com> writes:

> Am I missing something? I though migration referred only to userspace
> processes. Software suspend on the other hand, deals with the whole system,
> of which process data/context is only a part.

The problem domain is user process and the kernel state they depend on.
Implementation wise we are looking at two totally different problems.

However the effects should be similar if the set of processes to
migrate are all of the processes in the system.

For most of the interesting cases migration does not need to be that
ambitious.

Eric
Re: Re: swsusp done by migration (was Re: [RFC][PATCH 1/5] Virtualization/containers: startup) [message #1499 is a reply to message #1497] Fri, 10 February 2006 06:23 Go to previous messageGo to next message
vaverin is currently offline  vaverin
Messages: 708
Registered: September 2005
Senior Member
Sam Vilain wrote:
> Kyle Moffett wrote:
>> <wishful thinking>
>> I can see another extension to this functionality. With appropriate
>> changes it might also be possible to have a container exist across
>> multiple computers using some cluster code for synchronization and
>> fencing. The outermost container would be the system boot container,
>> and multiple inner containers would use some sort of network-
>> container-aware cluster filesystem to spread multiple vservers across
>> multiple servers, distributing CPU and network load appropriately.
>> </wishful thinking>
>
> Yeah. If you fudged/virtualised /dev/random, the system clock, etc you
> could even have Tandem-style transparent High Availability.
> </more wishful thinking>

Could you please explain, why you want to virtualize /dev/random?

Tnank you,
Vasily Averin

Virtuozzo Linux Kernel Team
Re: swsusp done by migration (was Re: [RFC][PATCH 1/5] Virtualization/containers: startup) [message #1504 is a reply to message #1497] Fri, 10 February 2006 08:29 Go to previous messageGo to next message
Kyle Moffett is currently offline  Kyle Moffett
Messages: 4
Registered: February 2006
Junior Member
On Feb 09, 2006, at 23:31, Sam Vilain wrote:
> Kyle Moffett wrote:
>> <wishful thinking>
>> I can see another extension to this functionality. With
>> appropriate changes it might also be possible to have a container
>> exist across multiple computers using some cluster code for
>> synchronization and fencing. The outermost container would be
>> the system boot container, and multiple inner containers would
>> use some sort of network- container-aware cluster filesystem to
>> spread multiple vservers across multiple servers, distributing
>> CPU and network load appropriately.
>> </wishful thinking>
>
> Yeah. If you fudged/virtualised /dev/random, the system clock, etc
> you could even have Tandem-style transparent High Availability.
> </more wishful thinking>
>
> Actually there is relatively little difference between a NUMA
> system and a cluster...

Yeah, a cluster is just a multi-tiered multi-address-space RNUMA
(*Really* Non-Uniform Memory Architecture) :-D. With some kind of
RDMA infiniband card and the right kernel and userspace tools, that
kind of cluster could be practical.

I _suspect_ (never really considered the issue before) that a
properly virtualized container could even achieve extremely high
fault tolerance by allowing systems to "vote" on correct output. If
you synchronize /dev/random and network IO across the system
correctly such that each instance of each userspace process on each
system sees _exactly_ the same virtual inputs and virtual clock in
the exact same order, then you could binary-compare the output of 3
different servers. If one didn't agree, it could be discarded and
marked as failing.

Cheers,
Kyle Moffett

--
There are two ways of constructing a software design. One way is to
make it so simple that there are obviously no deficiencies. And the
other way is to make it so complicated that there are no obvious
deficiencies. The first method is far more difficult.
-- C.A.R. Hoare
Re: [RFC][PATCH 1/5] Virtualization/containers: startup [message #1507 is a reply to message #1314] Fri, 10 February 2006 05:40 Go to previous messageGo to next message
Nigel Cunningham is currently offline  Nigel Cunningham
Messages: 3
Registered: February 2006
Junior Member
Hi.

On Tuesday 07 February 2006 04:37, Eric W. Biederman wrote:
> Dave Hansen <haveblue@us.ibm.com> writes:
> > On Mon, 2006-02-06 at 02:19 -0700, Eric W. Biederman wrote:
> >> That you placed the namespaces in a separate structure from
> >> task_struct.
> >> That part seems completely unnecessary, that and the addition of a
> >> global id in a completely new namespace that will be a pain to
> >> virtualize
> >> when it's time comes.
> >
> > Could you explain a bit why the container ID would need to be
> > virtualized?
>
> As someone said to me a little bit ago, for migration or checkpointing
> ultimately you have to capture the entire user/kernel interface if
> things are going to work properly. Now if we add this facility to
> the kernel and it is a general purpose facility. It is only a matter
> of time before we need to deal with nested containers.
>
> Not considering the case of having nested containers now is just foolish.
> Maybe we don't have to implement it yet but not considering it is silly.
>
> As far as I can tell there is a very reasonable chance that when we
> are complete there is a very reasonable chance that software suspend
> will just be a special case of migration, done complete in user space.
> That is one of the more practical examples I can think of where this
> kind of functionality would be used.

Am I missing something? I though migration referred only to userspace
processes. Software suspend on the other hand, deals with the whole system,
of which process data/context is only a part.

Regards,

Nigel
Re: Re: swsusp done by migration (was Re: [RFC][PATCH 1/5] Virtualization/containers: startup) [message #1519 is a reply to message #1499] Sat, 11 February 2006 02:38 Go to previous messageGo to next message
Sam Vilain is currently offline  Sam Vilain
Messages: 73
Registered: February 2006
Member
On Fri, 2006-02-10 at 09:23 +0300, Vasily Averin wrote:
> >> <wishful thinking>
> >> I can see another extension to this functionality. With appropriate
> >> changes it might also be possible to have a container exist across
> >> multiple computers using some cluster code for synchronization and
> >> fencing. The outermost container would be the system boot container,
> >> and multiple inner containers would use some sort of network-
> >> container-aware cluster filesystem to spread multiple vservers across
> >> multiple servers, distributing CPU and network load appropriately.
> >> </wishful thinking>
> > Yeah. If you fudged/virtualised /dev/random, the system clock, etc you
> > could even have Tandem-style transparent High Availability.
> > </more wishful thinking>
> Could you please explain, why you want to virtualize /dev/random?

When checkpointing it is important to preserve all state. If you are
doing transparent highly available computing, you need to make sure all
system calls get the same answers in the clones. So you would need to
virtualise the entropy pool.

There are likely to be dozens of other quite hard problems in the way
first. Like I said, wishful thinking :-).

Sam.
Re: Re: swsusp done by migration (was Re: [RFC][PATCH 1/5] Virtualization/containers: startup) [message #1521 is a reply to message #1519] Sat, 11 February 2006 17:29 Go to previous messageGo to next message
vaverin is currently offline  vaverin
Messages: 708
Registered: September 2005
Senior Member
Sam Vilain wrote:
> On Fri, 2006-02-10 at 09:23 +0300, Vasily Averin wrote:
>>>Yeah. If you fudged/virtualised /dev/random, the system clock, etc you
>>>could even have Tandem-style transparent High Availability.
>>></more wishful thinking>
>>Could you please explain, why you want to virtualize /dev/random?
>
> When checkpointing it is important to preserve all state. If you are
> doing transparent highly available computing, you need to make sure all
> system calls get the same answers in the clones. So you would need to
> virtualise the entropy pool.

>From my point of view it is important to preserve only all the determinated state.

Ok, lets we've checkpointed and saved current entropy pool. But we have not any
guarantee that pool will be in the same state at the moment of first access to
it after wakeuping. Because a new entropy can change it unpredictable.

Am I right?

Thank you,
Vasily Averin

Virtuozzo Linux kernel Team
Re: [RFC][PATCH 1/5] Virtualization/containers: startup [message #1698 is a reply to message #1398] Mon, 20 February 2006 11:54 Go to previous message
dev is currently offline  dev
Messages: 1693
Registered: September 2005
Location: Moscow
Senior Member

> So why store anything other than the effective container in the task?
Effective container is used for temporary context change, e.g. when
processing interrupts and need to handle skb. it is effective container
for this code. just like get_fs()/set_fs() works.
Original container pointer is used for external process identification,
e.g. whether to show task in /proc in context of another task.

Kirill
Previous Topic: Versioning issue on vzquota-3.0.0-2
Next Topic: Versioning issue on vzquota-3.0.0-2
Goto Forum:
  


Current Time: Sat Oct 25 13:12:07 GMT 2025

Total time taken to generate the page: 0.09259 seconds