| Home » Mailing lists » Devel » [RFC] Virtualization steps Goto Forum:
	| 
		
			| [RFC] Virtualization steps [message #2197] | Fri, 24 March 2006 17:19  |  
			| 
				
				
					|  dev Messages: 1693
 Registered: September 2005
 Location: Moscow
 | Senior Member |  
 |  |  
	| Eric, Herbert, 
 I think it is quite clear, that without some agreement on all these
 virtualization issues, we won't be able to commit anything good to
 mainstream. My idea is to gather our efforts to get consensus on most
 clean parts of code first and commit them one by one.
 
 The proposal is quite simple. We have 4 parties in this conversation
 (maybe more?): IBM guys, OpenVZ, VServer and Eric Biederman. We discuss
 the areas which should be considered step by step. Send patches for each
 area, discuss, come to some agreement and all 4 parties Sign-Off the
 patch. After that it goes to Andrew/Linus. Worth trying?
 
 So far, (correct me if I'm wrong) we concluded that some people don't
 want containers as a whole, but want some subsystem namespaces. I
 suppose for people who care about containers only it doesn't matter, so
 we can proceed with namespaces, yeah?
 
 So the most easy namespaces to discuss I see:
 - utsname
 - sys IPC
 - network virtualization
 - netfilter virtualization
 
 all these were discussed already somehow and looks like there is no
 fundamental differencies in our approaches (at least OpenVZ and Eric,
 for sure).
 
 Right now, I suggest to concentrate on first 2 namespaces - utsname and
 sysvipc. They are small enough and easy. Lets consider them without
 sysctl/proc issues, as those can be resolved later. I sent the patches
 for these 2 namespaces to all of you. I really hope for some _good_
 critics, so we could work it out quickly.
 
 Thanks,
 Kirill
 |  
	|  |  |  
	| 
		
			| Re: [RFC] Virtualization steps [message #2201 is a reply to message #2197] | Fri, 24 March 2006 17:33   |  
			| 
				
				
					|  Nick Piggin Messages: 35
 Registered: March 2006
 | Member |  |  |  
	| Kirill Korotaev wrote: > Eric, Herbert,
 >
 > I think it is quite clear, that without some agreement on all these
 > virtualization issues, we won't be able to commit anything good to
 > mainstream. My idea is to gather our efforts to get consensus on most
 > clean parts of code first and commit them one by one.
 >
 > The proposal is quite simple. We have 4 parties in this conversation
 > (maybe more?): IBM guys, OpenVZ, VServer and Eric Biederman. We discuss
 > the areas which should be considered step by step. Send patches for each
 > area, discuss, come to some agreement and all 4 parties Sign-Off the
 > patch. After that it goes to Andrew/Linus. Worth trying?
 
 Oh, after you come to an agreement and start posting patches, can you
 also outline why we want this in the kernel (what it does that low
 level virtualization doesn't, etc, etc), and how and why you've agreed
 to implement it. Basically, some background and a summary of your
 discussions for those who can't follow everything. Or is that a faq
 item?
 
 Thanks,
 Nick
 
 --
 SUSE Labs, Novell Inc.
 Send instant messages to your online friends http://au.messenger.yahoo.com
 |  
	|  |  |  
	| 
		
			| Re: [RFC] Virtualization steps [message #2205 is a reply to message #2197] | Fri, 24 March 2006 18:36   |  
			| 
				
				
					|  ebiederm Messages: 1354
 Registered: February 2006
 | Senior Member |  |  |  
	| Kirill Korotaev <dev@sw.ru> writes: 
 > Eric, Herbert,
 >
 > I think it is quite clear, that without some agreement on all these
 > virtualization issues, we won't be able to commit anything good to
 > mainstream. My idea is to gather our efforts to get consensus on most clean
 > parts of code first and commit them one by one.
 >
 > The proposal is quite simple. We have 4 parties in this conversation (maybe
 > more?): IBM guys, OpenVZ, VServer and Eric Biederman. We discuss the areas which
 > should be considered step by step. Send patches for each area, discuss, come to
 > some agreement and all 4 parties Sign-Off the patch. After that it goes to
 > Andrew/Linus. Worth trying?
 
 Yes, this sounds like a path forward that has a reasonable chance of
 making progress.
 
 > So far, (correct me if I'm wrong) we concluded that some people don't want
 > containers as a whole, but want some subsystem namespaces. I suppose for people
 > who care about containers only it doesn't matter, so we can proceed with
 > namespaces, yeah?
 
 Yes, I think at one point I have seen all of the major parties receptive
 to the concept.
 
 > So the most easy namespaces to discuss I see:
 > - utsname
 > - sys IPC
 > - network virtualization
 > - netfilter virtualization
 
 The networking is hard simply because the is so very much of it, and it
 is being active developed :)
 
 > all these were discussed already somehow and looks like there is no fundamental
 > differencies in our approaches (at least OpenVZ and Eric, for sure).
 
 Yes.  I think we agree on what the semantics should be for these parts.
 Which should avoid the problem with have with the pid namespace.
 
 > Right now, I suggest to concentrate on first 2 namespaces - utsname and
 > sysvipc. They are small enough and easy. Lets consider them without sysctl/proc
 > issues, as those can be resolved later. I sent the patches for these 2
 > namespaces to all of you. I really hope for some _good_ critics, so we could
 > work it out quickly.
 
 Sounds like a plan.
 
 Eric
 |  
	|  |  |  
	| 
		
			| Re: [RFC] Virtualization steps [message #2206 is a reply to message #2201] | Fri, 24 March 2006 19:25   |  
			| 
				
				
					|  Dave Hansen Messages: 240
 Registered: October 2005
 | Senior Member |  |  |  
	| On Sat, 2006-03-25 at 04:33 +1100, Nick Piggin wrote: > Oh, after you come to an agreement and start posting patches, can you
 > also outline why we want this in the kernel (what it does that low
 > level virtualization doesn't, etc, etc)
 
 Can you wait for an OLS paper? ;)
 
 I'll summarize it this way: low-level virtualization uses resource
 inefficiently.
 
 With this higher-level stuff, you get to share all of the Linux caching,
 and can do things like sharing libraries pretty naturally.
 
 They are also much lighter-weight to create and destroy than full
 virtual machines.  We were planning on doing some performance
 comparisons versus some hypervisors like Xen and the ppc64 one to show
 scaling with the number of virtualized instances.  Creating 100 of these
 Linux containers is as easy as a couple of shell scripts, but we still
 can't find anybody crazy enough to go create 100 Xen VMs.
 
 Anyway, those are the things that came to my mind first.  I'm sure the
 others involved have their own motivations.
 
 -- Dave
 |  
	|  |  |  
	| 
		
			| Re: [RFC] Virtualization steps [message #2209 is a reply to message #2206] | Fri, 24 March 2006 19:53   |  
			| 
				
				
					|  ebiederm Messages: 1354
 Registered: February 2006
 | Senior Member |  |  |  
	| Dave Hansen <haveblue@us.ibm.com> writes: 
 > On Sat, 2006-03-25 at 04:33 +1100, Nick Piggin wrote:
 >> Oh, after you come to an agreement and start posting patches, can you
 >> also outline why we want this in the kernel (what it does that low
 >> level virtualization doesn't, etc, etc)
 >
 > Can you wait for an OLS paper? ;)
 >
 > I'll summarize it this way: low-level virtualization uses resource
 > inefficiently.
 >
 > With this higher-level stuff, you get to share all of the Linux caching,
 > and can do things like sharing libraries pretty naturally.
 
 Also it is a major enabler for things such as process migration,
 between kernels.
 
 > They are also much lighter-weight to create and destroy than full
 > virtual machines.  We were planning on doing some performance
 > comparisons versus some hypervisors like Xen and the ppc64 one to show
 > scaling with the number of virtualized instances.  Creating 100 of these
 > Linux containers is as easy as a couple of shell scripts, but we still
 > can't find anybody crazy enough to go create 100 Xen VMs.
 
 One of my favorite test cases is to kill about 100 of them
 simultaneously :)
 
 I think on a reasonably beefy dual processor machine I should be able
 to get about 1000 of them running all at once.
 
 > Anyway, those are the things that came to my mind first.  I'm sure the
 > others involved have their own motivations.
 
 The practical aspect is that several groups have found the arguments
 compelling enough that they have already done complete
 implementations.  At which point getting us all to agree on a common
 implementation is important. :)
 
 Eric
 |  
	|  |  |  
	| 
		
			| Re: [RFC] Virtualization steps [message #2212 is a reply to message #2197] | Fri, 24 March 2006 21:19   |  
			| 
				
				
					|  Herbert Poetzl Messages: 239
 Registered: February 2006
 | Senior Member |  |  |  
	| On Fri, Mar 24, 2006 at 08:19:59PM +0300, Kirill Korotaev wrote: > Eric, Herbert,
 >
 > I think it is quite clear, that without some agreement on all these
 > virtualization issues, we won't be able to commit anything good to
 > mainstream. My idea is to gather our efforts to get consensus on most
 > clean parts of code first and commit them one by one.
 >
 > The proposal is quite simple. We have 4 parties in this conversation
 > (maybe more?): IBM guys, OpenVZ, VServer and Eric Biederman. We
 > discuss the areas which should be considered step by step. Send
 > patches for each area, discuss, come to some agreement and all 4
 > parties Sign-Off the patch. After that it goes to Andrew/Linus.
 > Worth trying?
 
 sounds good to me, as long as we do not consider
 the patches 'final' atm .. because I think we should
 try to test them with _all_ currently existing solutions
 first ... we do not need to bother Andrew with stuff
 which doesn't work for the existing and future 'users'.
 
 so IMHO, we should make a kernel branch (Eric or Sam
 are probably willing to maintain that), which we keep
 in-sync with mainline (not necessarily git, but at
 least snapshot wise), where we put all the patches
 we agree on, and each party should then adjust the
 existing solution to this kernel, so we get some deep
 testing in the process, and everybody can see if it
 'works' for him or not ...
 
 things where we agree that it 'just works' for everyone
 can always be handed upstream, and would probably make
 perfect patches for Andrew ...
 
 > So far, (correct me if I'm wrong) we concluded that some people don't
 > want containers as a whole, but want some subsystem namespaces. I
 > suppose for people who care about containers only it doesn't matter, so
 > we can proceed with namespaces, yeah?
 
 yes, the emphasis here should be on lightweight and
 modular, so that those folks interested in full featured
 containers can just 'assemble' the pieces, while those
 desiring service/space isolation pick their subsystems
 one by one ...
 
 > So the most easy namespaces to discuss I see:
 > - utsname
 
 yes, that's definitely one we can start with, as it seems
 that we already have _very_ similar implementations
 
 > - sys IPC
 
 this is something which is also related to limits and
 should get special attention with resource sharing,
 isolation and control in mind
 
 > - network virtualization
 
 here I see many issues, as for example Linux-VServer
 does not necessarily aim for full virtualization, when
 simple and performant isolation is sufficient.
 
 don't get me wrong, we are _not_ against network
 virtualization per se, but we isolation is just so
 much simpler to administrate and often much more
 performant, so that it is very interesting for service
 separation as well as security applications
 
 just consider the 'typical' service isolation aspect
 where you want to have two apaches, separated on two
 IPs, but communicating with a single sql database
 
 > - netfilter virtualization
 
 same as for network virtualization, but not really
 an issue if it can be 'disabled'
 
 of course, the ideal solution would be some kind
 of hybrid, where you can have virtual interfaces as
 well as isolated IPs, side-by-side ...
 
 > all these were discussed already somehow and looks like there is no
 > fundamental differencies in our approaches (at least OpenVZ and Eric,
 > for sure).
 >
 > Right now, I suggest to concentrate on first 2 namespaces - utsname
 > and sysvipc. They are small enough and easy. Lets consider them
 > without sysctl/proc issues, as those can be resolved later. I sent the
 > patches for these 2 namespaces to all of you. I really hope for some
 > _good_ critics, so we could work it out quickly.
 
 will look into them soon ...
 
 best,
 Herbert
 
 > Thanks,
 > Kirill
 |  
	|  |  |  
	| 
		
			| Re: [RFC] Virtualization steps [message #2250 is a reply to message #2212] | Mon, 27 March 2006 18:45   |  
			| 
				
				
					|  ebiederm Messages: 1354
 Registered: February 2006
 | Senior Member |  |  |  
	| Herbert Poetzl <herbert@13thfloor.at> writes: 
 > On Fri, Mar 24, 2006 at 08:19:59PM +0300, Kirill Korotaev wrote:
 >> Eric, Herbert,
 >>
 >> I think it is quite clear, that without some agreement on all these
 >> virtualization issues, we won't be able to commit anything good to
 >> mainstream. My idea is to gather our efforts to get consensus on most
 >> clean parts of code first and commit them one by one.
 >>
 >> The proposal is quite simple. We have 4 parties in this conversation
 >> (maybe more?): IBM guys, OpenVZ, VServer and Eric Biederman. We
 >> discuss the areas which should be considered step by step. Send
 >> patches for each area, discuss, come to some agreement and all 4
 >> parties Sign-Off the patch. After that it goes to Andrew/Linus.
 >> Worth trying?
 >
 > sounds good to me, as long as we do not consider
 > the patches 'final' atm .. because I think we should
 > try to test them with _all_ currently existing solutions
 > first ... we do not need to bother Andrew with stuff
 > which doesn't work for the existing and future 'users'.
 >
 > so IMHO, we should make a kernel branch (Eric or Sam
 > are probably willing to maintain that), which we keep
 > in-sync with mainline (not necessarily git, but at
 > least snapshot wise), where we put all the patches
 > we agree on, and each party should then adjust the
 > existing solution to this kernel, so we get some deep
 > testing in the process, and everybody can see if it
 > 'works' for him or not ...
 
 ACK.  A collection of patches that we can all agree
 on sounds like something worth aiming for.
 
 It looks like Kirill last round of patches can form
 a nucleus for that.  So far I have seem plenty of technical
 objects but no objections to the general direction.
 
 So agreement appears possible.
 
 Eric
 |  
	|  |  |  
	| 
		
			| Re: [RFC] Virtualization steps [message #2275 is a reply to message #2206] | Tue, 28 March 2006 04:28   |  
			| 
				
				
					|  Bill Davidsen Messages: 4
 Registered: March 2006
 | Junior Member |  |  |  
	| Dave Hansen wrote: > On Sat, 2006-03-25 at 04:33 +1100, Nick Piggin wrote:
 >> Oh, after you come to an agreement and start posting patches, can you
 >> also outline why we want this in the kernel (what it does that low
 >> level virtualization doesn't, etc, etc)
 >
 > Can you wait for an OLS paper? ;)
 >
 > I'll summarize it this way: low-level virtualization uses resource
 > inefficiently.
 >
 > With this higher-level stuff, you get to share all of the Linux caching,
 > and can do things like sharing libraries pretty naturally.
 >
 > They are also much lighter-weight to create and destroy than full
 > virtual machines.  We were planning on doing some performance
 > comparisons versus some hypervisors like Xen and the ppc64 one to show
 > scaling with the number of virtualized instances.  Creating 100 of these
 > Linux containers is as easy as a couple of shell scripts, but we still
 > can't find anybody crazy enough to go create 100 Xen VMs.
 
 But these require a modified O/S, do they not? Or do I read that
 incorrectly? Is this going to be real virtualization able to run any O/S?
 
 Frankly I don't see running 100 VMs as a realistic goal, being able to
 run Linux, Windows, Solaris and BEOS unmodified in 4-5 VMs would be far
 more useful.
 >
 > Anyway, those are the things that came to my mind first.  I'm sure the
 > others involved have their own motivations.
 >
 > -- Dave
 >
 |  
	|  |  |  
	| 
		
			| Re:  Re: [RFC] Virtualization steps [message #2277 is a reply to message #2275] | Tue, 28 March 2006 06:45   |  
			|  |  
	| Bill Davidsen wrote: 
 > Dave Hansen wrote:
 >
 >> On Sat, 2006-03-25 at 04:33 +1100, Nick Piggin wrote:
 >>
 >>> Oh, after you come to an agreement and start posting patches, can you
 >>> also outline why we want this in the kernel (what it does that low
 >>> level virtualization doesn't, etc, etc)
 >>
 >>
 >> Can you wait for an OLS paper? ;)
 >>
 >> I'll summarize it this way: low-level virtualization uses resource
 >> inefficiently.
 >>
 >> With this higher-level stuff, you get to share all of the Linux caching,
 >> and can do things like sharing libraries pretty naturally.
 >>
 >> They are also much lighter-weight to create and destroy than full
 >> virtual machines.  We were planning on doing some performance
 >> comparisons versus some hypervisors like Xen and the ppc64 one to show
 >> scaling with the number of virtualized instances.  Creating 100 of these
 >> Linux containers is as easy as a couple of shell scripts, but we still
 >> can't find anybody crazy enough to go create 100 Xen VMs.
 >
 >
 > But these require a modified O/S, do they not? Or do I read that
 > incorrectly? Is this going to be real virtualization able to run any O/S?
 
 This type is called OS-level virtualization, or kernel-level
 virtualization, or partitioning. Basically it allows to create a
 compartments (in OpenVZ we call them VEs -- Virtual Environments) in
 which you can run full *unmodified* Linux system (but the kernel itself
 -- it is one single kernel common for all compartments). That means that
 with this approach you can not run OSs other than Linux, but different
 Linux distributions are working just fine.
 
 > Frankly I don't see running 100 VMs as a realistic goal
 
 It is actually not a future goal, but rather a reality. Since os-level
 virtualization overhead is very low (1-2 per cent or so), one can run
 hundreds of VEs.
 
 Say, on a box with 1GB of RAM OpenVZ [http://openvz.org/] is able to run
 about 150 VEs each one having init, apache (serving static content),
 sendmail, sshd, cron etc. running. Actually you can run more, but with
 the aggressive swapping so performance drops considerably. So it all
 mostly depends on RAM, and I'd say that 500+ VEs on a 4GB box should run
 just fine. Of course it all depends on what you run inside those VEs.
 
 > , being able to run Linux, Windows, Solaris and BEOS unmodified in 4-5
 > VMs would be far more useful.
 
 This is a different story. If you want to run different OSs on the same
 box -- use emulation or paravirtualization.
 
 If you are happy to stick to Linux on this box -- use OS-level
 virtualization. Aside from the best possible scalability and
 performance, the other benefit of this approach is dynamic resource
 management -- since there is a single kernel managing all the resources
 such as RAM, you can easily tune all those resources runtime. More to
 say, you can make one VE use more RAM while nobody else it using it,
 leading to much better resource usage. And since there is one single
 kernel that manages everything, you could do nice tricks like VE
 checkpointing, live migration, etc. etc.
 
 Some more info on topic are available from
 http://openvz.org/documentation/tech/
 
 Kir.
 
 >>
 >> Anyway, those are the things that came to my mind first.  I'm sure the
 >> others involved have their own motivations.
 >>
 >> -- Dave
 >>
 >
 |  
	|  |  |  
	| 
		
			| Re: [RFC] Virtualization steps [message #2283 is a reply to message #2250] | Tue, 28 March 2006 08:51   |  
			| 
				
				
					|  dev Messages: 1693
 Registered: September 2005
 Location: Moscow
 | Senior Member |  
 |  |  
	| >> so IMHO, we should make a kernel branch (Eric or Sam >> are probably willing to maintain that), which we keep
 >> in-sync with mainline (not necessarily git, but at
 >> least snapshot wise), where we put all the patches
 >> we agree on, and each party should then adjust the
 >> existing solution to this kernel, so we get some deep
 >> testing in the process, and everybody can see if it
 >> 'works' for him or not ...
 >
 > ACK.  A collection of patches that we can all agree
 > on sounds like something worth aiming for.
 >
 > It looks like Kirill last round of patches can form
 > a nucleus for that.  So far I have seem plenty of technical
 > objects but no objections to the general direction.
 yup, I will fix everything and will come with a set of patches for IPC,
 so we could select which way is better to do it :)
 
 > So agreement appears possible.
 Nice to hear this!
 
 Eric, we have a GIT repo on openvz.org already:
 http://git.openvz.org
 
 we will create a separate branch also called -acked, where patches
 agreed upon will go.
 
 Thanks,
 Kirill
 |  
	|  |  |  
	|  |  
	| 
		
			| Re: [RFC] Virtualization steps [message #2285 is a reply to message #2201] | Tue, 28 March 2006 09:02   |  
			| 
				
				
					|  dev Messages: 1693
 Registered: September 2005
 Location: Moscow
 | Senior Member |  
 |  |  
	| > Oh, after you come to an agreement and start posting patches, can you > also outline why we want this in the kernel (what it does that low
 > level virtualization doesn't, etc, etc), and how and why you've agreed
 > to implement it. Basically, some background and a summary of your
 > discussions for those who can't follow everything. Or is that a faq
 > item?
 Nick, will be glad to shed some light on it.
 
 First of all, what it does which low level virtualization can't:
 - it allows to run 100 containers on 1GB RAM
 (it is called containers, VE - Virtual Environments,
 VPS - Virtual Private Servers).
 - it has no much overhead (<1-2%), which is unavoidable with hardware
 virtualization. For example, Xen has >20% overhead on disk I/O.
 - it allows to create/deploy VE in less than a minute, VE start/stop
 takes ~1-2 seconds.
 - it allows to dynamically change all resource limits/configurations.
 In OpenVZ it is even possible to add/remove virtual CPUs to/from VE.
 It is possible to increase/descrease memory limits on the fly etc.
 - it has much more efficient memory usage with single template file
 in a cache if COW-like filesystem is used for VE templates.
 - it allows you to access VE files from host easily if needed.
 This helps to make management much more flexible, e.g. you can
 upgrade/repair/fix all you VEs from host, i.e. easy mass management.
 
 
 OS kernel virtualization
 ~~~~~~~~~~~~~~~~~~~~~~~~
 OS virtualization is a kernel solution, which replaces the usage
 of many global variables with context-dependant counterparts. This
 allows to have isolated private resources in different contexts.
 
 So VE means essentially context and a set of it's variables/settings,
 which include but not limited to, own process tree, files, IPC
 resources, IP routing, network devices and such.
 
 Full virtualization solution consists of:
 - virtualization of resources, i.e. private contexts
 - resource controls, for limiting contexts
 - management tools
 
 Such kind of virtualization solution is implemented in OpenVZ
 (http://openvz.org) and Linux-Vserver (http://linux-vserver.org) projects.
 
 Summary of previous discussions on LKML
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 - we agreed upon doing virtualization of each kernel subsystem
 separately, not as a single virtual environment.
 - we almost agreed upon calling virtualization of subsystems
 "namespaces".
 - we were discussing whether we should have global namespace context,
 like 'current' or bypass context as an argument to all functions
 which require it.
 - we didn't agreed on whether we need a config option and ability to
 compile kernel w/o virtual namespaces.
 
 Thansk,
 Kirill
 |  
	|  |  |  
	| 
		
			| Re: [RFC] Virtualization steps [message #2286 is a reply to message #2285] | Tue, 28 March 2006 09:15   |  
			| 
				
				
					|  Nick Piggin Messages: 35
 Registered: March 2006
 | Member |  |  |  
	| Kirill Korotaev wrote: >
 > Nick, will be glad to shed some light on it.
 >
 
 Thanks very much Kirill.
 
 I don't think I'm qualified to make any decisions about this,
 so I don't want to detract from the real discussions, but I
 just had a couple more questions:
 
 > First of all, what it does which low level virtualization can't:
 > - it allows to run 100 containers on 1GB RAM
 >   (it is called containers, VE - Virtual Environments,
 >    VPS - Virtual Private Servers).
 > - it has no much overhead (<1-2%), which is unavoidable with hardware
 >   virtualization. For example, Xen has >20% overhead on disk I/O.
 
 Are any future hardware solutions likely to improve these problems?
 
 >
 > OS kernel virtualization
 > ~~~~~~~~~~~~~~~~~~~~~~~~
 
 Is this considered secure enough that multiple untrusted VEs are run
 on production systems?
 
 What kind of users want this, who can't use alternatives like real
 VMs?
 
 > Summary of previous discussions on LKML
 > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
 Have their been any discussions between the groups pushing this
 virtualization, and important kernel developers who are not part of
 a virtualization effort? Ie. is there any consensus about the
 future of these patches?
 
 Thanks,
 Nick
 
 --
 SUSE Labs, Novell Inc.
 Send instant messages to your online friends http://au.messenger.yahoo.com
 |  
	|  |  |  
	| 
		
			| Re: [RFC] Virtualization steps [message #2287 is a reply to message #2283] | Tue, 28 March 2006 12:53   |  
			| 
				
				
					|  serue Messages: 750
 Registered: February 2006
 | Senior Member |  |  |  
	| Quoting Kirill Korotaev (dev@sw.ru): > >>so IMHO, we should make a kernel branch (Eric or Sam
 > >>are probably willing to maintain that), which we keep
 > >>in-sync with mainline (not necessarily git, but at
 > >>least snapshot wise), where we put all the patches
 > >>we agree on, and each party should then adjust the
 > >>existing solution to this kernel, so we get some deep
 > >>testing in the process, and everybody can see if it
 > >>'works' for him or not ...
 > >
 > >ACK.  A collection of patches that we can all agree
 > >on sounds like something worth aiming for.
 > >
 > >It looks like Kirill last round of patches can form
 > >a nucleus for that.  So far I have seem plenty of technical
 > >objects but no objections to the general direction.
 > yup, I will fix everything and will come with a set of patches for IPC,
 > so we could select which way is better to do it :)
 >
 > >So agreement appears possible.
 > Nice to hear this!
 >
 > Eric, we have a GIT repo on openvz.org already:
 > http://git.openvz.org
 >
 > we will create a separate branch also called -acked, where patches
 > agreed upon will go.
 
 That's ok by me.  If a more neutral name/site were preferred, we could
 use the sf.net set we had finally gotten around to setting up -
 www.sf.net/projects/lxc  (LinuX Containers).  Unfortunately that would
 likely be just a quilt patch repository.
 
 A wiki + git repository would be ideal.
 
 -serge
 |  
	|  |  |  
	| 
		
			| Re: [RFC] Virtualization steps [message #2290 is a reply to message #2284] | Tue, 28 March 2006 14:38   |  
			| 
				
				
					|  Bill Davidsen Messages: 4
 Registered: March 2006
 | Junior Member |  |  |  
	| Kirill Korotaev wrote: 
 >> Frankly I don't see running 100 VMs as a realistic goal, being able
 >> to run Linux, Windows, Solaris and BEOS unmodified in 4-5 VMs would
 >> be far more useful.
 >
 > It is more than realistic. Hosting companies run more than 100 VPSs in
 > reality. There are also other usefull scenarios. For example, I know
 > the universities which run VPS for every faculty web site, for every
 > department, mail server and so on. Why do you think they want to run
 > only 5VMs on one machine? Much more!
 
 I made no commont on what "they" might want, I want to make the rack of
 underutilized Windows, BSD and Solaris servers go away. An approach
 which doesn't support unmodified guest installs doesn't solve any of my
 current problems. I didn't say it was in any way not useful, just not of
 interest to me. What needs I have for Linux environments are answered by
 jails and/or UML.
 
 --
 bill davidsen <davidsen@tmr.com>
 CTO TMR Associates, Inc
 Doing interesting things with small computers since 1979
 |  
	|  |  |  
	| 
		
			| Re: [RFC] Virtualization steps [message #2293 is a reply to message #2290] | Tue, 28 March 2006 15:03   |  
			| 
				
				
					|  ebiederm Messages: 1354
 Registered: February 2006
 | Senior Member |  |  |  
	| Bill Davidsen <davidsen@tmr.com> writes: 
 > Kirill Korotaev wrote:
 >
 >>> Frankly I don't see running 100 VMs as a realistic goal, being able to run
 >>> Linux, Windows, Solaris and BEOS unmodified in 4-5 VMs would be far more
 >>> useful.
 >>
 >> It is more than realistic. Hosting companies run more than 100 VPSs in
 >> reality. There are also other usefull scenarios. For example, I know the
 >> universities which run VPS for every faculty web site, for every department,
 >> mail server and so on. Why do you think they want to run only 5VMs on one
 >> machine? Much more!
 >
 > I made no commont on what "they" might want, I want to make the rack of
 > underutilized Windows, BSD and Solaris servers go away. An approach which
 > doesn't support unmodified guest installs doesn't solve any of my current
 > problems. I didn't say it was in any way not useful, just not of interest to
 > me. What needs I have for Linux environments are answered by jails and/or UML.
 
 So from one perspective that is what we are building.  A full featured
 jail capable of running an unmodified linux distro.  The cost is
 simply making a way to use the same names twice for the global
 namespaces.  UML may use these features to accelerate it's own processes.
 
 Virtualization is really the wrong word to describe what we are building.  As
 it allows for all kinds of heavy weight implementations, and has an associate
 with much heavier things.
 
 At the extreme end where you only have one process in each logical instance
 of the kernel, a better name would be a heavy weight process.  Where each
 such process sees an environment as if it owned the entire machine.
 
 Eric
 |  
	|  |  |  
	| 
		
			| Re: [RFC] Virtualization steps [message #2294 is a reply to message #2286] | Tue, 28 March 2006 15:35   |  
			| 
				
				
					|  Herbert Poetzl Messages: 239
 Registered: February 2006
 | Senior Member |  |  |  
	| On Tue, Mar 28, 2006 at 07:15:17PM +1000, Nick Piggin wrote: > Kirill Korotaev wrote:
 > >
 > >Nick, will be glad to shed some light on it.
 > >
 >
 > Thanks very much Kirill.
 >
 > I don't think I'm qualified to make any decisions about this,
 > so I don't want to detract from the real discussions, but I
 > just had a couple more questions:
 >
 > >First of all, what it does which low level virtualization can't:
 > >- it allows to run 100 containers on 1GB RAM
 > >  (it is called containers, VE - Virtual Environments,
 > >   VPS - Virtual Private Servers).
 > >- it has no much overhead (<1-2%), which is unavoidable with hardware
 > >  virtualization. For example, Xen has >20% overhead on disk I/O.
 >
 > Are any future hardware solutions likely to improve these problems?
 
 not really, but as you know, "640K ought to be enough
 for anybody", so maybe future hardware developments will
 make shared resources possible (with different kernels)
 
 > >OS kernel virtualization
 > >~~~~~~~~~~~~~~~~~~~~~~~~
 >
 > Is this considered secure enough that multiple untrusted VEs are run
 > on production systems?
 
 definitely! there are many, many, hosting providers
 using exactly this technology to provide Virutal Private
 Servers for their customers, of course, in production
 
 > What kind of users want this, who can't use alternatives like real
 > VMs?
 
 well, the same users who do not want to use Bochs for
 emulating a PC on a PC, when they can use UML for example,
 because it's much faster and easier to use ...
 
 aside from that, Linux-VServer for example, is not only
 designed to create complete virtual servers, it also
 works for service separation and increasing security for
 many applications, like for example:
 
 - test environments (one guest per distro)
 - service separation (one service per 'container')
 - resource management and accounting
 
 > >Summary of previous discussions on LKML
 > >~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 >
 > Have their been any discussions between the groups pushing this
 > virtualization, and ...
 
 yes, the discussions are ongoing ... maybe to clarify the
 situation for the folks not involved (projects in
 alphabetical order):
 
 FreeVPS (Free Virtual Private Server Solution):
 ===============================================
 [http://www.freevps.com/]
 not pushing for inclusion, early Linux-VServer
 spinoff, partially maintained but they seem to have
 other interrests lately
 
 Alex Lyashkov (FreeVPS kernel maintainer)
 [Positive Software Corporation http://www.freevps.com/]
 
 BSD Jail LSM (Linux-Jails security module):
 ===========================================
 [http://kerneltrap.org/node/3823]
 
 Serge E. Hallyn (Patch/Module maintainer) [IBM]
 interested in some kind of mainline solution
 
 Dave Hansen (IBM Linux Technology Center)
 interested in virtualization for context/container
 migration
 
 Linux-VServer (community project, maintained):
 ==============================================
 [http://linux-vserver.org/]
 
 Jacques Gelinas (previous VServer maintainer)
 not pushing for inclusion
 
 Herbert Poetzl (Linux-VServer kernel maintainer)
 not pushing for inclusion, but I want to make damn
 sure that there does not come bloat into the kernel
 and the mainline effords will be usable for
 Linux-VServer and similar ...
 
 Sam Vilain (Refactoring Linux-VServer patches)
 [Catalyst http://catalyst.net.nz/]
 trying hard to provide a simple/minimalistic version
 of Linux-VServer for mainline
 
 many others, not really pushing anything here :)
 
 OpenVZ (open project, maintained, subset of Virtuozzo(tm)):
 ===========================================================
 [http://openvz.org/]
 
 Kir Kolyshkin (OpenVZ maintainer):
 [SWsoft http://www.swsoft.com I gues?]
 maybe pushing for inclusion ...
 
 Kirill Korotaev (OpenVZ/Virtuozzo kernel developer?)
 [SWsoft http://www.swsoft.com]
 heavily pushing for inclusion ...
 
 Alexey Kuznetsov (Chief Software Engineer)
 [SWsoft http://www.swsoft.com]
 not pushing but supporting company interrests
 
 PID Virtualization (kernel branch for inclusion):
 =================================================
 
 Eric W. Biederman (branch developer/maintainer)
 [XMission http://xmission.com/]
 
 Virtuozzo(tm) (Commercial solution form SWsoft):
 ================================================
 [http://www.virtuozzo.com/]
 
 not involved yet, except via OpenVZ
 
 Stanislav Protassov (Director of Engineering)
 [SWsoft http://www.swsoft.com]
 
 
 A ton of IBM and VZ folks are not listed here, but I
 guess you can figure who is who from the email addresses
 
 there are also a bunch of folks from Columbia and
 Princeton university interested and/or involved in
 kernel level virtualization and context migration.
 
 please extend this list where appropriate, I'm pretty
 sure I forgot at least five important/involved persons
 
 > important kernel developers who are not part of a virtualization
 > effort?
 
 no idea, probably none for now ...
 
 > Ie. is there any consensus about the future of these patches?
 
 what patches? what future?
 
 HTC,
 Herbert
 
 > Thanks,
 > Nick
 >
 > --
 > SUSE Labs, Novell Inc.
 > Send instant messages to your online friends http://au.messenger.yahoo.com
 |  
	|  |  |  
	| 
		
			| Re:  Re: [RFC] Virtualization steps [message #2295 is a reply to message #2285] | Tue, 28 March 2006 15:48   |  
			| 
				
				
					|  TheWiseOne Messages: 66
 Registered: September 2005
 Location: Pennsylvania
 | Member |  |  |  
	| Kirill Korotaev wrote: >> Oh, after you come to an agreement and start posting patches, can you
 >> also outline why we want this in the kernel (what it does that low
 >> level virtualization doesn't, etc, etc), and how and why you've agreed
 >> to implement it. Basically, some background and a summary of your
 >> discussions for those who can't follow everything. Or is that a faq
 >> item?
 > Nick, will be glad to shed some light on it.
 >
 > First of all, what it does which low level virtualization can't:
 > - it allows to run 100 containers on 1GB RAM
 >   (it is called containers, VE - Virtual Environments,
 >    VPS - Virtual Private Servers).
 > - it has no much overhead (<1-2%), which is unavoidable with hardware
 >   virtualization. For example, Xen has >20% overhead on disk I/O.
 
 I think the Xen guys would disagree with you on this.  Xen claims <3%
 overhead on the XenSource site.
 
 Where did you get these figures from?  What Xen version did you test?
 What was your configuration? Did you have kernel debugging enabled? You
 can't just post numbers without the data to back it up, especially when
 it conflicts greatly with the Xen developers statements.  AFAIK Xen is
 well on it's way to inclusion into the mainstream kernel.
 
 Thank you,
 Matt Ayres
 |  
	|  |  |  
	| 
		
			| Re: [RFC] Virtualization steps [message #2296 is a reply to message #2286] | Tue, 28 March 2006 16:15   |  
			| 
				
				
					|  ebiederm Messages: 1354
 Registered: February 2006
 | Senior Member |  |  |  
	| Nick Piggin <nickpiggin@yahoo.com.au> writes: 
 > Kirill Korotaev wrote:
 >> Nick, will be glad to shed some light on it.
 >>
 >
 > Thanks very much Kirill.
 >
 > I don't think I'm qualified to make any decisions about this,
 > so I don't want to detract from the real discussions, but I
 > just had a couple more questions:
 >
 >> First of all, what it does which low level virtualization can't:
 >> - it allows to run 100 containers on 1GB RAM
 >>   (it is called containers, VE - Virtual Environments,
 >>    VPS - Virtual Private Servers).
 >> - it has no much overhead (<1-2%), which is unavoidable with hardware
 >>   virtualization. For example, Xen has >20% overhead on disk I/O.
 >
 > Are any future hardware solutions likely to improve these problems?
 
 This isn't a direct competition, both solutions coincide nicely.
 
 The major efficiency differences are fundamental to the approaches and
 can only be solved in software and not hardware.  The fundamental efficiency
 limits of low level virtualization are not sharing resources between
 instances well (think how hard memory hotplug is to solve), the fact
 that running a kernel takes at least 1MB for just the kernel, the
 fact that no matter how good your hypervisor is there will be some
 hardware interface it doesn't virtualize.
 
 Whereas what we are aiming at are just enough modifications to the kernel
 to allow multiple instances of user space.  We aren't virtualizing anything
 that isn't already virtualized in the kernel.
 
 >> OS kernel virtualization
 >> ~~~~~~~~~~~~~~~~~~~~~~~~
 >
 > Is this considered secure enough that multiple untrusted VEs are run
 > on production systems?
 
 Kirill or Herbert can give a better answer but that is of the major
 points of BSD Jails and their kin is it not?
 
 > What kind of users want this, who can't use alternatives like real
 > VMs?
 
 Well that question assumes a lot.  The answer that assumes a lot
 in the other direction is that adding an additional unnecessary layers
 just complicates the problem and slows things down for no reason
 while making it so you can't assume the solution is always present.
 In addition to doing it in a non-portable way so it is only available
 on a few platforms.
 
 I can't even think of a straight answer to the users question.
 
 My users are in the high performance computing realm, and for that
 subset it is easy.  Xen and it's kin don't virtualize the high
 bandwidth low latency communication hardware that is used, and that
 may not even be possible.  Using a hypervisor in a situation like that
 certainly isn't general or easily maintainable.  (Think about
 what a challenge it has been to get usable infiniband drivers merged).
 
 >> Summary of previous discussions on LKML
 >> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 >
 > Have their been any discussions between the groups pushing this
 > virtualization, and important kernel developers who are not part of
 > a virtualization effort? Ie. is there any consensus about the
 > future of these patches?
 
 Yes, but just enough to give us hope :)
 
 Unless you count the mount namespace as part of this in which case
 pieces are already merged.
 
 The challenging is that writing kernel code that does this is
 easy.  Writing kernel code that is mergeable and that the different
 groups all agree meets their requirements is much harder.  It has
 taken us until now to have a basic approach that we all agree on.
 Now we get to beat each other up over the technical details :)
 
 Eric
 |  
	|  |  |  
	|  |  
	|  |  
	|  |  
	|  |  
	| 
		
			| Re:  Re: [RFC] Virtualization steps [message #2304 is a reply to message #2206] | Tue, 28 March 2006 20:26   |  
			| 
				
				
					|  Jun OKAJIMA Messages: 30
 Registered: March 2006
 | Member |  |  |  
	| > >I'll summarize it this way: low-level virtualization uses resource
 >inefficiently.
 >
 >With this higher-level stuff, you get to share all of the Linux caching,
 >and can do things like sharing libraries pretty naturally.
 >
 >They are also much lighter-weight to create and destroy than full
 >virtual machines.  We were planning on doing some performance
 >comparisons versus some hypervisors like Xen and the ppc64 one to show
 >scaling with the number of virtualized instances.  Creating 100 of these
 >Linux containers is as easy as a couple of shell scripts, but we still
 >can't find anybody crazy enough to go create 100 Xen VMs.
 >
 >Anyway, those are the things that came to my mind first.  I'm sure the
 >others involved have their own motivations.
 >
 
 Some questions.
 
 1. Your point is rignt in some ways, and I agree with you.
 Yes, I currently guess Jail is quite practical than Xen.
 Xen sounds cool, but really practical? I doubt a bit.
 But it would be a narrow thought, maybe.
 How you estimate feature improvement of memory shareing
 on VM ( e.g. Xen/VMware)?
 I have seen there are many papers about this issue.
 If once memory sharing gets much efficient, Xen possibly wins.
 
 2. Folks, how you think about other good points of Xen,
 like live migration, or runs solaris, or has suspend/resume or...
 No Linux jails have such feature for now, although I dont think
 it is impossible with jail.
 
 
 My current suggestion is,
 
 1. Dont use Xen for running multiple VMs.
 2. Use Xen for better admin/operation/deploy... tools.
 3. If you need multiple VMs, use jail on Xen.
 
 --- Okajima, Jun. Tokyo, Japan.
 http://www.digitalinfra.co.jp/
 http://www.colinux.org/
 http://www.machboot.com/
 |  
	|  |  |  
	| 
		
			| Re:  Re: [RFC] Virtualization steps [message #2305 is a reply to message #2304] | Tue, 28 March 2006 20:50   |  
			|  |  
	| Jun OKAJIMA wrote: 
 >>I'll summarize it this way: low-level virtualization uses resource
 >>inefficiently.
 >>
 >>With this higher-level stuff, you get to share all of the Linux caching,
 >>and can do things like sharing libraries pretty naturally.
 >>
 >>They are also much lighter-weight to create and destroy than full
 >>virtual machines.  We were planning on doing some performance
 >>comparisons versus some hypervisors like Xen and the ppc64 one to show
 >>scaling with the number of virtualized instances.  Creating 100 of these
 >>Linux containers is as easy as a couple of shell scripts, but we still
 >>can't find anybody crazy enough to go create 100 Xen VMs.
 >>
 >>Anyway, those are the things that came to my mind first.  I'm sure the
 >>others involved have their own motivations.
 >>
 >>
 >>
 >
 >Some questions.
 >
 >1. Your point is rignt in some ways, and I agree with you.
 >   Yes, I currently guess Jail is quite practical than Xen.
 >   Xen sounds cool, but really practical? I doubt a bit.
 >   But it would be a narrow thought, maybe.
 >   How you estimate feature improvement of memory shareing
 >   on VM ( e.g. Xen/VMware)?
 >   I have seen there are many papers about this issue.
 >   If once memory sharing gets much efficient, Xen possibly wins.
 >
 >
 This is not just about memory sharing. Dynamic resource management is
 hardly possible in a model where you have multiple kernels running; all
 of those kernel were designed to run on a dedicated hardware. As it was
 pointed out, adding/removing memory from a Xen guest during runtime is
 tricky.
 
 Finally, multiple-kernels-on-top-of-hypervisor architecture is just more
 complex and has more overhead then one-kernel-with-many-namespaces.
 
 >2. Folks, how you think about other good points of Xen,
 >   like live migration, or runs solaris, or has suspend/resume or...
 >
 >
 OpenVZ will have live zero downtime migration and suspend/resume some
 time next month.
 
 >   No Linux jails have such feature for now, although I dont think
 >   it is impossible with jail.
 >
 >
 >My current suggestion is,
 >
 >1. Dont use Xen for running multiple VMs.
 >2. Use Xen for better admin/operation/deploy... tools.
 >
 >
 This point is controversial. Tools are tools -- they can be made to
 support Xen, Linux VServer, UML, OpenVZ, VMware -- or even all of them!
 
 But anyway, speaking of tools and better admin operations, what it takes
 to create a Xen domain (I mean create all those files needed to run a
 new Xen domain), and how much time it takes? Say, in OpenVZ creation of
 a VE (Virtual Environment) is a matter of unpacking a ~100MB tarball and
 copying 1K config file -- which essentially means one can create a VE in
 a minute. Linux-VServer should be pretty much the same.
 
 Another concern is, yes, manageability. In OpenVZ model the host system
 can easily access all the VPSs' files, making, say, a mass software
 update a reality. You can easily change some settings in 100+ VEs very
 easy. In systems based on Xen and, say, VMware one should log in into
 each system, one by one, to administer them, which is not unlike the
 'separate physical server' model.
 
 >3. If you need multiple VMs, use jail on Xen.
 >
 >
 Indeed, a mixed approach is very interesting. You can run OpenVZ or
 Linux-VServer in a Xen domain, that makes a lot of sense.
 |  
	|  |  |  
	|  |  
	| 
		
			| Re:  Re: [RFC] Virtualization steps [message #2308 is a reply to message #2305] | Tue, 28 March 2006 21:35   |  
			| 
				
				
					|  Jun OKAJIMA Messages: 30
 Registered: March 2006
 | Member |  |  |  
	| > >>2. Folks, how you think about other good points of Xen,
 >>   like live migration, or runs solaris, or has suspend/resume or...
 >>
 >>
 >OpenVZ will have live zero downtime migration and suspend/resume some
 >time next month.
 >
 
 COOL!!!!
 
 >>
 >>1. Dont use Xen for running multiple VMs.
 >>2. Use Xen for better admin/operation/deploy... tools.
 >>
 >>
 >This point is controversial. Tools are tools -- they can be made to
 >support Xen, Linux VServer, UML, OpenVZ, VMware -- or even all of them!
 >
 >But anyway, speaking of tools and better admin operations, what it takes
 >to create a Xen domain (I mean create all those files needed to run a
 >new Xen domain), and how much time it takes? Say, in OpenVZ creation of
 >a VE (Virtual Environment) is a matter of unpacking a ~100MB tarball and
 >copying 1K config file -- which essentially means one can create a VE in
 >a minute. Linux-VServer should be pretty much the same.
 >
 >Another concern is, yes, manageability. In OpenVZ model the host system
 >can easily access all the VPSs' files, making, say, a mass software
 >update a reality. You can easily change some settings in 100+ VEs very
 >easy. In systems based on Xen and, say, VMware one should log in into
 >each system, one by one, to administer them, which is not unlike the
 >'separate physical server' model.
 >
 >>3. If you need multiple VMs, use jail on Xen.
 >>
 >>
 >Indeed, a mixed approach is very interesting. You can run OpenVZ or
 >Linux-VServer in a Xen domain, that makes a lot of sense.
 >
 >
 
 Sorry for making misunderstanding.
 What I wanted to say with "2" (use Xen as a tool) is, probably same as
 what you are guessing now.
 I mean, you make a server like this.
 1. Install jailed Linux(OpenVZ/Vserver/or..) on Xen
 2. make only one domU. and many VMs on this domU with jail.
 3. runs many (more than 100 or...) VMs with jail, not with Xen.
 4. but, for example, you want to migrate to another PC,
 use Xen live migration.
 The fourth point would help administration tasks easier. This is the point
 where I mentioned about "better tool".
 There is other usage of Xen as admin tool. For example, if you need device
 driver (e.g. new iSCSI H/W driver or gigabit ether or...) of 2.6 kernel, but
 no need to use any other 2.6 funcs, keep guest OS (domU) as 2.4, and make
 dom0 as 2.6 Xen.  This also helps admin tasks.
 Probably, the biggest problem for now is, Xen patch conflicts with
 Vserver/OpenVZ patch.
 
 
 --- Okajima, Jun. Tokyo, Japan.
 |  
	|  |  |  
	|  |  
	|  |  
	|  |  
	| 
		
			| Re:  Re: [RFC] Virtualization steps [message #2312 is a reply to message #2310] | Tue, 28 March 2006 22:24   |  
			| 
				
				
					|  Kir Kolyshkin Messages: 6
 Registered: September 2005
 | Junior Member |  |  |  
	| Sam Vilain wrote: 
 >On Tue, 2006-03-28 at 10:45 +0400, Kir Kolyshkin wrote:
 >
 >
 >>It is actually not a future goal, but rather a reality. Since os-level
 >>virtualization overhead is very low (1-2 per cent or so), one can run
 >>hundreds of VEs.
 >>
 >>
 >
 >Huh?  You managed to measure it!?  Or do you just mean "negligible" by
 >"1-2 per cent" ?  :-)
 >
 >
 We run different tests to measure OpenVZ/Virtuozzo overhead, as we do
 care much for that stuff. I do not remember all the gory details at the
 moment, but I gave the correct numbers: "1-2 per cent or so".
 
 There are things such as networking (OpenVZ's venet device) overhead, a
 fair cpu scheduler overhead, something else.
 
 Why do you think it can not be measured? It either can be, or it is too
 low to be measured reliably (a fraction of a per cent or so).
 
 Regards,
 Kir.
 |  
	|  |  |  
	| 
		
			| Re: [RFC] Virtualization steps [message #2313 is a reply to message #2283] | Tue, 28 March 2006 22:50   |  
			| 
				
				
					|  Sam Vilain Messages: 73
 Registered: February 2006
 | Member |  |  |  
	| On Tue, 2006-03-28 at 12:51 +0400, Kirill Korotaev wrote: > we will create a separate branch also called -acked, where patches
 > agreed upon will go.
 
 No need.  Just use Acked-By: comments.
 
 Also, can I give some more feedback on the way you publish your patches:
 
 1. git's replication uses the notion of a forward-only commit list.
 So, if you change patches or rebase them then you have to rewind
 the base point - which in pure git terms means create a new head.
 So, you should use the convention of putting some identifier - a
 date, or a version number - in each head.
 
 2. Why do you have a seperate repository for your normal openvz and the
 -ms trees?  You can just you different heads.
 
 3. Apache was doing something weird to the HEAD symlink in your
 repository.  (mind you, if you adopt notion 1., this becomes
 irrelevant :-))
 
 Otherwise, it's a great thing to see your patches published via git!
 
 I can't recommend Stacked Git more highly for performing the 'winding'
 of the patch stack necessary for revising patches.  Google for "stgit".
 
 Sam.
 |  
	|  |  |  
	| 
		
			| Re: [RFC] Virtualization steps [message #2314 is a reply to message #2290] | Tue, 28 March 2006 23:07   |  
			| 
				
				
					|  Sam Vilain Messages: 73
 Registered: February 2006
 | Member |  |  |  
	| On Tue, 2006-03-28 at 09:41 -0500, Bill Davidsen wrote: > > It is more than realistic. Hosting companies run more than 100 VPSs in
 > > reality. There are also other usefull scenarios. For example, I know
 > > the universities which run VPS for every faculty web site, for every
 > > department, mail server and so on. Why do you think they want to run
 > > only 5VMs on one machine? Much more!
 >
 > I made no commont on what "they" might want, I want to make the rack of
 > underutilized Windows, BSD and Solaris servers go away. An approach
 > which doesn't support unmodified guest installs doesn't solve any of my
 > current problems. I didn't say it was in any way not useful, just not of
 > interest to me. What needs I have for Linux environments are answered by
 > jails and/or UML.
 
 We are talking about adding jail technology, also known as containers on
 Solaris and vserver/openvz on Linux, to the mainline kernel.
 
 So, you are obviously interested!
 
 Because of course, you can take an unmodified filesystem of the guest
 and assuming the kernels are compatible run them without changes.  I
 find this consolidation approach indispensible.
 
 Sam.
 |  
	|  |  |  
	|  |  
	| 
		
			| Re:  Re: [RFC] Virtualization steps [message #2316 is a reply to message #2312] | Tue, 28 March 2006 23:28   |  
			| 
				
				
					|  Sam Vilain Messages: 73
 Registered: February 2006
 | Member |  |  |  
	| On Wed, 2006-03-29 at 02:24 +0400, Kir Kolyshkin wrote: > >Huh?  You managed to measure it!?  Or do you just mean "negligible" by
 > >"1-2 per cent" ?  :-)
 > We run different tests to measure OpenVZ/Virtuozzo overhead, as we do
 > care much for that stuff. I do not remember all the gory details at the
 > moment, but I gave the correct numbers: "1-2 per cent or so".
 >
 > There are things such as networking (OpenVZ's venet device) overhead, a
 > fair cpu scheduler overhead, something else.
 >
 > Why do you think it can not be measured? It either can be, or it is too
 > low to be measured reliably (a fraction of a per cent or so).
 
 Well, for instance the fair CPU scheduling overhead is so tiny it may as
 well not be there in the VServer patch.  It's just a per-vserver TBF
 that feeds back into the priority (and hence timeslice length) of the
 process.  ie, you get "CPU tokens" which deplete as processes in your
 vserver run and you either get a boost or a penalty depending on the
 level of the tokens in the bucket.  This doesn't provide guarantees, but
 works well for many typical workloads.  And once Herbert fixed the SMP
 cacheline problems in my code ;) it was pretty much full speed.  That
 is, until you want it to sacrifice overall performance for enforcing
 limits.
 
 How does your fair scheduler work?  Do you just keep a runqueue for each
 vps?
 
 To be honest, I've never needed to determine whether its overhead is 1%
 or 0.01%, it would just be a meaningless benchmark anyway :-).  I know
 it's "good enough for me".
 
 Sam.
 |  
	|  |  |  
	| 
		
			| Re:  Re: [RFC] Virtualization steps [message #2321 is a reply to message #2295] | Wed, 29 March 2006 00:55   |  
			| 
				
				
					|  Kirill Korotaev Messages: 137
 Registered: January 2006
 | Senior Member |  |  |  
	| > Kirill Korotaev wrote: >>> Oh, after you come to an agreement and start posting patches, can you
 >>> also outline why we want this in the kernel (what it does that low
 >>> level virtualization doesn't, etc, etc), and how and why you've agreed
 >>> to implement it. Basically, some background and a summary of your
 >>> discussions for those who can't follow everything. Or is that a faq
 >>> item?
 >> Nick, will be glad to shed some light on it.
 >>
 >> First of all, what it does which low level virtualization can't:
 >> - it allows to run 100 containers on 1GB RAM
 >>   (it is called containers, VE - Virtual Environments,
 >>    VPS - Virtual Private Servers).
 >> - it has no much overhead (<1-2%), which is unavoidable with hardware
 >>   virtualization. For example, Xen has >20% overhead on disk I/O.
 >
 > I think the Xen guys would disagree with you on this.  Xen claims <3%
 > overhead on the XenSource site.
 >
 > Where did you get these figures from?  What Xen version did you test?
 > What was your configuration? Did you have kernel debugging enabled? You
 > can't just post numbers without the data to back it up, especially when
 > it conflicts greatly with the Xen developers statements.  AFAIK Xen is
 > well on it's way to inclusion into the mainstream kernel.
 I have no exact numbers in the hands as I'm in another country right now.
 But! We tested Xen not long ago with iozone test suite and it gave
 ~20-30% disk I/O overhead. Recently we were testing CPU scheduler and
 EDF scheduler gave me 33% overhead on some very simple loads with almost
 busy loops inside VMs. It also was not providing any good fairness on
 2CPU SMP system to my suprise. You can object to me, but better simply
 retest it if interested yourself. There were other tests as well, which
 reported very different overheads on Xen 3. I suppose Xen guys do such
 measurements themself, no?
 And I'm sure, they are constantly improving it, they are doing a good
 work on it.
 
 Thanks,
 Kirill
 |  
	|  |  |  
	| 
		
			| Re: [RFC] Virtualization steps [message #2322 is a reply to message #2286] | Wed, 29 March 2006 01:39   |  
			| 
				
				
					|  dev Messages: 1693
 Registered: September 2005
 Location: Moscow
 | Senior Member |  
 |  |  
	| Nick, 
 >> First of all, what it does which low level virtualization can't:
 >> - it allows to run 100 containers on 1GB RAM
 >>   (it is called containers, VE - Virtual Environments,
 >>    VPS - Virtual Private Servers).
 >> - it has no much overhead (<1-2%), which is unavoidable with hardware
 >>   virtualization. For example, Xen has >20% overhead on disk I/O.
 >
 > Are any future hardware solutions likely to improve these problems?
 Probably you are aware of VT-i/VT-x technologies and planned virtualized
 MMU and I/O MMU from Intel and AMD.
 These features should improve the performance somehow, but there is
 still a limit for decreasing the overhead, since at least disk, network,
 video and such devices should be emulated.
 
 >> OS kernel virtualization
 >> ~~~~~~~~~~~~~~~~~~~~~~~~
 >
 > Is this considered secure enough that multiple untrusted VEs are run
 > on production systems?
 it is secure enough. What makes it secure? In general:
 - virtualization, which makes resources private
 - resource control, which makes VE to be limited with its usages
 In more technical details virtualization projects make user access (and
 capabilities) checks stricter. Moreover, OpenVZ is using "denied by
 default" approach to make sure it is secure and VE users are not allowed
 something else.
 
 Also, about 2-3 month ago we had a security review of OpenVZ project
 made by Solar Designer. So, in general such virtualization approach
 should be not less secure than VM-like one. VM core code is bigger and
 there is enough chances for bugs there.
 
 > What kind of users want this, who can't use alternatives like real
 > VMs?
 Many companies, just can't share their names. But in general no
 enterprise and hosting companies need to run different OSes on the same
 machine. For them it is quite natural to use N machines for Linux and M
 for Windows. And since VEs are much more lightweight and easier to work
 with, they like it very much.
 
 Just for example, OpenVZ core is running more than 300,000 VEs worldwide.
 
 Thanks,
 Kirill
 |  
	|  |  |  
	| 
		
			| Re:  Re: [RFC] Virtualization steps [message #2324 is a reply to message #2316] | Wed, 29 March 2006 09:13   |  
			| 
				
				
					|  Kirill Korotaev Messages: 137
 Registered: January 2006
 | Senior Member |  |  |  
	| Sam, 
 >> Why do you think it can not be measured? It either can be, or it is too
 >> low to be measured reliably (a fraction of a per cent or so).
 >
 > Well, for instance the fair CPU scheduling overhead is so tiny it may as
 > well not be there in the VServer patch.  It's just a per-vserver TBF
 > that feeds back into the priority (and hence timeslice length) of the
 > process.  ie, you get "CPU tokens" which deplete as processes in your
 > vserver run and you either get a boost or a penalty depending on the
 > level of the tokens in the bucket.  This doesn't provide guarantees, but
 > works well for many typical workloads.
 I wonder what is the value of it if it doesn't do guarantees or QoS?
 In our experiments with it we failed to observe any fairness. So I
 suppose the only goal of this is too make sure that maliscuios user want
 consume all the CPU power, right?
 
 > How does your fair scheduler work?  Do you just keep a runqueue for each
 > vps?
 we keep num_online_cpus runqueues per VPS.
 Fairs scheduler is some kind of SFQ like algorithm which selects VPS to
 be scheduled, than standart linux scheduler selects a process in a VPS
 runqueues to run.
 
 > To be honest, I've never needed to determine whether its overhead is 1%
 > or 0.01%, it would just be a meaningless benchmark anyway :-).  I know
 > it's "good enough for me".
 Sure! We feel the same, but people like numbers :)
 
 Thanks,
 Kirill
 |  
	|  |  |  
	| 
		
			| Re:  Re: [RFC] Virtualization steps [message #2326 is a reply to message #2324] | Wed, 29 March 2006 11:08   |  
			| 
				
				
					|  Sam Vilain Messages: 73
 Registered: February 2006
 | Member |  |  |  
	| On Wed, 2006-03-29 at 13:13 +0400, Kirill Korotaev wrote: > > Well, for instance the fair CPU scheduling overhead is so tiny it may as
 > > well not be there in the VServer patch.  It's just a per-vserver TBF
 > > that feeds back into the priority (and hence timeslice length) of the
 > > process.  ie, you get "CPU tokens" which deplete as processes in your
 > > vserver run and you either get a boost or a penalty depending on the
 > > level of the tokens in the bucket.  This doesn't provide guarantees, but
 > > works well for many typical workloads.
 > I wonder what is the value of it if it doesn't do guarantees or QoS?
 
 It still does "QoS".  The TBF has a "fill rate", which is basically N
 tokens per M jiffies.  Then you just set the size of the "bucket", and
 the prio bonus given is between -5 (when bucket is full) and +15 (when
 bucket is empty).  The normal -10 to +10 'interactive' prio bonus is
 reduced to -5 to +5 to compensate.
 
 In other words, it's like a global 'nice' across all of the processes in
 the vserver.
 
 So, these characteristics do provide some level of guarantees, but not
 all that people expect.  eg, people want to say "cap usage at 5%", but
 as designed the scheduler does not ever prevent runnable processes from
 running if the CPUs have nothing better to do, so they think the
 scheduler is broken.  It is also possible with a fork bomb (assuming the
 absence of appropriate ulimits) that you start enough processes that you
 don't care that they are all effectively nice +19.
 
 Herbert later made it add some of these guarantees, but I believe there
 is a performance impact of some kind.
 
 > In our experiments with it we failed to observe any fairness.
 
 Well, it does not aim to be 'fair', it aims to be useful for allocating
 CPU to vservers.   ie, if you allocate X% of the CPU in the system to a
 vserver, and it uses more, then try to make it use less via priority
 penalties - and give others shortchanged or not using the CPU very much
 performance bonuses.  That's all.
 
 So, if you under- or over-book CPU allocation, it doesn't work.  The
 idea was that monitoring it could be shipped out to userland.  I just
 wanted something flexible enough to allow virtually any policy to be put
 into place without wasting too many cycles.
 
 > > How does your fair scheduler work?  Do you just keep a runqueue for each
 > > vps?
 > we keep num_online_cpus runqueues per VPS.
 
 Right.  I considered that approach but just couldn't be bothered
 implementing it, so went with the TBF because it worked and was
 lightweight.
 
 > Fairs scheduler is some kind of SFQ like algorithm which selects VPS to
 > be scheduled, than standart linux scheduler selects a process in a VPS
 > runqueues to run.
 
 Right.
 
 > > To be honest, I've never needed to determine whether its overhead is 1%
 > > or 0.01%, it would just be a meaningless benchmark anyway :-).  I know
 > > it's "good enough for me".
 > Sure! We feel the same, but people like numbers :)
 
 Sometimes the answer has to be "mu".
 
 Sam.
 |  
	|  |  |  
	| 
		
			| Re:  Re: [RFC] Virtualization steps [message #2333 is a reply to message #2324] | Wed, 29 March 2006 13:45   |  
			| 
				
				
					|  Herbert Poetzl Messages: 239
 Registered: February 2006
 | Senior Member |  |  |  
	| On Wed, Mar 29, 2006 at 01:13:14PM +0400, Kirill Korotaev wrote: > Sam,
 >
 > >>Why do you think it can not be measured? It either can be, or it is too
 > >>low to be measured reliably (a fraction of a per cent or so).
 > >
 > >Well, for instance the fair CPU scheduling overhead is so tiny it may as
 > >well not be there in the VServer patch.  It's just a per-vserver TBF
 > >that feeds back into the priority (and hence timeslice length) of the
 > >process.  ie, you get "CPU tokens" which deplete as processes in your
 > >vserver run and you either get a boost or a penalty depending on the
 > >level of the tokens in the bucket.  This doesn't provide guarantees, but
 > >works well for many typical workloads.
 
 > I wonder what is the value of it if it doesn't do guarantees or QoS?
 > In our experiments with it we failed to observe any fairness.
 
 probably a misconfiguration on your side ...
 
 > So I suppose the only goal of this is too make sure that maliscuios
 > user want consume all the CPU power, right?
 
 the currently used scheduler extensions do much
 more than that, basically all kinds of scenarios
 can be satisfied with it, at almost no overhead
 
 > >How does your fair scheduler work?
 > >Do you just keep a runqueue for each vps?
 > we keep num_online_cpus runqueues per VPS.
 
 > Fairs scheduler is some kind of SFQ like algorithm which selects VPS
 > to be scheduled, than standart linux scheduler selects a process in a
 > VPS runqueues to run.
 >
 > >To be honest, I've never needed to determine whether its overhead is 1%
 > >or 0.01%, it would just be a meaningless benchmark anyway :-).  I know
 > >it's "good enough for me".
 
 > Sure! We feel the same, but people like numbers :)
 
 well, do you have numbers?
 
 best,
 Herbert
 
 > Thanks,
 > Kirill
 |  
	|  |  | 
 
 
 Current Time: Sat Oct 25 20:01:11 GMT 2025 
 Total time taken to generate the page: 0.09111 seconds |