OpenVZ Forum - RDF feed
https://new-forum.openvz.org/index.php
[PATCH v7 09/10] IPC: message queue copy feature introduced
https://new-forum.openvz.org/index.phpindex.php?t=rview&goto=48474&th=11228#msg_48474
IOW, c/r requires some way to get all pending IPC messages without deleting
them from the queue (checkpoint can fail and in this case tasks will be resumed,
so queue have to be valid).
To achive this, new operation flag MSG_COPY for sys_msgrcv() system call was
introduced. If this flag was specified, then mtype is interpreted as number of
the message to copy.
If MSG_COPY is set, then kernel will allocate dummy message with passed size,
and then use new copy_msg() helper function to copy desired message (instead of
unlinking it from the queue).
Notes:
1) Return -ENOSYS if MSG_COPY is specified, but CONFIG_CHECKPOINT_RESTORE is
not set.
extern void recompute_msgmni(struct ipc_namespace *);]]>Stanislav Kinsbursky2012-10-18T10:23:22-00:00Re: [PATCH v7 09/10] IPC: message queue copy feature introduced
https://new-forum.openvz.org/index.phpindex.php?t=rview&goto=48480&th=11228#msg_48480
<skinsbursky@parallels.com> wrote:
> This patch is required for checkpoint/restore in userspace.
> IOW, c/r requires some way to get all pending IPC messages without deleting
> them from the queue (checkpoint can fail and in this case tasks will be resumed,
> so queue have to be valid).
> To achive this, new operation flag MSG_COPY for sys_msgrcv() system call was
> introduced. If this flag was specified, then mtype is interpreted as number of
> the message to copy.
> If MSG_COPY is set, then kernel will allocate dummy message with passed size,
> and then use new copy_msg() helper function to copy desired message (instead of
> unlinking it from the queue).
>
> Notes:
> 1) Return -ENOSYS if MSG_COPY is specified, but CONFIG_CHECKPOINT_RESTORE is
> not set.
Stanislav,
A naive question, because I have not followed C/R closely. How do you
deal with the case that other processes may be reading from the queue?
(Or is that disabled during checkpointing?)
Thanks,
Michael]]>Michael Kerrisk (man-2012-10-18T10:39:09-00:00Re: [PATCH v7 09/10] IPC: message queue copy feature introduced
https://new-forum.openvz.org/index.phpindex.php?t=rview&goto=48482&th=11228#msg_48482
> On Thu, Oct 18, 2012 at 12:23 PM, Stanislav Kinsbursky
> <skinsbursky@parallels.com> wrote:
>> This patch is required for checkpoint/restore in userspace.
>> IOW, c/r requires some way to get all pending IPC messages without deleting
>> them from the queue (checkpoint can fail and in this case tasks will be resumed,
>> so queue have to be valid).
>> To achive this, new operation flag MSG_COPY for sys_msgrcv() system call was
>> introduced. If this flag was specified, then mtype is interpreted as number of
>> the message to copy.
>> If MSG_COPY is set, then kernel will allocate dummy message with passed size,
>> and then use new copy_msg() helper function to copy desired message (instead of
>> unlinking it from the queue).
>>
>> Notes:
>> 1) Return -ENOSYS if MSG_COPY is specified, but CONFIG_CHECKPOINT_RESTORE is
>> not set.
>
> Stanislav,
>
> A naive question, because I have not followed C/R closely. How do you
> deal with the case that other processes may be reading from the queue?
> (Or is that disabled during checkpointing?)
>
To be honest, in this case behaviour in user-space is unpredictable.
I.e. if you have, for example, 5 messages in queue and going to peek them all,
and another process is reading the queue in the same time, then, most probably,
you won't peek all the 5 and receive ENOMSG.
But this case can be easily handled by user-space application (number of
messages in queue can be discovered before peeking).
Note, that in CRIU IPC resources will be collected when all processes to migrate
are frozen.
> Thanks,
>
> Michael
>
--
Best regards,
Stanislav Kinsbursky]]>Stanislav Kinsbursky2012-10-18T11:02:32-00:00Re: [PATCH v7 09/10] IPC: message queue copy feature introduced
https://new-forum.openvz.org/index.phpindex.php?t=rview&goto=48484&th=11228#msg_48484
<skinsbursky@parallels.com> wrote:
> 18.10.2012 14:39, Michael Kerrisk (man-pages) пишет:
>
>> On Thu, Oct 18, 2012 at 12:23 PM, Stanislav Kinsbursky
>> <skinsbursky@parallels.com> wrote:
>>>
>>> This patch is required for checkpoint/restore in userspace.
>>> IOW, c/r requires some way to get all pending IPC messages without
>>> deleting
>>> them from the queue (checkpoint can fail and in this case tasks will be
>>> resumed,
>>> so queue have to be valid).
>>> To achive this, new operation flag MSG_COPY for sys_msgrcv() system call
>>> was
>>> introduced. If this flag was specified, then mtype is interpreted as
>>> number of
>>> the message to copy.
>>> If MSG_COPY is set, then kernel will allocate dummy message with passed
>>> size,
>>> and then use new copy_msg() helper function to copy desired message
>>> (instead of
>>> unlinking it from the queue).
>>>
>>> Notes:
>>> 1) Return -ENOSYS if MSG_COPY is specified, but CONFIG_CHECKPOINT_RESTORE
>>> is
>>> not set.
>>
>>
>> Stanislav,
>>
>> A naive question, because I have not followed C/R closely. How do you
>> deal with the case that other processes may be reading from the queue?
>> (Or is that disabled during checkpointing?)
>>
>
> To be honest, in this case behaviour in user-space is unpredictable.
> I.e. if you have, for example, 5 messages in queue and going to peek them
> all, and another process is reading the queue in the same time, then, most
> probably, you won't peek all the 5 and receive ENOMSG.
> But this case can be easily handled by user-space application (number of
> messages in queue can be discovered before peeking).
>
> Note, that in CRIU IPC resources will be collected when all processes to
> migrate are frozen.
Perhaps I am missing something fundamental, but how can C/R sanely do
anything at all here?
For example, suppose a process reads and processes a message after you
read it with MSG_COPY. Then the remaining messages are all shifted by
one position, and you are going to miss reading one of them. IIUC the
idea of MSG_COPY is to allow you to retrieve a copy of all messages in
the list. It sounds like there's no way this can be done reliably. So,
what possible use does the operation have?
Thanks,
Michael
--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Author of "The Linux Programming Interface"; http://man7.org/tlpi/]]>Michael Kerrisk (man-2012-10-18T11:18:17-00:00Re: [PATCH v7 09/10] IPC: message queue copy feature introduced
https://new-forum.openvz.org/index.phpindex.php?t=rview&goto=48485&th=11228#msg_48485
> On Thu, Oct 18, 2012 at 1:02 PM, Stanislav Kinsbursky
> <skinsbursky@parallels.com> wrote:
>> 18.10.2012 14:39, Michael Kerrisk (man-pages) пишет:
>>
>>> On Thu, Oct 18, 2012 at 12:23 PM, Stanislav Kinsbursky
>>> <skinsbursky@parallels.com> wrote:
>>>>
>>>> This patch is required for checkpoint/restore in userspace.
>>>> IOW, c/r requires some way to get all pending IPC messages without
>>>> deleting
>>>> them from the queue (checkpoint can fail and in this case tasks will be
>>>> resumed,
>>>> so queue have to be valid).
>>>> To achive this, new operation flag MSG_COPY for sys_msgrcv() system call
>>>> was
>>>> introduced. If this flag was specified, then mtype is interpreted as
>>>> number of
>>>> the message to copy.
>>>> If MSG_COPY is set, then kernel will allocate dummy message with passed
>>>> size,
>>>> and then use new copy_msg() helper function to copy desired message
>>>> (instead of
>>>> unlinking it from the queue).
>>>>
>>>> Notes:
>>>> 1) Return -ENOSYS if MSG_COPY is specified, but CONFIG_CHECKPOINT_RESTORE
>>>> is
>>>> not set.
>>>
>>>
>>> Stanislav,
>>>
>>> A naive question, because I have not followed C/R closely. How do you
>>> deal with the case that other processes may be reading from the queue?
>>> (Or is that disabled during checkpointing?)
>>>
>>
>> To be honest, in this case behaviour in user-space is unpredictable.
>> I.e. if you have, for example, 5 messages in queue and going to peek them
>> all, and another process is reading the queue in the same time, then, most
>> probably, you won't peek all the 5 and receive ENOMSG.
>> But this case can be easily handled by user-space application (number of
>> messages in queue can be discovered before peeking).
>>
>> Note, that in CRIU IPC resources will be collected when all processes to
>> migrate are frozen.
>
> Perhaps I am missing something fundamental, but how can C/R sanely do
> anything at all here?
>
> For example, suppose a process reads and processes a message after you
> read it with MSG_COPY. Then the remaining messages are all shifted by
> one position, and you are going to miss reading one of them. IIUC the
> idea of MSG_COPY is to allow you to retrieve a copy of all messages in
> the list. It sounds like there's no way this can be done reliably. So,
> what possible use does the operation have?
>
First of all, this problem exist as is regardless to C/R feature or this patch
set. If you share some resource (like message queue in this particular case)
system-wide, then any process A can read out a message, which was send by
process B to process C. So, when processes uses IPC message queues, they should
be designed to handle such failures.
Second, it's up to user-space how to handle such things. It's implied, that
user, trying to migrate some process, holding one end of queue, will also
migrate another process, holding second end.
Third, there is IPC namespace, which isolates IPC objects. It can be used for
safe migration of process tree.
> Thanks,
>
> Michael
>
>
--
Best regards,
Stanislav Kinsbursky]]>Stanislav Kinsbursky2012-10-18T11:34:47-00:00Re: [PATCH v7 09/10] IPC: message queue copy feature introduced
https://new-forum.openvz.org/index.phpindex.php?t=rview&goto=48494&th=11228#msg_48494
skinsbursky@parallels.com> writes:
> First of all, this problem exist as is regardless to C/R feature or this patch
> set. If you share some resource (like message queue in this particular case)
> system-wide, then any process A can read out a message, which was send by
> process B to process C. So, when processes uses IPC message queues, they should
> be designed to handle such failures.
>
> Second, it's up to user-space how to handle such things. It's implied, that
> user, trying to migrate some process, holding one end of queue, will also
> migrate another process, holding second end.
>
> Third, there is IPC namespace, which isolates IPC objects. It can be used for
> safe migration of process tree.
This does raise an interesting question.
What is the point of the message copy feature? It appears to be simply
an optimization and not needed to actually perform the
checkpoint/restart. If you are going to restart the processes you can
read all of the messages and then write all of the messages back before
you restart the processes.
Eric]]>ebiederm2012-10-18T11:52:48-00:00Re: [PATCH v7 09/10] IPC: message queue copy feature introduced
https://new-forum.openvz.org/index.phpindex.php?t=rview&goto=48489&th=11228#msg_48489
>>>> deal with the case that other processes may be reading from the queue?
>>>> (Or is that disabled during checkpointing?)
>>>>
>>>
>>> To be honest, in this case behaviour in user-space is unpredictable.
>>> I.e. if you have, for example, 5 messages in queue and going to peek them
>>> all, and another process is reading the queue in the same time, then,
>>> most
>>> probably, you won't peek all the 5 and receive ENOMSG.
>>> But this case can be easily handled by user-space application (number of
>>> messages in queue can be discovered before peeking).
>>>
>>> Note, that in CRIU IPC resources will be collected when all processes to
>>> migrate are frozen.
>>
>>
>> Perhaps I am missing something fundamental, but how can C/R sanely do
>> anything at all here?
>>
>> For example, suppose a process reads and processes a message after you
>> read it with MSG_COPY. Then the remaining messages are all shifted by
>> one position, and you are going to miss reading one of them. IIUC the
>> idea of MSG_COPY is to allow you to retrieve a copy of all messages in
>> the list. It sounds like there's no way this can be done reliably. So,
>> what possible use does the operation have?
>>
>
> First of all, this problem exist as is regardless to C/R feature or this
> patch set. If you share some resource (like message queue in this particular
> case) system-wide, then any process A can read out a message, which was send
> by process B to process C. So, when processes uses IPC message queues, they
> should be designed to handle such failures.
>
> Second, it's up to user-space how to handle such things. It's implied, that
> user, trying to migrate some process, holding one end of queue, will also
> migrate another process, holding second end.
>
> Third, there is IPC namespace, which isolates IPC objects. It can be used
> for safe migration of process tree.
Is there somewhere a *detailed* description of how this feature would
be used? Lacking that, it's really hard to see how anything sane and
reliable can be done with MSG_COPY.
--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Author of "The Linux Programming Interface"; http://man7.org/tlpi/]]>Michael Kerrisk (man-2012-10-18T11:55:07-00:00Re: [PATCH v7 09/10] IPC: message queue copy feature introduced
https://new-forum.openvz.org/index.phpindex.php?t=rview&goto=48490&th=11228#msg_48490
>>>>> A naive question, because I have not followed C/R closely. How do you
>>>>> deal with the case that other processes may be reading from the queue?
>>>>> (Or is that disabled during checkpointing?)
>>>>>
>>>>
>>>> To be honest, in this case behaviour in user-space is unpredictable.
>>>> I.e. if you have, for example, 5 messages in queue and going to peek them
>>>> all, and another process is reading the queue in the same time, then,
>>>> most
>>>> probably, you won't peek all the 5 and receive ENOMSG.
>>>> But this case can be easily handled by user-space application (number of
>>>> messages in queue can be discovered before peeking).
>>>>
>>>> Note, that in CRIU IPC resources will be collected when all processes to
>>>> migrate are frozen.
>>>
>>>
>>> Perhaps I am missing something fundamental, but how can C/R sanely do
>>> anything at all here?
>>>
>>> For example, suppose a process reads and processes a message after you
>>> read it with MSG_COPY. Then the remaining messages are all shifted by
>>> one position, and you are going to miss reading one of them. IIUC the
>>> idea of MSG_COPY is to allow you to retrieve a copy of all messages in
>>> the list. It sounds like there's no way this can be done reliably. So,
>>> what possible use does the operation have?
>>>
>>
>> First of all, this problem exist as is regardless to C/R feature or this
>> patch set. If you share some resource (like message queue in this particular
>> case) system-wide, then any process A can read out a message, which was send
>> by process B to process C. So, when processes uses IPC message queues, they
>> should be designed to handle such failures.
>>
>> Second, it's up to user-space how to handle such things. It's implied, that
>> user, trying to migrate some process, holding one end of queue, will also
>> migrate another process, holding second end.
>>
>> Third, there is IPC namespace, which isolates IPC objects. It can be used
>> for safe migration of process tree.
>
> Is there somewhere a *detailed* description of how this feature would
> be used? Lacking that, it's really hard to see how anything sane and
> reliable can be done with MSG_COPY.
>
These patches are used by CRIU already.
So, you can have a look at the CRIU source code:
Sanity and reliability on the level you are talking about can be achieved, only
if you'll freeze all message users before peeking.
--
Best regards,
Stanislav Kinsbursky]]>Stanislav Kinsbursky2012-10-18T12:05:52-00:00Re: [PATCH v7 09/10] IPC: message queue copy feature introduced
https://new-forum.openvz.org/index.phpindex.php?t=rview&goto=48491&th=11228#msg_48491
>> be used? Lacking that, it's really hard to see how anything sane and
>> reliable can be done with MSG_COPY.
>>
>
> These patches are used by CRIU already.
> So, you can have a look at the CRIU source code:
>
> http://git.criu.org/?p=crtools.git ;a=blob;f=ipc_ns.c;h=9e259fefcfc04ec0556bb722921545552e1c69f 3;hb=HEAD
>
> Sanity and reliability on the level you are talking about can be achieved,
> only if you'll freeze all message users before peeking.
Okay -- that's the piece I was looking for. Thanks.
In this scenario, how do you find all of the message users? Or do you
simply ensure that everything is frozen beforehand?]]>Michael Kerrisk (man-2012-10-18T12:09:09-00:00Re: [PATCH v7 09/10] IPC: message queue copy feature introduced
https://new-forum.openvz.org/index.phpindex.php?t=rview&goto=48493&th=11228#msg_48493
mtk.manpages@gmail.com> writes:
>>> Is there somewhere a *detailed* description of how this feature would
>>> be used? Lacking that, it's really hard to see how anything sane and
>>> reliable can be done with MSG_COPY.
>>>
>>
>> These patches are used by CRIU already.
>> So, you can have a look at the CRIU source code:
>>
>> http://git.criu.org/?p=crtools.git ;a=blob;f=ipc_ns.c;h=9e259fefcfc04ec0556bb722921545552e1c69f 3;hb=HEAD
>>
>> Sanity and reliability on the level you are talking about can be achieved,
>> only if you'll freeze all message users before peeking.
>
> Okay -- that's the piece I was looking for. Thanks.
>
> In this scenario, how do you find all of the message users? Or do you
> simply ensure that everything is frozen beforehand?
The general design is a container is started with a fresh set of
namespaces and then the entire is frozen using the process freezer
control group. With all of the userspace process frozen checkpoint then
happens.
Eric]]>ebiederm2012-10-18T12:20:54-00:00Re: [PATCH v7 09/10] IPC: message queue copy feature introduced
https://new-forum.openvz.org/index.phpindex.php?t=rview&goto=48492&th=11228#msg_48492
>>> Is there somewhere a *detailed* description of how this feature would
>>> be used? Lacking that, it's really hard to see how anything sane and
>>> reliable can be done with MSG_COPY.
>>>
>>
>> These patches are used by CRIU already.
>> So, you can have a look at the CRIU source code:
>>
>> http://git.criu.org/?p=crtools.git ;a=blob;f=ipc_ns.c;h=9e259fefcfc04ec0556bb722921545552e1c69f 3;hb=HEAD
>>
>> Sanity and reliability on the level you are talking about can be achieved,
>> only if you'll freeze all message users before peeking.
>
> Okay -- that's the piece I was looking for. Thanks.
>
> In this scenario, how do you find all of the message users?
It looks like there is no way to how find these users.
But I don't really think, that this is necessary.
I.e. nothing protects the queue from reading by some alien (not expected)
process in real life.
> Or do you simply ensure that everything is frozen beforehand?
>
In this particular case - yes.
--
Best regards,
Stanislav Kinsbursky]]>Stanislav Kinsbursky2012-10-18T12:41:34-00:00Re: [PATCH v7 09/10] IPC: message queue copy feature introduced
https://new-forum.openvz.org/index.phpindex.php?t=rview&goto=48496&th=11228#msg_48496
> Stanislav Kinsbursky <skinsbursky@parallels.com> writes:
>
>> First of all, this problem exist as is regardless to C/R feature or this patch
>> set. If you share some resource (like message queue in this particular case)
>> system-wide, then any process A can read out a message, which was send by
>> process B to process C. So, when processes uses IPC message queues, they should
>> be designed to handle such failures.
>>
>> Second, it's up to user-space how to handle such things. It's implied, that
>> user, trying to migrate some process, holding one end of queue, will also
>> migrate another process, holding second end.
>>
>> Third, there is IPC namespace, which isolates IPC objects. It can be used for
>> safe migration of process tree.
>
> This does raise an interesting question.
>
> What is the point of the message copy feature? It appears to be simply
> an optimization and not needed to actually perform the
> checkpoint/restart. If you are going to restart the processes you can
> read all of the messages and then write all of the messages back before
> you restart the processes.
>
It's not just an optimisation.
If crtools will fail (with SIGSEGV, for instance), then queue will be empty.
> Eric
>
--
Best regards,
Stanislav Kinsbursky]]>Stanislav Kinsbursky2012-10-18T14:40:01-00:00Re: [PATCH v7 09/10] IPC: message queue copy feature introduced
https://new-forum.openvz.org/index.phpindex.php?t=rview&goto=48505&th=11228#msg_48505
skinsbursky@parallels.com> writes:
> It's not just an optimisation.
> If crtools will fail (with SIGSEGV, for instance), then queue will be empty.
Regardless of what you call the benefit of this enhancement, this
enhancement is not required to implement checkpoint/restart.
For reliability/restartability I suspect a simple enqueue/dequeue loop
over each message in the queue would be nearly as proof against SIGSEGV
and other failures.
So since all of these changes are enhancements we need to know what we
are getting, over just sticking with the existing interfaces.
Unless there is a real bottleneck for something to work, I suspect the
direction forward is to make checkpoint and restart work with the
existing kernel interfaces and then revisit that decision when you
actually have a real problem.
Eric]]>ebiederm2012-10-19T00:52:51-00:00Re: [PATCH v7 09/10] IPC: message queue copy feature introduced
https://new-forum.openvz.org/index.phpindex.php?t=rview&goto=48506&th=11228#msg_48506
> Stanislav Kinsbursky <skinsbursky@parallels.com> writes:
>
>> It's not just an optimisation.
>> If crtools will fail (with SIGSEGV, for instance), then queue will be empty.
>
> Regardless of what you call the benefit of this enhancement, this
> enhancement is not required to implement checkpoint/restart.
>
> For reliability/restartability I suspect a simple enqueue/dequeue loop
> over each message in the queue would be nearly as proof against SIGSEGV
> and other failures.
>
"Nearly as proof" is not good enough for CRIU quality of service.
Moreover, if crtools will fail in this loop, then not only one message will be
lost, but also the queue messages order will be invalid.
> So since all of these changes are enhancements we need to know what we
> are getting, over just sticking with the existing interfaces.
>
> Unless there is a real bottleneck for something to work, I suspect the
> direction forward is to make checkpoint and restart work with the
> existing kernel interfaces and then revisit that decision when you
> actually have a real problem.
>
> Eric
>
--
Best regards,
Stanislav Kinsbursky]]>Stanislav Kinsbursky2012-10-19T07:44:33-00:00