OpenVZ Forum


Home » Mailing lists » Devel » Remaining straight forward kthread API conversions...
Re: [PATCH] kthread: Spontaneous exit support [message #18500 is a reply to message #18489] Mon, 23 April 2007 18:20 Go to previous messageGo to next message
Oleg Nesterov is currently offline  Oleg Nesterov
Messages: 143
Registered: August 2006
Senior Member
On 04/23, Eric W. Biederman wrote:
>
> So I propose we add a kthread_orphan as a basic primitive to decrement the
> count on the task_struct if we want a kthread to simply exit after it
> has done some work.
> 
> And as a helper function we can have a kthread_run_orphan.

Speaking about helpers, could we also add kthread_start(), which should be
used instead of direct wake_up_process() ? Not that it is terribly important,
but still.

Note that "kthread_create() pins the task_struct" allows us to cleanup the code.
Look at this ugly "wait_to_die:" label in migration_thread(). Is is because
migration_thread() can't exit until CPU_DEAD reaps it. Other reasons were already
solved by kthread-enhance-kthread_stop-to-abort-interruptible-sleeps.patch

Oleg.

_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers
Re: [PATCH] powerpc pseries eeh: Convert to kthread API [message #18501 is a reply to message #18295] Tue, 24 April 2007 04:51 Go to previous messageGo to next message
ebiederm is currently offline  ebiederm
Messages: 1354
Registered: February 2006
Senior Member
Paul Mackerras <paulus@samba.org> writes:

> Eric W. Biederman writes:
>
>> Well the basic problem is that for any piece of code that can be modular
>> we need a way to ensure all threads it has running are shutdown when we
>> remove the module.
>
> The EEH code can't be modular, and wouldn't make any sense to be
> modular, since it's part of the infrastructure for accessing PCI
> devices.

Agreed.  However most kthread users are modular and make sense to
be so we need to design to handle modular users.

I don't think the idiom of go fire off a thread to handle something
is specific to non-modular users.

Eric
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers
Re: [PATCH] powerpc pseries eeh: Convert to kthread API [message #18502 is a reply to message #18295] Tue, 24 April 2007 05:43 Go to previous messageGo to next message
ebiederm is currently offline  ebiederm
Messages: 1354
Registered: February 2006
Senior Member
Benjamin Herrenschmidt <benh@kernel.crashing.org> writes:

>> Further in general it doesn't make sense to grab a module reference
>> and call that sufficient because we would like to request that the
>> module exits.
>
> Which is, btw, I think a total misdesign of our module stuff, but heh, I
> remember that lead to some flamewars back then...
>
> Like anything else, modules should have separated the entrypoints for
>
>  - Initiating a removal request
>  - Releasing the module
>
> The former is use did "rmmod", can unregister things from subsystems,
> etc... (and can file if the driver decides to refuse removal requests
> when it's busy doing things or whatever policy that module wants to
> implement).
>
> The later is called when all references to the modules have been
> dropped, it's a bit like the kref "release" (and could be implemented as
> one).
>
> If we had done that (simple) thing back then, module refcounting would
> have been much less of a problem... I remember some reasons why that was
> veto'ed but I didn't and still don't agree.

The basic point is because a thread can terminate sooner if we have an
explicit request to stop, we need that in the design.

Because we need to find the threads to request that they stop we need to
have some way to track them.

Since we need to have some way to track them having an explicit data
structure that the callers manage seems to make sense.

Eric
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers
Re: [PATCH] powerpc pseries eeh: Convert to kthread API [message #18503 is a reply to message #18295] Tue, 24 April 2007 08:37 Go to previous messageGo to next message
Christoph Hellwig is currently offline  Christoph Hellwig
Messages: 59
Registered: April 2006
Member
On Tue, Apr 24, 2007 at 03:55:06PM +1000, Paul Mackerras wrote:
> Christoph Hellwig writes:
> 
> > The first question is obviously, is this really something we want?
> > spawning kernel thread on demand without reaping them properly seems
> > quite dangerous.
> 
> What specifically has to be done to reap a kernel thread?  Are you
> concerned about the number of threads, or about having zombies hanging
> around?

I'm mostly concerned about number of threads and possible leakage of
threads.  Linas already explained it's not a problem in this case,
so it's covered.
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers
Re: [PATCH] powerpc pseries eeh: Convert to kthread API [message #18504 is a reply to message #18295] Tue, 24 April 2007 07:46 Go to previous messageGo to next message
Cornelia Huck is currently offline  Cornelia Huck
Messages: 16
Registered: August 2007
Junior Member
On Tue, 24 Apr 2007 15:00:42 +1000,
Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:

> Like anything else, modules should have separated the entrypoints for
> 
>  - Initiating a removal request
>  - Releasing the module
> 
> The former is use did "rmmod", can unregister things from subsystems,
> etc... (and can file if the driver decides to refuse removal requests
> when it's busy doing things or whatever policy that module wants to
> implement).
> 
> The later is called when all references to the modules have been
> dropped, it's a bit like the kref "release" (and could be implemented as
> one).

That sounds quite similar to the problems we have with kobject
refcounting vs. module unloading. The patchset I posted at
http://marc.info/?l=linux-kernel&m=117679014404994&w=2 exposes the
refcount of the kobject embedded in the module. Maybe the kthread code
could use that reference as well?
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers
Re: [PATCH] powerpc pseries eeh: Convert to kthread API [message #18505 is a reply to message #18493] Tue, 24 April 2007 01:38 Go to previous messageGo to next message
Benjamin Herrenschmid is currently offline  Benjamin Herrenschmid
Messages: 10
Registered: February 2006
Junior Member
> The only reason for using threads here is to get the error recovery
> out of an interrupt context (where errors may be detected), and then,
> an hour later, decrement a counter (which is how we limit these to 
> 6 per hour). Thread reaping is "trivial", the thread just exits
> after an hour.

In addition, it should be a thread and not done from within keventd
because :

 - It can take a long time (well, relatively but still too long for a
work queue)

 - The driver callbacks might need to use keventd or do flush_workqueue
to synchronize with their own workqueues when doing an internal
recovery.

> Since these are events rare, I've no particular concern about
> performance or resource consumption. The current code seems 
> to work just fine. :-)

I think moving to kthread's is cleaner (just a wrapper around kernel
threads that simplify dealing with reaping them out mostly) and I agree
with Christoph that it would be nice to be able to "fire off" kthreads
from interrupt context.. in many cases, we abuse work queues for things
that should really done from kthreads instead (basically anything that
takes more than a couple hundred microsecs or so).

Ben.


_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers
Re: [PATCH] powerpc pseries eeh: Convert to kthread API [message #18506 is a reply to message #18505] Tue, 24 April 2007 17:24 Go to previous messageGo to next message
linas is currently offline  linas
Messages: 3
Registered: April 2007
Junior Member
On Tue, Apr 24, 2007 at 11:38:53AM +1000, Benjamin Herrenschmidt wrote:
> > The only reason for using threads here is to get the error recovery
> > out of an interrupt context (where errors may be detected), and then,
> > an hour later, decrement a counter (which is how we limit these to 
> > 6 per hour). Thread reaping is "trivial", the thread just exits
> > after an hour.
> 
> In addition, it should be a thread and not done from within keventd
> because :
> 
>  - It can take a long time (well, relatively but still too long for a
> work queue)

Uhh, 15 or 20 seconds even. That's a long time by any kernel standard.

>  - The driver callbacks might need to use keventd or do flush_workqueue
> to synchronize with their own workqueues when doing an internal
> recovery.
> 
> > Since these are events rare, I've no particular concern about
> > performance or resource consumption. The current code seems 
> > to work just fine. :-)
> 
> I think moving to kthread's is cleaner (just a wrapper around kernel
> threads that simplify dealing with reaping them out mostly) and I agree
> with Christoph that it would be nice to be able to "fire off" kthreads
> from interrupt context.. in many cases, we abuse work queues for things
> that should really done from kthreads instead (basically anything that
> takes more than a couple hundred microsecs or so).

It would be nice to have threads that can be "fired off" from an
interrupt context.  That would simplify the EEH code slightly 
(removing a few dozen lines of code that do this bounce).

I presume that various device drivers might find this useful as well.

--linas

_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers
Re: [PATCH] powerpc pseries eeh: Convert to kthread API [message #18507 is a reply to message #18468] Tue, 24 April 2007 17:35 Go to previous messageGo to next message
linas is currently offline  linas
Messages: 3
Registered: April 2007
Junior Member
On Thu, Apr 19, 2007 at 01:58:45AM -0600, Eric W. Biederman wrote:
> From: Eric W. Biederman <ebiederm@xmission.com>
> 
> This patch modifies the startup of eehd to use kthread_run
> not a combination of kernel_thread and daemonize.  Making
> the code slightly simpler and more maintainable.

For the patch that touched arch/powerpc/platforms/pseries/eeh_event.c,
I ran a variety of tests, and couldn't see/find/evoke any adverse
effects, so ..

Acked-by: Linas Vepstas <linas@austin.ibm.com>

> The second question is whether this is the right implementation.
> kthread_create already works by using a workqueue to create the thread
> and then waits for it.  If we really want to support creating threads
> asynchronously on demand we should have a proper API in kthread.c for
> this instead of spreading workqueues.

Yes, exactly; all I really want is to start a thread from an
interrupt context, and pass a structure to it.  This is pretty much
all that arch/powerpc/platforms/pseries/eeh_event.c is trying to do,
and little else.

--linas

_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers
Re: [PATCH] powerpc pseries eeh: Convert to kthread API [message #18508 is a reply to message #18496] Tue, 24 April 2007 02:42 Go to previous messageGo to next message
Benjamin Herrenschmid is currently offline  Benjamin Herrenschmid
Messages: 10
Registered: February 2006
Junior Member
On Mon, 2007-04-23 at 20:08 -0600, Eric W. Biederman wrote:
> Benjamin Herrenschmidt <benh@kernel.crashing.org> writes:
> 
> >> The only reason for using threads here is to get the error recovery
> >> out of an interrupt context (where errors may be detected), and then,
> >> an hour later, decrement a counter (which is how we limit these to 
> >> 6 per hour). Thread reaping is "trivial", the thread just exits
> >> after an hour.
> >
> > In addition, it should be a thread and not done from within keventd
> > because :
> >
> >  - It can take a long time (well, relatively but still too long for a
> > work queue)
> >
> >  - The driver callbacks might need to use keventd or do flush_workqueue
> > to synchronize with their own workqueues when doing an internal
> > recovery.
> >
> >> Since these are events rare, I've no particular concern about
> >> performance or resource consumption. The current code seems 
> >> to work just fine. :-)
> >
> > I think moving to kthread's is cleaner (just a wrapper around kernel
> > threads that simplify dealing with reaping them out mostly) and I agree
> > with Christoph that it would be nice to be able to "fire off" kthreads
> > from interrupt context.. in many cases, we abuse work queues for things
> > that should really done from kthreads instead (basically anything that
> > takes more than a couple hundred microsecs or so).
> 
> On that note does anyone have a problem is we manage the irq spawning
> safe kthreads the same way that we manage the work queue entries.
> 
> i.e. by a structure allocated by the caller?

Not sure... I can see places where I might want to spawn an arbitrary
number of these without having to preallocate structures... and if I
allocate on the fly, then I need a way to free that structure when the
kthread is reaped which I don't think we have currently, do we ? (In
fact, I could use that for other things too now that I'm thinking of
it ... I might have a go at providing optional kthread destructors).

Ben.


_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers
Re: [PATCH] kthread: Spontaneous exit support [message #18509 is a reply to message #18481] Tue, 24 April 2007 13:34 Go to previous messageGo to next message
Christoph Hellwig is currently offline  Christoph Hellwig
Messages: 59
Registered: April 2006
Member
On Tue, Apr 24, 2007 at 03:08:57PM +0200, Jan Engelhardt wrote:
> >I don't think having to parallel APIs is a good idea, people will
> >get utterly confused which one to use.  Better always grab a reference
> >in kthread_create and drop it in kthread_stop.  For normal thread
> >no change in behaviour and only slightly more code in the slowpath.
> 
> I *am* already confused... a driver of mine does:
> 
> static __init int thkd_init(void)
> {
> 	touch_task = kthread_run(touch_thread, Device, "thkd");
> 	...
> }
> 
> and
> 
> static __exit void thkd_exit(void)
> {
> 	kthread_stop(touch_task);
> 	/* I bet something is missing */
> }
> 
> now what good would kthread_run do me?

Please read the kerneldoc documentation for kthread_create
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers
Re: [PATCH] powerpc pseries eeh: Convert to kthread API [message #18510 is a reply to message #18499] Tue, 24 April 2007 04:34 Go to previous messageGo to next message
Paul Mackerras is currently offline  Paul Mackerras
Messages: 6
Registered: May 2006
Junior Member
Eric W. Biederman writes:

> Well the basic problem is that for any piece of code that can be modular
> we need a way to ensure all threads it has running are shutdown when we
> remove the module.

The EEH code can't be modular, and wouldn't make any sense to be
modular, since it's part of the infrastructure for accessing PCI
devices.

Paul.
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers
Re: Getting the new RxRPC patches upstream [message #18511 is a reply to message #18498] Tue, 24 April 2007 13:37 Go to previous messageGo to next message
David Howells is currently offline  David Howells
Messages: 44
Registered: October 2006
Member
Oleg Nesterov <oleg@tv-sign.ru> wrote:

> > > We only care when del_timer() returns true. In that case, if the timer
> > > function still runs (possible for single-threaded wqs), it has already
> > > passed __queue_work().
> > 
> > Why do you assume that?

Sorry, I should have been more clear.  I meant the assumption that we only
care about a true return from del_timer().

> If del_timer() returns true, the timer was pending. This means it was
> started by work->func() (note that __run_timers() clears timer_pending()
> before calling timer->function). This in turn means that
> delayed_work_timer_fn() has already called __queue_work(dwork), otherwise
> work->func() has no chance to run.

But if del_timer() returns 0, then there may be a problem.  We can't tell the
difference between the following two cases:

 (1) The timer hadn't been started.

 (2) The timer had been started, has expired and is no longer pending, but
     another CPU is running its handler routine.

try_to_del_timer_sync() _does_, however, distinguish between these cases: the
first is the 0 return, the second is the -1 return, and the case where it
dequeued the timer is the 1 return.

BTW, can a timer handler be preempted?  I assume not...  But it can be delayed
by interrupt processing.

David
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers
Re: [PATCH] powerpc pseries eeh: Convert to kthread API [message #18512 is a reply to message #18499] Tue, 24 April 2007 05:00 Go to previous messageGo to next message
Benjamin Herrenschmid is currently offline  Benjamin Herrenschmid
Messages: 10
Registered: February 2006
Junior Member
> Further in general it doesn't make sense to grab a module reference
> and call that sufficient because we would like to request that the
> module exits.

Which is, btw, I think a total misdesign of our module stuff, but heh, I
remember that lead to some flamewars back then...

Like anything else, modules should have separated the entrypoints for

 - Initiating a removal request
 - Releasing the module

The former is use did "rmmod", can unregister things from subsystems,
etc... (and can file if the driver decides to refuse removal requests
when it's busy doing things or whatever policy that module wants to
implement).

The later is called when all references to the modules have been
dropped, it's a bit like the kref "release" (and could be implemented as
one).

If we had done that (simple) thing back then, module refcounting would
have been much less of a problem... I remember some reasons why that was
veto'ed but I didn't and still don't agree.

Ben.


_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers
Re: [PATCH] kthread: Spontaneous exit support [message #18513 is a reply to message #18491] Tue, 24 April 2007 13:08 Go to previous messageGo to next message
Jan Engelhardt is currently offline  Jan Engelhardt
Messages: 18
Registered: August 2006
Junior Member
On Apr 23 2007 12:25, Christoph Hellwig wrote:
>On Sun, Apr 22, 2007 at 09:12:55PM -0600, Eric W. Biederman wrote:
>> 
>> This patch implements the kthread helper functions kthread_start
>> and kthread_end which make it simple to support a kernel thread
>> that may decided to exit on it's own before we request it to.
>> It is still assumed that eventually we will get around to requesting
>> that the kernel thread stop.
>
>I don't think having to parallel APIs is a good idea, people will
>get utterly confused which one to use.  Better always grab a reference
>in kthread_create and drop it in kthread_stop.  For normal thread
>no change in behaviour and only slightly more code in the slowpath.

I *am* already confused... a driver of mine does:

static __init int thkd_init(void)
{
	touch_task = kthread_run(touch_thread, Device, "thkd");
	...
}

and

static __exit void thkd_exit(void)
{
	kthread_stop(touch_task);
	/* I bet something is missing */
}

now what good would kthread_run do me?



Jan
-- 
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers
Re: [PATCH] powerpc pseries eeh: Convert to kthread API [message #18514 is a reply to message #18468] Tue, 24 April 2007 05:55 Go to previous messageGo to next message
Paul Mackerras is currently offline  Paul Mackerras
Messages: 6
Registered: May 2006
Junior Member
Christoph Hellwig writes:

> The first question is obviously, is this really something we want?
> spawning kernel thread on demand without reaping them properly seems
> quite dangerous.

What specifically has to be done to reap a kernel thread?  Are you
concerned about the number of threads, or about having zombies hanging
around?

Paul.
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers
Re: [PATCH] powerpc pseries eeh: Convert to kthread API [message #18515 is a reply to message #18502] Tue, 24 April 2007 05:58 Go to previous messageGo to next message
Benjamin Herrenschmid is currently offline  Benjamin Herrenschmid
Messages: 10
Registered: February 2006
Junior Member
> Since we need to have some way to track them having an explicit data
> structure that the callers manage seems to make sense.

Oh sure, I wasn't arguing against that at all...

It might be handy to have a release() callback (optional) that gets
called after the kthread stops/exits, once we know the data structure
isn't going to be used anymore (if practical to implement, depends on
your approach).

Ben.


_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers
Re: Getting the new RxRPC patches upstream [message #18516 is a reply to message #18371] Tue, 24 April 2007 15:51 Go to previous messageGo to next message
David Howells is currently offline  David Howells
Messages: 44
Registered: October 2006
Member
Oleg Nesterov <oleg@tv-sign.ru> wrote:

> Let's look at (2). cancel_delayed_work() (on top of del_timer()) returns 0,
> and this is correct, we failed to cancel the timer, and we don't know whether
> work->func() finished, or not.

Yes.

> The current code uses del_timer_sync(). It will also return 0. However, it
> will spin waiting for timer->function() to complete. So we are just wasting
> CPU.

That's my objection to using cancel_delayed_work() as it stands, although in
most cases it's a relatively minor waste of time.  However, if the timer
expiry routine gets interrupted then it may not be so minor...  So, yes, I'm
in full agreement with you there.

> I guess I misunderstood you. Perhaps, you propose a new helper which use
> try_to_del_timer_sync(), yes? Unless I missed something, this doesn't help.
> Because the return value == -1 should be treated as 0. We failed to stop
> the timer, and we can't free dwork.

Consider how I'm using try_to_cancel_delayed_work(): I use this when I want to
queue a delayed work item with a particular timeout (usually for immediate
processing), but the work item may already be pending.

If try_to_cancel_delayed_work() returns 0 or 1 (not pending or pending but
dequeued) then I can go ahead and just schedule the work item (I'll be holding
a lock to prevent anyone else from interfering).

However, if try_to_cancel_delayed_work() returns -1 then there's no usually no
point attempting to schedule the work item because I know the timer expiry
handler is doing that or going to do that.


The code looks like this in pretty much all cases:

	if (try_to_cancel_delayed_work(&afs_server_reaper) >= 0)
		schedule_delayed_work(&afs_server_reaper, 0);

And so could well be packaged into a convenience routine and placed in
workqueue.[ch].  However, this would still concern Dave Miller as my patches
would still be altering non-net stuff or depending on non-net patches he
doesn't have in his GIT tree.

Using cancel_delayer_work() instead would be acceptable, functionally, as that
just waits till the -1 return case no longer holds true, and so always returns
0 or 1.


In RxRPC, this is only used to cancel a pair global delayed work items in the
rmmod path, and so the inefficiency of cancel_delayed_work() is something I
can live with, though it's something I'd want to reduce in the longer term.

In AFS, this is not only used in object destruction paths, but is also used to
cancel the callback timer and initiate synchronisation processing with
immediate effect.

David
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers
Re: Getting the new RxRPC patches upstream [message #18517 is a reply to message #18371] Tue, 24 April 2007 16:58 Go to previous messageGo to next message
David Howells is currently offline  David Howells
Messages: 44
Registered: October 2006
Member
Oleg Nesterov <oleg@tv-sign.ru> wrote:

> > > The current code uses del_timer_sync(). It will also return 0. However,
> > > it will spin waiting for timer->function() to complete. So we are just
> > > wasting CPU.
> > 
> > That's my objection to using cancel_delayed_work() as it stands, although in
> > most cases it's a relatively minor waste of time.  However, if the timer
> > expiry routine gets interrupted then it may not be so minor...  So, yes, I'm
> > in full agreement with you there.
> 
> Great. I'll send the s/del_timer_sync/del_timer/ patch.

I didn't say I necessarily agreed that this was a good idea.  I just meant that
I agree that it will waste CPU.  You must still audit all uses of
cancel_delayed_work().

> Aha, now I see what you mean. However. Why the code above is better then
> 
> 	cancel_delayed_work(&afs_server_reaper);
> 	schedule_delayed_work(&afs_server_reaper, 0);
> 
> ? (I assume we already changed cancel_delayed_work() to use del_timer).

Because calling schedule_delayed_work() is a waste of CPU if the timer expiry
handler is currently running at this time as *that* is going to also schedule
the delayed work item.

> If delayed_work_timer_fn() is not running - both variants (let's denote them
> as 1 and 2) do the same.

Yes, but that's not the point.

> Now suppose that delayed_work_timer_fn() is running.
> 
> 	1: lock_timer_base(), return -1, skip schedule_delayed_work().
>
> 	2: check timer_pending(), return 0, call schedule_delayed_work(),
> 	   return immediately because test_and_set_bit(WORK_STRUCT_PENDING)
> 	   fails.

I don't see what you're illustrating here.  Are these meant to be two steps in
a single process?  Or are they two alternate steps?

> So I still don't think try_to_del_timer_sync() can help in this particular
> case.

It permits us to avoid the test_and_set_bit() under some circumstances.

> To some extent, try_to_cancel_delayed_work is
> 
> 	int try_to_cancel_delayed_work(dwork)
> 	{
> 		ret = cancel_delayed_work(dwork);
> 		if (!ret && work_pending(&dwork->work))
> 			ret = -1;
> 		return ret;
> 	}
> 
> iow, work_pending() looks like a more "precise" indication that work->func()
> is going to run soon.

Ah, but the timer routine may try to set the work item pending flag *after* the
work_pending() check you have here.  Furthermore, it would be better to avoid
the work_pending() check entirely because that check involves interacting with
atomic ops done on other CPUs.  try_to_del_timer_sync() returning -1 tells us
without a shadow of a doubt that the work item is either scheduled now or will
be scheduled very shortly, thus allowing us to avoid having to do it ourself.

David
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers
Re: Getting the new RxRPC patches upstream [message #18518 is a reply to message #18511] Tue, 24 April 2007 14:22 Go to previous messageGo to next message
Oleg Nesterov is currently offline  Oleg Nesterov
Messages: 143
Registered: August 2006
Senior Member
On 04/24, David Howells wrote:
>
> Oleg Nesterov <oleg@tv-sign.ru> wrote:
> 
> > > > We only care when del_timer() returns true. In that case, if the timer
> > > > function still runs (possible for single-threaded wqs), it has already
> > > > passed __queue_work().
> > > 
> > > Why do you assume that?
> 
> Sorry, I should have been more clear.  I meant the assumption that we only
> care about a true return from del_timer().
> 
> > If del_timer() returns true, the timer was pending. This means it was
> > started by work->func() (note that __run_timers() clears timer_pending()
> > before calling timer->function). This in turn means that
> > delayed_work_timer_fn() has already called __queue_work(dwork), otherwise
> > work->func() has no chance to run.
> 
> But if del_timer() returns 0, then there may be a problem.  We can't tell the
> difference between the following two cases:
> 
>  (1) The timer hadn't been started.
> 
>  (2) The timer had been started, has expired and is no longer pending, but
>      another CPU is running its handler routine.
> 
> try_to_del_timer_sync() _does_, however, distinguish between these cases: the
> first is the 0 return, the second is the -1 return, and the case where it
> dequeued the timer is the 1 return.

Of course, del_timer() and del_timer_sync() are different. What I meant the
latter buys nothing for cancel_delayed_work() (which in fact could be named
try_to_cancel_delayed_work()).

Let's look at (2). cancel_delayed_work() (on top of del_timer()) returns 0,
and this is correct, we failed to cancel the timer, and we don't know whether
work->func() finished, or not.

The current code uses del_timer_sync(). It will also return 0. However, it will
spin waiting for timer->function() to complete. So we are just wasting CPU.

I guess I misunderstood you. Perhaps, you propose a new helper which use
try_to_del_timer_sync(), yes? Unless I missed something, this doesn't help.
Because the return value == -1 should be treated as 0. We failed to stop
the timer, and we can't free dwork.

IOW, currently we should do:

	if (!cancel_delayed_work(dwork))
		cancel_work_sync(dwork));

The same if we use del_timer(). If we use try_to_del_timer_sync(),

	if (cancel_delayed_work(dwork) <= 0)
		cancel_work_sync(dwork));

(of course, dwork shouldn't re-arm itself).

Could you clarify if I misunderstood you again?

> BTW, can a timer handler be preempted?  I assume not...  But it can be delayed
> by interrupt processing.

No, it can't be preempted, it runs in softirq context.

Oleg.

_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers
Re: Getting the new RxRPC patches upstream [message #18520 is a reply to message #18371] Tue, 24 April 2007 18:22 Go to previous messageGo to next message
David Howells is currently offline  David Howells
Messages: 44
Registered: October 2006
Member
Oleg Nesterov <oleg@tv-sign.ru> wrote:

> Sure, I'll grep for cancel_delayed_work(). But unless I missed something,
> this change should be completely transparent for all users. Otherwise, it
> is buggy.

I guess you will have to make sure that cancel_delayed_work() is always
followed by a flush of the workqueue, otherwise you might get this situation:

	CPU 0				CPU 1
	===============================	=======================
					<timer expires>
	cancel_delayed_work(x) == 0	-->delayed_work_timer_fn(x)
	kfree(x);			-->do_IRQ()
	y = kmalloc(); // reuses x
					<--do_IRQ()
					__queue_work(x)
	--- OOPS ---

That's my main concern.  If you are certain that can't happen, then fair
enough.

Note that although you can call cancel_delayed_work() from within a work item
handler, you can't then follow it up with a flush as it's very likely to
deadlock.

> > Because calling schedule_delayed_work() is a waste of CPU if the timer
> > expiry handler is currently running at this time as *that* is going to
> > also schedule the delayed work item.
> 
> Yes. But otoh, try_to_del_timer_sync() is a waste of CPU compared to
> del_timer(), when the timer is not pending.

I suppose that's true.  As previously stated, my main objection to del_timer()
is the fact that it doesn't tell you if the timer expiry function is still
running.

Can you show me a patch illustrating exactly how you want to change
cancel_delayed_work()?  I can't remember whether you've done so already, but
if you have, I can't find it.  Is it basically this?:

 static inline int cancel_delayed_work(struct delayed_work *work)
 {
 	int ret;

-	ret = del_timer_sync(&work->timer);
+	ret = del_timer(&work->timer);
 	if (ret)
 		work_release(&work->work);
 	return ret;
 }

I was thinking this situation might be a problem:

	CPU 0				CPU 1
	===============================	=======================
					<timer expires>
	cancel_delayed_work(x) == 0	-->delayed_work_timer_fn(x)
	schedule_delayed_work(x,0)	-->do_IRQ()
	<keventd scheduled>
	x->work()
					<--do_IRQ()
					__queue_work(x)

But it won't, will it?

> > Ah, but the timer routine may try to set the work item pending flag
> > *after* the work_pending() check you have here.
> 
> No, delayed_work_timer_fn() doesn't set the _PENDING flag.

Good point.  I don't think that's a problem because cancel_delayed_work()
won't clear the pending flag if it didn't remove a timer.

> First, this is very unlikely event, delayed_work_timer_fn() is very fast
> unless interrupted.

Yeah, I guess so.


Okay, you've convinced me, I think - provided you consider the case I
outlinded at the top of this email.

If you give me a patch to alter cancel_delayed_work(), I'll substitute it for
mine and use that that instead.  Dave Miller will just have to live with that
patch being there:-)

David
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers
Re: Getting the new RxRPC patches upstream [message #18521 is a reply to message #18516] Tue, 24 April 2007 16:40 Go to previous messageGo to next message
Oleg Nesterov is currently offline  Oleg Nesterov
Messages: 143
Registered: August 2006
Senior Member
On 04/24, David Howells wrote:
>
> Oleg Nesterov <oleg@tv-sign.ru> wrote:
> 
> > The current code uses del_timer_sync(). It will also return 0. However, it
> > will spin waiting for timer->function() to complete. So we are just wasting
> > CPU.
> 
> That's my objection to using cancel_delayed_work() as it stands, although in
> most cases it's a relatively minor waste of time.  However, if the timer
> expiry routine gets interrupted then it may not be so minor...  So, yes, I'm
> in full agreement with you there.

Great. I'll send the s/del_timer_sync/del_timer/ patch.

> > I guess I misunderstood you. Perhaps, you propose a new helper which use
> > try_to_del_timer_sync(), yes? Unless I missed something, this doesn't help.
> > Because the return value == -1 should be treated as 0. We failed to stop
> > the timer, and we can't free dwork.
> 
> Consider how I'm using try_to_cancel_delayed_work(): I use this when I want to
> queue a delayed work item with a particular timeout (usually for immediate
> processing), but the work item may already be pending.
> 
> If try_to_cancel_delayed_work() returns 0 or 1 (not pending or pending but
> dequeued) then I can go ahead and just schedule the work item (I'll be holding
> a lock to prevent anyone else from interfering).
> 
> However, if try_to_cancel_delayed_work() returns -1 then there's no usually no
> point attempting to schedule the work item because I know the timer expiry
> handler is doing that or going to do that.
> 
> 
> The code looks like this in pretty much all cases:
> 
> 	if (try_to_cancel_delayed_work(&afs_server_reaper) >= 0)
> 		schedule_delayed_work(&afs_server_reaper, 0);

Aha, now I see what you mean. However. Why the code above is better then

	cancel_delayed_work(&afs_server_reaper);
	schedule_delayed_work(&afs_server_reaper, 0);

? (I assume we already changed cancel_delayed_work() to use del_timer).

If delayed_work_timer_fn() is not running - both variants (let's denote them
as 1 and 2) do the same.

Now suppose that delayed_work_timer_fn() is running.

	1: lock_timer_base(), return -1, skip schedule_delayed_work().

	2: check timer_pending(), return 0, call schedule_delayed_work(),
	   return immediately because test_and_set_bit(WORK_STRUCT_PENDING)
	   fails.

So I still don't think try_to_del_timer_sync() can help in this particular
case.

To some extent, try_to_cancel_delayed_work is

	int try_to_cancel_delayed_work(dwork)
	{
		ret = cancel_delayed_work(dwork);
		if (!ret && work_pending(&dwork->work))
			ret = -1;
		return ret;
	}

iow, work_pending() looks like a more "precise" indication that work->func()
is going to run soon.

Oleg.

_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers
Re: Getting the new RxRPC patches upstream [message #18522 is a reply to message #18517] Tue, 24 April 2007 17:33 Go to previous messageGo to next message
Oleg Nesterov is currently offline  Oleg Nesterov
Messages: 143
Registered: August 2006
Senior Member
On 04/24, David Howells wrote:
>
> Oleg Nesterov <oleg@tv-sign.ru> wrote:
> 
> > Great. I'll send the s/del_timer_sync/del_timer/ patch.
> 
> I didn't say I necessarily agreed that this was a good idea.  I just meant that
> I agree that it will waste CPU.  You must still audit all uses of
> cancel_delayed_work().

Sure, I'll grep for cancel_delayed_work(). But unless I missed something,
this change should be completely transparent for all users. Otherwise, it
is buggy.

> > Aha, now I see what you mean. However. Why the code above is better then
> > 
> > 	cancel_delayed_work(&afs_server_reaper);
> > 	schedule_delayed_work(&afs_server_reaper, 0);
> > 
> > ? (I assume we already changed cancel_delayed_work() to use del_timer).
> 
> Because calling schedule_delayed_work() is a waste of CPU if the timer expiry
> handler is currently running at this time as *that* is going to also schedule
> the delayed work item.

Yes. But otoh, try_to_del_timer_sync() is a waste of CPU compared to del_timer(),
when the timer is not pending.

> > 	1: lock_timer_base(), return -1, skip schedule_delayed_work().
> >
> > 	2: check timer_pending(), return 0, call schedule_delayed_work(),
> > 	   return immediately because test_and_set_bit(WORK_STRUCT_PENDING)
> > 	   fails.
> 
> I don't see what you're illustrating here.  Are these meant to be two steps in
> a single process?  Or are they two alternate steps?

two alternate steps.

1 means
	if (try_to_cancel_delayed_work())
		schedule_delayed_work();

2 means
	cancel_delayed_work();
	schedule_delayed_work();

> > So I still don't think try_to_del_timer_sync() can help in this particular
> > case.
> 
> It permits us to avoid the test_and_set_bit() under some circumstances.

Yes. But lock_timer_base() is more costly.

> > To some extent, try_to_cancel_delayed_work is
> > 
> > 	int try_to_cancel_delayed_work(dwork)
> > 	{
> > 		ret = cancel_delayed_work(dwork);
> > 		if (!ret && work_pending(&dwork->work))
> > 			ret = -1;
> > 		return ret;
> > 	}
> > 
> > iow, work_pending() looks like a more "precise" indication that work->func()
> > is going to run soon.
> 
> Ah, but the timer routine may try to set the work item pending flag *after* the
> work_pending() check you have here.

No, delayed_work_timer_fn() doesn't set the _PENDING flag.

>                                      Furthermore, it would be better to avoid
> the work_pending() check entirely because that check involves interacting with
> atomic ops done on other CPUs.

Sure, the implementation of try_to_cancel_delayed_work() above is just for
illustration. I don't think we need try_to_cancel_delayed_work() at all.

>                                try_to_del_timer_sync() returning -1 tells us
> without a shadow of a doubt that the work item is either scheduled now or will
> be scheduled very shortly, thus allowing us to avoid having to do it ourself.

First, this is very unlikely event, delayed_work_timer_fn() is very fast unless
interrupted.

_PENDING flag won't be cleared until this work is executed by run_workqueue().
In generak, work_pending() after del_timer() is imho better way to avoid the
unneeded schedule_delayed_work().

But again, I can't undertand the win for that particular case.

Oleg.

_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers
Re: Getting the new RxRPC patches upstream [message #18523 is a reply to message #18520] Tue, 24 April 2007 19:34 Go to previous messageGo to next message
Oleg Nesterov is currently offline  Oleg Nesterov
Messages: 143
Registered: August 2006
Senior Member
On 04/24, David Howells wrote:
>
> Oleg Nesterov <oleg@tv-sign.ru> wrote:
> 
> > Sure, I'll grep for cancel_delayed_work(). But unless I missed something,
> > this change should be completely transparent for all users. Otherwise, it
> > is buggy.
> 
> I guess you will have to make sure that cancel_delayed_work() is always
> followed by a flush of the workqueue, otherwise you might get this situation:
> 
> 	CPU 0				CPU 1
> 	===============================	=======================
> 					<timer expires>
> 	cancel_delayed_work(x) == 0	-->delayed_work_timer_fn(x)
> 	kfree(x);			-->do_IRQ()
> 	y = kmalloc(); // reuses x
> 					<--do_IRQ()
> 					__queue_work(x)
> 	--- OOPS ---
> 
> That's my main concern.  If you are certain that can't happen, then fair
> enough.

Yes sure. Note that this is documented:

	/*
	 * Kill off a pending schedule_delayed_work().  Note that the work callback
	 * function may still be running on return from cancel_delayed_work().  Run
	 * flush_workqueue() or cancel_work_sync() to wait on it.
	 */

This comment is not very precise though. If the work doesn't re-arm itself,
we need cancel_work_sync() only if cancel_delayed_work() returns 0.

So there is no difference with the proposed change. Except, return value == 0
means:

	currently (del_timer_sync): callback may still be running or scheduled

	with del_timer: may still be running, or scheduled, or will be scheduled
	right now.

However, this is the same from the caller POV.

> Can you show me a patch illustrating exactly how you want to change
> cancel_delayed_work()?  I can't remember whether you've done so already, but
> if you have, I can't find it.  Is it basically this?:
> 
>  static inline int cancel_delayed_work(struct delayed_work *work)
>  {
>  	int ret;
> 
> -	ret = del_timer_sync(&work->timer);
> +	ret = del_timer(&work->timer);
>  	if (ret)
>  		work_release(&work->work);
>  	return ret;
>  }

Yes, exactly. The patch is trivial, but I need some time to write the
understandable changelog...

> I was thinking this situation might be a problem:
> 
> 	CPU 0				CPU 1
> 	===============================	=======================
> 					<timer expires>
> 	cancel_delayed_work(x) == 0	-->delayed_work_timer_fn(x)
> 	schedule_delayed_work(x,0)	-->do_IRQ()
> 	<keventd scheduled>
> 	x->work()
> 					<--do_IRQ()
> 					__queue_work(x)
> 
> But it won't, will it?

Yes, I think this should be OK. schedule_delayed_work() will notice
_PENDING and abort, so the last "x->work()" doesn't happen.

What can happen is

					<timer expires>
	cancel_delayed_work(x) == 0
					-->delayed_work_timer_fn(x)
					__queue_work(x)
					<keventd scheduled>
					x->work()
	schedule_delayed_work(x,0)
	<the work is scheduled again>

, so we can have an "unneeded schedule", but this is very unlikely.

Oleg.

_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers
Re: Getting the new RxRPC patches upstream [message #18525 is a reply to message #18523] Wed, 25 April 2007 08:10 Go to previous messageGo to next message
David Howells is currently offline  David Howells
Messages: 44
Registered: October 2006
Member
Oleg Nesterov <oleg@tv-sign.ru> wrote:

> Yes sure. Note that this is documented:
> 
> 	/*
> 	 * Kill off a pending schedule_delayed_work().  Note that the work callback
> 	 * function may still be running on return from cancel_delayed_work().  Run
> 	 * flush_workqueue() or cancel_work_sync() to wait on it.
> 	 */

No, it isn't documented.  It says that the *work* callback may be running, but
does not mention the timer callback.  However, just looking at the
cancellation function source made it clear that this would wait for the timer
handler to return first.


However, is it worth just making cancel_delayed_work() a void function and not
returning anything?  I'm not sure the return value is very useful.

David
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers
Re: Getting the new RxRPC patches upstream [message #18531 is a reply to message #18371] Wed, 25 April 2007 10:45 Go to previous messageGo to next message
David Howells is currently offline  David Howells
Messages: 44
Registered: October 2006
Member
Oleg Nesterov <oleg@tv-sign.ru> wrote:

> Ah yes, it says nothing about what the returned value means...

Yeah...  If you could amend that as part of your patch, that'd be great.

David
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers
Re: Getting the new RxRPC patches upstream [message #18539 is a reply to message #18525] Wed, 25 April 2007 10:41 Go to previous messageGo to next message
Oleg Nesterov is currently offline  Oleg Nesterov
Messages: 143
Registered: August 2006
Senior Member
On 04/25, David Howells wrote:
>
> Oleg Nesterov <oleg@tv-sign.ru> wrote:
> 
> > Yes sure. Note that this is documented:
> > 
> > 	/*
> > 	 * Kill off a pending schedule_delayed_work().  Note that the work callback
> > 	 * function may still be running on return from cancel_delayed_work().  Run
> > 	 * flush_workqueue() or cancel_work_sync() to wait on it.
> > 	 */
> 
> No, it isn't documented.  It says that the *work* callback may be running, but
> does not mention the timer callback.  However, just looking at the
> cancellation function source made it clear that this would wait for the timer
> handler to return first.

Ah yes, it says nothing about what the returned value means...

> However, is it worth just making cancel_delayed_work() a void function and not
> returning anything?  I'm not sure the return value is very useful.

cancel_rearming_delayed_work() needs this, tty_io.c, probably somebody else.

Oleg.

_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers
Re: Getting the new RxRPC patches upstream [message #18540 is a reply to message #18394] Wed, 25 April 2007 13:48 Go to previous messageGo to next message
David Howells is currently offline  David Howells
Messages: 44
Registered: October 2006
Member
David Miller <davem@davemloft.net> wrote:

> Is it possible for your changes to be purely networking
> and not need those changes outside of the networking?

See my latest patchset release.  I've reduced the dependencies on
non-networking changes to:

 (1) Oleg Nesterov's patch to change cancel_delayed_work() to use del_timer()
     rather than del_timer_sync() [patch 02/16].

     This patch can be discarded without compilation failure at the expense of
     making AFS slightly less efficient. It also makes AF_RXRPC slightly less
     efficient, but only in the rmmod path.

 (2) A symbol export in the keyring stuff plus a proliferation of the types
     available in the struct key::type_data union [patch 03/16].  This does
     not conflict with any other patches that I know about.

 (3) A symbol export in the timer stuff [patch 04/16].

Everything else that remains after the reduction is confined to the AF_RXRPC
or AFS code, save for a couple of networking patches in my patchset that you
already have and I just need to make the thing compile.

I'm not sure that I can make the AF_RXRPC patches totally independent of the
AFS patches as the two sets need to interleave since the last AF_RXRPC patch
deletes the old RxRPC code - which the old AFS code depends on.

David
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers
Re: [PATCH] ia64 sn xpc: Convert to use kthread API. [message #18547 is a reply to message #18331] Thu, 26 April 2007 20:00 Go to previous messageGo to next message
Dean Nelson is currently offline  Dean Nelson
Messages: 6
Registered: April 2007
Junior Member
On Thu, Apr 19, 2007 at 01:58:44AM -0600, Eric W. Biederman wrote:
> From: Eric W. Biederman <ebiederm@xmission.com>
> 
> This patch starts the xpc kernel threads using kthread_run
> not a combination of kernel_thread and daemonize.  Resuling
> in slightly simpler and more maintainable code.
> 
> Cc: Jes Sorensen <jes@sgi.com>
> Cc: Tony Luck <tony.luck@intel.com>
> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
> ---
>  arch/ia64/sn/kernel/xpc_main.c |   31 +++++++++++++------------------
>  1 files changed, 13 insertions(+), 18 deletions(-)

Acked-by: Dean Nelson <dcn@sgi.com>

Andrew, I've tested Eric's patch in conjunction with a fix from Christoph
Lameter, which you've already got (it cleaned up a couple of compiler errors),
and the following patch that I'd like added (it cleans up a couple of
compiler warning errors and makes a few cosmetic changes).

Thanks,
Dean


Signed-off-by: Dean Nelson <dcn@sgi.com>

Index: mm-tree/arch/ia64/sn/kernel/xpc_main.c
===================================================================
--- mm-tree.orig/arch/ia64/sn/kernel/xpc_main.c	2007-04-25 14:04:51.701213426 -0500
+++ mm-tree/arch/ia64/sn/kernel/xpc_main.c	2007-04-26 06:29:02.447330438 -0500
@@ -568,7 +568,6 @@
 
 	task = kthread_run(xpc_activating, (void *) ((u64) partid),
 			  "xpc%02d", partid);
-
 	if (unlikely(IS_ERR(task))) {
 		spin_lock_irqsave(&part->act_lock, irq_flags);
 		part->act_state = XPC_P_INACTIVE;
@@ -808,7 +807,6 @@
 			int ignore_disconnecting)
 {
 	unsigned long irq_flags;
-	pid_t pid;
 	u64 args = XPC_PACK_ARGS(ch->partid, ch->number);
 	struct xpc_partition *part = &xpc_partitions[ch->partid];
 	struct task_struct *task;
@@ -840,7 +838,7 @@
 		(void) xpc_part_ref(part);
 		xpc_msgqueue_ref(ch);
 
-		task = kthread_run(xpc_daemonize_kthread, args,
+		task = kthread_run(xpc_daemonize_kthread, (void *) args,
 				   "xpc%02dc%d", ch->partid, ch->number);
 		if (IS_ERR(task)) {
 			/* the fork failed */
@@ -1381,7 +1379,8 @@
 	 * activate based on info provided by SAL. This new thread is short
 	 * lived and will exit once discovery is complete.
 	 */
-	task = kthread_run(xpc_initiate_discovery, NULL, XPC_DISCOVERY_THREAD_NAME);
+	task = kthread_run(xpc_initiate_discovery, NULL,
+			   XPC_DISCOVERY_THREAD_NAME);
 	if (IS_ERR(task)) {
 		dev_err(xpc_part, "failed while forking discovery thread\n");
 
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers
Re: [PATCH] ia64 sn xpc: Convert to use kthread API. [message #18556 is a reply to message #18294] Fri, 27 April 2007 18:34 Go to previous messageGo to next message
ebiederm is currently offline  ebiederm
Messages: 1354
Registered: February 2006
Senior Member
Dean Nelson <dcn@sgi.com> writes:
>
> Christoph is correct in that XPC has a single thread that exists throughout
> its lifetime, another set of threads that exist for the time that active
> contact with other XPCs running on other SGI system partitions exists, and
> finally there is a pool of threads that exist on an as needed basis once
> a channel connection has been established between two partitions.
>
> In principle I approve of the kthread API and its use as opposed to what
> XPC currently does (calls kernel_thread(), daemonize(), wait_for_completion(),
> and complete()). So Christoph's patch that changes the single long-lived
> thread to use kthread_stop() and kthread_should_stop() is appreciated.
>
> But the fact that another thread, started at the xpc_init() time, that does
> discovery of other SGI system partitions wasn't converted points out a
> weekness in either my thinking or the kthread API. This discovery thread
> does its job and then exits. Should XPC be rmmod'd while the discovery
> thread is still running we would need to do a kthread_stop() against it. 
> But kthread_stop() isn't set up to deal with a task that has already exited.
> And if what once was the task structure of this exited task has been
> reallocated to another new task, we'd end up stopping it should it be
> operating under the kthread API, or possibly waiting a very long time
> for it to exit if it is not.

Patches are currently under development to allow kthreads to exit
before kthread_stop is called.  The big thing is that once we allow
kernel threads that exited by themselves to be reaped by kthread_stop
we have some significant work to do.

> I'm also a little uneasy that kthread_stop() has an "only one thread can
> stop another thread at a time" design. It's a potential bottleneck on
> very large systems where threads are blocked and unable to respond to a
> kthread_should_stop() for some period of time.

There are already patches out there to fix this issue.

> XPC is in need of threads that can block indefinitely, which is why XPC
> is in the business of maintaining a pool of threads. Currently there is
> no such capability (that I know of) that is provided by linux. Workqueues
> can't block indefinitely.

I'm not certain I understand this requirement.  Do you mean block indefinitely
unless requested to stop?

> And for performance reasons these threads need to be able to be created
> quickly. These threads are involved in delivering messages to XPC's users
> (like XPNET) and we had latency issues that led us to use kernel_thread()
> directly instead of the kthread API. Additionally, XPC may need to have
> hundreds of these threads active at any given time.

Ugh.  Can you tell me a little more about the latency issues?

Is having a non-halting kthread_create enough to fix this?
So you don't have to context switch several times to get the
thread running?

Or do you need more severe latency reductions?

The more severe fix would require some significant changes to copy_process
and every architecture would need to be touched to fix up copy_thread.
It is possible, it is a lot of work, and the reward is far from obvious.

> I think it would be great if the kthread API (or underlying implementation)
> could be changed to handle these issues. I'd love for XPC to not have to
> maintain this sort of thing itself.

Currently daemonize is a serious maintenance problem.

Using daemonize and kernel_thread to create kernel threads is a blocker
in implementing the pid namespace because of their use of pid_t.

So I am motivated to get this fixed.

Eric
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers
Re: [PATCH] ia64 sn xpc: Convert to use kthread API. [message #18557 is a reply to message #18485] Fri, 27 April 2007 17:41 Go to previous messageGo to next message
Dean Nelson is currently offline  Dean Nelson
Messages: 6
Registered: April 2007
Junior Member
On Thu, Apr 19, 2007 at 04:51:03PM -0700, Andrew Morton wrote:
> Another driver which should be fully converted to the kthread API:
> kthread_stop() and kthread_should_stop().
> 
> And according to my logs, this driver was added to the tree more than
> a year _after_ the kthread interface was made available.
> 
> This isn't good.

On Sun, Apr 22, 2007 at 09:36:47PM +0100, Christoph Hellwig wrote:
> On Thu, Apr 19, 2007 at 01:58:44AM -0600, Eric W. Biederman wrote:
> > From: Eric W. Biederman <ebiederm@xmission.com>
> >
> > This patch starts the xpc kernel threads using kthread_run
> > not a combination of kernel_thread and daemonize.  Resuling
> > in slightly simpler and more maintainable code.
>
> This driver is a really twisted maze.  It has a lot of threads,
> some of them running through the whole lifetime of the driver,
> some short-lived and some in a sort of a pool.
>
> The patch below fixes up the long-lived thread as well as fixing
> gazillions of leaks in the init routine by switching to proper
> goto-based unwinding.

I see that the setting of 'xpc_rsvd_page->vars_pa = 0;' in xpc_init() is
considered a leak by Christoph (hch), but it really is not. If you look at
xpc_rsvd_page_init() where it is set up, you might see that the reserved page
is something XPC gets from SAL who created it at system boot time. If XPC is
rmmod'd and insmod'd again, it will be handed the same page of memory by SAL.
So there is no memory leak here.

As for the other suspected leaks mentioned I'm not sure what they could be.
It may be the fact that XPC continues coming up when presented with error
returns from register_reboot_notifier() and register_die_notifier(). There
is no leak in this, just simply a loss of an early notification to other
SGI system partitions that this partition is going down. A fact they will
sooner or later discover on their own. A notice of this degraded
functionality is written to the console. And the likelyhood that these
functions should ever return an error is very, very small (currently,
neither of them has an error return).

>From my experience the goto-based unwinding of error returns is not necessarily
a superior approach. I spent a month tracking down a difficult to reproduce
problem that ended up being an error return jumping to the wrong label in a
goto-based unwind. Problems can arise with either approach. I've also seen
the compiler generate less code for the non-goto approach. I'm not a compiler
person so I can't explain this, nor can I say that it's always the case, but
at one time when I did a bake off between the two approaches the non-goto
approach generated less code.

Having said this I have no problem with switching to a goto-based unwinding
of errors if that is what the community prefers. I personally find it more
readable than the non-goto approach.

> Note that thread pools are something we have in a few places,
> and might be worth handling in the core kthread infrastructure,
> as tearing down pools will get a bit complicated using the
> kthread APIs.

Christoph is correct in that XPC has a single thread that exists throughout
its lifetime, another set of threads that exist for the time that active
contact with other XPCs running on other SGI system partitions exists, and
finally there is a pool of threads that exist on an as needed basis once
a channel connection has been established between two partitions.

In principle I approve of the kthread API and its use as opposed to what
XPC currently does (calls kernel_thread(), daemonize(), wait_for_completion(),
and complete()). So Christoph's patch that changes the single long-lived
thread to use kthread_stop() and kthread_should_stop() is appreciated.

But the fact that another thread, started at the xpc_init() time, that does
discovery of other SGI system partitions wasn't converted points out a
weekness in either my thinking or the kthread API. This discovery thread
does its job and then exits. Should XPC be rmmod'd while the discovery
thread is still running we would need to do a kthread_stop() against it. 
But kthread_stop() isn't set up to deal with a task that has already exited.
And if what once was the task structure of this exited task has been
reallocated to another new task, we'd end up stopping it should it be
operating under the kthread API, or possibly waiting a very long time
for it to exit if it is not.

I'm also a little uneasy that kthread_stop() has an "only one thread can
stop another thread at a time" design. It's a potential bottleneck on
very large systems where threads are blocked and unable to respond to a
kthread_should_stop() for some period of time.

XPC is in need of threads that can block indefinitely, which is why XPC
is in the business of maintaining a pool of threads. Currently there is
no such capability (that I know of) that is provided by linux. Workqueues
can't block indefinitely.

And for performance reasons these threads need to be able to be created
quickly. These threads are involved in delivering messages to XPC's users
(like XPNET) and we had latency issues that led us to use kernel_thread()
directly instead of the kthread API. Additionally, XPC may need to have
hundreds of these threads active at any given time.

I think it would be great if the kthread API (or underlying implementation)
could be changed to handle these issues. I'd love for XPC to not have to
maintain this sort of thing itself.

Dean
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers
Re: [PATCH] ia64 sn xpc: Convert to use kthread API. [message #18569 is a reply to message #18294] Fri, 27 April 2007 20:33 Go to previous messageGo to next message
ebiederm is currently offline  ebiederm
Messages: 1354
Registered: February 2006
Senior Member
Dean Nelson <dcn@sgi.com> writes:

> On Fri, Apr 27, 2007 at 12:34:02PM -0600, Eric W. Biederman wrote:
>> Dean Nelson <dcn@sgi.com> writes:
>> >
>> > Christoph is correct in that XPC has a single thread that exists throughout
>> > its lifetime, another set of threads that exist for the time that active
>> > contact with other XPCs running on other SGI system partitions exists, and
>> > finally there is a pool of threads that exist on an as needed basis once
>> > a channel connection has been established between two partitions.
>> >
>> > In principle I approve of the kthread API and its use as opposed to what
>> > XPC currently does (calls kernel_thread(), daemonize(),
> wait_for_completion(),
>> > and complete()). So Christoph's patch that changes the single long-lived
>> > thread to use kthread_stop() and kthread_should_stop() is appreciated.
>> >
>> > But the fact that another thread, started at the xpc_init() time, that does
>> > discovery of other SGI system partitions wasn't converted points out a
>> > weekness in either my thinking or the kthread API. This discovery thread
>> > does its job and then exits. Should XPC be rmmod'd while the discovery
>> > thread is still running we would need to do a kthread_stop() against it. 
>> > But kthread_stop() isn't set up to deal with a task that has already exited.
>> > And if what once was the task structure of this exited task has been
>> > reallocated to another new task, we'd end up stopping it should it be
>> > operating under the kthread API, or possibly waiting a very long time
>> > for it to exit if it is not.
>> 
>> Patches are currently under development to allow kthreads to exit
>> before kthread_stop is called.  The big thing is that once we allow
>> kernel threads that exited by themselves to be reaped by kthread_stop
>> we have some significant work to do.
>> 
>> > I'm also a little uneasy that kthread_stop() has an "only one thread can
>> > stop another thread at a time" design. It's a potential bottleneck on
>> > very large systems where threads are blocked and unable to respond to a
>> > kthread_should_stop() for some period of time.
>> 
>> There are already patches out there to fix this issue.
>> 
>> > XPC is in need of threads that can block indefinitely, which is why XPC
>> > is in the business of maintaining a pool of threads. Currently there is
>> > no such capability (that I know of) that is provided by linux. Workqueues
>> > can't block indefinitely.
>> 
>> I'm not certain I understand this requirement.  Do you mean block indefinitely
>> unless requested to stop?
>
> These threads can block waiting for a hardware DMA engine, which has a 28
> second timeout setpoint.

Ok.  So this is an interruptible sleep?
Do you have any problems being woken up out of that interruptible sleep
by kthread_stop?

I am in the process of modifying kthread_stop to wake up thread in an
interruptible sleep and set signal_pending, so they will break out.

>> > And for performance reasons these threads need to be able to be created
>> > quickly. These threads are involved in delivering messages to XPC's users
>> > (like XPNET) and we had latency issues that led us to use kernel_thread()
>> > directly instead of the kthread API. Additionally, XPC may need to have
>> > hundreds of these threads active at any given time.
>> 
>> Ugh.  Can you tell me a little more about the latency issues?
>
> After placing a message in a local message queue, one SGI system partition
> will interrupt another to retrieve the message. We need to minimize the
> time from entering XPC's interrupt handler to the time that the message
> can be DMA transferred and delivered to the consumer (like XPNET) to
> whom it was sent.
>
>> Is having a non-halting kthread_create enough to fix this?
>> So you don't have to context switch several times to get the
>> thread running?
>> 
>> Or do you need more severe latency reductions?
>> 
>> The more severe fix would require some significant changes to copy_process
>> and every architecture would need to be touched to fix up copy_thread.
>> It is possible, it is a lot of work, and the reward is far from obvious.
>
> I think a non-halting kthread_create() should be sufficient. It is in
> effect what XPC has now in calling kernel_thread() directly.

A little different but pretty close.

We call kthread_create() it prepares everything and places it on
a queue and wakes up kthreadd.

kthreadd then wakes up and forks the thread.

After the thread has finishing setting up it will call complete on
a completion so kthread_create can continue on it's merry way
but it should not need to go to sleep waiting for someone to
call kthread_bind.

But if you can live with what I have just described that will
be easy to code up.

It is a little slower then kernel_thread but hopefully not much.

> Taking it one step further, if you added the notion of a thread pool,
> where upon exit, a thread isn't destroyed but rather is queued ready to
> handle the next kthread_create_quick() request.

That might happen.  So far I am avoiding the notion of a thread pool for
as long as I can.  There is some sense in it, especially in generalizing
the svc thread pool code from nfs.  But if I don't have to go there I would
prefer it.

>> > I think it would be great if the kthread API (or underlying implementation)
>> > could be changed to handle these issues. I'd love for XPC to not have to
>> > maintain this sort of thing itself.
>> 
>> Currently daemonize is a serious maintenance problem.
>> 
>> Using daemonize and kernel_thread to create kernel threads is a blocker
>> in implementing the pid namespace because of their use of pid_t.
>> 
>> So I am motivated to get this fixed.
>
> This would also address the problems we see with huge pid spaces for
> kernel threads on our largest machines.  In the example from last week,
> we had 10 threads each on 4096 cpus.  If we reworked work_queues to use
> the kthread_create_nonblocking() thread pool, we could probably collapse
> the need for having all of those per-task, per-cpu work queues.

Patches have already been sent (and I don't think found problems with)
that make kthreadd pid == 2, and they also modify daemonize to reparent
to kthreadd instead of init.

Eric
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers
Re: [PATCH] ia64 sn xpc: Convert to use kthread API. [message #18570 is a reply to message #18556] Fri, 27 April 2007 20:12 Go to previous messageGo to next message
Dean Nelson is currently offline  Dean Nelson
Messages: 6
Registered: April 2007
Junior Member
On Fri, Apr 27, 2007 at 12:34:02PM -0600, Eric W. Biederman wrote:
> Dean Nelson <dcn@sgi.com> writes:
> >
> > Christoph is correct in that XPC has a single thread that exists throughout
> > its lifetime, another set of threads that exist for the time that active
> > contact with other XPCs running on other SGI system partitions exists, and
> > finally there is a pool of threads that exist on an as needed basis once
> > a channel connection has been established between two partitions.
> >
> > In principle I approve of the kthread API and its use as opposed to what
> > XPC currently does (calls kernel_thread(), daemonize(), wait_for_completion(),
> > and complete()). So Christoph's patch that changes the single long-lived
> > thread to use kthread_stop() and kthread_should_stop() is appreciated.
> >
> > But the fact that another thread, started at the xpc_init() time, that does
> > discovery of other SGI system partitions wasn't converted points out a
> > weekness in either my thinking or the kthread API. This discovery thread
> > does its job and then exits. Should XPC be rmmod'd while the discovery
> > thread is still running we would need to do a kthread_stop() against it. 
> > But kthread_stop() isn't set up to deal with a task that has already exited.
> > And if what once was the task structure of this exited task has been
> > reallocated to another new task, we'd end up stopping it should it be
> > operating under the kthread API, or possibly waiting a very long time
> > for it to exit if it is not.
> 
> Patches are currently under development to allow kthreads to exit
> before kthread_stop is called.  The big thing is that once we allow
> kernel threads that exited by themselves to be reaped by kthread_stop
> we have some significant work to do.
> 
> > I'm also a little uneasy that kthread_stop() has an "only one thread can
> > stop another thread at a time" design. It's a potential bottleneck on
> > very large systems where threads are blocked and unable to respond to a
> > kthread_should_stop() for some period of time.
> 
> There are already patches out there to fix this issue.
> 
> > XPC is in need of threads that can block indefinitely, which is why XPC
> > is in the business of maintaining a pool of threads. Currently there is
> > no such capability (that I know of) that is provided by linux. Workqueues
> > can't block indefinitely.
> 
> I'm not certain I understand this requirement.  Do you mean block indefinitely
> unless requested to stop?

These threads can block waiting for a hardware DMA engine, which has a 28
second timeout setpoint.

> > And for performance reasons these threads need to be able to be created
> > quickly. These threads are involved in delivering messages to XPC's users
> > (like XPNET) and we had latency issues that led us to use kernel_thread()
> > directly instead of the kthread API. Additionally, XPC may need to have
> > hundreds of these threads active at any given time.
> 
> Ugh.  Can you tell me a little more about the latency issues?

After placing a message in a local message queue, one SGI system partition
will interrupt another to retrieve the message. We need to minimize the
time from entering XPC's interrupt handler to the time that the message
can be DMA transferred and delivered to the consumer (like XPNET) to
whom it was sent.

> Is having a non-halting kthread_create enough to fix this?
> So you don't have to context switch several times to get the
> thread running?
> 
> Or do you need more severe latency reductions?
> 
> The more severe fix would require some significant changes to copy_process
> and every architecture would need to be touched to fix up copy_thread.
> It is possible, it is a lot of work, and the reward is far from obvious.

I think a non-halting kthread_create() should be sufficient. It is in
effect what XPC has now in calling kernel_thread() directly.

Taking it one step further, if you added the notion of a thread pool,
where upon exit, a thread isn't destroyed but rather is queued ready to
handle the next kthread_create_quick() request.

> > I think it would be great if the kthread API (or underlying implementation)
> > could be changed to handle these issues. I'd love for XPC to not have to
> > maintain this sort of thing itself.
> 
> Currently daemonize is a serious maintenance problem.
> 
> Using daemonize and kernel_thread to create kernel threads is a blocker
> in implementing the pid namespace because of their use of pid_t.
> 
> So I am motivated to get this fixed.

This would also address the problems we see with huge pid spaces for
kernel threads on our largest machines.  In the example from last week,
we had 10 threads each on 4096 cpus.  If we reworked work_queues to use
the kthread_create_nonblocking() thread pool, we could probably collapse
the need for having all of those per-task, per-cpu work queues.

Dean
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers
Re: [PATCH] cpci_hotplug: Convert to use the kthread API [message #18571 is a reply to message #18463] Fri, 27 April 2007 22:07 Go to previous messageGo to next message
Scott Murray is currently offline  Scott Murray
Messages: 4
Registered: April 2007
Junior Member
On Sun, 22 Apr 2007, Christoph Hellwig wrote:

> On Thu, Apr 19, 2007 at 12:55:29AM -0600, Eric W. Biederman wrote:
> > From: Eric W. Biederman <ebiederm@xmission.com> - unquoted
> > 
> > kthread_run replaces the kernel_thread and  daemonize calls
> > during thread startup.
> > 
> > Calls to signal_pending were also removed as it is currently
> > impossible for the cpci_hotplug thread to receive signals.
> 
> This drivers thread are a bit of a miss, although a lot better than
> most other pci hotplug drivers :)
> 
> Below is more complete conversion to the kthread infrastructure +
> wake_up_process to wake the thread.  Note that we had to keep
> a thread_finished variable because the existing one had dual use.

Sorry, it took me a few days to get to testing this out.  It looks good,    
but I had to make a couple of tweaks to avoid a hang when rmmod'ing a
board driver.  The board drivers do:

cpci_hp_stop()
cpci_hp_unregister_controller(controller)

to shutdown, and the check in cpci_hp_unregister_controller if the thread
is running wasn't working due to a bit too much code being excised.  The
result was kthread_stop being called twice, which hangs.  I've indicated 
my changes to avoid this inline below.

Scott

> Signed-off-by: Christoph Hellwig <hch@lst.de>

Acked-by: Scott Murray <scottm@somanetworks.com>

> Index: linux-2.6/drivers/pci/hotplug/cpci_hotplug_core.c
> ===================================================================
> --- linux-2.6.orig/drivers/pci/hotplug/cpci_hotplug_core.c	2007-04-22 12:54:17.000000000 +0200
> +++ linux-2.6/drivers/pci/hotplug/cpci_hotplug_core.c	2007-04-22 13:01:42.000000000 +0200
> @@ -35,6 +35,7 @@
>  #include <linux/smp_lock.h>
>  #include <asm/atomic.h>
>  #include <linux/delay.h>
> +#include <linux/kthread.h>
>  #include "cpci_hotplug.h"
>  
>  #define DRIVER_AUTHOR	"Scott Murray <scottm@somanetworks.com>"
> @@ -59,9 +60,8 @@ static int slots;
>  static atomic_t extracting;
>  int cpci_debug;
>  static struct cpci_hp_controller *controller;
> -static struct semaphore event_semaphore;	/* mutex for process loop (up if something to process) */
> -static struct semaphore thread_exit;		/* guard ensure thread has exited before calling it quits */
> -static int thread_finished = 1;
> +static struct task_struct *cpci_thread;
> +static int thread_finished;
>  
>  static int enable_slot(struct hotplug_slot *slot);
>  static int disable_slot(struct hotplug_slot *slot);
> @@ -357,9 +357,7 @@ cpci_hp_intr(int irq, void *data)
>  	controller->ops->disable_irq();
>  
>  	/* Trigger processing by the event thread */
> -	dbg("Signal event_semaphore");
> -	up(&event_semaphore);
> -	dbg("exited cpci_hp_intr");
> +	wake_up_process(cpci_thread);
>  	return IRQ_HANDLED;
>  }
>  
> @@ -521,17 +519,12 @@ event_thread(void *data)
>  {
>  	int rc;
>  
> -	lock_kernel();
> -	daemonize("cpci_hp_eventd");
> -	unlock_kernel();
> -
>  	dbg("%s - event thread started", __FUNCTION__);
>  	while (1) {
>  		dbg("event thread sleeping");
> -		down_interruptible(&event_semaphore);
> -		dbg("event thread woken, thread_finished = %d",
> -		    thread_finished);
> -		if (thread_finished || signal_pending(current))
> +		set_current_state(TASK_INTERRUPTIBLE);
> +		schedule();
> +		if (kthread_should_stop())
>  			break;
>  		do {
>  			rc = check_slots();
> @@ -541,18 +534,17 @@ event_thread(void *data)
>  			} else if (rc < 0) {
>  				dbg("%s - error checking slots", __FUNCTION__);
>  				thread_finished = 1;
> -				break;
> +				goto out;
>  			}
> -		} while (atomic_read(&extracting) && !thread_finished);
> -		if (thread_finished)
> +		} while (atomic_read(&extracting) && !kthread_should_stop());
> +		if (kthread_should_stop())
>  			break;
>  
>  		/* Re-enable ENUM# interrupt */
>  		dbg("%s - re-enabling irq", __FUNCTION__);
>  		controller->ops->enable_irq();
>  	}
> -	dbg("%s - event thread signals exit", __FUNCTION__);
> -	up(&thread_exit);
> + out:
>  	return 0;
>  }
>  
> @@ -562,12 +554,8 @@ poll_thread(void *data)
>  {
>  	int rc;
>  
> -	lock_kernel();
> -	daemonize("cpci_hp_polld");
> -	unlock_kernel();
> -
>  	while (1) {
> -		if (thread_finished || signal_pending(current))
> +		if (kthread_should_stop() || signal_pending(current))
>  			break;
>  		if (controller->ops->query_enum()) {
>  			do {
> @@ -578,48 +566,34 @@ poll_thread(void *data)
>  				} else if (rc < 0) {
>  					dbg("%s - error checking slots", __FUNCTION__);
>  					thread_finished = 1;
> -					break;
> +					goto out;
>  				}
> -			} while (atomic_read(&extracting) && !thread_finished);
> +			} while (atomic_read(&extracting) && !kthread_should_stop());
>  		}
>  		msleep(100);
>  	}
> -	dbg("poll thread signals exit");
> -	up(&thread_exit);
> + out:
>  	return 0;
>  }
>  
>  static int
>  cpci_start_thread(void)
>  {
> -	int pid;
> -
> -	/* initialize our semaphores */
> -	init_MUTEX_LOCKED(&event_semaphore);
> -	init_MUTEX_LOCKED(&thread_exit);
> -	thread_finished = 0;
> -
>  	if (controller->irq)
> -		pid = kernel_thread(event_thread, NULL, 0);
> +		cpci_thread = kthread_run(event_thread, NULL, "cpci_hp_eventd");
>  	else
> -		pid = kernel_thread(poll_thread, NULL, 0);
> -	if (pid < 0) {
> +		cpci_thread = kthread_run(poll_thread, NULL, "cpci_hp_polld");
> +	if (IS_ERR(cpci_thread)) {
>  		err("Can't start up our thread");
> -		return -1;
> +		return PTR_ERR(cpci_thread);
>  	}
> -	dbg("Our thread pid = %d", pid);

There still needs to be a:

        thread_finished = 0;

here, so that things work if a board driver is insmod'ed again after a 
rmmod (i.e. cpci_hp_start after a cpci_hp_stop).

>  	return 0;
>  }
>  
>  static void
>  cpci_stop_thread(void)
>  {
> -	thread_finished = 1;
> -	dbg("thread finish command given");
> -	if (controller->irq)
> -		up(&event_semaphore);
> -	dbg("wait for thread to exit");
> -	down(&thread_exit);
> +	kthread_stop(cpci_thread);

As well, there still needs to be a:

        thread_finished = 1;
                         
here to make the check in cpci_hp_unregister_controller work.

>  }
>  
>  int
 

-- 
Scott Murray
SOMA Networks, Inc.
Toronto, Ontario
e-mail: scottm@somanetworks.com
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers
Re: [PATCH] ia64 sn xpc: Convert to use kthread API. [message #18572 is a reply to message #18569] Mon, 30 April 2007 15:22 Go to previous messageGo to next message
Dean Nelson is currently offline  Dean Nelson
Messages: 6
Registered: April 2007
Junior Member
On Fri, Apr 27, 2007 at 02:33:32PM -0600, Eric W. Biederman wrote:
> Dean Nelson <dcn@sgi.com> writes:
> 
> > On Fri, Apr 27, 2007 at 12:34:02PM -0600, Eric W. Biederman wrote:
> >> Dean Nelson <dcn@sgi.com> writes:
> >> >
> >> > XPC is in need of threads that can block indefinitely, which is why XPC
> >> > is in the business of maintaining a pool of threads. Currently there is
> >> > no such capability (that I know of) that is provided by linux. Workqueues
> >> > can't block indefinitely.
> >> 
> >> I'm not certain I understand this requirement.  Do you mean block indefinitely
> >> unless requested to stop?
> >
> > These threads can block waiting for a hardware DMA engine, which has a 28
> > second timeout setpoint.
> 
> Ok.  So this is an interruptible sleep?

No, the hardware DMA engine's software interface, doesn't sleep
nor relinquish the CPU. But there are other spots where we do sleep
interruptibly.

> Do you have any problems being woken up out of that interruptible sleep
> by kthread_stop?
> 
> I am in the process of modifying kthread_stop to wake up thread in an
> interruptible sleep and set signal_pending, so they will break out.

No, this is fine, just avoid designing the kthread stop mechanism
to require a thread being requested to stop to actually stop in some
finite amount of time, and that by it not stopping other kthread stop
requests are held off. Allow the thread to take as much time as it needs
to respond to the kthread stop request.

> >> > And for performance reasons these threads need to be able to be created
> >> > quickly. These threads are involved in delivering messages to XPC's users
> >> > (like XPNET) and we had latency issues that led us to use kernel_thread()
> >> > directly instead of the kthread API. Additionally, XPC may need to have
> >> > hundreds of these threads active at any given time.
> >> 
> >> Ugh.  Can you tell me a little more about the latency issues?
> >
> > After placing a message in a local message queue, one SGI system partition
> > will interrupt another to retrieve the message. We need to minimize the
> > time from entering XPC's interrupt handler to the time that the message
> > can be DMA transferred and delivered to the consumer (like XPNET) to
> > whom it was sent.
> >
> >> Is having a non-halting kthread_create enough to fix this?
> >> So you don't have to context switch several times to get the
> >> thread running?
> >> 
> >> Or do you need more severe latency reductions?
> >> 
> >> The more severe fix would require some significant changes to copy_process
> >> and every architecture would need to be touched to fix up copy_thread.
> >> It is possible, it is a lot of work, and the reward is far from obvious.
> >
> > I think a non-halting kthread_create() should be sufficient. It is in
> > effect what XPC has now in calling kernel_thread() directly.
> 
> A little different but pretty close.
> 
> We call kthread_create() it prepares everything and places it on
> a queue and wakes up kthreadd.
> 
> kthreadd then wakes up and forks the thread.
> 
> After the thread has finishing setting up it will call complete on
> a completion so kthread_create can continue on it's merry way
> but it should not need to go to sleep waiting for someone to
> call kthread_bind.
> 
> But if you can live with what I have just described that will
> be easy to code up.
> 
> It is a little slower then kernel_thread but hopefully not much.

I was aware of this behavior of kthread_create(), which I consider
'halting' in that the thread doing the kthread_create() blocks waiting
for kthreadd to get scheduled, call kernel_thread(), and then call
complete(). By your mentioning a 'non-halting' kthread_create() I
thought you were planning to create a new flavor of kthread_create()
that called kernel_thread() directly and reparented the child thread to
kthreadd. My mistake.

So there will be more overhead (time-wise) for XPC in calling
kthread_run() as opposed to it formerly calling kernel_thread() directly.
Thus requiring XPC to utilize a pool of kthread_create()'d threads.

> > Taking it one step further, if you added the notion of a thread pool,
> > where upon exit, a thread isn't destroyed but rather is queued ready to
> > handle the next kthread_create_quick() request.
> 
> That might happen.  So far I am avoiding the notion of a thread pool for
> as long as I can.  There is some sense in it, especially in generalizing
> the svc thread pool code from nfs.  But if I don't have to go there I would
> prefer it.

This means that XPC will have to retain its thread pool, but I can
understand you not wanting to go there.

Thanks,
Dean

_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers
Re: [PATCH] s390/scsi/zfcp_erp: Convert to use the kthread API [message #18573 is a reply to message #18483] Mon, 30 April 2007 10:41 Go to previous messageGo to next message
Swen Schillig is currently offline  Swen Schillig
Messages: 1
Registered: April 2007
Junior Member
On Sunday 22 April 2007 22:17, Christoph Hellwig wrote:
> On Thu, Apr 19, 2007 at 01:58:42AM -0600, Eric W. Biederman wrote:
> > From: Eric W. Biederman <ebiederm@xmission.com>
> > 
> > Modify zfcperp%s to be started with kthread_run not
> > a combination of kernel_thread, daemonize and siginitsetinv
> > making the code slightly simpler and more maintainable.
> 
> This driver would also benefit from a full kthread conversion.
> Unfortunately it has a strange dual-use semaphore (->erp_ready_sem)
> that hinders a straight conversion.  Maybe the maintainer can take
> a look whether there's a nice way to get rid of that one?
> 
I know and we have it on our schedule, 
but it's not as easy as it might look like .
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers
Re: [PATCH] md: Remove broken SIGKILL support [message #18574 is a reply to message #18315] Tue, 01 May 2007 06:13 Go to previous messageGo to next message
ebiederm is currently offline  ebiederm
Messages: 1354
Registered: February 2006
Senior Member
Neil Brown <neilb@suse.de> writes:

> On Thursday April 19, ebiederm@xmission.com wrote:
>> From: Eric W. Biederman <ebiederm@xmission.com>
>> 
>> Currently md_thread calls allow_signal so it can receive a
>> SIGKILL but then does nothing with it except flush the
>> sigkill so that it not can use an interruptible sleep.
>> 
>> This whole dance is silly so remove the unnecessary
>> and broken signal handling logic.
>
> (sorry of the delay in replying)
>
> You missed some related code which should help you see that it is -
> maybe - not completely 'silly' (though I confess it might be slightly
> broken).
> In md_check_recovery:
>
> 	if (signal_pending(current)) {
> 		if (mddev->pers->sync_request) {
> 			printk(KERN_INFO "md: %s in immediate safe mode\n",
> 			       mdname(mddev));
> 			mddev->safemode = 2;
> 		}
> 		flush_signals(current);
> 	}

Thanks.

> The idea is that alt-sysrq-K will send SIGKILL to all processes
> including the md support threads, which will cause them to enter
> "immediate safe mode" so that the metadata will be marked clean
> immediately at every opportunity.  That way you can use alt-sysrq:
>   sync,unmount,kill,reboot
> and be fairly sure that you md array will be shut down cleanly.
>
> I'd be just as happy to link this into Unmount (aka
> do_emergency_remount), but that doesn't seem at all straight forward,
> and in any case should be done before the current code is ripped out.
>
> While we do have a reboot_notifier which tries to stop all arrays,
> I've never been comfortable with that.  A reboot really should just
> reboot... 
>
> What I would REALLY like is for the block device to know whether it is
> open read-only or read-write.  Then I could mark it clean when it
> becomes read-only as would happen when do_emergency_remount remounts
> it read-only.
>
> I might see how hard that would be...

My goal to get signals to kernel threads out of the user space interface
especially for non-privileged processes, so everything that we do with
kernel threads can just be an unimportant implementation detail to user
space.

Eric
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers
Re: [PATCH] md: Remove broken SIGKILL support [message #18575 is a reply to message #18352] Tue, 01 May 2007 00:47 Go to previous messageGo to next message
Neil Brown is currently offline  Neil Brown
Messages: 6
Registered: October 2006
Junior Member
On Thursday April 19, ebiederm@xmission.com wrote:
> From: Eric W. Biederman <ebiederm@xmission.com>
> 
> Currently md_thread calls allow_signal so it can receive a
> SIGKILL but then does nothing with it except flush the
> sigkill so that it not can use an interruptible sleep.
> 
> This whole dance is silly so remove the unnecessary
> and broken signal handling logic.

(sorry of the delay in replying)

You missed some related code which should help you see that it is -
maybe - not completely 'silly' (though I confess it might be slightly
broken).
In md_check_recovery:

	if (signal_pending(current)) {
		if (mddev->pers->sync_request) {
			printk(KERN_INFO "md: %s in immediate safe mode\n",
			       mdname(mddev));
			mddev->safemode = 2;
		}
		flush_signals(current);
	}

The idea is that alt-sysrq-K will send SIGKILL to all processes
including the md support threads, which will cause them to enter
"immediate safe mode" so that the metadata will be marked clean
immediately at every opportunity.  That way you can use alt-sysrq:
  sync,unmount,kill,reboot
and be fairly sure that you md array will be shut down cleanly.

I'd be just as happy to link this into Unmount (aka
do_emergency_remount), but that doesn't seem at all straight forward,
and in any case should be done before the current code is ripped out.

While we do have a reboot_notifier which tries to stop all arrays,
I've never been comfortable with that.  A reboot really should just
reboot... 

What I would REALLY like is for the block device to know whether it is
open read-only or read-write.  Then I could mark it clean when it
becomes read-only as would happen when do_emergency_remount remounts
it read-only.

I might see how hard that would be...

NeilBrown
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers
Re: [PATCH] ia64 sn xpc: Convert to use kthread API. [message #18576 is a reply to message #18294] Wed, 02 May 2007 15:44 Go to previous messageGo to next message
ebiederm is currently offline  ebiederm
Messages: 1354
Registered: February 2006
Senior Member
Dean Nelson <dcn@sgi.com> writes:

> On Mon, Apr 30, 2007 at 10:22:30AM -0500, Dean Nelson wrote:
>> On Fri, Apr 27, 2007 at 02:33:32PM -0600, Eric W. Biederman wrote:
>> > Dean Nelson <dcn@sgi.com> writes:
>> > 
>> > > Taking it one step further, if you added the notion of a thread pool,
>> > > where upon exit, a thread isn't destroyed but rather is queued ready to
>> > > handle the next kthread_create_quick() request.
>> > 
>> > That might happen.  So far I am avoiding the notion of a thread pool for
>> > as long as I can.  There is some sense in it, especially in generalizing
>> > the svc thread pool code from nfs.  But if I don't have to go there I would
>> > prefer it.
>> 
>> This means that XPC will have to retain its thread pool, but I can
>> understand you not wanting to go there.
>
> On Thu, Apr 26, 2007 at 01:11:15PM -0600, Eric W. Biederman wrote:
>>
>> Ok.    Because of the module unloading issue, and because we don't have
>> a lot of these threads running around, the current plan is to fix
>> thread_create and kthread_stop so that they must always be paired,
>> and so that kthread_stop will work correctly if the task has already
>> exited.
>>
>> Basically that just involves calling get_task_struct in kthread_create
>> and put_task_struct in kthread_stop.
>
> Okay, so I need to expand upon Christoph Hellwig's patch so that all
> the kthread_create()'d threads are kthread_stop()'d.
>
> This is easy to do for the XPC thread that exists for the lifetime of XPC,
> as well as for the threads created to manage the SGI system partitions.
>
> XPC has the one discovery thread that is created when XPC is first started
> and exits as soon as it has finished discovering all existing SGI system
> partitions. With your forthcoming change to kthread_stop() that will allow
> it to be called after the thread has exited, doing this one is also easy.
> Note that the kthread_stop() for this discovery thread won't occur until
> XPC is rmmod'd. This means that its task_struct will not be freed for
> possibly a very long time (i.e., weeks). Is that a problem?

As long as there is only one, not really.  It would be good if we could
get rid of it though.

The practical problem is the race with rmmod, in particular if someone
calls rmmod while this thread is still running.

If I get clever I think this is likely solvable with something like.

kthread_maybe_stop(struct task_struct **loc)
{
        struct task_struct *tsk;
        tsk = xchg(loc, NULL);
        if (tsk)
        	kthread_stop(tsk);
}

kthread_stop_self(struct task_struct **loc, int exit_code)
{
        struct task_struct *tsk;
        
        tsk = xchg(loc, NULL);
        if (tsk)
        	put_task_struct(tsk);
	do_exit(tsk);                
}

I'm not quite convinced that is a common enough paradigm to implement
that.

> But then we come to XPC's pool of threads that deliver channel messages
> to the appropriate consumer (like XPNET) and can block indefinitely. As
> mentioned earlier there could be hundreds if not thousands of these
> (our systems keep getting bigger). So now requiring a kthread_stop()
> for each one of these becomes more of a problem, as it is a lot of
> task_struct pointers to maintain.
>
> Currently, XPC maintains these threads via a
> wait_event_interruptible_exclusive() queue so that it can wakeup as many
> or as few as needed at any given moment by calling wake_up_nr(). When XPC
> is rmmod'd, a flag is set which causes them to exit and wake_up_all()
> is called.  Therefore XPC dosen't need to remember their pids or
> task_struct pointers.
>
> So what would you suggest we do for this pool of threads?

Good question.

The whole concept of something that feels like a core part of the
platform code being modular I'm still looking at strange.

> Is there any way to have a version of kthread_create() that doesn't
> require a matching kthread_stop()? Or add a kthread_not_stopping()
> that does the put_task_struct() call, so as to eliminate the need for
> calling kthread_stop()?

Yes.  I was thinking calling it kthread_orphan or something like that.
We can't make anything like that the default, because of the modular
remove problem, but it's not to hard.

> Or should we reconsider the kthread pool approach
> (and get XPC out of the thread management business altogether)? Robin
> Holt is putting together a proposal for how one could do a kthread pool,
> it should provide a bit more justification for going down that road.

Eric
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers
Re: [PATCH] ia64 sn xpc: Convert to use kthread API. [message #18577 is a reply to message #18572] Wed, 02 May 2007 15:16 Go to previous messageGo to next message
Dean Nelson is currently offline  Dean Nelson
Messages: 6
Registered: April 2007
Junior Member
On Mon, Apr 30, 2007 at 10:22:30AM -0500, Dean Nelson wrote:
> On Fri, Apr 27, 2007 at 02:33:32PM -0600, Eric W. Biederman wrote:
> > Dean Nelson <dcn@sgi.com> writes:
> > 
> > > Taking it one step further, if you added the notion of a thread pool,
> > > where upon exit, a thread isn't destroyed but rather is queued ready to
> > > handle the next kthread_create_quick() request.
> > 
> > That might happen.  So far I am avoiding the notion of a thread pool for
> > as long as I can.  There is some sense in it, especially in generalizing
> > the svc thread pool code from nfs.  But if I don't have to go there I would
> > prefer it.
> 
> This means that XPC will have to retain its thread pool, but I can
> understand you not wanting to go there.

On Thu, Apr 26, 2007 at 01:11:15PM -0600, Eric W. Biederman wrote:
>
> Ok.    Because of the module unloading issue, and because we don't have
> a lot of these threads running around, the current plan is to fix
> thread_create and kthread_stop so that they must always be paired,
> and so that kthread_stop will work correctly if the task has already
> exited.
>
> Basically that just involves calling get_task_struct in kthread_create
> and put_task_struct in kthread_stop.

Okay, so I need to expand upon Christoph Hellwig's patch so that all
the kthread_create()'d threads are kthread_stop()'d.

This is easy to do for the XPC thread that exists for the lifetime of XPC,
as well as for the threads created to manage the SGI system partitions.

XPC has the one discovery thread that is created when XPC is first started
and exits as soon as it has finished discovering all existing SGI system
partitions. With your forthcoming change to kthread_stop() that will allow
it to be called after the thread has exited, doing this one is also easy.
Note that the kthread_stop() for this discovery thread won't occur until
XPC is rmmod'd. This means that its task_struct will not be freed for
possibly a very long time (i.e., weeks). Is that a problem?

But then we come to XPC's pool of threads that deliver channel messages
to the appropriate consumer (like XPNET) and can block indefinitely. As
mentioned earlier there could be hundreds if not thousands of these
(our systems keep getting bigger). So now requiring a kthread_stop()
for each one of these becomes more of a problem, as it is a lot of
task_struct pointers to maintain.

Currently, XPC maintains these threads via a
wait_event_interruptible_exclusive() queue so that it can wakeup as many
or as few as needed at any given moment by calling wake_up_nr(). When XPC
is rmmod'd, a flag is set which causes them to exit and wake_up_all()
is called.  Therefore XPC dosen't need to remember their pids or
task_struct pointers.

So what would you suggest we do for this pool of threads?

Is there any way to have a version of kthread_create() that doesn't
require a matching kthread_stop()? Or add a kthread_not_stopping()
that does the put_task_struct() call, so as to eliminate the need for
calling kthread_stop()?  Or should we reconsider the kthread pool approach
(and get XPC out of the thread management business altogether)? Robin
Holt is putting together a proposal for how one could do a kthread pool,
it should provide a bit more justification for going down that road.

Thanks,
Dean

_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers
Re: [PATCH] cpci_hotplug: Convert to use the kthread API [message #18581 is a reply to message #18571] Fri, 04 May 2007 11:12 Go to previous messageGo to previous message
Christoph Hellwig is currently offline  Christoph Hellwig
Messages: 59
Registered: April 2006
Member
On Fri, Apr 27, 2007 at 06:07:49PM -0400, Scott Murray wrote:
> Sorry, it took me a few days to get to testing this out.  It looks good,    
> but I had to make a couple of tweaks to avoid a hang when rmmod'ing a
> board driver.  The board drivers do:
> 
> cpci_hp_stop()
> cpci_hp_unregister_controller(controller)
> 
> to shutdown, and the check in cpci_hp_unregister_controller if the thread
> is running wasn't working due to a bit too much code being excised.  The
> result was kthread_stop being called twice, which hangs.  I've indicated 
> my changes to avoid this inline below.

Can you forward the patches with your fix to Andrew to make sure he
picks it up?

_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers
Previous Topic: [patch i2o] i2o layer cleanup
Next Topic: [PATCH] Make common helpers for seq_files that work with list_head-s
Goto Forum:
  


Current Time: Sun Oct 26 18:03:43 GMT 2025

Total time taken to generate the page: 0.16838 seconds