Home » Mailing lists » Devel » [Q] missing unused dentry in prune_dcache()?
[Q] missing unused dentry in prune_dcache()? [message #7784] |
Wed, 25 October 2006 12:30 |
vaverin
Messages: 708 Registered: September 2005
|
Senior Member |
|
|
Hello folks,
I would like to ask you clarify me one question in the the following patch:
http://linux.bkbits.net:8080/linux-2.6/gnupatch@449b144ecSF1 rYskg3q-SeR2vf88zg
# ChangeSet
# 2006/06/22 15:05:57-07:00 neilb@suse.de
# [PATCH] Fix dcache race during umount
# If prune_dcache finds a dentry that it cannot free, it leaves it where it
# is (at the tail of the list) and exits, on the assumption that some other
# thread will be removing that dentry soon.
However as far as I see this comment is not correct: when we cannot take
s_umount rw_semaphore (for example because it was taken in do_remount) this
dentry is already extracted from dentry_unused list and we do not add it into
the list again. Therefore dentry will not be found by prune_dcache() and
shrink_dcache_sb() and will leave in memory very long time until the partition
will be unmounted.
Am I probably err?
The patch adds this dentry into tail of the dentry_unused list.
Signed-off-by: Vasily Averin <vvs@sw.ru>
--- linux-2.6.19-rc3/fs/dcache.c.prdch 2006-10-25 16:09:19.000000000 +0400
+++ linux-2.6.19-rc3/fs/dcache.c 2006-10-25 16:08:20.000000000 +0400
@@ -477,6 +477,8 @@ static void prune_dcache(int count, stru
}
up_read(s_umount);
}
+ list_add_tail(&dentry->d_lru, &dentry_unused);
+ dentry_stat.nr_unused++;
spin_unlock(&dentry->d_lock);
/* Cannot remove the first dentry, and it isn't appropriate
* to move it to the head of the list, so give up, and try
|
|
|
|
|
Re: [Q] missing unused dentry in prune_dcache()? [message #7791 is a reply to message #7786] |
Wed, 25 October 2006 14:23 |
David Howells
Messages: 44 Registered: October 2006
|
Member |
|
|
Vasily Averin <vvs@sw.ru> wrote:
> >> The patch adds this dentry into tail of the dentry_unused list.
> >
> > I think that's reasonable. I wonder if we can avoid removing it from the
> > list in the first place, but I suspect it's less optimal.
>
> Could you please explain this place in details, I do not understand why tail
> of the list is better than head. Also I do not understand why we should go
> to out in this case. Why we cannot use next dentry in the list instead?
I meant adding it back into the list is reasonable; I didn't actually consider
where you were adding it back.
However, you've just raised a good point: moving a dentry to the head of the
list is what happens when we see it's been referenced recently. The most
recently used entry is at the head and the least at the tail.
If you look at the list scan algorithm, you can see it works from the tail
towards the head.
We could re-insert the dentry at the position from which we removed it, but
that would mean retaining a pointer to one of its neighbouring dentries. We
can do that - it's just stack space after all...
I think the main point though, is generally that the superblock that owns the
dentry is very busy in some critical way; either it's being unmounted (in which
case we may as well put the dentry at the MRU end), or it's being remounted (in
which case, it's probably best putting it back where we found it).
You also have to consider how often superblocks are going to be unmounted or
remounted vs the dcache being pruned...
There is also grab_super(), but that's only used by sget() when a superblock is
being mounted again - also unlikely.
So, given that the three ops that affect this are very unlikely to be happening
at any one time, it's probably worth just slapping it at the head and ignoring
it. The main thing is that it's reinserted somewhere.
Hope that helps...
David
|
|
|
Re: [Q] missing unused dentry in prune_dcache()? [message #7809 is a reply to message #7791] |
Thu, 26 October 2006 11:36 |
vaverin
Messages: 708 Registered: September 2005
|
Senior Member |
|
|
Hello David
David Howells wrote:
> Vasily Averin <vvs@sw.ru> wrote:
>>>> The patch adds this dentry into tail of the dentry_unused list.
>>> I think that's reasonable. I wonder if we can avoid removing it from the
>>> list in the first place, but I suspect it's less optimal.
>> Could you please explain this place in details, I do not understand why tail
>> of the list is better than head. Also I do not understand why we should go
>> to out in this case. Why we cannot use next dentry in the list instead?
>
> I meant adding it back into the list is reasonable; I didn't actually consider
> where you were adding it back.
>
> So, given that the three ops that affect this are very unlikely to be happening
> at any one time, it's probably worth just slapping it at the head and ignoring
> it. The main thing is that it's reinserted somewhere.
I don't like to insert this dentry into tail of list because of it prevents
shrink_dcache_memory. It finds "skipped" dentry at the tail of the list, does
not free it and goes to out without any progress.
Therefore I've removed break of cycle and insert this dentry to head of the
list. Theoretically it can lead to the second using of the same dentry, however
I do not think that it is a big problem.
Signed-off-by: Vasily Averin <vvs@sw.ru>
--- linux-2.6.19-rc3/fs/dcache.c.prdch 2006-10-25 16:09:19.000000000 +0400
+++ linux-2.6.19-rc3/fs/dcache.c 2006-10-26 15:14:51.000000000 +0400
@@ -478,11 +478,9 @@ static void prune_dcache(int count, stru
up_read(s_umount);
}
spin_unlock(&dentry->d_lock);
- /* Cannot remove the first dentry, and it isn't appropriate
- * to move it to the head of the list, so give up, and try
- * later
- */
- break;
+ /* Inserting dentry to tail of the list leads to cycle */
+ list_add(&dentry->d_lru, &dentry_unused);
+ dentry_stat.nr_unused++;
}
spin_unlock(&dcache_lock);
}
|
|
|
|
|
Re: [Q] missing unused dentry in prune_dcache()? [message #7815 is a reply to message #7811] |
Thu, 26 October 2006 13:23 |
David Howells
Messages: 44 Registered: October 2006
|
Member |
|
|
Vasily Averin <vvs@sw.ru> wrote:
> I've noticed one more minor issue in your patch: in
> shrink_dcache_for_umount_subtree() function you decrement
> dentry_stat.nr_dentry without dcache_lock.
How about the attached patch?
David
---
VFS: Fix an error in unused dentry counting
From: David Howells <dhowells@redhat.com>
Fix an error in unused dentry counting in shrink_dcache_for_umount_subtree() in
which the count is modified without the dcache_lock held.
Signed-Off-By: David Howells <dhowells@redhat.com>
---
fs/dcache.c | 22 +++++++++++++++-------
1 files changed, 15 insertions(+), 7 deletions(-)
diff --git a/fs/dcache.c b/fs/dcache.c
index a1ff91e..eab1bf4 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -553,16 +553,17 @@ repeat:
* - see the comments on shrink_dcache_for_umount() for a description of the
* locking
*/
-static void shrink_dcache_for_umount_subtree(struct dentry *dentry)
+static unsigned shrink_dcache_for_umount_subtree(struct dentry *dentry)
{
struct dentry *parent;
+ unsigned detached = 0;
BUG_ON(!IS_ROOT(dentry));
/* detach this root from the system */
spin_lock(&dcache_lock);
if (!list_empty(&dentry->d_lru)) {
- dentry_stat.nr_unused--;
+ detached++;
list_del_init(&dentry->d_lru);
}
__d_drop(dentry);
@@ -579,7 +580,7 @@ static void shrink_dcache_for_umount_sub
list_for_each_entry(loop, &dentry->d_subdirs,
d_u.d_child) {
if (!list_empty(&loop->d_lru)) {
- dentry_stat.nr_unused--;
+ detached++;
list_del_init(&loop->d_lru);
}
@@ -620,7 +621,7 @@ static void shrink_dcache_for_umount_sub
atomic_dec(&parent->d_count);
list_del(&dentry->d_u.d_child);
- dentry_stat.nr_dentry--; /* For d_free, below */
+ detached++; /* For d_free, below */
inode = dentry->d_inode;
if (inode) {
@@ -638,7 +639,7 @@ static void shrink_dcache_for_umount_sub
* otherwise we ascend to the parent and move to the
* next sibling if there is one */
if (!parent)
- return;
+ return detached;
dentry = parent;
@@ -663,6 +664,7 @@ static void shrink_dcache_for_umount_sub
void shrink_dcache_for_umount(struct super_block *sb)
{
struct dentry *dentry;
+ unsigned detached = 0;
if (down_read_trylock(&sb->s_umount))
BUG();
@@ -670,11 +672,17 @@ void shrink_dcache_for_umount(struct sup
dentry = sb->s_root;
sb->s_root = NULL;
atomic_dec(&dentry->d_count);
- shrink_dcache_for_umount_subtree(dentry);
+ detached = shrink_dcache_for_umount_subtree(dentry);
while (!hlist_empty(&sb->s_anon)) {
dentry = hlist_entry(sb->s_anon.first, struct dentry, d_hash);
- shrink_dcache_for_umount_subtree(dentry);
+ detached += shrink_dcache_for_umount_subtree(dentry);
+ }
+
+ if (detached) {
+ spin_lock(&dcache_lock);
+ dentry_stat.nr_unused -= detached;
+ spin_unlock(&dcache_lock);
}
}
|
|
|
|
Re: [Q] missing unused dentry in prune_dcache()? [message #7817 is a reply to message #7815] |
Thu, 26 October 2006 13:58 |
vaverin
Messages: 708 Registered: September 2005
|
Senior Member |
|
|
David Howells wrote:
> Vasily Averin <vvs@sw.ru> wrote:
>
>> I've noticed one more minor issue in your patch: in
>> shrink_dcache_for_umount_subtree() function you decrement
>> dentry_stat.nr_dentry without dcache_lock.
>
> How about the attached patch?
I would note that in first two hunks you have decremented dentry_stat.nr_unused
correctly, under dcache_lock. Probably it's better to count freed dentries only
in third case and corrects dentry_stat.nr_unused value inside
shrink_dcache_for_umount_subtree() function before return.
Thank you,
Vasily Averin
> David
> ---
> VFS: Fix an error in unused dentry counting
>
> From: David Howells <dhowells@redhat.com>
>
> Fix an error in unused dentry counting in shrink_dcache_for_umount_subtree() in
> which the count is modified without the dcache_lock held.
>
> Signed-Off-By: David Howells <dhowells@redhat.com>
> ---
>
> fs/dcache.c | 22 +++++++++++++++-------
> 1 files changed, 15 insertions(+), 7 deletions(-)
>
> diff --git a/fs/dcache.c b/fs/dcache.c
> index a1ff91e..eab1bf4 100644
> --- a/fs/dcache.c
> +++ b/fs/dcache.c
> @@ -553,16 +553,17 @@ repeat:
> * - see the comments on shrink_dcache_for_umount() for a description of the
> * locking
> */
> -static void shrink_dcache_for_umount_subtree(struct dentry *dentry)
> +static unsigned shrink_dcache_for_umount_subtree(struct dentry *dentry)
> {
> struct dentry *parent;
> + unsigned detached = 0;
>
> BUG_ON(!IS_ROOT(dentry));
>
> /* detach this root from the system */
> spin_lock(&dcache_lock);
> if (!list_empty(&dentry->d_lru)) {
> - dentry_stat.nr_unused--;
> + detached++;
> list_del_init(&dentry->d_lru);
> }
> __d_drop(dentry);
> @@ -579,7 +580,7 @@ static void shrink_dcache_for_umount_sub
> list_for_each_entry(loop, &dentry->d_subdirs,
> d_u.d_child) {
> if (!list_empty(&loop->d_lru)) {
> - dentry_stat.nr_unused--;
> + detached++;
> list_del_init(&loop->d_lru);
> }
>
> @@ -620,7 +621,7 @@ static void shrink_dcache_for_umount_sub
> atomic_dec(&parent->d_count);
>
> list_del(&dentry->d_u.d_child);
> - dentry_stat.nr_dentry--; /* For d_free, below */
> + detached++; /* For d_free, below */
>
> inode = dentry->d_inode;
> if (inode) {
> @@ -638,7 +639,7 @@ static void shrink_dcache_for_umount_sub
> * otherwise we ascend to the parent and move to the
> * next sibling if there is one */
> if (!parent)
> - return;
> + return detached;
>
> dentry = parent;
>
> @@ -663,6 +664,7 @@ static void shrink_dcache_for_umount_sub
> void shrink_dcache_for_umount(struct super_block *sb)
> {
> struct dentry *dentry;
> + unsigned detached = 0;
>
> if (down_read_trylock(&sb->s_umount))
> BUG();
> @@ -670,11 +672,17 @@ void shrink_dcache_for_umount(struct sup
> dentry = sb->s_root;
> sb->s_root = NULL;
> atomic_dec(&dentry->d_count);
> - shrink_dcache_for_umount_subtree(dentry);
> + detached = shrink_dcache_for_umount_subtree(dentry);
>
> while (!hlist_empty(&sb->s_anon)) {
> dentry = hlist_entry(sb->s_anon.first, struct dentry, d_hash);
> - shrink_dcache_for_umount_subtree(dentry);
> + detached += shrink_dcache_for_umount_subtree(dentry);
> + }
> +
> + if (detached) {
> + spin_lock(&dcache_lock);
> + dentry_stat.nr_unused -= detached;
> + spin_unlock(&dcache_lock);
> }
> }
>
>
|
|
|
|
Re: [Q] missing unused dentry in prune_dcache()? [message #7837 is a reply to message #7815] |
Fri, 27 October 2006 06:32 |
vaverin
Messages: 708 Registered: September 2005
|
Senior Member |
|
|
David Howells wrote:
> Vasily Averin <vvs@sw.ru> wrote:
>
>> I've noticed one more minor issue in your patch: in
>> shrink_dcache_for_umount_subtree() function you decrement
>> dentry_stat.nr_dentry without dcache_lock.
>
> How about the attached patch?
I'm sorry, but your patch is wrong:
you have mixed calculation of 2 variables:
dentry_stat.nr_unused -- were correct, it was decremented under dcache_lock.
dentry_stat.nr_dentry -- were incorrect, it was decremented without dcache_lock.
You should correct dentry_stat.nr_dentry, but instead you broke calculation of
dentry_stat.nr_unused.
I've fixed this issue by following patch.
Thank you,
Vasily Averin
---
VFS: Fix an error in dentry_stat.nr_dentry counting
From: Vasily Averin <vvs@sw.ru>
Fix an error in dentry_stat.nr_dentry counting in
shrink_dcache_for_umount_subtree() in which the count is modified without the
dcache_lock held.
Signed-Off-By: Vasily Averin <vvs@sw.ru>
--- linux-2.6.19-rc3/fs/dcache.c.nrdntr 2006-10-26 15:14:51.000000000 +0400
+++ linux-2.6.19-rc3/fs/dcache.c 2006-10-27 10:24:07.000000000 +0400
@@ -554,6 +554,7 @@ repeat:
static void shrink_dcache_for_umount_subtree(struct dentry *dentry)
{
struct dentry *parent;
+ unsigned detached = 0;
BUG_ON(!IS_ROOT(dentry));
@@ -618,7 +619,7 @@ static void shrink_dcache_for_umount_sub
atomic_dec(&parent->d_count);
list_del(&dentry->d_u.d_child);
- dentry_stat.nr_dentry--; /* For d_free, below */
+ detached++;
inode = dentry->d_inode;
if (inode) {
@@ -635,8 +636,8 @@ static void shrink_dcache_for_umount_sub
/* finished when we fall off the top of the tree,
* otherwise we ascend to the parent and move to the
* next sibling if there is one */
- if (!parent)
- return;
+ if (!parent)
+ goto out;
dentry = parent;
@@ -645,6 +646,11 @@ static void shrink_dcache_for_umount_sub
dentry = list_entry(dentry->d_subdirs.next,
struct dentry, d_u.d_child);
}
+out:
+ /* several dentries were freed, need to correct nr_dentry */
+ spin_lock(&dcache_lock);
+ dentry_stat.nr_dentry -= detached;
+ spin_unlock(&dcache_lock);
}
/*
|
|
|
Re: [Q] missing unused dentry in prune_dcache()? [message #7838 is a reply to message #7837] |
Fri, 27 October 2006 06:50 |
vaverin
Messages: 708 Registered: September 2005
|
Senior Member |
|
|
Vasily Averin wrote:
> David Howells wrote:
>> Vasily Averin <vvs@sw.ru> wrote:
>>
>>> I've noticed one more minor issue in your patch: in
>>> shrink_dcache_for_umount_subtree() function you decrement
>>> dentry_stat.nr_dentry without dcache_lock.
>> How about the attached patch?
> I'm sorry, but your patch is wrong:
> you have mixed calculation of 2 variables:
> dentry_stat.nr_unused -- were correct, it was decremented under dcache_lock.
> dentry_stat.nr_dentry -- were incorrect, it was decremented without dcache_lock.
>
> You should correct dentry_stat.nr_dentry, but instead you broke calculation of
> dentry_stat.nr_unused.
>
> I've fixed this issue by following patch.
corrected version, extra space were removed
Thank you,
Vasily Averin
---
VFS: Fix an error in dentry_stat.nr_dentry counting
From: Vasily Averin <vvs@sw.ru>
Fix an error in dentry_stat.nr_dentry counting in
shrink_dcache_for_umount_subtree() in which the count is modified without the
dcache_lock held.
Signed-Off-By: Vasily Averin <vvs@sw.ru>
--- linux-2.6.19-rc3/fs/dcache.c.nrdntr 2006-10-26 15:14:51.000000000 +0400
+++ linux-2.6.19-rc3/fs/dcache.c 2006-10-27 10:45:11.000000000 +0400
@@ -554,6 +554,7 @@ repeat:
static void shrink_dcache_for_umount_subtree(struct dentry *dentry)
{
struct dentry *parent;
+ unsigned detached = 0;
BUG_ON(!IS_ROOT(dentry));
@@ -618,7 +619,7 @@ static void shrink_dcache_for_umount_sub
atomic_dec(&parent->d_count);
list_del(&dentry->d_u.d_child);
- dentry_stat.nr_dentry--; /* For d_free, below */
+ detached++;
inode = dentry->d_inode;
if (inode) {
@@ -636,7 +637,7 @@ static void shrink_dcache_for_umount_sub
* otherwise we ascend to the parent and move to the
* next sibling if there is one */
if (!parent)
- return;
+ goto out;
dentry = parent;
@@ -645,6 +646,11 @@ static void shrink_dcache_for_umount_sub
dentry = list_entry(dentry->d_subdirs.next,
struct dentry, d_u.d_child);
}
+out:
+ /* several dentries were freed, need to correct nr_dentry */
+ spin_lock(&dcache_lock);
+ dentry_stat.nr_dentry -= detached;
+ spin_unlock(&dcache_lock);
}
/*
|
|
|
|
|
Re: [Q] missing unused dentry in prune_dcache()? [message #7850 is a reply to message #7844] |
Fri, 27 October 2006 10:42 |
David Howells
Messages: 44 Registered: October 2006
|
Member |
|
|
Vasily Averin <vvs@sw.ru> wrote:
> Therefore I believe that my patch is optimal solution.
I'm not sure that prune_dcache() is particularly optimal. If we're looking to
prune for a specific superblock, it may scan most of the dentry_unused list
several times, once for each dentry it eliminates.
Imagine the list with a million dentries on it. Imagine further that all the
dentries you're trying to eliminate are up near the head end: you're going to
have to scan most of the list several times unnecessarily; if you're asked to
kill 128 dentries, you might wind up examining on the order of 100,000,000
dentries, 99% of which you scan 128 times.
I wonder if this could be improved by making the assumption that there won't be
any entries inserted tailwards of where we've just looked. The problem is that
if dcache_lock is dropped, we've no way of keeping track of the current
position without inserting a marker into the list.
Now we could do the marker thing quite easily. We'd have to insert a dummy
dcache entry, probably with d_sb pointing to some special location that is
recognised as saying "that dentry is a marker".
We could do something like the attached patch, for example. Note that the
patch compiles, but I haven't tested it. It also uses a big chunk of stack
space for the marker. It ought to be safe enough with respect to the other
functions that touch that list - all of those deal with specific dentries or
look for dentries by superblock.
David
---
diff --git a/fs/dcache.c b/fs/dcache.c
index eab1bf4..a1cae74 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -395,18 +395,27 @@ static void prune_one_dentry(struct dent
* This function may fail to free any resources if
* all the dentries are in use.
*/
-
+
static void prune_dcache(int count, struct super_block *sb)
{
+ struct dentry marker = {
+ .d_sb = (struct super_block *) &prune_dcache,
+ };
+
+ struct list_head *tmp;
+
spin_lock(&dcache_lock);
+ list_add_tail(&marker.d_lru, &dentry_unused);
+
for (; count ; count--) {
struct dentry *dentry;
- struct list_head *tmp;
struct rw_semaphore *s_umount;
cond_resched_lock(&dcache_lock);
- tmp = dentry_unused.prev;
+ tmp = marker.d_lru.prev;
+ list_del_init(&marker.d_lru);
+
if (sb) {
/* Try to find a dentry for this sb, but don't try
* too hard, if they aren't near the tail they will
@@ -418,9 +427,18 @@ static void prune_dcache(int count, stru
skip--;
tmp = tmp->prev;
}
+ } else {
+ /* We may not be the only pruner */
+ while (tmp != &dentry_unused) {
+ dentry = list_entry(tmp, struct dentry, d_lru);
+ if (dentry->d_sb !=
+ (struct super_block *) &prune_dcache)
+ break;
+ }
}
if (tmp == &dentry_unused)
break;
+ list_add(&marker.d_lru, tmp);
list_del_init(tmp);
prefetch(dentry_unused.prev);
dentry_stat.nr_unused--;
@@ -439,7 +457,7 @@ static void prune_dcache(int count, stru
/* If the dentry was recently referenced, don't free it. */
if (dentry->d_flags & DCACHE_REFERENCED) {
dentry->d_flags &= ~DCACHE_REFERENCED;
- list_add(&dentry->d_lru, &dentry_unused);
+ list_add(&dentry->d_lru, &marker.d_lru);
dentry_stat.nr_unused++;
spin_unlock(&dentry->d_lock);
continue;
@@ -478,12 +496,10 @@ static void prune_dcache(int count, stru
up_read(s_umount);
}
spin_unlock(&dentry->d_lock);
- /* Cannot remove the first dentry, and it isn't appropriate
- * to move it to the head of the list, so give up, and try
- * later
- */
- break;
+ list_add(&dentry->d_lru, &marker.d_lru);
+ dentry_stat.nr_unused++;
}
+ list_del(&marker.d_lru);
spin_unlock(&dcache_lock);
}
|
|
|
Re: [Q] missing unused dentry in prune_dcache()? [message #7851 is a reply to message #7850] |
Fri, 27 October 2006 11:50 |
vaverin
Messages: 708 Registered: September 2005
|
Senior Member |
|
|
David,
David Howells wrote:
> Vasily Averin <vvs@sw.ru> wrote:
>> Therefore I believe that my patch is optimal solution.
> I'm not sure that prune_dcache() is particularly optimal.
I means that my patch is optimal for problem in subject. I would like to ask you
to approve it and we will go to next issue.
> If we're looking to
> prune for a specific superblock, it may scan most of the dentry_unused list
> several times, once for each dentry it eliminates.
>
> Imagine the list with a million dentries on it. Imagine further that all the
> dentries you're trying to eliminate are up near the head end: you're going to
> have to scan most of the list several times unnecessarily; if you're asked to
> kill 128 dentries, you might wind up examining on the order of 100,000,000
> dentries, 99% of which you scan 128 times.
I would note that we (Virtuozzo/OpenVZ team) have seen similar issue on praxis.
We have kernel that handles a few dozens Virtual servers, and each of them have
the several isolated filesystems. We have seen that umount (and remount) can
work very slowly, it was cycled inside shrink_dcache_sb() up to several hours
with taken s_umount semaphore.
We are trying to resolve this issue by using per-sb lru list. I'm preparing the
patch for 2.6.19-rc3 right now and going to send it soon.
thank you,
Vasily Averin
|
|
|
|
[PATCH 2.6.19-rc3] VFS: missing unused dentry in prune_dcache() [message #7857 is a reply to message #7791] |
Fri, 27 October 2006 13:42 |
vaverin
Messages: 708 Registered: September 2005
|
Senior Member |
|
|
Andrew,
could you please use this patch instead of
missing-unused-dentry-in-prune_dcache.patch
As far as I understand David Howells have not any objections
Thank you,
Vasily Averin
---
VFS: missing unused dentry in prune_dcache()
From: Vasily Averin <vvs@sw.ru>
If we cannot delete dentry we should insert it back to the dentry_unused list.
To prevent cycle and do not blocks prune_dcache() call from
shrink_dcache_memory() we adding this dentry to head of list.
Signed-off-by: Vasily Averin <vvs@sw.ru>
--- linux-2.6.19-rc3/fs/dcache.c.prdch 2006-10-25 16:09:19.000000000 +0400
+++ linux-2.6.19-rc3/fs/dcache.c 2006-10-26 15:14:51.000000000 +0400
@@ -478,11 +478,9 @@ static void prune_dcache(int count, stru
up_read(s_umount);
}
spin_unlock(&dentry->d_lock);
- /* Cannot remove the first dentry, and it isn't appropriate
- * to move it to the head of the list, so give up, and try
- * later
- */
- break;
+ /* Inserting dentry to tail of the list leads to cycle */
+ list_add(&dentry->d_lru, &dentry_unused);
+ dentry_stat.nr_unused++;
}
spin_unlock(&dcache_lock);
}
|
|
|
|
[PATCH 2.6.19-rc3] VFS: per-sb dentry lru list [message #7860 is a reply to message #7852] |
Fri, 27 October 2006 14:05 |
vaverin
Messages: 708 Registered: September 2005
|
Senior Member |
|
|
From: Vasily Averin <vvs@sw.ru>
Virtuozzo/OpenVZ linux kernel team has discovered that umount/remount can last
for hours looping in shrink_dcache_sb() without much successes. Since during
shrinking s_umount semaphore is taken lots of other unrelated operations like
sync can stop working until shrink finished.
It happens due to very long unused dentries list (>1 million of dentries),
so that it takes shrink_dcache_sb() longer then 1/HZ seconds to get to the
required dentry and after freeing this single dentry it reschedules and restarts.
The proposed fix prevents this issue by using per-sb dentry LRU list. It
provides very quickly search for the unused dentries for given super block thus
forcing shrinking always making good progress.
It was well tested on 2.6.9-based Virtuozzo/OpenVZ kernels, but the port to the
latest mainstream kernel is not tested.
Signed-off-by: Kirill Korotaev <dev@openvz.org>
Signed-off-by: Vasily Averin <vvs@sw.ru>
--- linux-2.6.19-rc3/fs/dcache.c.shrsb 2006-10-27 10:45:11.000000000 +0400
+++ linux-2.6.19-rc3/fs/dcache.c 2006-10-27 15:49:54.000000000 +0400
@@ -173,6 +173,7 @@ repeat:
if (list_empty(&dentry->d_lru)) {
dentry->d_flags |= DCACHE_REFERENCED;
list_add(&dentry->d_lru, &dentry_unused);
+ list_add(&dentry->d_sb_lru, &dentry->d_sb->s_dentry_unused);
dentry_stat.nr_unused++;
}
spin_unlock(&dentry->d_lock);
@@ -190,6 +191,7 @@ kill_it: {
*/
if (!list_empty(&dentry->d_lru)) {
list_del(&dentry->d_lru);
+ list_del(&dentry->d_sb_lru);
dentry_stat.nr_unused--;
}
list_del(&dentry->d_u.d_child);
@@ -270,6 +272,7 @@ static inline struct dentry * __dget_loc
if (!list_empty(&dentry->d_lru)) {
dentry_stat.nr_unused--;
list_del_init(&dentry->d_lru);
+ list_del_init(&dentry->d_sb_lru);
}
return dentry;
}
@@ -398,6 +401,10 @@ static void prune_one_dentry(struct dent
static void prune_dcache(int count, struct super_block *sb)
{
+ struct list_head *lru_head;
+
+ lru_head = sb ? &sb->s_dentry_unused : &dentry_unused;
+
spin_lock(&dcache_lock);
for (; count ; count--) {
struct dentry *dentry;
@@ -406,25 +413,16 @@ static void prune_dcache(int count, stru
cond_resched_lock(&dcache_lock);
- tmp = dentry_unused.prev;
- if (sb) {
- /* Try to find a dentry for this sb, but don't try
- * too hard, if they aren't near the tail they will
- * be moved down again soon
- */
- int skip = count;
- while (skip && tmp != &dentry_unused &&
- list_entry(tmp, struct dentry, d_lru)->d_sb != sb) {
- skip--;
- tmp = tmp->prev;
- }
- }
- if (tmp == &dentry_unused)
+ tmp = lru_head->prev;
+ if (tmp == lru_head)
break;
- list_del_init(tmp);
- prefetch(dentry_unused.prev);
+
+ prefetch(lru_head->prev);
dentry_stat.nr_unused--;
- dentry = list_entry(tmp, struct dentry, d_lru);
+ dentry = sb ? list_entry(tmp, struct dentry, d_sb_lru) :
+ list_entry(tmp, struct dentry, d_lru);
+ list_del_init(&dentry->d_lru);
+ list_del_init(&dentry->d_sb_lru);
spin_lock(&dentry->d_lock);
/*
@@ -440,6 +438,8 @@ static void prune_dcache(int count, stru
if (dentry->d_flags & DCACHE_REFERENCED) {
dentry->d_flags &= ~DCACHE_REFERENCED;
list_add(&dentry->d_lru, &dentry_unused);
+ list_add(&dentry->d_sb_lru,
+ &dentry->d_sb->s_dentry_unused);
dentry_stat.nr_unused++;
spin_unlock(&dentry->d_lock);
continue;
@@ -455,7 +455,8 @@ static void prune_dcache(int count, stru
* If this dentry is for "my" filesystem, then I can prune it
* without taking the s_umount lock (I already hold it).
*/
- if (sb && dentry->d_sb == sb) {
+ if (sb) {
+ BUG_ON(dentry->d_sb != sb);
prune_one_dentry(dentry);
continue;
}
@@ -480,6 +481,8 @@ static void prune_dcache(int count, stru
spin_unlock(&dentry->d_lock);
/* Inserting dentry to tail of the list leads to cycle */
list_add(&dentry->d_lru, &dentry_unused);
+ list_add(&dentry->d_sb_lru,
+ &dentry->d_sb->s_dentry_unused);
dentry_stat.nr_unused++;
}
spin_unlock(&dcache_lock);
@@ -509,31 +512,15 @@ static void prune_dcache(int count, stru
void shrink_dcache_sb(struct super_block * sb)
{
- struct list_head *tmp, *next;
- struct dentry *dentry;
-
- /*
- * Pass one ... move the dentries for the specified
- * superblock to the most recent end of the unused list.
- */
spin_lock(&dcache_lock);
- list_for_each_safe(tmp, next, &dentry_unused) {
- dentry = list_entry(tmp, struct dentry, d_lru);
- if (dentry->d_sb != sb)
- continue;
- list_move(tmp, &dentry_unused);
- }
+ while (!list_empty(&sb->s_dentry_unused)) {
+ struct dentry *dentry;
- /*
- * Pass two ... free the dentries for this superblock.
- */
-repeat:
- list_for_each_safe(tmp, next, &dentry_unused) {
- dentry = list_entry(tmp, struct dentry, d_lru);
- if (dentry->d_sb != sb)
- continue;
+ dentry = list_entry((&sb->s_dentry_unused)->next,
+ struct dentry, d_sb_lru);
dentry_stat.nr_unused--;
- list_del_init(tmp);
+ list_del_init(&dentry->d_lru);
+ list_del_init(&dentry->d_sb_lru);
spin_lock(&dentry->d_lock);
if (atomic_read(&dentry->d_count)) {
spin_unlock(&dentry->d_lock);
@@ -541,7 +528,6 @@ repeat:
}
prune_one_dentry(dentry);
cond_resched_lock(&dcache_lock);
- goto repeat;
}
spin_unlock(&dcache_lock);
}
@@ -563,6 +549,7 @@ static void shrink_dcache_for_umount_sub
if (!list_empty(&dentry->d_lru)) {
dentry_stat.nr_unused--;
list_del_init(&dentry->d_lru);
+ list_del_init(&dentry->d_sb_lru);
}
__d_drop(dentry);
spin_unlock(&dcache_lock);
@@ -580,6 +567,7 @@ static void shrink_dcache_for_umount_sub
if (!list_empty(&loop->d_lru)) {
dentry_stat.nr_unused--;
list_del_init(&loop->d_lru);
+ list_del_init(&loop->d_sb_lru);
}
__d_drop(loop);
@@ -766,6 +754,7 @@ resume:
if (!list_empty(&dentry->d_lru)) {
dentry_stat.nr_unused--;
list_del_init(&dentry->d_lru);
+ list_del_init(&dentry->d_sb_lru);
}
/*
* move only zero ref count dentries to the end
@@ -773,6 +762,8 @@ resume:
*/
if (!atomic_read(&dentry->d_count)) {
list_add_tail(&dentry->d_lru, &dentry_unused);
+ list_add_tail(&dentry->d_sb_lru,
+ &dentry->d_sb->s_dentry_unused);
dentry_stat.nr_unused++;
found++;
}
@@ -892,6 +883,7 @@ struct dentry *d_alloc(struct dentry * p
#endif
INIT_HLIST_NODE(&dentry->d_hash);
INIT_LIST_HEAD(&dentry->d_lru);
+ INIT_LIST_HEAD(&dentry->d_sb_lru);
INIT_LIST_HEAD(&dentry->d_subdirs);
INIT_LIST_HEAD(&dentry->d_alias);
--- linux-2.6.19-rc3/fs/super.c.shrsb 2006-10-24 10:29:13.000000000 +0400
+++ linux-2.6.19-rc3/fs/super.c 2006-10-27 15:36:33.000000000 +0400
@@ -69,6 +69,7 @@ static struct super_block *alloc_super(s
INIT_LIST_HEAD(&s->s_io);
INIT_LIST_HEAD(&s->s_files);
INIT_LIST_HEAD(&s->s_instances);
+ INIT_LIST_HEAD(&s->s_dentry_unused);
INIT_HLIST_HEAD(&s->s_anon);
INIT_LIST_HEAD(&s->s_inodes);
init_rwsem(&s->s_umount);
--- linux-2.6.19-rc3/include/linux/dcache.h.shrsb 2006-10-24 10:29:14.000000000
+0400
+++ linux-2.6.19-rc3/include/linux/dcache.h 2006-10-27 15:36:33.000000000 +0400
@@ -94,6 +94,7 @@ struct dentry {
struct qstr d_name;
struct list_head d_lru; /* LRU list */
+ struct list_head d_sb_lru; /* per-sb LRU list */
/*
* d_child and d_rcu can share memory
*/
--- linux-2.6.19-rc3/include/linux/fs.h.shrsb 2006-10-24 10:29:14.000000000 +0400
+++ linux-2.6.19-rc3/include/linux/fs.h 2006-10-27 15:36:33.000000000 +0400
@@ -939,6 +939,7 @@ struct super_block {
struct list_head s_dirty; /* dirty inodes */
struct list_head s_io; /* parked for writeback */
struct hlist_head s_anon; /* anonymous dentries for (nfs) exporting */
+ struct list_head s_dentry_unused;
struct list_head s_files;
struct block_device *s_bdev;
|
|
|
|
|
|
|
|
|
|
|
|
Re: [PATCH 2.6.19-rc3] VFS: per-sb dentry lru list [message #7916 is a reply to message #7914] |
Mon, 30 October 2006 15:27 |
dev
Messages: 1693 Registered: September 2005 Location: Moscow
|
Senior Member |
|
|
> Quick search maybe, but your patch adds 2 pointers to each dentry in the
> system... That's pretty expensive, as dentries are already using a *lot* of
> ram.
I don't see much problems with it... it is cache and it can be pruned if needed.
Some time ago, for example, my patch introducing the same list for inodes
was commited.
> Maybe an alternative would be to not have anymore a global dentry_unused, but
> only per-sb unused dentries lists ?
I don't know global LRU implementation based on per-sb lists, do you?
If someone suggest the algorithm for more or less fair global LRU
based on non-global list we will implement it. However, so far,
AFAICS there were problems with it.
Thanks,
Kirill
|
|
|
Re: [PATCH 2.6.19-rc3] VFS: per-sb dentry lru list [message #7920 is a reply to message #7916] |
Mon, 30 October 2006 16:09 |
Eric Dumazet
Messages: 36 Registered: July 2006
|
Member |
|
|
On Monday 30 October 2006 16:34, Kirill Korotaev wrote:
> > Quick search maybe, but your patch adds 2 pointers to each dentry in the
> > system... That's pretty expensive, as dentries are already using a *lot*
> > of ram.
>
> I don't see much problems with it... it is cache and it can be pruned if
> needed. Some time ago, for example, my patch introducing the same list for
> inodes was commited.
The ratio of dentries/PAGE is higher than ration of inodes/PAGE.
(On a x86_64 machine, 19 dentries per PAGE, 5 ext3 inodes per PAGE=
Therefore I suspect that the number of pages locked in dentry_cache because of
one inuse dentry is higher.
>
> > Maybe an alternative would be to not have anymore a global dentry_unused,
> > but only per-sb unused dentries lists ?
>
> I don't know global LRU implementation based on per-sb lists, do you?
> If someone suggest the algorithm for more or less fair global LRU
> based on non-global list we will implement it. However, so far,
> AFAICS there were problems with it.
Using 32 bytes per dentry for unused LRUs sounds too much, maybe we should
revert LRUS handling to timestamps or things like that.
|
|
|
Re: Re: [PATCH 2.6.19-rc3] VFS: per-sb dentry lru list [message #7937 is a reply to message #7887] |
Tue, 31 October 2006 04:38 |
Neil Brown
Messages: 6 Registered: October 2006
|
Junior Member |
|
|
On Monday October 30, dev@sw.ru wrote:
> David,
>
> >>The proposed fix prevents this issue by using per-sb dentry LRU list. It
> >>provides very quickly search for the unused dentries for given super block thus
> >>forcing shrinking always making good progress.
> >
> >
> > We've been down this path before:
> >
> > http://marc.theaimsgroup.com/?l=linux-kernel&m=114861109 717260&w=2
> >
> > A lot of comments on per-sb unused dentry lists were made in
> > the threads associated with the above. other solutions were
> > found to the problem that the above patch addressed, but I don't
> > think any of them have made it to mainline yet. You might want
> > to have a bit of a read of these threads first...
> The major difference between our patch and the one discussed in the link
> it that we keep both global and per-sb dentry LRU lists.
> Thus, when needed normal LRU is used and prune logic is unchanged,
> while umount/remount use per-sb list and do its job faster.
Yes, we have been down this path before - several times I think.
Below is the patch that I like (not tested recently - just rediffed
and reviewed).
NeilBrown
Subject: Reduce contention in dentry_unused when unmounting.
When we unmount a filesystem we need to release all dentries.
We currently
- move a collection of dentries to the end of the dentry_unused list
- call prune_dcache to prune that number of dentries.
If lots of other dentries are added to the end of the list while
the prune_dcache proceeds (e.g. another filesystem is unmounted),
this can involve a lot of wasted time wandering through the
list looking for dentries that we had previously found.
This patch allows the dentry_unused list to temporarily be multiple
lists.
When unmounting, dentries that are found to require pruning are moved
to a temporary list, but accounted as though they were on dentry_unused.
Then this list is passed to prune_dcache for freeing. Any entries
that are not pruned for whatever reason are added to the end of
dentry_unused.
Also change shrink_dcache_sb to simply call shrink_dcache_parent.
This avoids a long walk of the LRU.
Signed-off-by: Neil Brown <neilb@suse.de>
### Diffstat output
./fs/dcache.c | 104 ++++++++++++++++------------------------------------------
1 file changed, 30 insertions(+), 74 deletions(-)
diff .prev/fs/dcache.c ./fs/dcache.c
--- .prev/fs/dcache.c 2006-10-31 15:22:10.000000000 +1100
+++ ./fs/dcache.c 2006-10-31 15:37:22.000000000 +1100
@@ -384,8 +384,8 @@ static void prune_one_dentry(struct dent
/**
* prune_dcache - shrink the dcache
* @count: number of entries to try and free
- * @sb: if given, ignore dentries for other superblocks
- * which are being unmounted.
+ * @list: If given, remove from this list instead of
+ * from dentry_unused.
*
* Shrink the dcache. This is done when we need
* more memory, or simply when we need to unmount
@@ -394,11 +394,21 @@ static void prune_one_dentry(struct dent
*
* This function may fail to free any resources if
* all the dentries are in use.
+ *
+ * Any dentries that were not removed due to the @count
+ * limit will be splice on to the end of dentry_unused,
+ * so they should already be founded in dentry_stat.nr_unused.
*/
-static void prune_dcache(int count, struct super_block *sb)
+static void prune_dcache(int count, struct list_head *list)
{
+ int have_list = list != NULL;
+
spin_lock(&dcache_lock);
+ if (!have_list)
+ /* use the dentry_unused list */
+ list = &dentry_unused;
+
for (; count ; count--) {
struct dentry *dentry;
struct list_head *tmp;
@@ -406,23 +416,11 @@ static void prune_dcache(int count, stru
cond_resched_lock(&dcache_lock);
- tmp = dentry_unused.prev;
- if (sb) {
- /* Try to find a dentry for this sb, but don't try
- * too hard, if they aren't near the tail they will
- * be moved down again soon
- */
- int skip = count;
- while (skip && tmp != &dentry_unused &&
- list_entry(tmp, struct dentry, d_lru)->d_sb != sb) {
- skip--;
- tmp = tmp->prev;
- }
- }
- if (tmp == &dentry_unused)
+ tmp = list->prev;
+ if (tmp == list)
break;
list_del_init(tmp);
- prefetch(dentry_unused.prev);
+ prefetch(list->prev);
dentry_stat.nr_unused--;
dentry = list_entry(tmp, struct dentry, d_lru);
@@ -455,7 +453,7 @@ static void prune_dcache(int count, stru
* If this dentry is for "my" filesystem, then I can prune it
* without taking the s_umount lock (I already hold it).
*/
- if (sb && dentry->d_sb == sb) {
+ if (have_list) {
prune_one_dentry(dentry);
continue;
}
@@ -485,68 +483,25 @@ static void prune_dcache(int count, stru
list_add(&dentry->d_lru, &dentry_unused);
dentry_stat.nr_unused++;
}
+ /* split any remaining entries back onto dentry_unused */
+ if (have_list)
+ list_splice(list, dentry_unused.prev);
spin_unlock(&dcache_lock);
}
-/*
- * Shrink the dcache for the specified super block.
- * This allows us to unmount a device without disturbing
- * the dcache for the other devices.
- *
- * This implementation makes just two traversals of the
- * unused list. On the first pass we move the selected
- * dentries to the most recent end, and on the second
- * pass we free them. The second pass must restart after
- * each dput(), but since the target dentries are all at
- * the end, it's really just a single traversal.
- */
-
/**
* shrink_dcache_sb - shrink dcache for a superblock
* @sb: superblock
*
* Shrink the dcache for the specified super block. This
- * is used to free the dcache before unmounting a file
- * system
+ * is used to reduce the dcache presence of a file system
+ * before re-mounting, and when invalidating the device
+ * holding a file system.
*/
void shrink_dcache_sb(struct super_block * sb)
{
- struct list_head *tmp, *next;
- struct dentry *dentry;
-
- /*
- * Pass one ... move the dentries for the specified
- * superblock to the most recent end of the unused list.
- */
- spin_lock(&dcache_lock);
- list_for_each_safe(tmp, next, &dentry_unused) {
- dentry = list_entry(tmp, struct dentry, d_lru);
- if (dentry->d_sb != sb)
- continue;
- list_move(tmp, &dentry_unused);
- }
-
- /*
- * Pass two ... free the dentries for this superblock.
- */
-repeat:
- list_for_each_safe(tmp, next, &dentry_unused) {
- dentry = list_entry(tmp, struct dentry, d_lru);
- if (dentry->d_sb != sb)
- continue;
- dentry_stat.nr_unused--;
- list_del_init(tmp);
- spin_lock(&dentry->d_lock);
- if (atomic_read(&dentry->d_count)) {
- spin_unlock(&dentry->d_lock);
- continue;
- }
- prune_one_dentry(dentry);
- cond_resched_lock(&dcache_lock);
- goto repeat;
- }
- spin_unlock(&dcache_lock);
+ shrink_dcache_parent(sb->s_root);
}
/*
@@ -739,7 +694,7 @@ positive:
/*
* Search the dentry child list for the specified parent,
- * and move any unused dentries to the end of the unused
+ * and move any unused dentries to the end of a new unused
* list for prune_dcache(). We descend to the next level
* whenever the d_subdirs list is non-empty and continue
* searching.
@@ -751,7 +706,7 @@ positive:
* drop the lock and return early due to latency
* constraints.
*/
-static int select_parent(struct dentry * parent)
+static int select_parent(struct dentry * parent, struct list_head *new)
{
struct dentry *this_parent = parent;
struct list_head *next;
@@ -775,7 +730,7 @@ resume:
* of the unused list for prune_dcache
*/
if (!atomic_read(&dentry->d_count)) {
- list_add_tail(&dentry->d_lru, &dentry_unused);
+ list_add_tail(&dentry->d_lru, new);
dentry_stat.nr_unused++;
found++;
}
@@ -819,9 +774,10 @@ out:
void shrink_dcache_parent(struct dentry * parent)
{
int found;
+ LIST_HEAD(list);
- while ((found = select_parent(parent)) != 0)
- prune_dcache(found, parent->d_sb);
+ while ((found = select_parent(parent, &list)) != 0)
+ prune_dcache(found, &list);
}
/*
|
|
|
|
Re: [PATCH 2.6.19-rc3] VFS: per-sb dentry lru list [message #7950 is a reply to message #7937] |
Tue, 31 October 2006 13:08 |
vaverin
Messages: 708 Registered: September 2005
|
Senior Member |
|
|
Hello Neil,
Neil Brown wrote:
>>> We've been down this path before:
>>>
>>> http://marc.theaimsgroup.com/?l=linux-kernel&m=114861109 717260&w=2
>>>
>>> A lot of comments on per-sb unused dentry lists were made in
>>> the threads associated with the above. other solutions were
>>> found to the problem that the above patch addressed, but I don't
>>> think any of them have made it to mainline yet. You might want
>>> to have a bit of a read of these threads first...
>> The major difference between our patch and the one discussed in the link
>> it that we keep both global and per-sb dentry LRU lists.
>> Thus, when needed normal LRU is used and prune logic is unchanged,
>> while umount/remount use per-sb list and do its job faster.
>
> Yes, we have been down this path before - several times I think.
> Below is the patch that I like (not tested recently - just rediffed
> and reviewed).
I've reviewed your patch, in general it looks great and I'm very like an idea to
have only one place to free the unused dentries.
I have several minor notes and one big question:
> @@ -485,68 +483,25 @@ static void prune_dcache(int count, stru
> list_add(&dentry->d_lru, &dentry_unused);
> dentry_stat.nr_unused++;
> }
> + /* split any remaining entries back onto dentry_unused */
> + if (have_list)
> + list_splice(list, dentry_unused.prev);
> spin_unlock(&dcache_lock);
> }
- It seems for me this hunk is not required here. You are knows how many entries
are present in the private list and nobody can add new dentries to it. Therefore
I think your list should be empty on exit.
- I do not like idea to split dentry_unused list, IMHO it broke LRU slightly (if
somebody on the other CPU's shrinks dcache). However on the other hand this
split is temporal and therefore it should not lead to any problems.
Unfortunately I'm not sure that it can help us against long remount. As far as I
understand the work time of shrink_dcache_sb() function depends on the subtree
size, and you need to check whole tree several (at least 3) times:
1) to remove all the unused dentries from dentry_unused to the private list
2) to collect all the non-freed dentries (prune_dcache will insert
DCACHE_REFERENCED dentries back to the global LRU list)
3) to check that there is not any unused dentries.
If the tree is large we will handle it too long and can be rescheduled. In these
cases steps 1) and 2) may be restarted several times. In the worst case we will
prune only one dentry per cycle. And it is without any filesystem activity.
And now let's assume that we do the same on the live filesystem. I'm afraid in
this case shrink_dcache_sb() may be executed unpredictable long time with
taken s_umount semaphore. :( I would note that it is real issue, and some of our
users have complained about this behaviour.
In case of per-sb LRU we will not have any additional overheads, we will be able
to traverse over the unused dentries list without any additional overheads.
Heavy activity on the filesystem will delay us too, but we will not depends on
filesystem size.
As far as I understand nobody is against per-sb LRU, but nobody wants to add the
new fields to the dentry struct.
Let's see on the possible alternatives:
1) per-sb LRU + global LRU (OpenVZ patch): it is fastest and (IMHO)
performance-optimal but requires 2 additional pointers on dentry struct.
2) per-sb LRU + per-dentry time stamp (Eric Dumazet idea):
shirnk_dcache_sb() and per-sb prune_dcache(,sb) are optimal (like in case 1),
but shrink_dcache(,NULL) called from shrink_dcache_memory() will be executed
slower than in current version(). It requires only one additional field on
dentry struct for time stamp, but probably we can use some of current fields on
the dentry struct (d_count? d_time?) for this purpose?
3) we can remove both LRU from struct dentry to the some dynamic-allocated
structure. It will be allocated for each new unused dentry and it can contain
both LRU fields and probably some other fields. Also by this way we can decrease
size of dentry. However ratio used/unused dentries is too low and probably it
will not have any advantages.
4) think how we can modify the other proposed patches (Neil Brown, Dave Chinner,
David Howells)
5) any new ideas?
Thank you,
Vasily Averin
|
|
|
|
|
|
Re: Re: [PATCH 2.6.19-rc3] VFS: per-sb dentry lru list [message #7970 is a reply to message #7937] |
Wed, 01 November 2006 10:48 |
dev
Messages: 1693 Registered: September 2005 Location: Moscow
|
Senior Member |
|
|
Neil Brown wrote:
> On Monday October 30, dev@sw.ru wrote:
>
>>David,
>>
>>
>>>>The proposed fix prevents this issue by using per-sb dentry LRU list. It
>>>>provides very quickly search for the unused dentries for given super block thus
>>>>forcing shrinking always making good progress.
>>>
>>>
>>>We've been down this path before:
>>>
>>> http://marc.theaimsgroup.com/?l=linux-kernel&m=114861109 717260&w=2
>>>
>>>A lot of comments on per-sb unused dentry lists were made in
>>>the threads associated with the above. other solutions were
>>>found to the problem that the above patch addressed, but I don't
>>>think any of them have made it to mainline yet. You might want
>>>to have a bit of a read of these threads first...
>>
>>The major difference between our patch and the one discussed in the link
>>it that we keep both global and per-sb dentry LRU lists.
>>Thus, when needed normal LRU is used and prune logic is unchanged,
>>while umount/remount use per-sb list and do its job faster.
>
>
> Yes, we have been down this path before - several times I think.
> Below is the patch that I like (not tested recently - just rediffed
> and reviewed).
>
> NeilBrown
>
> Subject: Reduce contention in dentry_unused when unmounting.
>
> When we unmount a filesystem we need to release all dentries.
> We currently
> - move a collection of dentries to the end of the dentry_unused list
> - call prune_dcache to prune that number of dentries.
>
> If lots of other dentries are added to the end of the list while
> the prune_dcache proceeds (e.g. another filesystem is unmounted),
> this can involve a lot of wasted time wandering through the
> list looking for dentries that we had previously found.
>
> This patch allows the dentry_unused list to temporarily be multiple
> lists.
> When unmounting, dentries that are found to require pruning are moved
> to a temporary list, but accounted as though they were on dentry_unused.
> Then this list is passed to prune_dcache for freeing. Any entries
> that are not pruned for whatever reason are added to the end of
> dentry_unused.
>
> Also change shrink_dcache_sb to simply call shrink_dcache_parent.
> This avoids a long walk of the LRU.
>
> Signed-off-by: Neil Brown <neilb@suse.de>
>
> ### Diffstat output
> ./fs/dcache.c | 104 ++++++++++++++++------------------------------------------
> 1 file changed, 30 insertions(+), 74 deletions(-)
>
> diff .prev/fs/dcache.c ./fs/dcache.c
> --- .prev/fs/dcache.c 2006-10-31 15:22:10.000000000 +1100
> +++ ./fs/dcache.c 2006-10-31 15:37:22.000000000 +1100
> @@ -384,8 +384,8 @@ static void prune_one_dentry(struct dent
> /**
> * prune_dcache - shrink the dcache
> * @count: number of entries to try and free
> - * @sb: if given, ignore dentries for other superblocks
> - * which are being unmounted.
> + * @list: If given, remove from this list instead of
> + * from dentry_unused.
> *
> * Shrink the dcache. This is done when we need
> * more memory, or simply when we need to unmount
> @@ -394,11 +394,21 @@ static void prune_one_dentry(struct dent
> *
> * This function may fail to free any resources if
> * all the dentries are in use.
> + *
> + * Any dentries that were not removed due to the @count
> + * limit will be splice on to the end of dentry_unused,
> + * so they should already be founded in dentry_stat.nr_unused.
> */
>
> -static void prune_dcache(int count, struct super_block *sb)
> +static void prune_dcache(int count, struct list_head *list)
> {
> + int have_list = list != NULL;
> +
> spin_lock(&dcache_lock);
> + if (!have_list)
> + /* use the dentry_unused list */
> + list = &dentry_unused;
> +
> for (; count ; count--) {
> struct dentry *dentry;
> struct list_head *tmp;
> @@ -406,23 +416,11 @@ static void prune_dcache(int count, stru
>
> cond_resched_lock(&dcache_lock);
>
> - tmp = dentry_unused.prev;
> - if (sb) {
> - /* Try to find a dentry for this sb, but don't try
> - * too hard, if they aren't near the tail they will
> - * be moved down again soon
> - */
> - int skip = count;
> - while (skip && tmp != &dentry_unused &&
> - list_entry(tmp, struct dentry, d_lru)->d_sb != sb) {
> - skip--;
> - tmp = tmp->prev;
> - }
> - }
> - if (tmp == &dentry_unused)
> + tmp = list->prev;
> + if (tmp == list)
> break;
> list_del_init(tmp);
> - prefetch(dentry_unused.prev);
> + prefetch(list->prev);
> dentry_stat.nr_unused--;
> dentry = list_entry(tmp, struct dentry, d_lru);
>
> @@ -455,7 +453,7 @@ static void prune_dcache(int count, stru
> * If this dentry is for "my" filesystem, then I can prune it
> * without taking the s_umount lock (I already hold it).
> */
> - if (sb && dentry->d_sb == sb) {
> + if (have_list) {
> prune_one_dentry(dentry);
> continue;
> }
> @@ -485,68 +483,25 @@ static void prune_dcache(int count, stru
> list_add(&dentry->d_lru, &dentry_unused);
> dentry_stat.nr_unused++;
> }
> + /* split any remaining entries back onto dentry_unused */
> + if (have_list)
> + list_splice(list, dentry_unused.prev);
> spin_unlock(&dcache_lock);
> }
>
> -/*
> - * Shrink the dcache for the specified super block.
> - * This allows us to unmount a device without disturbing
> - * the dcache for the other devices.
> - *
> - * This implementation makes just two traversals of the
> - * unused list. On the first pass we move the selected
> - * dentries to the most recent end, and on the second
> - * pass we free them. The second pass must restart after
> - * each dput(), but since the target dentries are all at
> - * the end, it's really just a single traversal.
> - */
> -
> /**
> * shrink_dcache_sb - shrink dcache for a superblock
> * @sb: superblock
> *
> * Shrink the dcache for the specified super block. This
> - * is used to free the dcache before unmounting a file
> - * system
> + * is used to reduce the dcache presence of a file system
> + * before re-mounting, and when invalidating the device
> + * holding a file system.
> */
>
> void shrink_dcache_sb(struct super_block * sb)
> {
> - struct list_head *tmp, *next;
> - struct dentry *dentry;
> -
> - /*
> - * Pass one ... move the dentries for the specified
> - * superblock to the most recent end of the unused list.
> - */
> - spin_lock(&dcache_lock);
> - list_for_each_safe(tmp, next, &dentry_unused) {
> - dentry = list_entry(tmp, struct dentry, d_lru);
> - if (dentry->d_sb != sb)
> - continue;
> - list_move(tmp, &dentry_unused);
> - }
> -
> - /*
> - * Pass two ... free the dentries for this superblock.
> - */
> -repeat:
> - list_for_each_safe(tmp, next, &dentry_unused) {
> - dentry = list_entry(tmp, struct dentry, d_lru);
> - if (dentry->d_sb != sb)
> - continue;
> - dentry_stat.nr_unused--;
> - list_del_init(tmp);
> - spin_lock(&dentry->d_lock);
> - if (atomic_read(&dentry->d_count)) {
> - spin_unlock(&dentry->d_lock);
> - continue;
> - }
> - prune_one_dentry(dentry);
> - cond_resched_lock(&dcache_lock);
> - goto repeat;
> - }
> - spin_unlock(&dcache_lock);
> + shrink_dcache_parent(sb->s_root);
<<<< AFAICS, doing so you introduced a leak of anonymous dentries.
d_alloc_anon() calls d_alloc() with parent == NULL, i.e. dentries have no parent
and are not linked to the sb->s_root...
BTW, looking at it, I found that s_anon field on super block is not used any more.
we can add BUG_ON(!hlist_empty(&sb->s_anon)) in generic_shutdown_super to avoid such issues like this.
maybe we can fix it adding something like:
while (!list_empty(&sb->s_anon)))
prune_dcache(MAX_INT, &sb->s_anon);
> }
>
> /*
> @@ -739,7 +694,7 @@ positive:
>
> /*
> * Search the dentry child list for the specified parent,
> - * and move any unused dentries to the end of the unused
> + * and move any unused dentries to the end of a new unused
> * list for prune_dcache(). We descend to the next level
> * whenever the d_subdirs list is non-empty and continue
> * searching.
> @@ -751,7 +706,7 @@ positive:
> * drop the lock and return early due to latency
> * constraints.
> */
> -static int select_parent(struct dentry * parent)
> +static int select_parent(struct dentry * parent, struct list_head *new)
> {
> struct dentry *this_parent = parent;
> struct list_head *next;
> @@ -775,7 +730,7 @@ resume:
> * of the unused list for prune_dcache
> */
> if (!atomic_read(&dentry->d_count)) {
> - list_add_tail(&dentry->d_lru, &dentry_unused);
> + list_add_tail(&dentry->d_lru, new);
> dentry_stat.nr_unused++;
> found++;
> }
> @@ -819,9 +774,10 @@ out:
> void shrink_dcache_parent(struct dentry * parent)
> {
> int found;
> + LIST_HEAD(list);
>
> - while ((found = select_parent(parent)) != 0)
> - prune_dcache(found, parent->d_sb);
> + while ((found = select_parent(parent, &list)) != 0)
> + prune_dcache(found, &list);
> }
>
> /*
&
...
|
|
|
Re: Re: [PATCH 2.6.19-rc3] VFS: per-sb dentry lru list [message #7973 is a reply to message #7966] |
Wed, 01 November 2006 13:32 |
vaverin
Messages: 708 Registered: September 2005
|
Senior Member |
|
|
Hello Neil,
Neil Brown wrote:
> On Tuesday October 31, dhowells@redhat.com wrote:
>> Neil Brown <neilb@suse.de> wrote:
>>
>>> When we unmount a filesystem we need to release all dentries.
>>> We currently
>>> - move a collection of dentries to the end of the dentry_unused list
>>> - call prune_dcache to prune that number of dentries.
>> This is not true anymore.
>
> True. That should read:
>
> When we remount a filesystem or invalidate a block device which has a
> mounted filesystem we call shrink dcache_sb which currently:
> - moves a collection of dentries to the end of the dentry_unused list
> - calls prune_dcache to prune that number of dentries.
>
> but the patch is still valid.
>
> Any objections to it going in to -mm and maybe .20 ??
Currently we have 3 type of functions that works with dentry_unused list:
1) prune_dcache(NULL) -- called from shrink_dcache_memory, frees the memory and
requires global LRU. works well in current implementation.
2) prune_dcache(sb) -- called from shrink_dcache_parent(), frees subtree, LRU
is not need here. Current implementation uses global LRU for these purposes, it
is ineffective, and patch from Neil Brown fixes this issue.
3) shrink_dcache_sb() -- called when we need to free the unused dentries for
given super block. Current implementation is not effective too, and per-sb LRU
would be the best solution here. On the other hand patch from Neil Brown is much
better than current implementation.
In general I think that we should approve Neil Brown's patch. We (I and Kirill
Korotaev) are ready to acknowledge it when the following remarks fill be fixed:
- it seems for me list_splice() is not required inside prune_dcache(),
- DCACHE_REFERENCED dentries should not be removed from private list to
dentry_unused list, this flag should be ignored if the private list is used,
- count argument should be ignored in this case too, we want to free all the
dentries in private list,
- when we shrink the whole super block we should free per-sb anonymous dentries
too (please see Kirill Korotaev's letter)
Then I'm going to prepare new patch that will enhance the shrink_dcache_sb()
performance:
- we can add new list head into struct superblock and use it in
shrink_dcache_sb() instead of temporal private list. We will check is it empty
in dput() and add the new unused dentries to per-sb list instead of
dentry_unused list.
thank you,
Vasily Averin
SWsoft Virtuozzo/OpenVZ Linux kernel team
|
|
|
Goto Forum:
Current Time: Mon Dec 30 17:56:29 GMT 2024
Total time taken to generate the page: 0.06992 seconds
|