Home » Mailing lists » Devel » [PATCH 2.6.21-rc6] [netfilter] early_drop imrovement
[PATCH 2.6.21-rc6] [netfilter] early_drop imrovement [message #11780] |
Fri, 06 April 2007 08:00 |
vaverin
Messages: 708 Registered: September 2005
|
Senior Member |
|
|
When the number of conntracks is reached ip_conntrack_max limit, early_drop() is
called and tries to free one of already used conntracks in one of the hash
buckets. If it does not find any conntracks that may be freed, it
leads to transmission errors.
However it is not fair because of current hash bucket may be empty but the
neighbour ones can have the number of conntracks that can be freed. With the
following patch early_drop() will search conntracks in all hash buckets.
Signed-off-by: Vasily Averin <vvs@sw.ru>
--- 2.6.21-rc6/net/ipv4/netfilter/ip_conntrack_core.c.erdrp
+++ 2.6.21-rc6/net/ipv4/netfilter/ip_conntrack_core.c
@@ -517,7 +517,7 @@ ip_conntrack_tuple_taken(const struct ip
/* There's a small race here where we may free a just-assured
connection. Too bad: we're in trouble anyway. */
-static int early_drop(struct list_head *chain)
+static int __early_drop(struct list_head *chain)
{
/* Traverse backwards: gives us oldest, which is roughly LRU */
struct ip_conntrack_tuple_hash *h;
@@ -547,6 +547,20 @@ static int early_drop(struct list_head *
return dropped;
}
+static int early_drop(const struct ip_conntrack_tuple *orig)
+{
+ unsigned int i, hash;
+ int ret = 0;
+
+ hash = hash_conntrack(orig);
+
+ for (i = 0;
+ !ret && i < ip_conntrack_htable_size;
+ ++i, hash = ++hash % ip_conntrack_htable_size)
+ ret = __early_drop(&ip_conntrack_hash[hash]);
+ return ret;
+}
+
static struct ip_conntrack_helper *
__ip_conntrack_helper_find( const struct ip_conntrack_tuple *tuple)
{
@@ -631,9 +645,7 @@ struct ip_conntrack *ip_conntrack_alloc(
if (ip_conntrack_max
&& atomic_read(&ip_conntrack_count) > ip_conntrack_max) {
- unsigned int hash = hash_conntrack(orig);
- /* Try dropping from this hash chain. */
- if (!early_drop(&ip_conntrack_hash[hash])) {
+ if (!early_drop(orig)) {
atomic_dec(&ip_conntrack_count);
if (net_ratelimit())
printk(KERN_WARNING
--- 2.6.21-rc6/net/netfilter/nf_conntrack_core.c.erdrp
+++ 2.6.21-rc6/net/netfilter/nf_conntrack_core.c
@@ -542,7 +542,7 @@ EXPORT_SYMBOL_GPL(nf_conntrack_tuple_tak
/* There's a small race here where we may free a just-assured
connection. Too bad: we're in trouble anyway. */
-static int early_drop(struct list_head *chain)
+static int __early_drop(struct list_head *chain)
{
/* Traverse backwards: gives us oldest, which is roughly LRU */
struct nf_conntrack_tuple_hash *h;
@@ -572,6 +572,20 @@ static int early_drop(struct list_head *
return dropped;
}
+static int early_drop(const struct nf_conntrack_tuple *orig)
+{
+ unsigned int i, hash;
+ int ret = 0;
+
+ hash = hash_conntrack(orig);
+
+ for (i = 0;
+ !ret && i < nf_conntrack_htable_size;
+ ++i, hash = ++hash % nf_conntrack_htable_size)
+ ret = __early_drop(&nf_conntrack_hash[hash]);
+ return ret;
+}
+
static struct nf_conn *
__nf_conntrack_alloc(const struct nf_conntrack_tuple *orig,
const struct nf_conntrack_tuple *repl,
@@ -591,9 +605,7 @@ __nf_conntrack_alloc(const struct nf_con
if (nf_conntrack_max
&& atomic_read(&nf_conntrack_count) > nf_conntrack_max) {
- unsigned int hash = hash_conntrack(orig);
- /* Try dropping from this hash chain. */
- if (!early_drop(&nf_conntrack_hash[hash])) {
+ if (!early_drop(orig)) {
atomic_dec(&nf_conntrack_count);
if (net_ratelimit())
printk(KERN_WARNING
|
|
|
|
|
|
[PATCH nf-2.6.22] [netfilter] early_drop imrovement [message #11813 is a reply to message #11792] |
Sat, 07 April 2007 11:45 |
vaverin
Messages: 708 Registered: September 2005
|
Senior Member |
|
|
When the number of conntracks is reached nf_conntrack_max limit, early_drop() is
called and tries to free one of already used conntracks in one of the hash
buckets. If it does not find any conntracks that may be freed, it
leads to transmission errors.
However it is not fair because of current hash bucket may be empty but the
neighbour ones can have the number of conntracks that can be freed. On the other
hand the number of checked conntracks is not limited and it can cause a long delay.
The following patch limits the number of checked conntracks by average number of
conntracks in one hash bucket and allows to search conntracks in other hash buckets.
Signed-off-by: Vasily Averin <vvs@sw.ru>
diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
index e132c8a..d0b5794 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -525,7 +525,7 @@ EXPORT_SYMBOL_GPL(nf_conntrack_tuple_taken);
/* There's a small race here where we may free a just-assured
connection. Too bad: we're in trouble anyway. */
-static int early_drop(struct list_head *chain)
+static int __early_drop(struct list_head *chain, unsigned int *cnt)
{
/* Traverse backwards: gives us oldest, which is roughly LRU */
struct nf_conntrack_tuple_hash *h;
@@ -540,6 +540,10 @@ static int early_drop(struct list_head *chain)
atomic_inc(&ct->ct_general.use);
break;
}
+ if (!--(*cnt)) {
+ dropped = 1;
+ break;
+ }
}
read_unlock_bh(&nf_conntrack_lock);
@@ -555,6 +559,21 @@ static int early_drop(struct list_head *chain)
return dropped;
}
+static int early_drop(const struct nf_conntrack_tuple *orig)
+{
+ unsigned int i, hash, cnt;
+ int ret = 0;
+
+ hash = hash_conntrack(orig);
+ cnt = (nf_conntrack_max/nf_conntrack_htable_size) + 1;
+
+ for (i = 0;
+ !ret && i < nf_conntrack_htable_size;
+ ++i, hash = ++hash % nf_conntrack_htable_size)
+ ret = __early_drop(&nf_conntrack_hash[hash], &cnt);
+ return ret;
+}
+
static struct nf_conn *
__nf_conntrack_alloc(const struct nf_conntrack_tuple *orig,
const struct nf_conntrack_tuple *repl,
@@ -574,9 +593,7 @@ __nf_conntrack_alloc(const struct nf_conntrack_tuple *orig,
if (nf_conntrack_max
&& atomic_read(&nf_conntrack_count) > nf_conntrack_max) {
- unsigned int hash = hash_conntrack(orig);
- /* Try dropping from this hash chain. */
- if (!early_drop(&nf_conntrack_hash[hash])) {
+ if (!early_drop(orig)) {
atomic_dec(&nf_conntrack_count);
if (net_ratelimit())
printk(KERN_WARNING
|
|
|
|
Re: [PATCH nf-2.6.22] [netfilter] early_drop imrovement [message #11822 is a reply to message #11814] |
Sun, 08 April 2007 05:02 |
vaverin
Messages: 708 Registered: September 2005
|
Senior Member |
|
|
Eric Dumazet wrote:
> Vasily Averin a e'crit :
>> When the number of conntracks is reached nf_conntrack_max limit,
>> early_drop() is
>> called and tries to free one of already used conntracks in one of the
>> hash
>> buckets. If it does not find any conntracks that may be freed, it
>> leads to transmission errors.
>> However it is not fair because of current hash bucket may be empty but
>> the
>> neighbour ones can have the number of conntracks that can be freed. On
>> the other
>> hand the number of checked conntracks is not limited and it can cause
>> a long delay.
>> The following patch limits the number of checked conntracks by average
>> number of
>> conntracks in one hash bucket and allows to search conntracks in other
>> hash buckets.
>
> Hi Vasily
>
>>
>> atomic_inc(&ct->ct_general.use);
>> break;
>> }
>> + if (!--(*cnt)) {
>> + dropped = 1;
>> + break;
>> + }
>
>
>> + cnt = (nf_conntrack_max/nf_conntrack_htable_size) + 1;
>
> I am sorry but this wont help in the case you mentioned in an earlier
> mail :
>
> If nf_conntrack_max < nf_conntrack_htable_size, cnt will be set to 1.
>
> Then in __early_drop() you endup in breaking the
> list_for_each_entry_reverse() loop after the first element was tested !
> Not what you intended I'm afraid, because you wont event scan the whole
> chain as before your patch :(
I would note that in my experiment I got errors when first checked hash bucket
was empty. With this patch I have guarantee that at least one conntrack will be
checked. I'm agree 1 is not too high, but it is better than nothing. I've
checked, my testcase works now.
> I believe you should not test --cnt in __early_drop() but in the caller.
>
> (That is not counting the number of found cells, but the number of hash
> chains you tried)
I need to count conntracks but not hash buckets. Also it is possible that all
the conntracks will be placed to only one hash bucket, and as you pointed in
your previous letter it may lead to long delays.
However how do you think, is it probably better to set low limit to default
average number of conntracks in hash bucket?
cnt = max(8U, nf_conntrack_max/nf_conntrack_htable_size);
Thank you,
Vasily Averin
|
|
|
[NETFILTER] early_drop() imrovement (v3) [message #12763 is a reply to message #11822] |
Wed, 09 May 2007 06:59 |
vaverin
Messages: 708 Registered: September 2005
|
Senior Member |
|
|
When the number of conntracks is reached nf_conntrack_max limit, early_drop()
tries to free one of already used conntracks. If it does not find any conntracks
that may be freed, it leads to transmission errors.
In current implementation the conntracks are searched in one hash bucket only.
It have some drawbacks: if used hash bucket is empty we have not any chances to
find something. On the other hand the hash bucket can contain a huge number of
conntracks and its check can last a long time.
The proposed patch limits the number of checked conntracks by default number of
conntracks in one hash bucket (NF_CT_PER_BUCKET) and allows to search conntracks
in other hash buckets. As result in any case the search will have the same
chances to free one of the conntracks and the check will not lead to long delays.
Signed-off-by: Vasily Averin <vvs@sw.ru>
diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
index e132c8a..d984bce 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -76,6 +76,8 @@ static unsigned int nf_conntrack_next_id;
DEFINE_PER_CPU(struct ip_conntrack_stat, nf_conntrack_stat);
EXPORT_PER_CPU_SYMBOL(nf_conntrack_stat);
+#define NF_CT_PER_BUCKET 8U
+
/*
* This scheme offers various size of "struct nf_conn" dependent on
* features(helper, nat, ...)
@@ -525,7 +527,7 @@ EXPORT_SYMBOL_GPL(nf_conntrack_tuple_taken);
/* There's a small race here where we may free a just-assured
connection. Too bad: we're in trouble anyway. */
-static int early_drop(struct list_head *chain)
+static int __early_drop(struct list_head *chain, unsigned int *cnt)
{
/* Traverse backwards: gives us oldest, which is roughly LRU */
struct nf_conntrack_tuple_hash *h;
@@ -540,6 +542,8 @@ static int early_drop(struct list_head *chain)
atomic_inc(&ct->ct_general.use);
break;
}
+ if (!--(*cnt))
+ break;
}
read_unlock_bh(&nf_conntrack_lock);
@@ -555,6 +559,21 @@ static int early_drop(struct list_head *chain)
return dropped;
}
+static int early_drop(const struct nf_conntrack_tuple *orig)
+{
+ unsigned int i, hash, cnt;
+ int ret = 0;
+
+ hash = hash_conntrack(orig);
+ cnt = NF_CT_PER_BUCKET;
+
+ for (i = 0;
+ !ret && cnt && i < nf_conntrack_htable_size;
+ ++i, hash = ++hash % nf_conntrack_htable_size)
+ ret = __early_drop(&nf_conntrack_hash[hash], &cnt);
+ return ret;
+}
+
static struct nf_conn *
__nf_conntrack_alloc(const struct nf_conntrack_tuple *orig,
const struct nf_conntrack_tuple *repl,
@@ -574,9 +593,7 @@ __nf_conntrack_alloc(const struct nf_conntrack_tuple *orig,
if (nf_conntrack_max
&& atomic_read(&nf_conntrack_count) > nf_conntrack_max) {
- unsigned int hash = hash_conntrack(orig);
- /* Try dropping from this hash chain. */
- if (!early_drop(&nf_conntrack_hash[hash])) {
+ if (!early_drop(orig)) {
atomic_dec(&nf_conntrack_count);
if (net_ratelimit())
printk(KERN_WARNING
@@ -1226,7 +1243,7 @@ int __init nf_conntrack_init(void)
if (nf_conntrack_htable_size < 16)
nf_conntrack_htable_size = 16;
}
- nf_conntrack_max = 8 * nf_conntrack_htable_size;
+ nf_conntrack_max = NF_CT_PER_BUCKET * nf_conntrack_htable_size;
printk("nf_conntrack version %s (%u buckets, %d max)\n",
NF_CONNTRACK_VERSION, nf_conntrack_htable_size,
|
|
|
Re: [NETFILTER] early_drop() imrovement (v3) [message #14353 is a reply to message #12763] |
Mon, 25 June 2007 13:53 |
Patrick McHardy
Messages: 107 Registered: March 2006
|
Senior Member |
|
|
Vasily Averin wrote:
> +static int early_drop(const struct nf_conntrack_tuple *orig)
> +{
> + unsigned int i, hash, cnt;
> + int ret = 0;
> +
> + hash = hash_conntrack(orig);
> + cnt = NF_CT_PER_BUCKET;
> +
> + for (i = 0;
> + !ret && cnt && i < nf_conntrack_htable_size;
> + ++i, hash = ++hash % nf_conntrack_htable_size)
> + ret = __early_drop(&nf_conntrack_hash[hash], &cnt);
Formatting is a bit ugly, looks much nicer as:
for (i = 0; i < nf_conntrack_htable_size; i++) {
ret = __early_drop(&nf_conntrack_hash[hash], &cnt);
if (ret || !cnt)
break;
hash = ++hash % nf_conntrack_htable_size;
}
> @@ -1226,7 +1243,7 @@ int __init nf_conntrack_init(void)
> if (nf_conntrack_htable_size < 16)
> nf_conntrack_htable_size = 16;
> }
> - nf_conntrack_max = 8 * nf_conntrack_htable_size;
> + nf_conntrack_max = NF_CT_PER_BUCKET * nf_conntrack_htable_size;
I don't like the NF_CT_PER_BUCKET constant. First of all, each
conntrack is hashed twice, so its really only 1/2 of the average
conntracks per bucket. Secondly, its only a default and many
people use nf_conntrack_max = nf_conntrack_htable_size / 2, so
using this constant for early_drop seems wrong.
Perhaps make it 2 * nf_conntrack_max / nf_conntrack_htable_size
or even add a nf_conntrack_eviction_range sysctl.
|
|
|
|
|
[NETFILTER] early_drop() imrovement (v4) [message #14387 is a reply to message #14370] |
Wed, 27 June 2007 08:46 |
vaverin
Messages: 708 Registered: September 2005
|
Senior Member |
|
|
When the number of conntracks is reached nf_conntrack_max limit, early_drop()
tries to free one of already used conntracks. If it does not find any conntracks
that may be freed, it leads to transmission errors.
In current implementation the conntracks are searched in one hash bucket only.
It have some drawbacks: if used hash bucket is empty we have not any chances to
find something. On the other hand the hash bucket can contain a huge number of
conntracks and its check can last a long time.
The proposed patch limits the number of checked conntracks and allows to search
conntracks in other hash buckets. As result in any case the search will have the
same chances to free one of the conntracks and the check will not lead to long
delays.
Signed-off-by: Vasily Averin <vvs@sw.ru>
diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
index 7a15e30..0540a88 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -526,7 +526,7 @@ EXPORT_SYMBOL_GPL(nf_conntrack_tuple_taken);
/* There's a small race here where we may free a just-assured
connection. Too bad: we're in trouble anyway. */
-static int early_drop(struct list_head *chain)
+static int __early_drop(struct list_head *chain, unsigned int *cnt)
{
/* Traverse backwards: gives us oldest, which is roughly LRU */
struct nf_conntrack_tuple_hash *h;
@@ -541,6 +541,8 @@ static int early_drop(struct list_head *chain)
atomic_inc(&ct->ct_general.use);
break;
}
+ if (!--(*cnt))
+ break;
}
read_unlock_bh(&nf_conntrack_lock);
@@ -556,6 +558,25 @@ static int early_drop(struct list_head *chain)
return dropped;
}
+#define NF_CT_EVICTION_RANGE 8U
+
+static int early_drop(const struct nf_conntrack_tuple *orig)
+{
+ unsigned int i, hash, cnt;
+ int ret = 0;
+
+ hash = hash_conntrack(orig);
+ cnt = NF_CT_EVICTION_RANGE;
+
+ for (i = 0; i < nf_conntrack_htable_size; i++) {
+ ret = __early_drop(&nf_conntrack_hash[hash], &cnt);
+ if (ret || !cnt)
+ break;
+ hash++; hash %= nf_conntrack_htable_size;
+ }
+ return ret;
+}
+
static struct nf_conn *
__nf_conntrack_alloc(const struct nf_conntrack_tuple *orig,
const struct nf_conntrack_tuple *repl,
@@ -575,9 +596,7 @@ __nf_conntrack_alloc(const struct nf_conntrack_tuple *orig,
if (nf_conntrack_max
&& atomic_read(&nf_conntrack_count) > nf_conntrack_max) {
- unsigned int hash = hash_conntrack(orig);
- /* Try dropping from this hash chain. */
- if (!early_drop(&nf_conntrack_hash[hash])) {
+ if (!early_drop(orig)) {
atomic_dec(&nf_conntrack_count);
if (net_ratelimit())
printk(KERN_WARNING
|
|
|
|
|
|
Re: [NETFILTER] early_drop() imrovement (v4) [message #14401 is a reply to message #14400] |
Wed, 27 June 2007 12:51 |
Patrick McHardy
Messages: 107 Registered: March 2006
|
Senior Member |
|
|
Vasily Averin wrote:
> Patrick McHardy wrote:
>
>>+ for (i = 0; i < NF_CT_EVICTION_RANGE; i++) {
>>+ hlist_for_each_entry(h, n, &nf_conntrack_hash[hash], hnode) {
>>+ tmp = nf_ct_tuplehash_to_ctrack(h);
>>+ if (!test_bit(IPS_ASSURED_BIT, &tmp->status))
>>+ ct = tmp;
>>+ }
>>+ if (ct) {
>>+ atomic_inc(&ct->ct_general.use);
>>+ break;
>>+ }
>>+ hash = (hash + 1) % nf_conntrack_htable_size;
>
>
> it is incorrect,
> We should count the number of checked _conntracks_, but you count the number of
> hash buckets. I.e "i" should be incremented/checked inside the nested loop.
I misunderstood your patch then. This one should be better.
[NETFILTER]: nf_conntrack: early_drop improvement
When the maximum number of conntrack entries is reached and a new
one needs to be allocated, conntrack tries to drop an unassured
connection from the same hash bucket the new conntrack would hash
to. Since with a properly sized hash the average number of entries
per bucket is 1, the chances of actually finding one are not very
good. This patch increases those chances by walking over the hash
until 8 entries are checked.
Based on patch by Vasily Averin <vvs@sw.ru>.
Signed-off-by: Patrick McHardy <kaber@trash.net>
---
commit df9f4fc41d7d6a7a51d2fe4b28db2557cb9a0d05
tree 8beb115ce12126b28ce3e5eb3f95b36b71462ea5
parent 665d98d03473cab252830129f414e1b38fb2b038
author Patrick McHardy <kaber@trash.net> Wed, 27 Jun 2007 14:51:38 +0200
committer Patrick McHardy <kaber@trash.net> Wed, 27 Jun 2007 14:51:38 +0200
net/netfilter/nf_conntrack_core.c | 23 +++++++++++++++--------
1 files changed, 15 insertions(+), 8 deletions(-)
diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
index d7e62ad..bbb52e5 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -377,21 +377,29 @@ nf_conntrack_tuple_taken(const struct nf_conntrack_tuple *tuple,
}
EXPORT_SYMBOL_GPL(nf_conntrack_tuple_taken);
+#define NF_CT_EVICTION_RANGE 8
+
/* There's a small race here where we may free a just-assured
connection. Too bad: we're in trouble anyway. */
-static int early_drop(struct hlist_head *chain)
+static int early_drop(unsigned int hash)
{
/* Use oldest entry, which is roughly LRU */
struct nf_conntrack_tuple_hash *h;
struct nf_conn *ct = NULL, *tmp;
struct hlist_node *n;
- int dropped = 0;
+ unsigned int i;
+ int dropped = 0, cnt = NF_CT_EVICTION_RANGE;
read_lock_bh(&nf_conntrack_lock);
- hlist_for_each_entry(h, n, chain, hnode) {
- tmp = nf_ct_tuplehash_to_ctrack(h);
- if (!test_bit(IPS_ASSURED_BIT, &tmp->status))
- ct = tmp;
+ for (i = 0; i < nf_conntrack_htable_size; i++) {
+ hlist_for_each_entry(h, n, &nf_conntrack_hash[hash], hnode) {
+ tmp = nf_ct_tuplehash_to_ctrack(h);
+ if (!test_bit(IPS_ASSURED_BIT, &tmp->status))
+ ct = tmp;
+ if (--cnt <= 0)
+ break;
+ }
+ hash = (hash + 1) % nf_conntrack_htable_size;
}
if (ct)
atomic_inc(&ct->ct_general.use);
@@ -425,8 +433,7 @@ struct nf_conn *nf_conntrack_alloc(const struct nf_conntrack_tuple *orig,
if (nf_conntrack_max
&& atomic_read(&nf_conntrack_count) > nf_conntrack_max) {
unsigned int hash = hash_conntrack(orig);
- /* Try dropping from this hash chain. */
- if (!early_drop(&nf_conntrack_hash[hash])) {
+ if (!early_drop(hash)) {
atomic_dec(&nf_conntrack_count);
if (net_ratelimit())
printk(KERN_WARNING
|
|
|
Re: [NETFILTER] early_drop() imrovement (v4) [message #14402 is a reply to message #14401] |
Wed, 27 June 2007 13:02 |
vaverin
Messages: 708 Registered: September 2005
|
Senior Member |
|
|
Patrick McHardy wrote:
> Vasily Averin wrote:
>> it is incorrect,
>> We should count the number of checked _conntracks_, but you count the number of
>> hash buckets. I.e "i" should be incremented/checked inside the nested loop.
>
>
> I misunderstood your patch then. This one should be better.
> +static int early_drop(unsigned int hash)
> {
> /* Use oldest entry, which is roughly LRU */
> struct nf_conntrack_tuple_hash *h;
> struct nf_conn *ct = NULL, *tmp;
> struct hlist_node *n;
> - int dropped = 0;
> + unsigned int i;
> + int dropped = 0, cnt = NF_CT_EVICTION_RANGE;
>
> read_lock_bh(&nf_conntrack_lock);
> - hlist_for_each_entry(h, n, chain, hnode) {
> - tmp = nf_ct_tuplehash_to_ctrack(h);
> - if (!test_bit(IPS_ASSURED_BIT, &tmp->status))
> - ct = tmp;
> + for (i = 0; i < nf_conntrack_htable_size; i++) {
> + hlist_for_each_entry(h, n, &nf_conntrack_hash[hash], hnode) {
> + tmp = nf_ct_tuplehash_to_ctrack(h);
> + if (!test_bit(IPS_ASSURED_BIT, &tmp->status))
> + ct = tmp;
> + if (--cnt <= 0)
> + break;
> + }
> + hash = (hash + 1) % nf_conntrack_htable_size;
> }
it is incorrect again: when cnt=0 you should break both cycles.
thank you,
Vasily Averin
|
|
|
Re: [NETFILTER] early_drop() imrovement (v4) [message #14405 is a reply to message #14402] |
Wed, 27 June 2007 13:18 |
Patrick McHardy
Messages: 107 Registered: March 2006
|
Senior Member |
|
|
Vasily Averin wrote:
> Patrick McHardy wrote:
>
> it is incorrect again: when cnt=0 you should break both cycles.
Indeed, thanks. Fixed now. Also changed it to leave the loop
if we found an entry within a chain (we want the last one of
the chain, so we still walk it entirely) and replaced
hash = (hash + 1) % nf_conntrack_htable_size
by
hash = (hash + 1) & (nf_conntrack_htable_size - 1)
since one of my queued patches makes sure that its always
a power of two.
[NETFILTER]: nf_conntrack: early_drop improvement
When the maximum number of conntrack entries is reached and a new
one needs to be allocated, conntrack tries to drop an unassured
connection from the same hash bucket the new conntrack would hash
to. Since with a properly sized hash the average number of entries
per bucket is 1, the chances of actually finding one are not very
good. This patch increases those chances by walking over the hash
until 8 entries are checked.
Based on patch by Vasily Averin <vvs@sw.ru>.
Signed-off-by: Patrick McHardy <kaber@trash.net>
---
commit 047d8f088a71f30d3042cc7615a7a25aa60a668b
tree c83c4a356caf608fe21a646c24678bef74023270
parent 665d98d03473cab252830129f414e1b38fb2b038
author Patrick McHardy <kaber@trash.net> Wed, 27 Jun 2007 15:17:17 +0200
committer Patrick McHardy <kaber@trash.net> Wed, 27 Jun 2007 15:17:17 +0200
net/netfilter/nf_conntrack_core.c | 26 ++++++++++++++++++--------
1 files changed, 18 insertions(+), 8 deletions(-)
diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
index d7e62ad..2d5c10f 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -377,22 +377,33 @@ nf_conntrack_tuple_taken(const struct nf_conntrack_tuple *tuple,
}
EXPORT_SYMBOL_GPL(nf_conntrack_tuple_taken);
+#define NF_CT_EVICTION_RANGE 8
+
/* There's a small race here where we may free a just-assured
connection. Too bad: we're in trouble anyway. */
-static int early_drop(struct hlist_head *chain)
+static int early_drop(unsigned int hash)
{
/* Use oldest entry, which is roughly LRU */
struct nf_conntrack_tuple_hash *h;
struct nf_conn *ct = NULL, *tmp;
struct hlist_node *n;
- int dropped = 0;
+ unsigned int i;
+ int dropped = 0, cnt = NF_CT_EVICTION_RANGE;
read_lock_bh(&nf_conntrack_lock);
- hlist_for_each_entry(h, n, chain, hnode) {
- tmp = nf_ct_tuplehash_to_ctrack(h);
- if (!test_bit(IPS_ASSURED_BIT, &tmp->status))
- ct = tmp;
+ for (i = 0; i < nf_conntrack_htable_size; i++) {
+ hlist_for_each_entry(h, n, &nf_conntrack_hash[hash], hnode) {
+ tmp = nf_ct_tuplehash_to_ctrack(h);
+ if (!test_bit(IPS_ASSURED_BIT, &tmp->status))
+ ct = tmp;
+ if (--cnt <= 0)
+ goto stop;
+ }
+ if (ct)
+ break;
+ hash = (hash + 1) & (nf_conntrack_htable_size - 1);
}
+stop:
if (ct)
atomic_inc(&ct->ct_general.use);
read_unlock_bh(&nf_conntrack_lock);
@@ -425,8 +436,7 @@ struct nf_conn *nf_conntrack_alloc(const struct nf_conntrack_tuple *orig,
if (nf_conntrack_max
&& atomic_read(&nf_conntrack_count) > nf_conntrack_max) {
unsigned int hash = hash_conntrack(orig);
- /* Try dropping from this hash chain. */
- if (!early_drop(&nf_conntrack_hash[hash])) {
+ if (!early_drop(hash)) {
atomic_dec(&nf_conntrack_count);
if (net_ratelimit())
printk(KERN_WARNING
|
|
|
|
Re: [NETFILTER] early_drop() imrovement (v4) [message #14407 is a reply to message #14405] |
Wed, 27 June 2007 13:25 |
vaverin
Messages: 708 Registered: September 2005
|
Senior Member |
|
|
Patrick McHardy wrote:
> Vasily Averin wrote:
>> Patrick McHardy wrote:
> -static int early_drop(struct hlist_head *chain)
> +static int early_drop(unsigned int hash)
> {
> /* Use oldest entry, which is roughly LRU */
> struct nf_conntrack_tuple_hash *h;
> struct nf_conn *ct = NULL, *tmp;
> struct hlist_node *n;
> - int dropped = 0;
> + unsigned int i;
> + int dropped = 0, cnt = NF_CT_EVICTION_RANGE;
>
> read_lock_bh(&nf_conntrack_lock);
> - hlist_for_each_entry(h, n, chain, hnode) {
> - tmp = nf_ct_tuplehash_to_ctrack(h);
> - if (!test_bit(IPS_ASSURED_BIT, &tmp->status))
> - ct = tmp;
> + for (i = 0; i < nf_conntrack_htable_size; i++) {
> + hlist_for_each_entry(h, n, &nf_conntrack_hash[hash], hnode) {
> + tmp = nf_ct_tuplehash_to_ctrack(h);
> + if (!test_bit(IPS_ASSURED_BIT, &tmp->status))
> + ct = tmp;
It is incorrect: you should break nested loop here too.
> + if (--cnt <= 0)
> + goto stop;
> + }
> + if (ct)
> + break;
> + hash = (hash + 1) & (nf_conntrack_htable_size - 1);
> }
> +stop:
> if (ct)
> atomic_inc(&ct->ct_general.use);
> read_unlock_bh(&nf_conntrack_lock);
|
|
|
|
Re: [NETFILTER] early_drop() imrovement (v4) [message #14409 is a reply to message #14408] |
Wed, 27 June 2007 13:35 |
Patrick McHardy
Messages: 107 Registered: March 2006
|
Senior Member |
|
|
Patrick McHardy wrote:
> Vasily Averin wrote:
>
>>Patrick McHardy wrote:
>>
>>
>>>+ for (i = 0; i < nf_conntrack_htable_size; i++) {
>>>+ hlist_for_each_entry(h, n, &nf_conntrack_hash[hash], hnode) {
>>>+ tmp = nf_ct_tuplehash_to_ctrack(h);
>>>+ if (!test_bit(IPS_ASSURED_BIT, &tmp->status))
>>>+ ct = tmp;
>>
>>
>>It is incorrect: you should break nested loop here too.
>
>
>
> No, as I said, we want the last entry of the chain.
Ideally we should do something like this I think (please let it be
correct :)):
+ for (i = 0; i < nf_conntrack_htable_size; i++) {
+ entries = 0;
+ hlist_for_each_entry(h, n, &nf_conntrack_hash[hash],
hnode) {
+ tmp = nf_ct_tuplehash_to_ctrack(h);
+ if (!test_bit(IPS_ASSURED_BIT, &tmp->status))
+ ct = tmp;
+ entries++;
+ }
+ if (ct)
+ break;
+ if ((cnt -= entries) <= 0)
+ break;
+ hash = (hash + 1) % nf_conntrack_htable_size;
}
So we always walk chains up to the end and NF_CT_EVICTION_RANGE is
just a minimum. This ensures we will always get the last entry *and*
we won't scan less entries than currently if someone has a chain
longer than 8 entries.
What do you think?
|
|
|
Re: [NETFILTER] early_drop() imrovement (v4) [message #14410 is a reply to message #14409] |
Wed, 27 June 2007 13:54 |
Patrick McHardy
Messages: 107 Registered: March 2006
|
Senior Member |
|
|
Patrick McHardy wrote:
> Patrick McHardy wrote:
>
> Ideally we should do something like this I think (please let it be
> correct :)):
>
> [...]
> So we always walk chains up to the end and NF_CT_EVICTION_RANGE is
> just a minimum. This ensures we will always get the last entry *and*
> we won't scan less entries than currently if someone has a chain
> longer than 8 entries.
>
> What do you think?
I've added this patch now.
[NETFILTER]: nf_conntrack: early_drop improvement
When the maximum number of conntrack entries is reached and a new
one needs to be allocated, conntrack tries to drop an unassured
connection from the same hash bucket the new conntrack would hash
to. Since with a properly sized hash the average number of entries
per bucket is 1, the chances of actually finding one are not very
good. This patch makes it walk the hash until a minimum number of
8 entries are checked.
Based on patch by Vasily Averin <vvs@sw.ru>.
Signed-off-by: Patrick McHardy <kaber@trash.net>
---
commit 31889ee2d8f42f84daec97c3b98c47165e358da8
tree 4a1fe840d3056c2e64ab027cc10f92f3843cd710
parent ca0ac66daa3b264702d72282e388f8ba920f9a91
author Patrick McHardy <kaber@trash.net> Wed, 27 Jun 2007 15:54:22 +0200
committer Patrick McHardy <kaber@trash.net> Wed, 27 Jun 2007 15:54:22 +0200
net/netfilter/nf_conntrack_core.c | 24 ++++++++++++++++--------
1 files changed, 16 insertions(+), 8 deletions(-)
diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
index ed44a09..ef3f747 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -377,21 +377,30 @@ nf_conntrack_tuple_taken(const struct nf_conntrack_tuple *tuple,
}
EXPORT_SYMBOL_GPL(nf_conntrack_tuple_taken);
+#define NF_CT_EVICTION_RANGE 8
+
/* There's a small race here where we may free a just-assured
connection. Too bad: we're in trouble anyway. */
-static int early_drop(struct hlist_head *chain)
+static int early_drop(unsigned int hash)
{
/* Use oldest entry, which is roughly LRU */
struct nf_conntrack_tuple_hash *h;
struct nf_conn *ct = NULL, *tmp;
struct hlist_node *n;
- int dropped = 0;
+ unsigned int i;
+ int dropped = 0, cnt = 0;
read_lock_bh(&nf_conntrack_lock);
- hlist_for_each_entry(h, n, chain, hnode) {
- tmp = nf_ct_tuplehash_to_ctrack(h);
- if (!test_bit(IPS_ASSURED_BIT, &tmp->status))
- ct = tmp;
+ for (i = 0; i < nf_conntrack_htable_size; i++) {
+ hlist_for_each_entry(h, n, &nf_conntrack_hash[hash], hnode) {
+ tmp = nf_ct_tuplehash_to_ctrack(h);
+ if (!test_bit(IPS_ASSURED_BIT, &tmp->status))
+ ct = tmp;
+ cnt++;
+ }
+ if (ct || cnt >= NF_CT_EVICTION_RANGE)
+ break;
+ hash = (hash + 1) % nf_conntrack_htable_size;
}
if (ct)
atomic_inc(&ct->ct_general.use);
@@ -425,8 +434,7 @@ struct nf_conn *nf_conntrack_alloc(const struct nf_conntrack_tuple *orig,
if (nf_conntrack_max
&& atomic_read(&nf_conntrack_count) > nf_conntrack_max) {
unsigned int hash = hash_conntrack(orig);
- /* Try dropping from this hash chain. */
- if (!early_drop(&nf_conntrack_hash[hash])) {
+ if (!early_drop(hash)) {
atomic_dec(&nf_conntrack_count);
if (net_ratelimit())
printk(KERN_WARNING
|
|
|
|
Re: [NETFILTER] early_drop() imrovement (v4) [message #14588 is a reply to message #14410] |
Mon, 02 July 2007 19:56 |
Rusty Russell
Messages: 10 Registered: March 2007
|
Junior Member |
|
|
On Wed, 2007-06-27 at 15:54 +0200, Patrick McHardy wrote:
> [NETFILTER]: nf_conntrack: early_drop improvement
>
> When the maximum number of conntrack entries is reached and a new
> one needs to be allocated, conntrack tries to drop an unassured
> connection from the same hash bucket the new conntrack would hash
> to. Since with a properly sized hash the average number of entries
> per bucket is 1, the chances of actually finding one are not very
> good. This patch makes it walk the hash until a minimum number of
> 8 entries are checked.
>
> Based on patch by Vasily Averin <vvs@sw.ru>.
>
> Signed-off-by: Patrick McHardy <kaber@trash.net>
>
> ---
> commit 31889ee2d8f42f84daec97c3b98c47165e358da8
> tree 4a1fe840d3056c2e64ab027cc10f92f3843cd710
> parent ca0ac66daa3b264702d72282e388f8ba920f9a91
> author Patrick McHardy <kaber@trash.net> Wed, 27 Jun 2007 15:54:22 +0200
> committer Patrick McHardy <kaber@trash.net> Wed, 27 Jun 2007 15:54:22 +0200
>
> net/netfilter/nf_conntrack_core.c | 24 ++++++++++++++++--------
> 1 files changed, 16 insertions(+), 8 deletions(-)
>
> diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
> index ed44a09..ef3f747 100644
> --- a/net/netfilter/nf_conntrack_core.c
> +++ b/net/netfilter/nf_conntrack_core.c
> @@ -377,21 +377,30 @@ nf_conntrack_tuple_taken(const struct nf_conntrack_tuple *tuple,
> }
> EXPORT_SYMBOL_GPL(nf_conntrack_tuple_taken);
>
> +#define NF_CT_EVICTION_RANGE 8
> +
> /* There's a small race here where we may free a just-assured
> connection. Too bad: we're in trouble anyway. */
> -static int early_drop(struct hlist_head *chain)
> +static int early_drop(unsigned int hash)
> {
> /* Use oldest entry, which is roughly LRU */
> struct nf_conntrack_tuple_hash *h;
> struct nf_conn *ct = NULL, *tmp;
> struct hlist_node *n;
> - int dropped = 0;
> + unsigned int i;
> + int dropped = 0, cnt = 0;
>
> read_lock_bh(&nf_conntrack_lock);
> - hlist_for_each_entry(h, n, chain, hnode) {
> - tmp = nf_ct_tuplehash_to_ctrack(h);
> - if (!test_bit(IPS_ASSURED_BIT, &tmp->status))
> - ct = tmp;
> + for (i = 0; i < nf_conntrack_htable_size; i++) {
> + hlist_for_each_entry(h, n, &nf_conntrack_hash[hash], hnode) {
> + tmp = nf_ct_tuplehash_to_ctrack(h);
> + if (!test_bit(IPS_ASSURED_BIT, &tmp->status))
> + ct = tmp;
> + cnt++;
> + }
> + if (ct || cnt >= NF_CT_EVICTION_RANGE)
> + break;
> + hash = (hash + 1) % nf_conntrack_htable_size;
> }
> if (ct)
> atomic_inc(&ct->ct_general.use);
This looks good. The randomness in the hash means we no longer need the
"hit the same hash bucket" heuristic to avoid hashbombing.
I still wonder if we should batch up the drops a little while we're
doing all this work? Should reduce stress under serious flood load.
Thanks,
Rusty.
|
|
|
Re: [NETFILTER] early_drop() imrovement (v4) [message #14605 is a reply to message #14588] |
Tue, 03 July 2007 11:42 |
Patrick McHardy
Messages: 107 Registered: March 2006
|
Senior Member |
|
|
Rusty Russell wrote:
> On Wed, 2007-06-27 at 15:54 +0200, Patrick McHardy wrote:
>
>>[NETFILTER]: nf_conntrack: early_drop improvement
>
> This looks good. The randomness in the hash means we no longer need the
> "hit the same hash bucket" heuristic to avoid hashbombing.
>
> I still wonder if we should batch up the drops a little while we're
> doing all this work? Should reduce stress under serious flood load.
Good point, I didn't think of that. Its a bit tricky though, we can't
destroy them while holding nf_conntrack_lock, so we'd either have to
release and re-grab it for every conntrack we want to kill or write
lock it from the beginning, clean them from the lists immediately and
put them on a temporary destruction queue. Or split death_by_timeout
so it can deal with callers already holding nf_conntrack_lock ..
I'll see if I can come up with a halfway decent looking patch :)
|
|
|
Re: [NETFILTER] early_drop() imrovement (v4) [message #14931 is a reply to message #14588] |
Tue, 03 July 2007 06:39 |
Martin Josefsson
Messages: 1 Registered: July 2007
|
Junior Member |
|
|
On Tue, 3 Jul 2007, Rusty Russell wrote:
> This looks good. The randomness in the hash means we no longer need the
> "hit the same hash bucket" heuristic to avoid hashbombing.
>
> I still wonder if we should batch up the drops a little while we're
> doing all this work? Should reduce stress under serious flood load.
Yes we should really do that, going searching for something to evict
for each new connection attempt is really painful and in this
overload situation we need all the cpu we can get.
/Martin
|
|
|
Goto Forum:
Current Time: Fri Oct 18 19:07:04 GMT 2024
Total time taken to generate the page: 0.05095 seconds
|