OpenVZ Forum


Home » Mailing lists » Devel » Re: Re: [PATCHSET] 2.6.20-lxc8
Re: Re: [PATCHSET] 2.6.20-lxc8 [message #11380] Thu, 22 March 2007 10:45 Go to next message
den is currently offline  den
Messages: 494
Registered: December 2005
Senior Member
Kirill Korotaev wrote:
> Eric W. Biederman wrote:
>> Daniel Lezcano <dlezcano@fr.ibm.com> writes:
>>
>>
>>> Benjamin Thery and I we were looking at this.
>>> For the moment we are investigating if there is IP fragmentation between the
>>> eth0 and the pair devices.
>>> The profiling shows us "pskb_expand_head" and "csum_partial_copy_generic"
>>> functions have the bigger CPU usage when we are inside a container.
>>
>> Hmm. Interesting hypothesis. I do know I haven't finished the IP fragment
>> support so there may be an issue there. Although generally especially with
>> TCP fragments are not generated. How are you thinking fragmentation would
>> be involved?
>
> I'm not sure I fully understand the test and details of the thread,
> but still...
>
> if network device inside container has MTU higher then eth0 outside the container,
> then packets will get fragmented.
> First time to MTU1 inside container and refragmented to MTU2
> outside the container. At least this is the reason we use ethernet MTU
> on venet devices in OpenVZ - to avoid framentation.

frankly speaking, there are another points, which can call to
pskb_expand_head. Hard header size on container network device should be
the same as on real one.

Regards,
Den
Re: Re: [PATCHSET] 2.6.20-lxc8 [message #11383 is a reply to message #11380] Thu, 22 March 2007 12:19 Go to previous messageGo to next message
ebiederm is currently offline  ebiederm
Messages: 1354
Registered: February 2006
Senior Member
"Denis V. Lunev" <den@sw.ru> writes:

> Kirill Korotaev wrote:
>>
>> if network device inside container has MTU higher then eth0 outside the
> container,
>> then packets will get fragmented.
>> First time to MTU1 inside container and refragmented to MTU2
>> outside the container. At least this is the reason we use ethernet MTU
>> on venet devices in OpenVZ - to avoid framentation.
>
> frankly speaking, there are another points, which can call to
> pskb_expand_head. Hard header size on container network device should be
> the same as on real one.

I believe it is the case that I have a 1500 byte mtu with an ethernet header.
Although it may be I didn't reserve extra hard header space or something like
that.

Eric
Re: Re: [PATCHSET] 2.6.20-lxc8 [message #17985 is a reply to message #11383] Thu, 22 March 2007 17:57 Go to previous message
Benjamin Thery is currently offline  Benjamin Thery
Messages: 79
Registered: March 2007
Member
Eric W. Biederman wrote:
> "Denis V. Lunev" <den@sw.ru> writes:
> 
>> Kirill Korotaev wrote:
>>> if network device inside container has MTU higher then eth0 outside the
>> container,
>>> then packets will get fragmented.
>>> First time to MTU1 inside container and refragmented to MTU2
>>> outside the container. At least this is the reason we use ethernet MTU
>>> on venet devices in OpenVZ - to avoid framentation.
>> frankly speaking, there are another points, which can call to
>> pskb_expand_head. Hard header size on container network device should be
>> the same as on real one.
> 
> I believe it is the case that I have a 1500 byte mtu with an ethernet header.
> Although it may be I didn't reserve extra hard header space or something like
> that.

My investigations on the increase of cpu load when running netperf 
inside a container (ie. through etun2<->etun1) is progressing slowly.

I'm not sure the cause is fragmentation as we supposed initially.
In fact, it seems related to forwarding the packets between the devices.

Here is what I've tracked so far:
* when we run netperf from the container, oprofile reports that the 
top "consuming" symbol is: "pskb_expand_head". Next comes 
"csum_partial_copy_generic". these symbols represents respectively 
13.5% and 9.7% of the samples.
* Without container, these symbols don't show up in the first 20 entries.

Who is calling "pskb_expand_head" in this case?

Using systemtap, I determined that the call to "pskb_expand_head" 
comes from the skb_cow() in ip_forward() (l.90 in 2.6.20-rc5-netns).

The number of calls to "pskb_expand_head" matches the number of 
invocations of ip_forward() (268000 calls for a 20 seconds netperf 
session in my case).

Here is an example of the backtrace when we reach pskb_expand_head:

  0xc028bb95 : pskb_expand_head+0x1/0x11b []
  0xc02ab5fe : ip_forward+0x150/0x256 []
  0xc02a9e9b : ip_rcv_finish+0xae/0xcb []
  0xc02aa314 : ip_rcv+0x207/0x231 []
  0xc031a0eb : notifier_call_chain+0x19/0x29 []
  0xc02902eb : netif_receive_skb+0x1e3/0x24e []
  0xc0291c60 : process_backlog+0x76/0xe3 []
  0xc0292048 : net_rx_action+0x94/0x14e []
  0xc0126e4d : __do_softirq+0x5d/0xba []
  0xc0126edc : do_softirq+0x32/0x36 []
  0xc01270a8 : local_bh_enable+0x7b/0x89 []
  0xc0292329 : dev_queue_xmit+0x227/0x246 []
  0xc031a0eb : notifier_call_chain+0x19/0x29 []
  0xc02aef41 : ip_output+0x1e6/0x221 []
  0xc02902eb : netif_receive_skb+0x1e3/0x24e []
  0xc02ae741 : ip_queue_xmit+0x3a8/0x3ea []
  0xc02c10cc : tcp_v4_send_check+0x74/0xaa []
  0xc02bc22a : tcp_transmit_skb+0x606/0x634 []
  0xc02aef41 : ip_output+0x1e6/0x221 []
  0xc02bdcec : tcp_push_one+0xb3/0xd8 []
  0xc02b429a : tcp_sendmsg+0x7c8/0x9f9 []
  0xc01cbe8a : vsnprintf+0x450/0x48c []
  0xc02cc10f : inet_sendmsg+0x3b/0x45 []
  0xc0286e17 : sock_sendmsg+0xd0/0xeb []
  0xc017be75 : seq_printf+0x2e/0x4b []
  0xc0133591 : autoremove_wake_function+0x0/0x35 []
  0xc0287691 : sys_sendto+0x116/0x140 []
  0xc02876f2 : sys_send+0x37/0x3b []
  0xc0288057 : sys_socketcall+0x14a/0x261 []
  0xc0102d6c : syscall_call+0x7/0xb []

Also here is a backtrace when etun_xmit() is called:

tx_dev: etun1
rx_dev: etun2

  0xf8b4a3b4 : etun_xmit+0x1/0xed [etun]
  0xc02906e0 : dev_hard_start_xmit+0x1ab/0x204 []
  0xc029229d : dev_queue_xmit+0x19b/0x246 []
  0xc0296672 : neigh_resolve_output+0x1f5/0x227 []
  0xc02aef41 : ip_output+0x1e6/0x221 []
  0xc02ab4ac : ip_forward_finish+0x2c/0x2e []
  0xc02ab6a5 : ip_forward+0x1f7/0x256 []            <- ip_forward
  0xc0292329 : dev_queue_xmit+0x227/0x246 []
  0xc02a9e9b : ip_rcv_finish+0xae/0xcb []
  0xc02aa314 : ip_rcv+0x207/0x231 []
  0xc028c0cd : __alloc_skb+0x4f/0xf8 []
  0xc02902eb : netif_receive_skb+0x1e3/0x24e []
  0xc0291c60 : process_backlog+0x76/0xe3 []
  0xc0292048 : net_rx_action+0x94/0x14e []
  0xc0126e4d : __do_softirq+0x5d/0xba []
  0xc0126edc : do_softirq+0x32/0x36 []
  0xc0104d2f : do_IRQ+0x87/0x9c []
  0xc0103707 : common_interrupt+0x23/0x28 []
  0xc0101c30 : default_idle+0x0/0x3e []
  0xc031007b : packet_set_ring+0x2d7/0x2ea []
  0xc0101c5c : default_idle+0x2c/0x3e []
  0xc0101390 : cpu_idle+0x9e/0xb7 []

That's it for now. I hope this helps.

Regards,
Benjamin


> 
> Eric
> _______________________________________________
> Containers mailing list
> Containers@lists.linux-foundation.org
> https://lists.linux-foundation.org/mailman/listinfo/containers
> 


-- 
B e n j a m i n   T h e r y  - BULL/DT/Open Software R&D

    http://www.bull.com
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers
Previous Topic: Re: [ckrm-tech] [PATCH 0/2] resource control file system - aka containers on top of nsproxy!
Next Topic: [PATCH 2/2] Replace pid_t in autofs with struct pid reference
Goto Forum:
  


Current Time: Sun Aug 17 21:27:27 GMT 2025

Total time taken to generate the page: 0.23214 seconds