OpenVZ Forum


Home » Mailing lists » Devel » Re: nptl perf bench and profiling with pidns patchsets
Re: Re: nptl perf bench and profiling with pidns patchsets [message #18741 is a reply to message #18734] Mon, 04 June 2007 14:12 Go to previous message
xemul is currently offline  xemul
Messages: 248
Registered: November 2005
Senior Member
Serge E. Hallyn wrote:
> Quoting Kirill Korotaev (dev@sw.ru):
>> Cedric,
>>
>> just a small note.
>> imho it is not correct to check performance with enabled debug in memory allocator
>> since it can influence cache efficiency much.
>> In you case looks like you have DEBUG_SLAB enabled.
> 
> Hm, good point.  Cedric, did you ever run any tests with profiling and
> debugging turned off?

I'd like to add that the results-for-comparison have to be run
with profiler turned off. Further, if we need to know what the
bottleneck is, the profiler is on, but the numbers get are not
trusted.

Cedric, may I ask you to rerun the tests with both the debug and
the profiler turned off and report the results again?

Thanks,
Pavel

> -serge
> 
>> Pavel will recheck as well what influences on this particular test.
>> BTW, it is strange... But according to Pavel unixbench results
>> were very reproducible. What was the problem in your case?
>>
>> Kirill
>>
>> Cedric Le Goater wrote:
>>> Pavel and all,
>>>
>>> I've been profiling the different pidns patchsets to chase the perf 
>>> bottlenecks in the pidns patchset. As i was not getting accurate  
>>> profiling results with unixbench, I changed the benchmark to use the 
>>> nptl perf benchmark ingo used when he introduced the generic pidhash 
>>> back in 2002. 
>>>
>>> 	http://lwn.net/Articles/10368/ 
>>>
>>> Compared to unixbench, this is a micro benchmark measuring thread 
>>> creation and destruction which I think is quite relevant of our 
>>> different patchsets. unixbench is fine but profiling is not really 
>>> accurate. too much noise. Any other suggestions ? 
>>>
>>> On a 2 * Intel(R) Xeon(TM) CPU 2.80GHz with 4 GB of RAM, I ran 8 
>>> simultaneous, like ingo did :
>>>
>>> 	./perf -s 1000000 -t 1 -r 0 -T --sync-join
>>>
>>> I did that a few times and also changed the load of the machine 
>>> to see if values were not too dispersed.
>>>
>>> kernels used were :
>>>
>>> * 2.6.22-rc1-mm1
>>> * http://lxc.sourceforge.net/patches/2.6.22/2.6.22-rc1-mm1-openvz-pidns1/
>>> * http://lxc.sourceforge.net/patches/2.6.22/2.6.22-rc1-mm1-pidns1/
>>>
>>> findings are : 
>>>
>>> * definitely better results for suka's patchset. suka's patchset is 
>>>   also getting better results with unixbench on a 2.6.22-rc1-mm1 but 
>>>   the values are really dispersed. can you confirm ?
>>> * suka's patchset would benefit from some optimization in init_upid() 
>>>   and dup_struct_pid()  
>>> * it seems that openvz's pachset has some issue with the struct pid 
>>>   cache. not sure what is the reason. may be you can help pavel.
>>>
>>> Cheers,
>>>
>>> C.
>>>
>>>
>>> * results for 2.6.22-rc1-mm1 
>>>
>>> Runtime: 91.635644842 seconds
>>> Runtime: 91.639834248 seconds
>>> Runtime: 93.615069259 seconds
>>> Runtime: 93.664678865 seconds
>>> Runtime: 95.724542035 seconds
>>> Runtime: 95.763572945 seconds
>>> Runtime: 96.444022314 seconds
>>> Runtime: 97.028016189 seconds
>>>
>>> * results for 2.6.22-rc1-mm1-pidns 
>>>
>>> Runtime: 92.054172217 seconds
>>> Runtime: 93.606016039 seconds
>>> Runtime: 93.624093799 seconds
>>> Runtime: 94.992255782 seconds
>>> Runtime: 95.914365693 seconds
>>> Runtime: 98.080396784 seconds
>>> Runtime: 98.674988254 seconds
>>> Runtime: 98.832674972 seconds
>>>
>>> * results for 2.6.22-rc1-mm1-openvz-pidns 
>>>
>>> Runtime: 92.359771573 seconds
>>> Runtime: 96.517435638 seconds
>>> Runtime: 98.328696048 seconds
>>> Runtime: 100.263042244 seconds
>>> Runtime: 101.003111486 seconds
>>> Runtime: 101.371180205 seconds
>>> Runtime: 102.536653818 seconds
>>> Runtime: 102.671519536 seconds
>>>
>>>
>>> * diffprofile 2.6.22-rc1-mm1 and 2.6.22-rc1-mm1-pidns 
>>>
>>>       2708    11.8% check_poison_obj
>>>       2461     0.0% init_upid
>>>       2445     2.9% total
>>>       2283   183.7% kmem_cache_free
>>>        383    16.9% kmem_cache_alloc
>>>        365    13.6% __memset
>>>        280     0.0% dup_struct_pid
>>>        279    22.9% __show_regs
>>>        278    21.1% cache_alloc_debugcheck_after
>>>        261    11.3% get_page_from_freelist
>>>        223     0.0% kref_put
>>>        203     3.4% copy_process
>>>        197    34.4% do_futex
>>>        176     5.6% do_exit
>>>         86    22.8% cache_alloc_refill
>>>         82    28.2% do_fork
>>>         69    18.3% sched_balance_self
>>>         68   136.0% __free_pages_ok
>>>         59    90.8% bad_range
>>>         52     4.3% __down_read
>>>         51    13.7% account_user_time
>>>         50     7.5% copy_thread
>>>         43    28.7% put_files_struct
>>>         37   264.3% __free_pages
>>>         31    18.9% poison_obj
>>>         28    82.4% gs_change
>>>         26    16.0% plist_check_prev_next
>>>         25   192.3% __put_task_struct
>>>         23    26.7% __get_free_pages
>>>         23    14.6% __put_user_4
>>>         23   230.0% alloc_uid
>>>         22     9.0% exit_mm
>>>         21    12.9% _raw_spin_unlock
>>>         21     7.8% mm_release
>>>         21     8.6% plist_check_list
>>>         20    20.0% drop_futex_key_refs
>>>         20    12.0% __up_read
>>>         19    48.7% unqueue_me
>>>         19    16.4% do_arch_prctl
>>>         18  1800.0% dummy_task_free_security
>>>         18    58.1% wake_futex
>>>         17    47.2% obj_offset
>>>         16    16.7% dbg_userword
>>>         15     0.0% kref_get
>>>         15   150.0% check_irq_off
>>>         15   300.0% __rcu_process_callbacks
>>>         14   466.7% __switch_to
>>>         14    32.6% prepare_to_copy
>>>         14     8.2% get_futex_key
>>>         14    16.1% __wake_up
>>>         13    65.0% rt_mutex_debug_task_free
>>>         12     7.1% obj_size
>>>         11    19.3% add_wait_queue
>>>         11   275.0% put_pid
>>>         11   550.0% profile_task_exit
>>>         10     9.0% task_nice
>>>          9   100.0% __delay
>>>          8    57.1% call_rcu
>>>          8     7.8% find_extend_vma
>>>          8   266.7% ktime_get
>>>          8    23.5% sys_clone
>>>          8    25.0% delayed_put_task_struct
>>>          7    26.9% task_rq_lock
>>>          7    18.9% _spin_lock_irqsave
>>>          6     0.0% quicklist_trim
>>>          6   100.0% __up_write
>>>         -6   -50.0% module_unload_free
>>>         -6  -100.0% nr_running
>>>         -7   -43.8% _raw_spin_trylock
>>>         -7    -2.8% __alloc_pages
>>>         -8   -33.3% sysret_check
>>>         -8   -28.6% sysret_careful
>>>         -8   -50.0% sysret_signal
>>>         -8    -1.9% copy_namespaces
>>>         -9   -16.7% memmove
>>>         -9   -11.5% __phys_addr
>>>         -9    -4.5% copy_semundo
>>>        -10   -28.6% rwlock_bug
>>>        -10   -27.8% wake_up_new_task
>>>        -10   -10.4% sched_clock
>>>        -10    -6.2% copy_user_generic_unrolled
>>>        -11  -100.0% d_validate
>>>        -11   -23.9% monotonic_to_bootbased
>>>        -11   -10.6% dummy_task_create
>>>        -11    -3.7% futex_wake
>>>        -12    -3.9% __might_sleep
>>>        -13  -100.0% vscnprintf
>>>        -14   -13.0% plist_del
>>>        -16   -84.2% sighand_ctor
>>>        -17   -20.7% debug_rt_mutex_free_waiter
>>>        -17   -42.5% release_thread
>>>        -18   -29.5% init_waitqueue_head
>>>        -19  -100.0% scnprintf
>>>        -21   -12.7% copy_files
>>>        -22   -47.8% blocking_notifier_call_chain
>>>        -23   -11.8% hash_futex
>>>        -24   -18.8% call_rcu_bh
>>>        -25   -19.8% mmput
>>>        -27   -16.5% down_read
>>>        -27   -39.7% audit_alloc
>>>        -27   -19.9% stub_clone
>>>        -28   -16.3% set_normalized_timespec
>>>        -32   -74.4% kfree_debugcheck
>>>        -35   -30.2% sys_exit
>>>        -40   -63.5% down_read_trylock
>>>        -43    -8.6% zone_watermark_ok
>>>        -49    -7.7% schedule
>>>        -53    -5.4% system_call
>>>        -54   -47.0% __blocking_notifier_call_chain
>>>        -64   -24.8% getnstimeofday
>>>        -66    -7.0% _raw_spin_lock
>>>        -75   -22.9% ktime_get_ts
>>>        -86  -100.0% snprintf
>>>        -86   -12.8% kernel_thread
>>>        -88   -38.1% plist_add
>>>        -93    -5.4% __memcpy
>>>       -100   -59.9% kmem_flagcheck
>&g
...

 
Read Message
Read Message
Read Message
Read Message
Previous Topic: [PATCH -mm] Fix /proc/slab_allocators re seq_list_next() conversion
Next Topic: [PATCH -RSS 1/1] Fix reclaim failure
Goto Forum:
  


Current Time: Sat Sep 13 09:06:28 GMT 2025

Total time taken to generate the page: 0.07686 seconds