OpenVZ Forum


Home » Mailing lists » Devel » Re: nptl perf bench and profiling with pidns patchsets
Re: nptl perf bench and profiling with pidns patchsets [message #18734] Mon, 04 June 2007 13:56 Go to previous message
serue is currently offline  serue
Messages: 750
Registered: February 2006
Senior Member
Quoting Kirill Korotaev (dev@sw.ru):
> Cedric,
> 
> just a small note.
> imho it is not correct to check performance with enabled debug in memory allocator
> since it can influence cache efficiency much.
> In you case looks like you have DEBUG_SLAB enabled.

Hm, good point.  Cedric, did you ever run any tests with profiling and
debugging turned off?

-serge

> Pavel will recheck as well what influences on this particular test.
> BTW, it is strange... But according to Pavel unixbench results
> were very reproducible. What was the problem in your case?
> 
> Kirill
> 
> Cedric Le Goater wrote:
> > Pavel and all,
> > 
> > I've been profiling the different pidns patchsets to chase the perf 
> > bottlenecks in the pidns patchset. As i was not getting accurate  
> > profiling results with unixbench, I changed the benchmark to use the 
> > nptl perf benchmark ingo used when he introduced the generic pidhash 
> > back in 2002. 
> > 
> > 	http://lwn.net/Articles/10368/ 
> > 
> > Compared to unixbench, this is a micro benchmark measuring thread 
> > creation and destruction which I think is quite relevant of our 
> > different patchsets. unixbench is fine but profiling is not really 
> > accurate. too much noise. Any other suggestions ? 
> > 
> > On a 2 * Intel(R) Xeon(TM) CPU 2.80GHz with 4 GB of RAM, I ran 8 
> > simultaneous, like ingo did :
> > 
> > 	./perf -s 1000000 -t 1 -r 0 -T --sync-join
> > 
> > I did that a few times and also changed the load of the machine 
> > to see if values were not too dispersed.
> > 
> > kernels used were :
> > 
> > * 2.6.22-rc1-mm1
> > * http://lxc.sourceforge.net/patches/2.6.22/2.6.22-rc1-mm1-openvz-pidns1/
> > * http://lxc.sourceforge.net/patches/2.6.22/2.6.22-rc1-mm1-pidns1/
> > 
> > findings are : 
> > 
> > * definitely better results for suka's patchset. suka's patchset is 
> >   also getting better results with unixbench on a 2.6.22-rc1-mm1 but 
> >   the values are really dispersed. can you confirm ?
> > * suka's patchset would benefit from some optimization in init_upid() 
> >   and dup_struct_pid()  
> > * it seems that openvz's pachset has some issue with the struct pid 
> >   cache. not sure what is the reason. may be you can help pavel.
> > 
> > Cheers,
> > 
> > C.
> > 
> > 
> > * results for 2.6.22-rc1-mm1 
> > 
> > Runtime: 91.635644842 seconds
> > Runtime: 91.639834248 seconds
> > Runtime: 93.615069259 seconds
> > Runtime: 93.664678865 seconds
> > Runtime: 95.724542035 seconds
> > Runtime: 95.763572945 seconds
> > Runtime: 96.444022314 seconds
> > Runtime: 97.028016189 seconds
> > 
> > * results for 2.6.22-rc1-mm1-pidns 
> > 
> > Runtime: 92.054172217 seconds
> > Runtime: 93.606016039 seconds
> > Runtime: 93.624093799 seconds
> > Runtime: 94.992255782 seconds
> > Runtime: 95.914365693 seconds
> > Runtime: 98.080396784 seconds
> > Runtime: 98.674988254 seconds
> > Runtime: 98.832674972 seconds
> > 
> > * results for 2.6.22-rc1-mm1-openvz-pidns 
> > 
> > Runtime: 92.359771573 seconds
> > Runtime: 96.517435638 seconds
> > Runtime: 98.328696048 seconds
> > Runtime: 100.263042244 seconds
> > Runtime: 101.003111486 seconds
> > Runtime: 101.371180205 seconds
> > Runtime: 102.536653818 seconds
> > Runtime: 102.671519536 seconds
> > 
> > 
> > * diffprofile 2.6.22-rc1-mm1 and 2.6.22-rc1-mm1-pidns 
> > 
> >       2708    11.8% check_poison_obj
> >       2461     0.0% init_upid
> >       2445     2.9% total
> >       2283   183.7% kmem_cache_free
> >        383    16.9% kmem_cache_alloc
> >        365    13.6% __memset
> >        280     0.0% dup_struct_pid
> >        279    22.9% __show_regs
> >        278    21.1% cache_alloc_debugcheck_after
> >        261    11.3% get_page_from_freelist
> >        223     0.0% kref_put
> >        203     3.4% copy_process
> >        197    34.4% do_futex
> >        176     5.6% do_exit
> >         86    22.8% cache_alloc_refill
> >         82    28.2% do_fork
> >         69    18.3% sched_balance_self
> >         68   136.0% __free_pages_ok
> >         59    90.8% bad_range
> >         52     4.3% __down_read
> >         51    13.7% account_user_time
> >         50     7.5% copy_thread
> >         43    28.7% put_files_struct
> >         37   264.3% __free_pages
> >         31    18.9% poison_obj
> >         28    82.4% gs_change
> >         26    16.0% plist_check_prev_next
> >         25   192.3% __put_task_struct
> >         23    26.7% __get_free_pages
> >         23    14.6% __put_user_4
> >         23   230.0% alloc_uid
> >         22     9.0% exit_mm
> >         21    12.9% _raw_spin_unlock
> >         21     7.8% mm_release
> >         21     8.6% plist_check_list
> >         20    20.0% drop_futex_key_refs
> >         20    12.0% __up_read
> >         19    48.7% unqueue_me
> >         19    16.4% do_arch_prctl
> >         18  1800.0% dummy_task_free_security
> >         18    58.1% wake_futex
> >         17    47.2% obj_offset
> >         16    16.7% dbg_userword
> >         15     0.0% kref_get
> >         15   150.0% check_irq_off
> >         15   300.0% __rcu_process_callbacks
> >         14   466.7% __switch_to
> >         14    32.6% prepare_to_copy
> >         14     8.2% get_futex_key
> >         14    16.1% __wake_up
> >         13    65.0% rt_mutex_debug_task_free
> >         12     7.1% obj_size
> >         11    19.3% add_wait_queue
> >         11   275.0% put_pid
> >         11   550.0% profile_task_exit
> >         10     9.0% task_nice
> >          9   100.0% __delay
> >          8    57.1% call_rcu
> >          8     7.8% find_extend_vma
> >          8   266.7% ktime_get
> >          8    23.5% sys_clone
> >          8    25.0% delayed_put_task_struct
> >          7    26.9% task_rq_lock
> >          7    18.9% _spin_lock_irqsave
> >          6     0.0% quicklist_trim
> >          6   100.0% __up_write
> >         -6   -50.0% module_unload_free
> >         -6  -100.0% nr_running
> >         -7   -43.8% _raw_spin_trylock
> >         -7    -2.8% __alloc_pages
> >         -8   -33.3% sysret_check
> >         -8   -28.6% sysret_careful
> >         -8   -50.0% sysret_signal
> >         -8    -1.9% copy_namespaces
> >         -9   -16.7% memmove
> >         -9   -11.5% __phys_addr
> >         -9    -4.5% copy_semundo
> >        -10   -28.6% rwlock_bug
> >        -10   -27.8% wake_up_new_task
> >        -10   -10.4% sched_clock
> >        -10    -6.2% copy_user_generic_unrolled
> >        -11  -100.0% d_validate
> >        -11   -23.9% monotonic_to_bootbased
> >        -11   -10.6% dummy_task_create
> >        -11    -3.7% futex_wake
> >        -12    -3.9% __might_sleep
> >        -13  -100.0% vscnprintf
> >        -14   -13.0% plist_del
> >        -16   -84.2% sighand_ctor
> >        -17   -20.7% debug_rt_mutex_free_waiter
> >        -17   -42.5% release_thread
> >        -18   -29.5% init_waitqueue_head
> >        -19  -100.0% scnprintf
> >        -21   -12.7% copy_files
> >        -22   -47.8% blocking_notifier_call_chain
> >        -23   -11.8% hash_futex
> >        -24   -18.8% call_rcu_bh
> >        -25   -19.8% mmput
> >        -27   -16.5% down_read
> >        -27   -39.7% audit_alloc
> >        -27   -19.9% stub_clone
> >        -28   -16.3% set_normalized_timespec
> >        -32   -74.4% kfree_debugcheck
> >        -35   -30.2% sys_exit
> >        -40   -63.5% down_read_trylock
> >        -43    -8.6% zone_watermark_ok
> >        -49    -7.7% schedule
> >        -53    -5.4% system_call
> >        -54   -47.0% __blocking_notifier_call_chain
> >        -64   -24.8% getnstimeofday
> >        -66    -7.0% _raw_spin_lock
> >        -75   -22.9% ktime_get_ts
> >        -86  -100.0% snprintf
> >        -86   -12.8% kernel_thread
> >        -88   -38.1% plist_add
> >        -93    -5.4% __memcpy
> >       -100   -59.9% kmem_flagcheck
> >       -103   -18.5% acct_collect
> >       -113   -38.3% dbg_redzone1
> >       -138    -3.9% schedule_tail
> >       -162   -12.2% _spin_unlock
> >       -243    -7.3% thread_return
> >       -268   -83.5% proc_flush_task
> >       -289  -100.0% d_lookup
> >       -357  -100.0% d_hash_and_lookup
> >       -368    -6.1% release_task
> >       -642   -99.8% vsnprintf
> >       -816  -100.0% __d_lookup
> >      -1529  -100.0% number
> >      -2431  -100.0% alloc_pid
> > 
> > * diffprofile 2.6.22-rc1-mm1 and 2.6.22-rc1-mm1-openvz-pidns 
> > 
> >      10046    11.8% total
> >       6896   554.8% kmem_cache_free
> >       1580     6.9% check_poison_obj
> >       1222     0.0% alloc_pidmap
> >        883    39.0% kmem_cache_alloc
> >        485   128.6% cache_alloc_refill
> >        263     8.4% do_exit
> >        223    40.0% acct_collect
> >        
...

 
Read Message
Read Message
Read Message
Read Message
Previous Topic: [PATCH -mm] Fix /proc/slab_allocators re seq_list_next() conversion
Next Topic: [PATCH -RSS 1/1] Fix reclaim failure
Goto Forum:
  


Current Time: Sat Sep 13 13:35:05 GMT 2025

Total time taken to generate the page: 0.06142 seconds