Quoting Kirill Korotaev (dev@sw.ru):
> Cedric,
>
> just a small note.
> imho it is not correct to check performance with enabled debug in memory allocator
> since it can influence cache efficiency much.
> In you case looks like you have DEBUG_SLAB enabled.
Hm, good point. Cedric, did you ever run any tests with profiling and
debugging turned off?
-serge
> Pavel will recheck as well what influences on this particular test.
> BTW, it is strange... But according to Pavel unixbench results
> were very reproducible. What was the problem in your case?
>
> Kirill
>
> Cedric Le Goater wrote:
> > Pavel and all,
> >
> > I've been profiling the different pidns patchsets to chase the perf
> > bottlenecks in the pidns patchset. As i was not getting accurate
> > profiling results with unixbench, I changed the benchmark to use the
> > nptl perf benchmark ingo used when he introduced the generic pidhash
> > back in 2002.
> >
> > http://lwn.net/Articles/10368/
> >
> > Compared to unixbench, this is a micro benchmark measuring thread
> > creation and destruction which I think is quite relevant of our
> > different patchsets. unixbench is fine but profiling is not really
> > accurate. too much noise. Any other suggestions ?
> >
> > On a 2 * Intel(R) Xeon(TM) CPU 2.80GHz with 4 GB of RAM, I ran 8
> > simultaneous, like ingo did :
> >
> > ./perf -s 1000000 -t 1 -r 0 -T --sync-join
> >
> > I did that a few times and also changed the load of the machine
> > to see if values were not too dispersed.
> >
> > kernels used were :
> >
> > * 2.6.22-rc1-mm1
> > * http://lxc.sourceforge.net/patches/2.6.22/2.6.22-rc1-mm1-openvz-pidns1/
> > * http://lxc.sourceforge.net/patches/2.6.22/2.6.22-rc1-mm1-pidns1/
> >
> > findings are :
> >
> > * definitely better results for suka's patchset. suka's patchset is
> > also getting better results with unixbench on a 2.6.22-rc1-mm1 but
> > the values are really dispersed. can you confirm ?
> > * suka's patchset would benefit from some optimization in init_upid()
> > and dup_struct_pid()
> > * it seems that openvz's pachset has some issue with the struct pid
> > cache. not sure what is the reason. may be you can help pavel.
> >
> > Cheers,
> >
> > C.
> >
> >
> > * results for 2.6.22-rc1-mm1
> >
> > Runtime: 91.635644842 seconds
> > Runtime: 91.639834248 seconds
> > Runtime: 93.615069259 seconds
> > Runtime: 93.664678865 seconds
> > Runtime: 95.724542035 seconds
> > Runtime: 95.763572945 seconds
> > Runtime: 96.444022314 seconds
> > Runtime: 97.028016189 seconds
> >
> > * results for 2.6.22-rc1-mm1-pidns
> >
> > Runtime: 92.054172217 seconds
> > Runtime: 93.606016039 seconds
> > Runtime: 93.624093799 seconds
> > Runtime: 94.992255782 seconds
> > Runtime: 95.914365693 seconds
> > Runtime: 98.080396784 seconds
> > Runtime: 98.674988254 seconds
> > Runtime: 98.832674972 seconds
> >
> > * results for 2.6.22-rc1-mm1-openvz-pidns
> >
> > Runtime: 92.359771573 seconds
> > Runtime: 96.517435638 seconds
> > Runtime: 98.328696048 seconds
> > Runtime: 100.263042244 seconds
> > Runtime: 101.003111486 seconds
> > Runtime: 101.371180205 seconds
> > Runtime: 102.536653818 seconds
> > Runtime: 102.671519536 seconds
> >
> >
> > * diffprofile 2.6.22-rc1-mm1 and 2.6.22-rc1-mm1-pidns
> >
> > 2708 11.8% check_poison_obj
> > 2461 0.0% init_upid
> > 2445 2.9% total
> > 2283 183.7% kmem_cache_free
> > 383 16.9% kmem_cache_alloc
> > 365 13.6% __memset
> > 280 0.0% dup_struct_pid
> > 279 22.9% __show_regs
> > 278 21.1% cache_alloc_debugcheck_after
> > 261 11.3% get_page_from_freelist
> > 223 0.0% kref_put
> > 203 3.4% copy_process
> > 197 34.4% do_futex
> > 176 5.6% do_exit
> > 86 22.8% cache_alloc_refill
> > 82 28.2% do_fork
> > 69 18.3% sched_balance_self
> > 68 136.0% __free_pages_ok
> > 59 90.8% bad_range
> > 52 4.3% __down_read
> > 51 13.7% account_user_time
> > 50 7.5% copy_thread
> > 43 28.7% put_files_struct
> > 37 264.3% __free_pages
> > 31 18.9% poison_obj
> > 28 82.4% gs_change
> > 26 16.0% plist_check_prev_next
> > 25 192.3% __put_task_struct
> > 23 26.7% __get_free_pages
> > 23 14.6% __put_user_4
> > 23 230.0% alloc_uid
> > 22 9.0% exit_mm
> > 21 12.9% _raw_spin_unlock
> > 21 7.8% mm_release
> > 21 8.6% plist_check_list
> > 20 20.0% drop_futex_key_refs
> > 20 12.0% __up_read
> > 19 48.7% unqueue_me
> > 19 16.4% do_arch_prctl
> > 18 1800.0% dummy_task_free_security
> > 18 58.1% wake_futex
> > 17 47.2% obj_offset
> > 16 16.7% dbg_userword
> > 15 0.0% kref_get
> > 15 150.0% check_irq_off
> > 15 300.0% __rcu_process_callbacks
> > 14 466.7% __switch_to
> > 14 32.6% prepare_to_copy
> > 14 8.2% get_futex_key
> > 14 16.1% __wake_up
> > 13 65.0% rt_mutex_debug_task_free
> > 12 7.1% obj_size
> > 11 19.3% add_wait_queue
> > 11 275.0% put_pid
> > 11 550.0% profile_task_exit
> > 10 9.0% task_nice
> > 9 100.0% __delay
> > 8 57.1% call_rcu
> > 8 7.8% find_extend_vma
> > 8 266.7% ktime_get
> > 8 23.5% sys_clone
> > 8 25.0% delayed_put_task_struct
> > 7 26.9% task_rq_lock
> > 7 18.9% _spin_lock_irqsave
> > 6 0.0% quicklist_trim
> > 6 100.0% __up_write
> > -6 -50.0% module_unload_free
> > -6 -100.0% nr_running
> > -7 -43.8% _raw_spin_trylock
> > -7 -2.8% __alloc_pages
> > -8 -33.3% sysret_check
> > -8 -28.6% sysret_careful
> > -8 -50.0% sysret_signal
> > -8 -1.9% copy_namespaces
> > -9 -16.7% memmove
> > -9 -11.5% __phys_addr
> > -9 -4.5% copy_semundo
> > -10 -28.6% rwlock_bug
> > -10 -27.8% wake_up_new_task
> > -10 -10.4% sched_clock
> > -10 -6.2% copy_user_generic_unrolled
> > -11 -100.0% d_validate
> > -11 -23.9% monotonic_to_bootbased
> > -11 -10.6% dummy_task_create
> > -11 -3.7% futex_wake
> > -12 -3.9% __might_sleep
> > -13 -100.0% vscnprintf
> > -14 -13.0% plist_del
> > -16 -84.2% sighand_ctor
> > -17 -20.7% debug_rt_mutex_free_waiter
> > -17 -42.5% release_thread
> > -18 -29.5% init_waitqueue_head
> > -19 -100.0% scnprintf
> > -21 -12.7% copy_files
> > -22 -47.8% blocking_notifier_call_chain
> > -23 -11.8% hash_futex
> > -24 -18.8% call_rcu_bh
> > -25 -19.8% mmput
> > -27 -16.5% down_read
> > -27 -39.7% audit_alloc
> > -27 -19.9% stub_clone
> > -28 -16.3% set_normalized_timespec
> > -32 -74.4% kfree_debugcheck
> > -35 -30.2% sys_exit
> > -40 -63.5% down_read_trylock
> > -43 -8.6% zone_watermark_ok
> > -49 -7.7% schedule
> > -53 -5.4% system_call
> > -54 -47.0% __blocking_notifier_call_chain
> > -64 -24.8% getnstimeofday
> > -66 -7.0% _raw_spin_lock
> > -75 -22.9% ktime_get_ts
> > -86 -100.0% snprintf
> > -86 -12.8% kernel_thread
> > -88 -38.1% plist_add
> > -93 -5.4% __memcpy
> > -100 -59.9% kmem_flagcheck
> > -103 -18.5% acct_collect
> > -113 -38.3% dbg_redzone1
> > -138 -3.9% schedule_tail
> > -162 -12.2% _spin_unlock
> > -243 -7.3% thread_return
> > -268 -83.5% proc_flush_task
> > -289 -100.0% d_lookup
> > -357 -100.0% d_hash_and_lookup
> > -368 -6.1% release_task
> > -642 -99.8% vsnprintf
> > -816 -100.0% __d_lookup
> > -1529 -100.0% number
> > -2431 -100.0% alloc_pid
> >
> > * diffprofile 2.6.22-rc1-mm1 and 2.6.22-rc1-mm1-openvz-pidns
> >
> > 10046 11.8% total
> > 6896 554.8% kmem_cache_free
> > 1580 6.9% check_poison_obj
> > 1222 0.0% alloc_pidmap
> > 883 39.0% kmem_cache_alloc
> > 485 128.6% cache_alloc_refill
> > 263 8.4% do_exit
> > 223 40.0% acct_collect
> >
...