As a tech who tried to fix issue, server rises in load after it reports a kernel bug. See the latest ones. I did a custom compile of kernel, version 2.6.18. For me, it looks like a faulty memory, but server owner said, he replaced every hardware.
Mar 19 02:45:01 localhost kernel: BUG: warning at kernel/ub/ub_page_bc.c:322/pb_dup_ref()
.....repeated thrice followed by,
Mar 19 02:45:05 localhost kernel: BUG: unable to handle kernel NULL pointer dereference at virtual address 00000002
Mar 19 02:45:05 localhost kernel: printing eip:
Mar 19 02:45:05 localhost kernel: c0143c41
Mar 19 02:45:05 localhost kernel: *pde = 00000000
Mar 19 02:45:05 localhost kernel: Oops: 0000 [#1]
Mar 19 02:45:05 localhost kernel: SMP
Mar 19 02:45:05 localhost kernel: Modules linked in: simfs vzethdev vzrst ip_nat vzcpt ip_conntrack vzdquota af_packet xt_tcpudp xt_length ipt_ttl xt_tcpmss ipt_TCPMSS iptable_mangle xt_multiport xt_limit ipt_tos ipt_REJECT iptable_filter ip_tables x_tables parport_pc lp parport autofs4 sunrpc vznetdev vzmon vzdev thermal processor fan button battery asus_acpi ac uhci_hcd ehci_hcd usbcore i2c_i801 i2c_core 8139too mii
Mar 19 02:45:05 localhost kernel: CPU: 1, VCPU: 149.0
Mar 19 02:45:05 localhost kernel: EIP: 0060:[<c0143c41>] Not tainted VLI
Mar 19 02:45:05 localhost kernel: EFLAGS: 00010286 (2.6.18-028 #3)
Mar 19 02:45:05 localhost kernel: EIP is at ub_page_uncharge+0x31/0x90
Mar 19 02:45:05 localhost kernel: eax: fffffffe ebx: c203c6b0 ecx: 00000000 edx: 00000001
Mar 19 02:45:05 localhost kernel: esi: f6515600 edi: c203c6b0 ebp: c0539a40 esp: f4a59dcc
Mar 19 02:45:05 localhost kernel: ds: 007b es: 007b ss: 0068
Mar 19 02:45:05 localhost kernel: Process dcpumon (pid: 22650, veid: 149, ti=f4a59000 task=f4969340 task.ti=f4a59000)
Mar 19 02:45:05 localhost kernel: Stack: f4a59f20 00000000 c203c6b0 c0539a20 00000000 c015e8f5 c203c6b0 00000000
Mar 19 02:45:05 localhost kernel: 00000000 000006e6 c05399e0 0000000c f4a59e1c 0000000e 0000000e c015f32c
Mar 19 02:45:05 localhost kernel: c1bce7ac 00000000 c0161fc2 f4a59e1c 0000000e 00000000 c1d7d564 c1d33cec
Mar 19 02:45:05 localhost kernel: Call Trace:
Mar 19 02:45:05 localhost kernel: [<c015e8f5>] free_hot_cold_page+0xf5/0x1a0
Mar 19 02:45:05 localhost kernel: [<c015f32c>] __pagevec_free+0x1c/0x30
Mar 19 02:45:05 localhost kernel: [<c0161fc2>] release_pages+0x102/0x190
Mar 19 02:45:05 localhost kernel: [<c0172287>] free_pages_and_swap_cache+0x77/0xa0
Mar 19 02:45:05 localhost kernel: [<c016838f>] zap_pte_range+0x27f/0x340
Mar 19 02:45:05 localhost kernel: [<c0168514>] unmap_page_range+0xc4/0x160
Mar 19 02:45:05 localhost kernel: [<c0168685>] unmap_vmas+0xd5/0x200
Mar 19 02:45:05 localhost kernel: [<c016de65>] exit_mmap+0x85/0x120
Mar 19 02:45:05 localhost kernel: [<c0121a88>] mmput+0x38/0xc0
Mar 19 02:45:05 localhost kernel: [<c012859e>] do_exit+0xfe/0x480
Mar 19 02:45:05 localhost kernel: [<c0128986>] do_group_exit+0x36/0xa0
Mar 19 02:45:05 localhost kernel: [<c01031c7>] syscall_call+0x7/0xb
Mar 19 02:45:05 localhost kernel: Code: 1c 89 7c 24 10 8b 7c 24 18 89 5c 24 08 89 74 24 0c 8b 77 20 85 f6 74 4f 89 e2 8b 86 30 05 00 00 81 e2 00 f0 ff ff 8b 52 10 f7 d0 <8b> 14 90 b8 01 00 00 00 d3 e0 29 42 20 81 3e 75 62 75 62 75 37
Mar 19 02:45:05 localhost kernel: EIP: [<c0143c41>] ub_page_uncharge+0x31/0x90 SS:ESP 0068:f4a59dcc
Mar 19 02:45:05 localhost kernel: Fixing recursive fault but reboot is needed!
Mar 19 02:45:05 localhost kernel: BUG: scheduling while atomic: dcpumon/0x00000001/22650
============================================
Most common error is in rmap.c which repeats very frequently. as below.
Mar 19 02:45:29 localhost kernel: ------------[ cut here ]------------
Mar 19 02:45:29 localhost kernel: kernel BUG at mm/rmap.c:529!
Mar 19 02:45:29 localhost kernel: invalid opcode: 0000 [#2]
Mar 19 02:45:29 localhost kernel: SMP
Mar 19 02:45:29 localhost kernel: Modules linked in: simfs vzethdev vzrst ip_nat vzcpt ip_conntrack vzdquota af_packet xt_tcpudp xt_length ipt_ttl xt_tcpmss ipt_TCPMSS iptable_mangle xt_multiport xt_limit ipt_tos ipt_REJECT iptable_filter ip_tables x_tables parport_pc lp parport autofs4 sunrpc vznetdev vzmon vzdev thermal processor fan button battery asus_acpi ac uhci_hcd ehci_hcd usbcore i2c_i801 i2c_core 8139too mii
Mar 19 02:45:29 localhost kernel: CPU: 1, VCPU: 120.1
Mar 19 02:45:29 localhost kernel: EIP: 0060:[<c016feb7>] Not tainted VLI
Mar 19 02:45:29 localhost kernel: EFLAGS: 00010286 (2.6.18-028 #3)
Mar 19 02:45:29 localhost kernel: EIP is at page_remove_rmap+0x37/0x50
Mar 19 02:45:29 localhost kernel: eax: ffffffff ebx: c1d1d6f0 ecx: c0003ea0 edx: c20b4bb4
Mar 19 02:45:29 localhost kernel: esi: c1d1d6f0 edi: fffa85a4 ebp: c20b4bb4 esp: f3d0ff10
Mar 19 02:45:29 localhost kernel: ds: 007b es: 007b ss: 0068
Mar 19 02:45:29 localhost kernel: Process exim (pid: 22837, veid: 120, ti=f3d0f000 task=f38446e0 task.ti=f3d0f000)
=================================
With this above kernel bug, server load started rising and finally all VPSes stopped responding. I was not able to do a 'vzctl stop veid" or a "vzctl exec 120 kill -9 22837" or even to enter the vps "vzctl enter 120". It simply hangs there and finally have to do Ctrl + C to get the shell prompt back. BTW that exim process raised the load to 100+ before I issued a reboot.
Please shed some lights into this issue. I am going clueless. I dont want to think it is kernel bug, but what are our other options to try ?