Home » General » Support » *NOT SOLVED* 3ware 8506 - CentOS 4.5 or 5.0 - OpenVZ Stock Kernel - Unstable
*NOT SOLVED* 3ware 8506 - CentOS 4.5 or 5.0 - OpenVZ Stock Kernel - Unstable [message #22458] |
Mon, 29 October 2007 14:13 |
tchipman
Messages: 28 Registered: June 2006
|
Junior Member |
|
|
Hi,
Just want to report this "open issue" I have got, with some slight debug progress. If I have more notes (later today or tomorrow) I'll update this thread in case it is of interest to others. Not positive this is a "bug report" yet.
I have a 32-bit dual xeon system (Tyan server-class motherboard) with a 64-bit PCI slot 3ware Raid card (model 8506, takes 8 x SATA disks)
There appears to be some problem with this 3ware card with certain kernel versions, ie,
(1) stock RHEL/CentOS 5.0 kernel is BAD to use on this card, flat out (you end up with disk failure, dmesg errors logged abundantly after significant disk IO, approx as follows:
3w-xxxx: scsi0: Character ioctl (0x1f) timed out, resetting card.
sd 0:0:0:0: rejecting I/O to offline device
the raid array then flags itself as "rebuild from scratch" state and you lose everything in the array
(2) stock RHEL/CentOS 4.5 which is booted without "noapic" and "acpi=off" kernel flags -- also is unstable. By adding these flags, it becomes stable.
(3) OpenVZ - stock production OpenVZ kernel booted for either CentOS 5 or 4.5 install .. will be unstable, regardless of use of the noapci/acpi=off flags. kernel used for test identified itself as "vmlinuz-2.6.9-023stab044.11-smp"
This machine has been reinstalled about 6 times in the last week for testing. Managed to run solid over the weekend it seems and I'm doing one last cycle of sustained disk activity (500 runs of bonnie) -- if it survives this, then I'm willing to believe that scenarion (2) above is OK (stock CentOS 4.5 kernel with the acpi-noapic flags)
Assuming it survives, I'll then try install of OpenVZ "RHEL Stock" kernel rather than the standard OVZ production kernel, and see if that runs more stable, or if that is also a lost cause.
I suspect based on observations so far,
-there is something new in RHEL5 which breaks 3ware controller hardware
-this feature may also be present in the default build of the OVZ kernel for CentOS 4.X lineage as well?
For anyone who wishes to see, there is a thread open in the CentOS bug-tracker on this 3ware hardware issue. The URL is,
http://bugs.centos.org/view.php?id=2186
(I have been posting to that thread, as have a number of other 3ware customers using CentOS 4.X or 5.X)
Lots of fun,
Tim Chipman
[Updated on: Tue, 04 December 2007 14:27] by Moderator Report message to a moderator
|
|
|
|
|
|
Re: 3ware 8506 - CentOS 4.5 or 5.0 - OpenVZ Stock Kernel - Unstable [message #23777 is a reply to message #22458] |
Mon, 26 November 2007 16:09 |
tchipman
Messages: 28 Registered: June 2006
|
Junior Member |
|
|
Hi, just to followup / clarify,
I cannot any longer change / reboot this machine for testing. It is in production, and any tests will (1) cause downtime, (2) likely corrupt and destroy all data on the HDDs, resulting a full-reinstall, something I don't cherish given the number of reinstalls done to get the machine operational.
Just to give you more concrete reference info from my notes,
- the problem happend for certain with the kernel available in the openvz yum repo ~1 month ago, which was:
ovzkernel-smp.i686 2.6.9-023stab044.11 openvz-kernel-rh
- the type of error message logged in dmesg was,
--paste--
scsi0 (2:0): rejecting I/O to offline device
scsi0 (2:0): rejecting I/O to offline device
scsi0 (2:0): rejecting I/O to offline device
scsi0 (2:0): rejecting I/O to offline device
scsi0 (2:0): rejecting I/O to offline device
scsi0 (2:0): rejecting I/O to offline device
scsi0 (2:0): rejecting I/O to offline device
scsi0 (2:0): rejecting I/O to offline device
scsi0 (2:0): rejecting I/O to offline device
scsi0 (2:0): rejecting I/O to offline device
EXT3-fs error (device sdb1): ext3_find_entry: reading directory #110313495 offset 0
scsi0 (2:0): rejecting I/O to offline device
EXT3-fs error (device sdb1): ext3_find_entry: reading directory #110313495 offset 0
scsi0 (2:0): rejecting I/O to offline device
EXT3-fs error (device sdb1): ext3_find_entry: reading directory #110313492 offset 0
--endpaste--
I don't have any further exact notes to provide, unfortunately.
I did get this same general type of problem with all of the following,
* CentOS 5.0 default install 32 bit or OpenVZ yum repo kernel
* CentOS 4.5 default install 32-bit or OpenvZ yum repo kernel
* the only current solid working setup I have on this hardware, is CentOS 4.5 32-bit default kernel, booted with "noapic" and "apci=off" kernel parameters passed via grub at boot time.
I haven't tested CentOS 5.0 with the kernel options being passed, as I had a decently working config for the machine, and I was sick and tired of re-re-re-installing this system... didn't feel like doing another pair of installs just to test "maybe if it works" for CentOS 5.
Sorry I cannot provide more info. Possibly / hopefully this is slightly of more use.
Most likely, if the problem is "common" then other folks with 3ware gear .. will report the problem soon enough .. and they can then provide "live and up to date" debug info. (?)
--Tim
|
|
|
|
|
Re: 3ware 8506 - CentOS 4.5 or 5.0 - OpenVZ Stock Kernel - Unstable [message #28518 is a reply to message #24384] |
Fri, 21 March 2008 18:49 |
mcluver
Messages: 12 Registered: February 2007
|
Junior Member |
|
|
Hi Tim,
I have not been receiving the exact same errors messages as you, but it sounds like I do have nearly the exact same hardware setup and configuration.
Here is what I have been getting quite frequently:
kernel: attempt to access beyond end of device
kernel: sda2: rw=0, want=25781269064, limit=2433349485
kernel: attempt to access beyond end of device
kernel: sda2: rw=0, want=25781269064, limit=2433349485
I am using the 3ware Escalade 8506-8 (8 x 250GB WD SATA Drives), along with a Tyan server grade motherboard (Tyan Tiger i7501 (S2723)), with CentOS 4.4 installed in an OpenVZ HA cluster (using DRBD between two exactly matching servers).
I have had them in production since February 2007 in this configuration. Since, I have had to remove and completely wipe and reinstall them several times because of inevitable partition table and data corruption.
I am currently in the process of testing out whether using Debian etch with this hardware and software combination is more stable. I see that your last post was in Dec. 2007, have you made any discoveries or headway since your last post?
Has anyone else been experiencing the effects of this horrible hardware combination?
Best regards,
Matthew Cluver
|
|
|
|
|
|
|
|
Re: 3ware 8506 - CentOS 4.5 or 5.0 - OpenVZ Stock Kernel - Unstable [message #28841 is a reply to message #28518] |
Sun, 30 March 2008 08:44 |
tchipman
Messages: 28 Registered: June 2006
|
Junior Member |
|
|
Hi Matt,
Just to reply to your question: I ended up deploying the system in question in a totally different config due to the stability problems.
It ended up being deployed thus,
-no hardware raid, used the 3ware card as JBOD (ugh) only
-CentOS stock install, no OpenVZ at all
-software raid config for OS (mirrors) and data (raid5) slices
In this setup, the systems has been 100% solid and working since deployment, which dates back to my previous post in the thread.
Since the requirements for virtualization on this machine were not strict ones (ie, non-virtualized config was tolerable) and the stability issues were a problem, that is how the machine ended up. Clearly not an optimal use of the resource. However, given the history of the hardware (acquired via ebay, absurd low price) and the (very basic) requirements of the project, investing further time in debugging, reinstalling, etc. was not merited.
Sorry I don't have better news for you on this. From the sound of other postings in this thread since your query, it seems maybe things have improved / that other options are now available, which is nice..
---Tim
|
|
|
|
|
|
Re: 3ware 8506 - CentOS 4.5 or 5.0 - OpenVZ Stock Kernel - Unstable [message #39449 is a reply to message #29239] |
Sun, 25 April 2010 20:38 |
mcluver
Messages: 12 Registered: February 2007
|
Junior Member |
|
|
The reason I am posting here again about this topic is because I have switched to Debian since my previous posts and have still been running into the very same issues as described here.
As well as here in detail: http://marc.info/?l=linux-kernel&m=105752554030895&w =2
Upon researching for way, way, way too long -- I believe I have finally found the exact problem (from the horse's mouth) that is causing this kernel panic, data loss, controller failure.
It is a problem between the 3ware Escalade 7506 and 8506 cards and a specific chip on certain Tyan Motherboards, specifically the E7501 Intel chipset, which is present on my Tiger i7501 (S2723) motherboard.
--
FROM 3WARE (http://tyan.com/archive/support/html/f_s2885.html):
Why can't I get my 3ware Raid card to work correctly with this motherboard?
3ware and Tyan have worked together to identify the issue between 3ware 7506 ATA and 8506 SATA RAID controllers and certain chipsets. 3Ware has a technical note available on their knowledge base:
https://www.3ware.com/kbadmin/attachments/TM900-0045-00%20Re v%20A_P.pdf
Tyan will be working with 3Ware to validate the next generation of 8506/9500 RAID controllers on our motherboards.
--
The PDF suggests that the solution is as simple enough as changing the 1U riser card being used to Model# PCITX8-3R manufactured by Adex Electronics, Inc. (http://www.adexelec.com). Too bad I have gone through two years of intermittent hell to finally get this solution, I hope it works...
Best regards,
Matthew Cluver
[Updated on: Sun, 25 April 2010 23:30] Report message to a moderator
|
|
|
Goto Forum:
Current Time: Fri Nov 15 12:08:27 GMT 2024
Total time taken to generate the page: 0.03348 seconds
|