Hello,
On one of our servers we are running OpenVZ for some time without any
problems. But yesterday I started a virtual machine based on a
iscsi-target, this target is running on opensolaris. After a few hours
without problems, the system totally collapsed by the cause of some
ip-conflicts. The machine running openvz was claiming IP-adres space, or
sending ip-frames with the same source ip as the storage server, and
because of this, the storage system disabled the network interfaces now
and then. I don't think this is caused by a bug in openvz, but because
we are running openiscsi in combination with opensolaris for some time,
i just want to know if it is possible that openvz interacts with the
storage/openiscsi layer and some error can raise because of that.
The details:
iscsi-target machine:
intel 64-bit OpenSolaris/SunOS 5.10
Target is on a zfs volume
open-iscsi:
version: 2.0-868-test1
This version because i had some other problems with previous versions
openVZ machine:
kernel: 2.6.18-1-openvz
patch: 028.18
platform: intel 64-bit
os: debian etch
extra sysctl.conf settings on openvz machine:
net.ipv4.tcp_max_tw_kmem_fraction=384
net.ipv4.tcp_max_tw_buckets_ub=16536
The problems:
During the night suddenly ping to our storage server dropped several
times. This is caused by sunos, because it discovers that an other
machine is using the same ip, and than disables the network interface
for some time and after some timout then tries to recover the
IP/interface. The mac-addresses in the SunOS log matches with the
hardware addresses of the openvz machine.
On the OpenVZ machine:
Around the same time, the errors at the bottom of this email occurred.
If the timing of all logfiles is correct, it looks like that these
errors occurred first, and then the problems with the IP's occurred. I
wonder if this has to do anything with the interface stack of openvz
conflicting with the lowlevel access of the interface by open-iscsi.
Maybe the IP-stack mirrored packets from the solaris machine or
something like that. On the machine without openvz but nearly the same
kernel and the same iscsi stack we didn't had any of these problems.
I hope anyone has a good hint. Are other people using iscsi targets for
storage and is this done on the same interface as the external network
etc. Are there any ip/tcp system settings where I have to take care of...
Kind regards,
Mart van Santen
Jan 23 08:30:41 krypton kernel: session0: iscsi: session recovery timed
out after 120 secs
Jan 23 08:30:41 krypton kernel: iscsi: cmd 0x28 is not queued (7)
Jan 23 08:30:41 krypton kernel: sd 1:0:0:0: SCSI error: return code =
0x00010000
Jan 23 08:30:41 krypton kernel: end_request: I/O error, dev sdb, sector
4969152
Jan 23 08:30:41 krypton kernel: iscsi: cmd 0x28 is not queued (7)
Jan 23 08:30:41 krypton kernel: sd 1:0:0:0: SCSI error: return code =
0x00010000
Jan 23 08:30:41 krypton kernel: end_request: I/O error, dev sdb, sector
4969152
Jan 23 08:30:41 krypton kernel: iscsi: cmd 0x2a is not queued (7)
Jan 23 08:30:41 krypton last message repeated 6 times
Jan 23 08:30:41 krypton kernel: sd 1:0:0:0: SCSI error: return code =
0x00010000
Jan 23 08:30:41 krypton kernel: end_request: I/O error, dev sdb, sector
37901744
Jan 23 08:30:41 krypton kernel: sd 1:0:0:0: SCSI error: return code =
0x00010000
Jan 23 08:30:41 krypton kernel: end_request: I/O error, dev sdb, sector
38917752
Jan 23 08:30:41 krypton kernel: sd 1:0:0:0: SCSI error: return code =
0x00010000
Jan 23 08:30:41 krypton kernel: end_request: I/O error, dev sdb, sector
38918128
Jan 23 08:30:41 krypton kernel: sd 1:0:0:0: SCSI error: return code =
0x00010000
Jan 23 08:30:41 krypton kernel: end_request: I/O error, dev sdb, sector
3543176
Jan 23 08:30:41 krypton kernel: sd 1:0:0:0: SCSI error: return code =
0x00010000
Jan 23 08:30:41 krypton kernel: printk: 18 messages suppressed.
Jan 23 08:30:41 krypton kernel: Buffer I/O error on device sdb1, logical
block 4864715
Jan 23 08:30:41 krypton kernel: lost page write due to I/O error on sdb1
Jan 23 08:30:41 krypton kernel: Buffer I/O error on device sdb1, logical
block 4864762
Jan 23 08:30:41 krypton kernel: lost page write due to I/O error on sdb1
Jan 23 08:30:41 krypton kernel: end_request: I/O error, dev sdb, sector
36712096
Jan 23 08:30:41 krypton kernel: sd 1:0:0:0: SCSI error: return code =
0x00010000
Jan 23 08:30:41 krypton kernel: end_request: I/O error, dev sdb, sector
4676184
Jan 23 08:30:41 krypton kernel: sd 1:0:0:0: SCSI error: return code =
0x00010000
Jan 23 08:30:41 krypton kernel: end_request: I/O error, dev sdb, sector
4676432
Jan 23 08:30:41 krypton kernel: Buffer I/O error on device sdb1, logical
block 584519
Jan 23 08:30:41 krypton kernel: lost page write due to I/O error on sdb1
Jan 23 08:30:41 krypton kernel: Buffer I/O error on device sdb1, logical
block 584550
Jan 23 08:30:41 krypton kernel: lost page write due to I/O error on sdb1
Jan 23 08:30:41 krypton kernel: I/O error in filesystem ("sdb1")
meta-data dev sdb1 block 0x2302e80 ("xfs_trans_read_buf") error 5
buf count 8192
Jan 23 08:30:42 krypton kernel: iscsi: cmd 0x2a is not queued (7)
Jan 23 08:30:42 krypton last message repeated 4 times
Jan 23 08:30:42 krypton kernel: sd 1:0:0:0: SCSI error: return code =
0x00010000
Jan 23 08:30:42 krypton kernel: end_request: I/O error, dev sdb, sector
36712096
Jan 23 08:30:42 krypton kernel: sd 1:0:0:0: SCSI error: return code =
0x00010000
Jan 23 08:30:42 krypton kernel: end_request: I/O error, dev sdb, sector
38917752
Jan 23 08:30:42 krypton kernel: sd 1:0:0:0: SCSI error: return code =
0x00010000
Jan 23 08:30:42 krypton kernel: end_request: I/O error, dev sdb, sector
3543176
Jan 23 08:30:42 krypton kernel: sd 1:0:0:0: SCSI error: return code =
0x00010000
Jan 23 08:30:42 krypton kernel: end_request: I/O error, dev sdb, sector
37901744
Jan 23 08:30:42 krypton kernel: sd 1:0:0:0: SCSI error: return code =
0x00010000
Jan 23 08:30:42 krypton kernel: end_request: I/O error, dev sdb, sector
38918136
Jan 23 08:30:42 krypton kernel: Buffer I/O error on device sdb1, logical
block 4864715
Jan 23 08:30:42 krypton kernel: lost page write due to I/O error on sdb1
Jan 23 08:30:42 krypton kernel: Buffer I/O error on device sdb1, logical
block 4864763
Jan 23 08:30:42 krypton kernel: lost page write due to I/O error on sdb1
Jan 23 08:31:18 krypton kernel: iscsi: cmd 0x2a is not queued (7)
Jan 23 08:31:18 krypton kernel: iscsi: cmd 0x2a is not queued (7)
Jan 23 08:31:18 krypton kernel: sd 1:0:0:0: SCSI error: return code =
0x00010000
Jan 23 08:31:18 krypton kernel: end_request: I/O error, dev sdb, sector
37901744
Jan 23 08:31:18 krypton kernel: sd 1:0:0:0: SCSI error: return code =
0x00010000
Jan 23 08:31:18 krypton kernel: end_request: I/O error, dev sdb, sector
38918136
Jan 23 08:31:18 krypton kernel: Buffer I/O error on device sdb1, logical
block 4864763
Jan 23 08:31:18 krypton kernel: lost page write due to I/O error on sdb1
Jan 23 08:31:20 krypton kernel: iscsi: cmd 0x28 is not queued (7)
Jan 23 08:31:20 krypton kernel: sd 1:0:0:0: SCSI error: return code =
0x00010000
Jan 23 08:31:20 krypton kernel: end_request: I/O error, dev sdb, sector
4701032
Jan 23 08:31:20 krypton kernel: iscsi: cmd 0x28 is not queued (7)
Jan 23 08:31:20 krypton kernel: sd 1:0:0:0: SCSI error: return code =
0x00010000
Jan 23 08:31:20 krypton kernel: end_request: I/O error, dev sdb, sector
4701032
Jan 23 08:31:20 krypton kernel: iscsi: cmd 0x28 is not queued (7)
Jan 23 08:31:20 krypton kernel: sd 1:0:0:0: SCSI error: return code =
0x00010000
Jan 23 08:31:20 krypton kernel: end_request: I/O error, dev sdb, sector
4979496
Jan 23 08:31:20 krypton kernel: iscsi: cmd 0x28 is not queued (7)
Jan 23 08:31:20 krypton kernel: sd 1:0:0:0: SCSI error: return code =
0x00010000
Jan 23 08:31:20 krypton kernel: end_request: I/O error, dev sdb, sector
4979496
Jan 23 08:31:28 krypton kernel: iscsi: cmd 0x2a is not queued (7)
Jan 23 08:31:28 krypton kernel: iscsi: cmd 0x2a is not queued (7)
Jan 23 08:31:28 krypton kernel: sd 1:0:0:0: SCSI error: return code =
0x00010000
Jan 23 08:31:28 krypton kernel: end_request: I/O error, dev sdb, sector
36712096
Jan 23 08:31:28 krypton kernel: sd 1:0:0:0: SCSI error: return code =
0x00010000
Jan 23 08:31:28 krypton kernel: end_request: I/O error, dev sdb, sector
38917752
Jan 23 08:31:28 krypton kernel: Buffer I/O error on device sdb1, logical
block 4864715
Jan 23 08:31:28 krypton kernel: lost page write due to I/O error on sdb1
Jan 23 08:31:38 krypton kernel: iscsi: cmd 0x2a is not queued (7)
Jan 23 08:31:38 krypton kernel: iscsi: cmd 0x2a is not queued (7)
Jan 23 08:31:38 krypton kernel: sd 1:0:0:0: SCSI error: return code =
0x00010000
Jan 23 08:31:38 krypton kernel: end_request: I/O error, dev sdb, sector
5032176
Jan 23 08:31:38 krypton kernel: sd 1:0:0:0: SCSI error: return code =
0x00010000
Jan 23 08:31:38 krypton kernel: Buffer I/O error on device sdb1, logical
block 629018
Jan 23 08:31:38 krypton kernel: lost page write due to I/O error on sdb1
Jan 23 08:31:38 krypton kernel: end_request: I/O error, dev sdb, sector
3543176
Jan 23 08:31:53 krypton kernel: iscsi: cmd 0x2a is not queued (7)
Jan 23 08:31:53 krypton kernel: iscsi: cmd 0x2a is not queued (7)
Jan 23 08:31:53 krypton kernel: sd 1:0:0:0: SCSI error: return code =
0x00010000
Jan 23 08:31:53 krypton kernel: end_request: I/O error, dev sdb, sector
37901744
Jan 23 08:31:53 krypton kernel: sd 1:0:0:0: SCSI error: return code =
0x00010000
Jan 23 08:31:53 krypton kernel: end_request: I/O error, dev sdb, sector
38918136
Jan 23 08:31:53 krypton kernel: Buffer I/O error on device sdb1, logical
block 4864763
Jan 23 08:31:53 krypton kernel: lost page write due to I/O error on sdb1
Jan 23 08:32:08 krypton kernel: iscsi: cmd 0x2a is not queued (7)
Jan 23 08:32:08 krypton kernel: iscsi: cmd 0x2a is not queued (7)
Jan 23 08:32:08 krypton kernel: sd 1:0:0:0: SCSI error: return code =
0x00010000
Jan 23 08:32:08 krypton kernel: end_request: I/O error, dev sdb, sector
38917752
...