OpenVZ Forum


Home » Mailing lists » Users » weird filesystem corruption issues
weird filesystem corruption issues [message #48243] Tue, 09 October 2012 12:34 Go to next message
Aleksandar Ivanisevic is currently offline  Aleksandar Ivanisevic
Messages: 34
Registered: April 2011
Member
From: *parallels.com
Hi,

please help me debug this weird issue. This has been happening
occasionally in my setup for literally years, on at least 10 different
OVZ kernels.

in VE:

# md5sum /tmp/application.log.backup
89024ce67704e3cf2aa9e7b2e2584a60 /tmp/application.log.backup

# gzip > application.log.backup.gz < /tmp/application.log.backup
# zcat application.log.backup.gz | md5sum

zcat: application.log.backup.gz: unexpected end of file
986389b791ee94692da36a56be29392a -

but the next attempt 10 seconds later:

# gzip > application.log.backup.gz < /tmp/application.log.backup
# /var/log zcat application.log.backup.gz | md5sum
89024ce67704e3cf2aa9e7b2e2584a60 -

the file is truncated at the random place.

I can reliably reproduce this by running this in a loop:

root@ /var/log while true; do gzip > application.log.backup.gz < /tmp/application.log.backup ; zcat application.log.backup.gz | md5sum; done
89024ce67704e3cf2aa9e7b2e2584a60 -

zcat: application.log.backup.gz: unexpected end of file
ad830a43ccf4641afc2c0dfd42b3d5b8 -
89024ce67704e3cf2aa9e7b2e2584a60 -
89024ce67704e3cf2aa9e7b2e2584a60 -

zcat: application.log.backup.gz: unexpected end of file
a35d71d503b3cfc249409075afd9295f -
89024ce67704e3cf2aa9e7b2e2584a60 -
89024ce67704e3cf2aa9e7b2e2584a60 -


But when I run it from the HN, there is never any issue


# while true; do gzip > application.log.backup.gz < /vz/private/1090/tmp/application.log.backup ; zcat application.log.backup.gz | md5sum; done
89024ce67704e3cf2aa9e7b2e2584a60 -
89024ce67704e3cf2aa9e7b2e2584a60 -
89024ce67704e3cf2aa9e7b2e2584a60 -
89024ce67704e3cf2aa9e7b2e2584a60 -
89024ce67704e3cf2aa9e7b2e2584a60 -
89024ce67704e3cf2aa9e7b2e2584a60 -
89024ce67704e3cf2aa9e7b2e2584a60 -
89024ce67704e3cf2aa9e7b2e2584a60 -
89024ce67704e3cf2aa9e7b2e2584a60 -
89024ce67704e3cf2aa9e7b2e2584a60 -
89024ce67704e3cf2aa9e7b2e2584a60 -
89024ce67704e3cf2aa9e7b2e2584a60 -
89024ce67704e3cf2aa9e7b2e2584a60 -
Re: weird filesystem corruption issues [message #48244 is a reply to message #48243] Tue, 09 October 2012 13:53 Go to previous messageGo to next message
Kirill Korotaev is currently offline  Kirill Korotaev
Messages: 137
Registered: January 2006
Senior Member
From: *parallels.com
can you provide access and demonstrate this on the real node?
The only guess I have is that some application changes your files in /tmp or you have memory corruptions, so memtest86 is recommended to run anyway.

Thanks,
Kirill


On Oct 9, 2012, at 16:34 , Aleksandar Ivanisevic <aleksandar@ivanisevic.de> wrote:

>
> Hi,
>
> please help me debug this weird issue. This has been happening
> occasionally in my setup for literally years, on at least 10 different
> OVZ kernels.
>
> in VE:
>
> # md5sum /tmp/application.log.backup
> 89024ce67704e3cf2aa9e7b2e2584a60 /tmp/application.log.backup
>
> # gzip > application.log.backup.gz < /tmp/application.log.backup
> # zcat application.log.backup.gz | md5sum
>
> zcat: application.log.backup.gz: unexpected end of file
> 986389b791ee94692da36a56be29392a -
>
> but the next attempt 10 seconds later:
>
> # gzip > application.log.backup.gz < /tmp/application.log.backup
> # /var/log zcat application.log.backup.gz | md5sum
> 89024ce67704e3cf2aa9e7b2e2584a60 -
>
> the file is truncated at the random place.
>
> I can reliably reproduce this by running this in a loop:
>
> root@ /var/log while true; do gzip > application.log.backup.gz < /tmp/application.log.backup ; zcat application.log.backup.gz | md5sum; done
> 89024ce67704e3cf2aa9e7b2e2584a60 -
>
> zcat: application.log.backup.gz: unexpected end of file
> ad830a43ccf4641afc2c0dfd42b3d5b8 -
> 89024ce67704e3cf2aa9e7b2e2584a60 -
> 89024ce67704e3cf2aa9e7b2e2584a60 -
>
> zcat: application.log.backup.gz: unexpected end of file
> a35d71d503b3cfc249409075afd9295f -
> 89024ce67704e3cf2aa9e7b2e2584a60 -
> 89024ce67704e3cf2aa9e7b2e2584a60 -
>
>
> But when I run it from the HN, there is never any issue
>
>
> # while true; do gzip > application.log.backup.gz < /vz/private/1090/tmp/application.log.backup ; zcat application.log.backup.gz | md5sum; done
> 89024ce67704e3cf2aa9e7b2e2584a60 -
> 89024ce67704e3cf2aa9e7b2e2584a60 -
> 89024ce67704e3cf2aa9e7b2e2584a60 -
> 89024ce67704e3cf2aa9e7b2e2584a60 -
> 89024ce67704e3cf2aa9e7b2e2584a60 -
> 89024ce67704e3cf2aa9e7b2e2584a60 -
> 89024ce67704e3cf2aa9e7b2e2584a60 -
> 89024ce67704e3cf2aa9e7b2e2584a60 -
> 89024ce67704e3cf2aa9e7b2e2584a60 -
> 89024ce67704e3cf2aa9e7b2e2584a60 -
> 89024ce67704e3cf2aa9e7b2e2584a60 -
> 89024ce67704e3cf2aa9e7b2e2584a60 -
> 89024ce67704e3cf2aa9e7b2e2584a60 -
>
>
>
Re: weird filesystem corruption issues [message #48262 is a reply to message #48244] Wed, 10 October 2012 09:06 Go to previous message
Aleksandar Ivanisevic is currently offline  Aleksandar Ivanisevic
Messages: 34
Registered: April 2011
Member
From: *parallels.com
Kirill Korotaev <dev@parallels.com> writes:

Hi,

I might be able to provide access if you wish. Here are some more
facts:

Once i reboot the VE the issue goes away for a few days, then it
happens again. The issue with gzip is just one manifestation, other
symptoms include files appearing corrupted that are fine again after
the reboot (something with the disk cache perhaps?), weird segfaults
and bus errors for various scripts and apps and similar things. We
also used to have weird mysql crashes (datafile checksum corruption)
in another VE running on the same HN that always stopped once we
migrated away, but this has stopped happening in the last few kernels.

Only this one VE exhibits this behaviour, out of 50 others running on
18 HNs with more or less identical hardware.

VE is migrated regularly to other nodes (both offline and online)
during maintenance and it has this problem on all HNs it has been
running on.

Nodes are IBM xSeries servers with ECC memory, so I don't think its a
physical memory issue.

VE is running Nagios/cfengine/syslogd server, it is fairly loaded, but
it is not the most loaded VE in our environment.

most of the limits are set to the max, failcnts are always zero.

Any pointers in where should I look are appreciated.

regards,

> can you provide access and demonstrate this on the real node?
> The only guess I have is that some application changes your files in /tmp or you have memory corruptions, so memtest86 is recommended to run anyway.
>
> Thanks,
> Kirill
>
>
> On Oct 9, 2012, at 16:34 , Aleksandar Ivanisevic <aleksandar@ivanisevic.de> wrote:
>
>>
>> Hi,
>>
>> please help me debug this weird issue. This has been happening
>> occasionally in my setup for literally years, on at least 10 different
>> OVZ kernels.
>>
>> in VE:
>>
>> # md5sum /tmp/application.log.backup
>> 89024ce67704e3cf2aa9e7b2e2584a60 /tmp/application.log.backup
>>
>> # gzip > application.log.backup.gz < /tmp/application.log.backup
>> # zcat application.log.backup.gz | md5sum
>>
>> zcat: application.log.backup.gz: unexpected end of file
>> 986389b791ee94692da36a56be29392a -
>>
>> but the next attempt 10 seconds later:
>>
>> # gzip > application.log.backup.gz < /tmp/application.log.backup
>> # /var/log zcat application.log.backup.gz | md5sum
>> 89024ce67704e3cf2aa9e7b2e2584a60 -
>>
>> the file is truncated at the random place.
>>
>> I can reliably reproduce this by running this in a loop:
>>
>> root@ /var/log while true; do gzip > application.log.backup.gz < /tmp/application.log.backup ; zcat application.log.backup.gz | md5sum; done
>> 89024ce67704e3cf2aa9e7b2e2584a60 -
>>
>> zcat: application.log.backup.gz: unexpected end of file
>> ad830a43ccf4641afc2c0dfd42b3d5b8 -
>> 89024ce67704e3cf2aa9e7b2e2584a60 -
>> 89024ce67704e3cf2aa9e7b2e2584a60 -
>>
>> zcat: application.log.backup.gz: unexpected end of file
>> a35d71d503b3cfc249409075afd9295f -
>> 89024ce67704e3cf2aa9e7b2e2584a60 -
>> 89024ce67704e3cf2aa9e7b2e2584a60 -
>>
>>
>> But when I run it from the HN, there is never any issue
>>
>>
>> # while true; do gzip > application.log.backup.gz < /vz/private/1090/tmp/application.log.backup ; zcat application.log.backup.gz | md5sum; done
>> 89024ce67704e3cf2aa9e7b2e2584a60 -
>> 89024ce67704e3cf2aa9e7b2e2584a60 -
>> 89024ce67704e3cf2aa9e7b2e2584a60 -
>> 89024ce67704e3cf2aa9e7b2e2584a60 -
>> 89024ce67704e3cf2aa9e7b2e2584a60 -
>> 89024ce67704e3cf2aa9e7b2e2584a60 -
>> 89024ce67704e3cf2aa9e7b2e2584a60 -
>> 89024ce67704e3cf2aa9e7b2e2584a60 -
>> 89024ce67704e3cf2aa9e7b2e2584a60 -
>> 89024ce67704e3cf2aa9e7b2e2584a60 -
>> 89024ce67704e3cf2aa9e7b2e2584a60 -
>> 89024ce67704e3cf2aa9e7b2e2584a60 -
>> 89024ce67704e3cf2aa9e7b2e2584a60 -
>>
>>
>>
--
Ti si arogantan, prepotentan i peglaš vlastitu frustraciju. -- Ivan
Tišljar, hr.comp.os.linux
Previous Topic: advice for quota file location
Next Topic: reboot in CT still does not work
Goto Forum:
  


Current Time: Wed Dec 13 01:16:10 GMT 2017