zlib's gzseek return gabbage and fails intermittently under wine.

Hin-Tak Leung htl10 at users.sourceforge.net
Mon Apr 15 03:30:07 CDT 2013


--- On Mon, 15/4/13, Nikolay Sivov <bunglehead at gmail.com> wrote:

> On 4/15/2013 02:50, Hin-Tak Leung
> wrote:
> > --- On Sun, 14/4/13, Vincent Povirk <madewokherd at gmail.com>
> wrote:
> >
> >> Well, here's a simple thing you can
> >> check: Does your zlib dll link to
> >> _lseek or _lseeki64? The first one uses a 32-bit
> offset.
> >> Wine's
> >> implementation (http://source.winehq.org/source/dlls/msvcrt/file.c#L1090)
> >> expands that to 64-bit and later truncates the file
> offset
> >> to 32-bit.
> >> For a file larger than 2 GB, that could account for
> the
> >> large negative
> >> value you're seeing.
> >>
> >> And since this would only matter in cases where
> zlib uses
> >> lseek (the
> >> first time through I guess it wouldn't, as it has
> to read
> >> the whole
> >> everything up to the offset you give at least once)
> and is
> >> at least 2
> >> GB into the file, that might also explain why it
> doesn't
> >> fail
> >> initially.
> >>
> >> But without really digging into the zlib code, all
> I can do
> >> it speculate.
> >>
> >> I should probably also check coapp's build of zlib
> >> sometime.
> > It is not a dll - as you suggested and I already wrote,
> due to past experience of other's packaging of slightly
> outdated, it is being built against a private *source* copy
> of the latest zlib.
> >
> > Also the bogus offset is not large negative but large
> (larger than 2^32) positive.
> >
> > Here is an example of the debug output under wine:
> >
> > ---------------
> > set_filepos failed at 34307 returning 134127533721091
> > Re-opening to re-try
> > Retry successful
> > set_filepos failed at 96919 returning 146686018157207
> > Re-opening to re-try
> > Retry successful
> > set_filepos failed at 128254 returning 12103217968382
> > Re-opening to re-try
> > Retry successful
> > ...
> > ---------------
> >
> > This is generated by this code snipplet which is called
> inside a loop, all wrapped in the c++ class:
> >
> > ---------------
> >         off_t offset =
> gzseek(gzvcf_in, filepos, SEEK_SET);
> >         if (offset !=
> filepos) { //implicitly converted to off_t by template
> streamoff()
> >            
> LOG.printLOG("set_filepos failed at " +
> LOG.streampos2str(filepos)
> >            
>   + " returning " + LOG.off_t2str(offset) + "\n");
> >            
> LOG.printLOG("Re-opening to re-try\n");
> >            
> close(); open();
> >            
> off_t offset1 = gzseek(gzvcf_in, filepos, SEEK_SET);
> >             if
> (offset1 == filepos)
> >            
>   LOG.printLOG("Retry successful\n");
> >            
> else
> >            
>   LOG.error("Retry failed\n"); // this also aborts
> >         }
> >
> > -------------------
> >
> > This code runs silently on linux i.e. the "if (offset
> != filepos)" condition is not triggered.

> For windows build you'll need to define _WIN32, so _lseeki64
> is used by 
> zlib. After this done you could play with
> native msvcrt to see if it helps, and after that +relay will
> tell you 
> everything.

$ i686-w64-mingw32-cpp -dM |grep WIN32
#define _WIN32 1
#define __WIN32 1
#define __WIN32__ 1
#define WIN32 1

I haven't explained what the application does earlier, so I'd try to add this info now. It  gzseek's to a set of previously generated offsets (some - actually all the relevant ones, I think, are beyond 2G in real/already-compressed data.), gzgets a few bytes, apply a user-selected criteria on those bytes, make a records of which of those offsets are "requested", then go back and go over those requested offsets again and gzgets a much larger chunk. Conceptually it reads the first few columns of a very large table, use some user defined criteria to select on those, and extract selected rows.

In the early test (which took most of a full day to run, compared to about 20 minutes on linux), I found that the count of of matched requests was correct, so the gzgets in the first pass was correct, but the extracted result from the 2nd pass was complete garbage. So my first idea was that the incremental forward gzseek in the first pass were okay, but the large backward gzseek between the first pass and the 2nd pass was wrong. Hence my code addition to check the return value of gzseek, as well as close/re-open to gzseek forward from the beginning if gzseek return a wrong value - I was only expecting -1 for failure to rewine.

After the code addition, I found that the return values from even the first pass are wrong every other time. So it looks like it is two bugs somewhere, and on 2nd-thought, not necessarily with wine - I think I should try it on window at some stage.

- I looked at the zlib code itself (it is in a file gzlib.c in 1.2.7 for those who wants to have a go). It seems that it always converts a seek request into relative one, do some actual work, then convert back return an absolute offset. It is possible that there is a bug somewhat in that, so that a fresh gzseek - having no where to be relative to - is correct, while a 2nd gzseek - after some flawed conversion to relative values - is wrong. But this first bug is related just to a flawed return value and not to the content re-positioning. The 'do some actual work' part of repositioning, etc seems to be correct as an immediate gzgets obtains correct content, even though gzseek returns a wrong value.

- the date from gzgets in the 2nd pass after a big backward gzseek is definitely wrong.

So my plan is (1) try to run it on windows at some stage, (2) trim the code to a small demo. (after I get further in work of higher priorities...) 

An unfortunate side-effect of the work-around (close/re-open) affecting the first pass means the application now runs even slower than the 1-day time it did.

While writing this it just clicked on me that I can get a bit more info about where the big backward gzseek actually go instead of where it was supposed to: although most of the extracted stuff is garbage, there are small bits of non-munched stuff in there that I can at least try to match that to the input file to see which part of it did the garbage came from. So I am going to do this now.




More information about the wine-devel mailing list