[Bug 41437] New: Implement a 64-bit version of __std_type_info_hash

Tue Oct 4 15:43:47 CDT 2016

https://bugs.winehq.org/show_bug.cgi?id=41437

            Bug ID: 41437
           Summary: Implement a 64-bit version of __std_type_info_hash
           Product: Wine
           Version: unspecified
          Hardware: x86-64
                OS: Linux
            Status: UNCONFIRMED
          Severity: normal
          Priority: P2
         Component: msvcrt
          Assignee: wine-bugs at winehq.org
          Reporter: cerebro.alexiel at gmail.com
      Distribution: ---

Two weeks ago, I saw commit 4931e6f92bc7e0c229a057ebf2e000f8f5aa1edd that
(partially) implements __std_type_info_hash function.

Code was simple but the 64-bit version was not implemented as it was using
"different constants".
I thought: "Findind a coefficient only requires time and patience, let's do
it!".

I had to find a value (noted as <?> here) that is such that (0xcbf29ce44fd0bfc1
^ 0x1) * <?> = 0xaf63bc4c29620a60 or 0xcbf29ce44fd0bfc0 * <?> =
0xaf63bc4c29620a60.

I started with a naive implementation computing all <?> values from 0 to
UINT64_MAX.
Was very long and very slow.
I optimized it using OpenMP (#pragma omp parallel) causing all my CPUs to work
at 100%.
Still very slow and the topmost bit would be tested in ages.

I then had the idea to use a (pseudo-)random non-repeating number generator.
This way, all numbers would be tested in (pseudo-)random order and equally
distributed on the 64-bit range. Plus, with the known seed and index, it would
be posible to stop and continue the computation at the point it stopped.

I chose Blowfish algorithm and let it work for about a week and then realized I
may need a better solution. Finding the 32-bit <?> value was very quick (less
than a minute) but a 64-bit value is way larger and testing every bit takes a
really long time.

Normally, if I have something like 5 * <?> = 15, I can do 15/5 = 3 = <?>.
Here, we abuse overflow so, mathematically it's more (0xcbf29ce44fd0bfc0 * <?>)
mod 2^64 = 0xaf63bc4c29620a60.
I discovered we can actually do "Modular multiplicative inverse" to find an
"inverse" of a number. <¿> such as (<¿> * a) mod x = 1
(a * <?>) mod x = b --> <?> mod x = b * <¿>

I found a C++ implementation that finds the inverse of a number modulo another.
I started with the 32-bit version to test it (no mod 2^32 to make it concise):
(0x811c9dc5 ^ 0x1) * <?> = 0x040c5b8c
0x811c9dc4 * <?> = 0x040c5b8c

0x100000000=2^32
modInverse(0x811c9dc4, 0x100000000) -> no inverse because 0x811c9dc4 is even

Anyway, let's do it with 0x2:
(0x811c9dc5 ^ 0x2) * <?> = 0x070c6045
0x811c9dc7 * <?> = 0x070c6045
0x811c9dc7 inverse is 0xdce213f7

in other words, (0x811c9dc7 * 0xdce213f7) mod 0x100000000 = 1
so 0x070c6045 * 0xdce213f7 = <?> = 0x1000193
we can verify that 0x811c9dc7 * 0x1000193 = 0x70c6045
and that 0x811c9dc4 * 0x1000193 = 0x40c5b8c
-> we have our 32-bit coefficient.

Note that 0x100000000 requires 33 bits so it's ok on my 64-bit computer but as
you may have guessed, 0x10000000000000000 (2^64) is 63 bits.
I tried using uint128_t but didn't work.

I then used GMP library (The GNU Multiple Precision Arithmetic Library) which
lets me work on numbers as big as I have memory.

Let's do it again for the 64-bit version (mox 2^64 omitted):
(0xcbf29ce44fd0bfc1 ^ 0x1) * <?> = 0xaf63bc4c29620a60
0xcbf29ce44fd0bfc0 * <?> = 0xaf63bc4c29620a60
0xcbf29ce44fd0bfc0 has no inverse because 0xcbf29ce44fd0bfc0 is even

(0xcbf29ce44fd0bfc1 ^ 0x2) * <?> = 0xaf63bf4c29620409
0xcbf29ce44fd0bfc3 * <?> = 0xaf63bf4c29620409
0xcbf29ce44fd0bfc3 inverse is 0xdbab92123bd8a8eb
in other words, (0xcbf29ce44fd0bfc3 * 0xdbab92123bd8a8eb) mod
0x10000000000000000 = 1

so 0xaf63bf4c29620409 * 0xdbab92123bd8a8eb = <?> = 0xa01b8255ca379c43
we can verify that 0xcbf29ce44fd0bfc3 * 0xa01b8255ca379c43 = 0xaf63bf4c29620409
BUT 0xcbf29ce44fd0bfc0 * 0xa01b8255ca379c43 != 0xaf63bc4c29620a60

I tested with 0x2, 0x4, 0x42, inverse is ok but <?> was always different and
wasn't working: I was missing something.

I googled 0x1000193 and found out that it's a well-known number used in FNV-1
and FNV-1a hash algorithms
(https://en.wikipedia.org/wiki/Fowler%E2%80%93Noll%E2%80%93Vo_hash_function).

Ideal 64-bit <?> (called FNV_prime) is 0xcbf29ce484222325 which slightly
differs from our 0xcbf29ce44fd0bfc1. Since only the low word is changed, I
guessed it was a XOR operation on the low part. I tested 0xcbf29ce484222325 ^ i
== 0xcbf29ce44fd0bfc1 in a loop and found i was 0xcbf29ce4. I retested with
other values et voilà, I found how it worked!

I just submitted a patch to fix this bug.

I posted a detailed explanation here to prove I didn't disassembled the native
DLL and maybe to tell people it's not that hard to do useful things in wine. By
the way, it's a fun way to learn things!

-- 
Do not reply to this email, post in Bugzilla using the
above URL to reply.
You are receiving this mail because:
You are watching all bug changes.