Re: question on hash for objects such as tables and functions

Le mer. 27 mai 2020 à 12:56, Philippe Verdy <verdyp@gmail.com> a écrit :

Le mer. 27 mai 2020 à 11:09, Francisco Olarte <folarte@peoplecall.com> a écrit :
> But the current implementation still uses it as the factor: (h<<5)+(h>>2) is the same as (h<<7 + h)>>>2 = (h*128+h)>>>2 = (h*129)>>>2; this way of writing it with the sum of shifts in two directions avoids discarding 2 high bits with (h<<7)

So, you affirm it is the same only to prove it is not the same in the
same sentence?

What??? I'm saying that they are arithmetically equivalent (you can check it yourself) if you ignore the limited bitlength. The difference is when integers have limited bitlength, because shifted bits will be discarded.

But as you want a visual demonstration (that anyone using binary arithmetic should know as it is evident in that case):

Take any binary number n represented as digits (..., H, G, F, E, D, C, B, A) where a is the lowest significant bit of just "0b...HGFEDCBA" for short notation:

h = 0b...HGFEDCBA

h<<5 = 0b...HGFEDCBA00000 = h*64

h>>2 = 0b...HGFEDC = floor(h/4)

h<<5 + h>>2 = 0b...HGFEDCBA00000 + 0b...HGFEDC

= floor(0b...HGFEDCBA * 64 + 0b...HGFEDCBA / 4)

= floor((0b...HGFEDCBA * 256 + 0b...HGFEDCBA) / 4)

= floor( 0b...HGFEDCBA * 260 / 4) = floor(h * 260 /4) = floor(h *129 / 4) = floor(h*129)>>2 = (h*129)>>2

but (h*129) as a integer with limited bitlength would discard the bits "...HG", where "HG" that must be kept in the highest bits of the result of >>2.

That's why is decomposed in two shifts of h (to the left and to the right) summed together (ignoring the highest bits discarded by the addition), so that >>2 is effectively a >>>2 rotation (with a limited bitlength for its first operand and in its result).

So (h<<5)+(h>>2) with a bitlength limit is equivalent to (h*129)>>2 = (h*129)>>>2 = rotr(h*129, 2) : the C compiler may generate the rotation itself in its optimizer to avoid duplicate reference to h and avoid allocating an extra register to hold the intermediate first operand of the sum, if it can use a single larger register (e.g. unsigned long or unsigned long long, instead of unsigned int) for computing a single multiplication without loosing the two bits, or could opt to using two registers for a sum if there's no wider register. But arithmerically this is still a multiplication by 129, preserving two "carry" bits in the result before shifting it to the right to reinject these carries (the two lowest bits left by the final shift to the right may be reinjected as high bits of the large register, but it does not matter, as the final store into "h" will discard them.

You can think it as you want with other checks. But this is really basic binary arithmetic rules with integers with limited bitlength (ie. with arithmetic operators working in Z/2^nZ instead of the infinite set of natural integers: the "+", "-", "<<", ">>" operators in C always operate in Z/2^nZ, i.e all operands and results rounded down modulo (2^n), where n is the bitlength of its relevant integer type, they never operate in the infinite set of natural integers with arbitrary bitlengths).

So now you've shown that:

- you don't know basic binary arithmetic

- you don't know the Knuth algorithm

You just stated that you have read TAOP (or affimed you had one copy of it), but you did not read it or did not understand it.

I have then serious doubts that you're qualified to evaluate the issue correctly, and even to correctly use the integer operators in any other C programs you write yourself: you don't care at all with precision limits, and don't care about the semantics and constraints explicitly given in TAOP or other serious papers about how to program in C.

And this remains true even if you did not initially write this piece code (because you just borrowed it from another source that was not properly reviewed and fixed before you reused it without reviewing it). If you still have doubts and you did not write the piece of code you borrowed from another source, then consult the initial author of that source, he should also assert that this piece of code has effectively a bug and is not a correct implementation of Knuth's fast hashing algorithm).