lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


David Olofson wrote:
> On Monday 17 December 2007, Mike Pall wrote:
> > The new hash algorithm is a bastardized variant of Bob Jenkins'
> > fast rotate-and-mix hash. It hashes up to 12 chars from a string,
> > picked from the start, middle and end. The basic ideas are: less
> > fetches, more bit mixing, no loops and a constant hash time.
> 
> Brilliant! Can I be as rude as to blatantly steal this for EEL? 

No problem. The original algorithm is in the public domain and so
is my minor variation of it. Be sure to credit the original author.

But maybe you should wait a bit until more feedback has been
posted on the list. Especially about the performance effects on
larger, non-synthetic and string-happy applications.

> > CAVEAT: this version relies on UNALIGNED access of 32 bit words
> > for speed. I.e. it probably only works on x86 and will certainly
> > crash on most other architectures. Sorry, I don't have the time
> > right now to add a (fast) replacement with aligned accesses. Feel
> > free to adapt the patch to your needs.
> 
> Would it be acceptable to just round the middle and end indices down 
> to the nearest aligned address?

This is difficult because the string itself may not be aligned.
And in my tests it turned out to be important to always include
the first few and the last few characters in the hash.

The original code covers the alignment issue, too:

  http://www.burtleburtle.net/bob/c/lookup3.c
  http://www.burtleburtle.net/bob/hash/index.html
  http://www.burtleburtle.net/bob/hash/doobs.html

[But note that it always uses all characters of a string for
calculating the hash.]

> BTW, have you considered the penalty of unaligned access in this 
> context? (Don't even know if it's significant on modern x86 in 
> situations like this...)

The only significant penalty for unaligned access on x86 is when
cache lines or page boundaries are crossed. The other cases show
no measurable difference; the longer latency is shadowed. This
means that (on average) 32 bit unaligned accesses are a lot
faster on x86 hardware than doing the same in software with byte
fetching and bit shifting. Note: the penalty for 64 bit or 128
bit (FP or SSE) unaligned accesses is much higher.

[Of course none of this applies to most non-x86 CPUs.]

--Mike