lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


Backing up to the original question of "how do I do hashes faster than
chars" there's MurmurHash ( http://code.google.com/p/smhasher/ ) and
FNV hashes. GCC's libstdc++ implements both[1] as does Ruby[2]. Note
that Ruby has a whitelist of architectures with UNALIGNED_WORD_ACCESS.
Lua would probably keep something like its current hash, and allow
replacement on architectures where an alternative worked and was
faster. If hashing is a bottleneck.

On Sat, Jan 7, 2012 at 6:47 AM, HyperHacker <hyperhacker@gmail.com> wrote:
> On Sat, Jan 7, 2012 at 04:40, Jay Carlson <nop@nop.com> wrote:
>> On Sat, Jan 7, 2012 at 1:00 AM, HyperHacker <hyperhacker@gmail.com> wrote:

>>> That seems terribly amusing, given that C is supposed to be low-level.
>>> There's no way to convert the address to an integer and slow-copy the
>>> first few bytes until we're at an aligned address?
>>
>> No. Pointers are weird. Doing this depends on implementation-specific
>> details. As it happens, a complete model for memory given as union {
>> int n; unsigned char b[sizeof(int)]; } is true almost everywhere.
>> However, (n & 0xff) may still be b[0] or b[3] or b[7], and I don't
>> believe there is any guarantee (intptr_t)&n takes on any value of
>> (intptr_t)&b[i].

> Hm, and there's no standard function to convert a pointer to integer
> for this sort of operation? Similar to atoi() and itoa(), does there
> exist some ptoi()?

You can cast anything to int and get *some* kind of result. This
doesn't tell you much because pointers are surprisingly opaque in ANSI
C, again to support machines with strange memory models. Here's a
quote from the Sperry--uh I mean Unisys OS 2200 ANSI C compiler
reference:

====
A pointer in UC cannot be treated as an integer. A UC pointer is a
two-word structure with the base virtual address (VA) of a bank in the
first word and a bit-word pointer in the second word. The bit-word
pointer is necessary since the 2200 hardware does not have byte
pointers; the basic pointer in the 2200 hardware is a word (36-bit) VA
pointer that can only point to words. The bit-word portion of the UC
pointer has a bit offset in the first 6 bits of the word and a word
offset in the lower 24 bits of the word. If you convert (cast) a UC
pointer to a 36-bit integer (int, long, or unsigned), the bit offset
is lost. Converting it back to a C pointer results in it pointing to a
word boundary. If you add 1 to the integer before converting it back
to a pointer, the pointer points to the next word, not the next byte.
A 36-bit integer is not capable of holding all the information in a UC
pointer.
====

At a minimum you need 38 bits to address bytes--oh, did I mention
they're 9-bit bytes? Anyway, bigger than a 36-bit int. Function
pointers are eight words long and there simply is no integral type you
can round trip them through.

If somebody has access to an OS 2200 box it might be fun to see if Lua
works. I had an account on the Unix subsystem (!) of one a zillion
years ago.

Jay

[1]: http://gcc.gnu.org/viewcvs/trunk/libstdc%2B%2B-v3/libsupc%2B%2B/hash_bytes.cc?revision=169421&view=markup
[2]: http://svn.ruby-lang.org/cgi-bin/viewvc.cgi/trunk/st.c?revision=34141&view=markup
at around line 971