[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: equals metamethod and hash functions
- From: Rici Lake <lua@...>
- Date: Thu, 9 Jun 2005 11:54:30 -0500
On 9-Jun-05, at 9:13 AM, Paul Chiusano wrote:
So really,
it is the equals method that provides the standard for uniqueness, and
the hashCode method is just used to make this speedier... there is the
contract that objects which are equal should return the same hashCode,
otherwise the scheme wouldn't work.
I guess that's one way of looking at it. I would have said "must
return", but it's much of a muchness. However, "just used to make this
speedier" more or less skates over the fact that speedier here means is
O(1) or so, rather than O(N). In other words, you could satisfy the
contract by always returning the same hash value for every object, but
that would reduce the hash table to a linear scan, which some might say
is qualitatively different.
If I had a Pair class that just stored pairs of items, and I want
pairs to be equal if corresponding elements are the same, I could make
my __hash method return say, the sum of the hash values of the
elements stored, and I could make the __eq method compare objects at
the same position in the two pairs, returning true if both
corresponding elements were equal.
That's true, although it might not be the best possible hash function
(it would return the same value for <a, b> as for <b, a> for example.)
Writing good hash functions is an art -- and getting them to satisfy
the equality contract can be tricky. (Getting this wrong creates very
hard to find bugs.)
In any event, this algorithm is not significantly different from having
a definition of CanonicalString for each object. Then a Pair's
CanonicalString is just a Concatenation:
function PairMeta:CanonicalString()
return '['..CanonicalString(self.first)..
','..CanonicalString(self.second)..']'
end
That's not much different from the hash function you propose, in terms
of complexity, and has the advantage that the standard string hash
function would probably not produce the same value for <a, b> and <b,
a>.
With Lua 5.1(work)'s new (or reintroduced) facility for attaching
metatables to primitive types, this could actually be implemented
pretty easily:
function CanonicalString(obj)
return ((getmetatable(obj) or {}).CanonicalString or tostring)(obj)
end
Then it is only necessary to define:
getmetatable('string').CanonicalString =
function(s) return string.format('%q', s) end
tostring should be adequate for other primitive types, but that could
certainly be adjusted.
(I haven't actually tested this, so the syntax might not be quite
right.)
Of course, all CanonicalString functions have to guarantee somehow to
generate strings which are different from CanonicalString's returned by
other types. This could be done by, for example, including some
typename in the CanonicalString.
But I think there is an important point here: one of Lua's great
strengths is that you can use mutable objects as table keys, using
object identity as the equality function. This is, as far as I can see,
the only meaningful way to use a mutable object as a table key, since
any object whose equivalence class could change over time is not
suitable as a table key. [1]
Still, your pair class might be immutable. In the case of immutable
objects, it is not necessary to maintain multiple copies of the same
object; the object can be interned at creation, as Lua does with
strings. [2] An implementation of such an object would involve
computing it's CanonicalString exactly once, when the object was
created, and it would be used only to decide whether or not a new
object was needed:
--(pseudocode:)
function createImmutable(obj)
local cs = CanonicalString(obj)
if not _INTERN_[cs] then _INTERN_[cs] = obj end
return _INTERN_[cs]
end
From that point on, object identity is a simple and fast hashing and
equality function.
Notes:
[1] "Mutable" here does not technically mean "no bit can change". It
means that no change can have an observable effect (for some definition
of observable.) For example, it does not prevent an object from having
lazily computed attributes (although technically one could 'observe'
this by watching how long it takes to return the attribute, I'm not
using 'observable' quite in this sense -- specifically, in the case of
table keys, it means that no change can affect the equivalence class of
the object.)
[2] It is not strictly necessary to intern every immutable object at
creation. It could be done lazily when the object was first involved in
an equality computation, either through application of the == operator
or through application of one of the table-lookup operators [] and []=
Lazy interning may be a useful optimization for certain immutable types.
[3] I'm still skating over some details. But I'm happy to continue the
discussion.