Re: equals metamethod and hash functions

lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

Subject: Re: equals metamethod and hash functions
From: Rici Lake <lua@...>
Date: Thu, 9 Jun 2005 11:54:30 -0500


On 9-Jun-05, at 9:13 AM, Paul Chiusano wrote:

So really,
it is the equals method that provides the standard for uniqueness, and
the hashCode method is just used to make this speedier... there is the
contract that objects which are equal should return the same hashCode,
otherwise the scheme wouldn't work.

I guess that's one way of looking at it. I would have said "mustreturn", but it's much of a muchness. However, "just used to make thisspeedier" more or less skates over the fact that speedier here means isO(1) or so, rather than O(N). In other words, you could satisfy thecontract by always returning the same hash value for every object, butthat would reduce the hash table to a linear scan, which some might sayis qualitatively different.


If I had a Pair class that just stored pairs of items, and I want
pairs to be equal if corresponding elements are the same, I could make
my __hash method return say, the sum of the hash values of the
elements stored, and I could make the __eq method compare objects at
the same position in the two pairs, returning true if both
corresponding elements were equal.

That's true, although it might not be the best possible hash function(it would return the same value for <a, b> as for <b, a> for example.)Writing good hash functions is an art -- and getting them to satisfythe equality contract can be tricky. (Getting this wrong creates veryhard to find bugs.)

In any event, this algorithm is not significantly different from havinga definition of CanonicalString for each object. Then a Pair'sCanonicalString is just a Concatenation:


function PairMeta:CanonicalString()
  return '['..CanonicalString(self.first)..
         ','..CanonicalString(self.second)..']'
end

That's not much different from the hash function you propose, in termsof complexity, and has the advantage that the standard string hashfunction would probably not produce the same value for <a, b> and <b,a>.

With Lua 5.1(work)'s new (or reintroduced) facility for attachingmetatables to primitive types, this could actually be implementedpretty easily:


function CanonicalString(obj)
  return ((getmetatable(obj) or {}).CanonicalString or tostring)(obj)
end

Then it is only necessary to define:
  getmetatable('string').CanonicalString =
    function(s) return string.format('%q', s) end

tostring should be adequate for other primitive types, but that couldcertainly be adjusted.

(I haven't actually tested this, so the syntax might not be quiteright.)

Of course, all CanonicalString functions have to guarantee somehow togenerate strings which are different from CanonicalString's returned byother types. This could be done by, for example, including sometypename in the CanonicalString.

But I think there is an important point here: one of Lua's greatstrengths is that you can use mutable objects as table keys, usingobject identity as the equality function. This is, as far as I can see,the only meaningful way to use a mutable object as a table key, sinceany object whose equivalence class could change over time is notsuitable as a table key. [1]

Still, your pair class might be immutable. In the case of immutableobjects, it is not necessary to maintain multiple copies of the sameobject; the object can be interned at creation, as Lua does withstrings. [2] An implementation of such an object would involvecomputing it's CanonicalString exactly once, when the object wascreated, and it would be used only to decide whether or not a newobject was needed:


--(pseudocode:)
function createImmutable(obj)
  local cs = CanonicalString(obj)
  if not _INTERN_[cs] then _INTERN_[cs] = obj end
  return _INTERN_[cs]
end

From that point on, object identity is a simple and fast hashing andequality function.


Notes:

[1] "Mutable" here does not technically mean "no bit can change". Itmeans that no change can have an observable effect (for some definitionof observable.) For example, it does not prevent an object from havinglazily computed attributes (although technically one could 'observe'this by watching how long it takes to return the attribute, I'm notusing 'observable' quite in this sense -- specifically, in the case oftable keys, it means that no change can affect the equivalence class ofthe object.)

[2] It is not strictly necessary to intern every immutable object atcreation. It could be done lazily when the object was first involved inan equality computation, either through application of the == operatoror through application of one of the table-lookup operators [] and []=

Lazy interning may be a useful optimization for certain immutable types.

[3] I'm still skating over some details. But I'm happy to continue thediscussion.

References:
- Re: equals metamethod and hash functions, Paul Chiusano

Prev by Date: Re: equals metamethod and hash functions
Next by Date: Re: equals metamethod and hash functions
Previous by thread: Re: equals metamethod and hash functions
Next by thread: Warning on floor and ceil on latest OS X devtools
Index(es):
- Date
- Thread