lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]



On 9-Jun-05, at 9:13 AM, Paul Chiusano wrote:

So really,
it is the equals method that provides the standard for uniqueness, and
the hashCode method is just used to make this speedier... there is the
contract that objects which are equal should return the same hashCode,
otherwise the scheme wouldn't work.

I guess that's one way of looking at it. I would have said "must return", but it's much of a muchness. However, "just used to make this speedier" more or less skates over the fact that speedier here means is O(1) or so, rather than O(N). In other words, you could satisfy the contract by always returning the same hash value for every object, but that would reduce the hash table to a linear scan, which some might say is qualitatively different.


If I had a Pair class that just stored pairs of items, and I want
pairs to be equal if corresponding elements are the same, I could make
my __hash method return say, the sum of the hash values of the
elements stored, and I could make the __eq method compare objects at
the same position in the two pairs, returning true if both
corresponding elements were equal.

That's true, although it might not be the best possible hash function (it would return the same value for <a, b> as for <b, a> for example.) Writing good hash functions is an art -- and getting them to satisfy the equality contract can be tricky. (Getting this wrong creates very hard to find bugs.)

In any event, this algorithm is not significantly different from having a definition of CanonicalString for each object. Then a Pair's CanonicalString is just a Concatenation:

function PairMeta:CanonicalString()
  return '['..CanonicalString(self.first)..
         ','..CanonicalString(self.second)..']'
end

That's not much different from the hash function you propose, in terms of complexity, and has the advantage that the standard string hash function would probably not produce the same value for <a, b> and <b, a>.

With Lua 5.1(work)'s new (or reintroduced) facility for attaching metatables to primitive types, this could actually be implemented pretty easily:

function CanonicalString(obj)
  return ((getmetatable(obj) or {}).CanonicalString or tostring)(obj)
end

Then it is only necessary to define:
  getmetatable('string').CanonicalString =
    function(s) return string.format('%q', s) end

tostring should be adequate for other primitive types, but that could certainly be adjusted.

(I haven't actually tested this, so the syntax might not be quite right.)

Of course, all CanonicalString functions have to guarantee somehow to generate strings which are different from CanonicalString's returned by other types. This could be done by, for example, including some typename in the CanonicalString.

But I think there is an important point here: one of Lua's great strengths is that you can use mutable objects as table keys, using object identity as the equality function. This is, as far as I can see, the only meaningful way to use a mutable object as a table key, since any object whose equivalence class could change over time is not suitable as a table key. [1]

Still, your pair class might be immutable. In the case of immutable objects, it is not necessary to maintain multiple copies of the same object; the object can be interned at creation, as Lua does with strings. [2] An implementation of such an object would involve computing it's CanonicalString exactly once, when the object was created, and it would be used only to decide whether or not a new object was needed:

--(pseudocode:)
function createImmutable(obj)
  local cs = CanonicalString(obj)
  if not _INTERN_[cs] then _INTERN_[cs] = obj end
  return _INTERN_[cs]
end

From that point on, object identity is a simple and fast hashing and equality function.

Notes:

[1] "Mutable" here does not technically mean "no bit can change". It means that no change can have an observable effect (for some definition of observable.) For example, it does not prevent an object from having lazily computed attributes (although technically one could 'observe' this by watching how long it takes to return the attribute, I'm not using 'observable' quite in this sense -- specifically, in the case of table keys, it means that no change can affect the equivalence class of the object.)

[2] It is not strictly necessary to intern every immutable object at creation. It could be done lazily when the object was first involved in an equality computation, either through application of the == operator or through application of one of the table-lookup operators [] and []=
Lazy interning may be a useful optimization for certain immutable types.

[3] I'm still skating over some details. But I'm happy to continue the discussion.