[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: Bug report in length of simple table
- From: Ross Berteig <Ross@...>
- Date: Thu, 15 Sep 2016 12:28:53 -0700
On 9/14/2016 4:42 PM, Miroslav Janíček wrote:
On 15 Sep 2016, at 0:15, Ross Berteig <Ross@CheshireEng.com> wrote:
I'm not sure there are practical versions of Lua where that table
could actually exist. It would have to be built with a small enough
integer type that a table with half of all possible integral keys could
fit within (virtual) memory.
Bah, that’s an implementation detail ;)
Well, I'm an Engineer, not a Mathematician, so implementation details
are what pays my bills ;-)
But you’re of course right — we probably won’t see systems that
could handle such tables for 64-bit integers in our lifetimes. However, I
think it’s perfectly possible to have such a table with 32-bit integers,
even today.
And although it would likely be foolish, I suspect with some effort one
could build a Lua using 16-bit integers, even if the float subtype
remained 32-bit. That might even make a kind of sense in some embedded
system environment, although having enough RAM to comfortably *use* Lua
usually implies also having a 32-bit CPU core.
(As a side note: in my reading of the Lua Reference
Manual I haven’t been able to conclusively discard the
possibility that computing sequence length involves
invoking the __index metamethod. I know it sounds silly,
but I think that a Lua implementation that would do that
would still be “legal” according to the manual. Now, in
such an implementation, the __index metamethod
function (t, k) if k > 0 then return k; else return nil; end end
would do the trick for any integer width.)
The 5.3 reference at https://www.lua.org/manual/5.3/manual.html#3.4.7
does not say that only raw operations will be used when __len() is not
provided, and nothing in section 2.4's description of __len or __index
would seem to promise otherwise.
As I read it, your proposed __index would not affect #t. You would need
to define __len for that, which makes sense because there really is no
sensible meaning to #t in that case.
Dipping into the source code, we find that the stock implementation of
the length operator is here:
https://www.lua.org/source/5.3/lvm.c.html#luaV_objlen
which calls
https://www.lua.org/source/5.3/ltable.c.html#luaH_getn
if __len() is not defined. luaH_getn() is part of the table
implementation and directly peeks at its internals to do the binary
search for a boundary value to return.
Now here's another implementation detail: luaH_getn() is declared to
return `int`, and the part that searches the hash table works with a
range i to j where i and j are declared `unsigned int` with the
identified boundary index implicitly cast back to `int` on return.
The catch is not only that `unsigned int` is not the same range as
`int`, but that it does not use `LUA_INTEGER` so the range of integer
keys implemented for the length of a sequence is restricted to the range
of `int` not whatever LUA_INTEGER is configured to be.
Also, there will be a gotcha waiting should #t exceed INT_MAX, which I
think could cause #t to be negative. If so, I'd argue that is a bug. I
have not attempted to create a test case, however.
But anyway, that wasn’t really the point. The point is that under
this definition, it seems that there exists a sequence that does not
> have a length.
Only if there can be a largest integer key. And in practice, in every
implementation, there does exist a largest integer n such that t[n],
t[n-1], and t[n+1] are all distinct entries in the table t.
Then there are many values of i > n, where i == i+1 (and hence t[i] ==
t[i+1]) and the whole idea of a sequence falls apart. This is a natural
result of the quiet and soft roll over from integers to floats in Lua
5.3, and always happened in earlier versions of Lua where numbers are
always floats.
The practical answer is to simply note that there is an implementation
defined longest possible sequence, and possibly give some guidance about
what it might be. As I read the sources, that longest length is INT_MAX
as currently implemented.
I think the fix could be quite simple, along the following lines:
* if t[1] == nil, then 0 is a border;
* let MAX be the maximum value for an integer, then MAX
is a border if t[MAX] ~= nil;
* for all positive integers i, if t[i] ~= nil and t[i+1] == nil,
then i is a border.
Plus: a table is a sequence if and only if it has exactly one border.
That said, I'm not certain that the definition of sequence cares
about whether the keys are specifically integer type numbers. So the
number math.maxinteger will be 2^63-1, and 2^63 can be exactly
represented in 64-bit float. 2^63+1 cannot, and that is probably where
the fuzzy non-border happens.
I must admit I’m not exactly sure what you mean.....
I'm just pointing out that the implementation detail of there being
special treatment for integral values is not necessarily important for
the definition of a sequence. 2^63+1 cannot be a border because 2^63+2
is, in the fun world of floating point arithmetic, the very same key.
--
Ross Berteig Ross@CheshireEng.com
Cheshire Engineering Corp. http://www.CheshireEng.com/
+1 626 303 1602
- References:
- Bug report in length of simple table, Tomas.Lavicka
- Re: Bug report in length of simple table, Marc Balmer
- Re: Bug report in length of simple table, Malma
- Re: Bug report in length of simple table, Oliver Kroth
- RE: Bug report in length of simple table, Tomas.Lavicka
- Re: Bug report in length of simple table, Dirk Laurie
- Re: Bug report in length of simple table, Roberto Ierusalimschy
- Re: Bug report in length of simple table, Miroslav Janíček
- Re: Bug report in length of simple table, Ross Berteig
- Re: Bug report in length of simple table, Miroslav Janíček