[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: Lua 5.4-work1 with first class arrays (implementation & benchmarks)
- From: Philipp Janda <siffiejoe@...>
- Date: Fri, 30 Mar 2018 11:10:37 +0200
Hi!
Am 29.03.2018 um 21:12 schröbte Petri Häkkinen:
On 29 Mar 2018, at 9.29, Philipp Janda <siffiejoe@gmx.net> wrote:
Couldn't you just add the length-updating code to normal tables? I mean we already pay the extra space for the length field.
I don't think we are understanding each other. Let me try again:
I suspect so too, because I mostly agree with you. I just don't like the
idea of adding a separate array sub-type. And I think it's completely
unnecessary (except as a type hint for some serialization libraries).
To sum up what I meant:
- Add an array size field to all tables like you did in your patch.
- Don't introduce a new array sub-type.
- Add the array size update code to the table key insertion function (I
believe there is already a separate code path for integer inserts).
- Array literal syntax is optional (I wouldn't add it). But table
literals should set the array size field to the largest integer key.
- Add a `table.resize()` function like you did, and modify the other
table functions to update the array size field. (You probably also need
some new C API functions for setting the array size.)
Lua tables are just key-value pairs with some optimizations. In a sparse table, like the one I showed in my example, there is nothing between the key-values, not even nils. This is the fundamental reason for the issue with holes and why # can't be implemented in constant time.
I have a different view. Tables are mappings from arbitrary keys to
corresponding values (a default value most of the time). If you have a
finite, ordered set of keys, a Lua table gives you a finite, ordered set
of values indexable by those keys. In Lua this set of keys is by
convention the set of integers starting from 1 up to some maximum
integer. The difficulty is agreeing on this maximum integer. You have
multiple options for that:
1. You could have a manually managed length for each table (e.g. the
".n" hack or your `table.resize()`).
2. You could observe the user of the table (e.g. use largest assigned
integer key).
3- You could scan all key-values that are different from the default
value (e.g. current Lua sequence concept, or linear scan like
`ipairs()`). Most of the time you are only interested in the non-default
values anyway.
4. maybe more?
All of those approaches have advantages and disadvantages. AFAICS, your
approach is a combination of 1. and 2.: You grow depending on usage
pattern (assignment out of bounds), you shrink on explicit request
(`table.resize()` or `table.remove()`). I think it's a nice mix, and
your benchmarks have shown that the performance and space concerns
(still looking for that old post -- we had a lot of discussions about
table length over the years) are mostly unfounded.
When you remove a key-value pair from the table, you would have to do some sort of scan of key-values to know the new length. You can do this when you insert/remove key-values, or you can do it lazily in the length operator. The latter is usually more efficient and that's what Lua does. When doing insert/remove on a table the table data structure does not know whether you are removing the last element or not, and in which indices the other values are stored in.
If you remove one element, the length decreases by one, as you wrote
yourself below. No scan necessary.
Arrays are different. They are dense and have all key-values between 1 and array size (even nils).
Arrays don't have to be dense. For sparse arrays it might make sense to
store only non-nil elements. Arrays must "have" all key-values between 1
and array size, but they don't necessarily have to store them.
What do you think about the following "array"?
local a = setmetatable( {}, {
__index = function( _, k ) return tostring( k ) end
} )
table.resize( a, 10 ) --> array of "1" to "10"
When you push a new value, the array size always grows by one. When you insert a new key-value the array size will always be extended up to the new index (so array size becomes the index of the added value). When you remove a key-value, you always decrease the size by one. When you overwrite an existing value, the size does not change. Simple and efficient.
Completely agree.
To get the speed benefits and to fix the hole issue, you have to break the rules of the game, i.e. the semantics need to change. Coming up with a new working semantic for tables, as proved by all the discussions over the years on this list, is hard (I would say impossible) and can break existing code in subtle ways. Luckily, arrays have just the right semantics for the job.
I think there will be broken code no matter which solution to the array
problem we choose. In this particular case: nilling the last sequence
element won't shrink the table anymore, and iterating over that table
will then produce an unexpected nil value.
Petri
Philipp