Re: Lua 5.4-work1 with first class arrays (implementation & benchmarks)

lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

Subject: Re: Lua 5.4-work1 with first class arrays (implementation & benchmarks)
From: Philipp Janda <siffiejoe@...>
Date: Fri, 30 Mar 2018 11:10:37 +0200

Hi!

Am 29.03.2018 um 21:12 schröbte Petri Häkkinen:

On 29 Mar 2018, at 9.29, Philipp Janda <siffiejoe@gmx.net> wrote:

Couldn't you just add the length-updating code to normal tables? I mean we already pay the extra space for the length field.


I don't think we are understanding each other. Let me try again:

I suspect so too, because I mostly agree with you. I just don't like theidea of adding a separate array sub-type. And I think it's completelyunnecessary (except as a type hint for some serialization libraries).


To sum up what I meant:
- Add an array size field to all tables like you did in your patch.
- Don't introduce a new array sub-type.

- Add the array size update code to the table key insertion function (Ibelieve there is already a separate code path for integer inserts).- Array literal syntax is optional (I wouldn't add it). But tableliterals should set the array size field to the largest integer key.- Add a `table.resize()` function like you did, and modify the othertable functions to update the array size field. (You probably also needsome new C API functions for setting the array size.)


Lua tables are just key-value pairs with some optimizations. In a sparse table, like the one I showed in my example, there is nothing between the key-values, not even nils. This is the fundamental reason for the issue with holes and why # can't be implemented in constant time.

I have a different view. Tables are mappings from arbitrary keys tocorresponding values (a default value most of the time). If you have afinite, ordered set of keys, a Lua table gives you a finite, ordered setof values indexable by those keys. In Lua this set of keys is byconvention the set of integers starting from 1 up to some maximuminteger. The difficulty is agreeing on this maximum integer. You havemultiple options for that:

1. You could have a manually managed length for each table (e.g. the".n" hack or your `table.resize()`).2. You could observe the user of the table (e.g. use largest assignedinteger key).3- You could scan all key-values that are different from the defaultvalue (e.g. current Lua sequence concept, or linear scan like`ipairs()`). Most of the time you are only interested in the non-defaultvalues anyway.

4. maybe more?

All of those approaches have advantages and disadvantages. AFAICS, yourapproach is a combination of 1. and 2.: You grow depending on usagepattern (assignment out of bounds), you shrink on explicit request(`table.resize()` or `table.remove()`). I think it's a nice mix, andyour benchmarks have shown that the performance and space concerns(still looking for that old post -- we had a lot of discussions abouttable length over the years) are mostly unfounded.


When you remove a key-value pair from the table, you would have to do some sort of scan of key-values to know the new length. You can do this when you insert/remove key-values, or you can do it lazily in the length operator. The latter is usually more efficient and that's what Lua does. When doing insert/remove on a table the table data structure does not know whether you are removing the last element or not, and in which indices the other values are stored in.

If you remove one element, the length decreases by one, as you wroteyourself below. No scan necessary.


Arrays are different. They are dense and have all key-values between 1 and array size (even nils).

Arrays don't have to be dense. For sparse arrays it might make sense tostore only non-nil elements. Arrays must "have" all key-values between 1and array size, but they don't necessarily have to store them.


What do you think about the following "array"?

    local a = setmetatable( {}, {
      __index = function( _, k ) return tostring( k ) end
    } )
    table.resize( a, 10 )  --> array of "1" to "10"

When you push a new value, the array size always grows by one. When you insert a new key-value the array size will always be extended up to the new index (so array size becomes the index of the added value). When you remove a key-value, you always decrease the size by one. When you overwrite an existing value, the size does not change. Simple and efficient.


Completely agree.


To get the speed benefits and to fix the hole issue, you have to break the rules of the game, i.e. the semantics need to change. Coming up with a new working semantic for tables, as proved by all the discussions over the years on this list, is hard (I would say impossible) and can break existing code in subtle ways. Luckily, arrays have just the right semantics for the job.

I think there will be broken code no matter which solution to the arrayproblem we choose. In this particular case: nilling the last sequenceelement won't shrink the table anymore, and iterating over that tablewill then produce an unexpected nil value.


Petri


Philipp

Follow-Ups:
- Re: Lua 5.4-work1 with first class arrays (implementation & benchmarks), Petri Häkkinen

References:
- Re: Lua 5.4-work1 with first class arrays (implementation & benchmarks), Duane Leslie
- Re: Lua 5.4-work1 with first class arrays (implementation & benchmarks), Petri Häkkinen
- Re: Lua 5.4-work1 with first class arrays (implementation & benchmarks), Philipp Janda
- Re: Lua 5.4-work1 with first class arrays (implementation & benchmarks), Petri Häkkinen
- Re: Lua 5.4-work1 with first class arrays (implementation & benchmarks), Philipp Janda
- Re: Lua 5.4-work1 with first class arrays (implementation & benchmarks), Petri Häkkinen

Prev by Date: Re: Lua 5.4-work1 with first class arrays (implementation & benchmarks)
Next by Date: Re: Lua 5.4-work1 with first class arrays (implementation & benchmarks)
Previous by thread: Re: Lua 5.4-work1 with first class arrays (implementation & benchmarks)
Next by thread: Re: Lua 5.4-work1 with first class arrays (implementation & benchmarks)
Index(es):
- Date
- Thread