See
https://www.lua.org/pil/5.2.html.
[...]
CALL PARAMETERS
g(3) a=3, b=nil, arg={n=0}
g(3, 4) a=3, b=4, arg={n=0}
g(3, 4, 5, 8) a=3, b=4, arg={5, 8; n=2}
Using those regular parameters, the definition of select
is straightforward:
This notation uses a semicolon in a table constructor, a syntax that does not exist in Lua. It may suggest that this semicolon indicates that the key/value pair n=2 is not stored in the table itself, but in its metatable.
Such observation cannot be made, but if it does, then any declared function that explicitly accesses to its "arg" parameter could have to create (on function entry) a new table indexing all upvalues (including nil values), and another table to store the key/value pair with an implicit key name 'n' (here ['n'] = 2) containing the effective number of upvalues (including nil's).
Such creation of two tables would be very inefficient (stressing the garbage collector), so I suspect that instead the "arg" type is not a true "table" but behaves "like a table" in user's Lua code.
But the Lua parser and compiler knows that it is not a real table, and instead it just uses the internal upvalues array directly (i.e. as the integer-indexed array part of any table, whereas upvalues do not have any hashed part containing arbitrary keys but allow storing nils directly at any integer index in the valid range, and that the "n" notation used here refers to the maximum index allocated in this integer-indexed array.
[==[ Sidenote:
Despite this, I like the notation used, with the semicolon: if it was available in Lua syntax for table constructors, to attach a metatable after the normal enumerated keys, we could avoid the use of "setmetatable" in a separate statement, and could create tables with their metatable directly within expressions.
We could as well attach a metatable to the metatable, by adding another semicolon followed by another list of key/value pairs.
]==]
But the doc page is still incoherent in that case as it shows another notation:
So to conclude, the section 5.2 (which is the only one describing the implicit parameter "arg") is very confusive.
It does not refers to the correct feature of Lua, which is using "upvalues" as a real array with a fixed size (accessible as "#arg"). "arg" has a special type
- It is a "pseudo-table", meaning you can access it keys as: arg[1], arg[2], etc.
- You can read arbitrary keys from it.
- It has positive integer keys only: arg[x] will always be nil if x is not a positive integer.
- It allows calling "#arg" to get its effective size (not modifiable within the function body)
- It does not have any metatable: getmetatable(arg) returns nil.
- You cannot write arbitrary keys in it (only keys in the integer range 1 to #arg); for any other keys (including integers outside the range 1 to #arg, or other numbers, or in other types, you'll get an error)
- You can set valid keys to any valid type, or to nil and back again to a non-nil value.
- It is actually allocated on the Lua call stack but then managed by a hidden "upvalues" structure in the Lua runtime engine, used to perform bound checking for valid keys.
- It is definitely NOT weak: it is never garbage-collected like regular objects, but stored in the stack of upvalues (also used by the "return" statement in function bodies, where the listed returned values will replace all functions arguments and which is the only statement that can change the value of "#arg"). The call stack used to store upvalues persists as long as Lua is running (the Lua engine may grow the stack as needed, and may eventually shrink it, if it's much larger than necessary, e.g. when the garbage collector runs and sees that the call stack has lot of unused free slots since too long, but the garbage collector will normally leave some free slots, to avoid the runtime engine having to reallocate too frequently the stack to a larger size)
- It is only "implicitly destroyed" when the function actually returns (i.e. when it does not return with a trailing call to itself): upvalues left on the stack are reused by the caller.
- When the function returns by a trailing call to itself (with possibly different arguments), Lua will just first push the returned values one by one (like for regular function calls), but then optimizes the trailing recursive call by just shifting down the initial arguments in upvalues to discard them and to just leave these new returned values, and will adjust the value of the hidden field for #arg, before looping by a jump to the start of the function body.
So Lua really has a type for arrays indexed by positive integers only.
My opinion is that this special subtype of table should be exposed as a true type, or as standard properties of the table type (notably that it allows only integer keys, and that if effective size "#t" is unmutable). It would still not necessarily be an array and may still have an hashed part for sparse arrays (notably if "#t" is large and most keys are set to nil values).
We should have a standard library call to create or transform a table into an array with these constraints: when transforming a table with a library call, it would check that the hashed part contains only positive integer keys (for keys outside the integer-indexed array part), and then could either (with a call flag?) drop the offending keys or return an error. If the table has a metatable, it may eventually be kept, but it is not necessary to get the best performance offered by array.
It could also offer a way to recompact the array between the integer-indexed part and the hashed part, so that the first part would have a minimal rate of non-nil values (e.g. at least 50%), and all other positive keys would be hashed. The internal hashing function for positive integers can be much simpler than for arbitrary types.
Note that arrays currently used by upvalues allocated on the stack for function calls have no minimal rate of non-nil values, and they are small anyway. We currently have a limit somewhere between 50 and 242 due to how Lua generates its bytecode. Lua could extend this limit by allowing upvalues for function calls (or any tables transformed into arrays) to be stored outside of the call stack (using dynamic memory, or paging it onto a virtual memory cache, backed by external storage with optional data compression), leaving only an object reference on the stack.