lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


On Jul 6, 2013 3:33 AM, "Andrew Starks" <andrew.starks@trms.com> wrote:
>
> On Fri, Jul 5, 2013 at 4:48 PM, Tim Hill <drtimhill@gmail.com> wrote:
> > So getting back to my 3 questions…
> >
> > 1. Is there a need for an "empty" element within an array (where "empty" as
> > a concept is tbd)?
> > 2. Assuming #1 is "yes", would this be useful as a standardized technique so
> > that everyone uses the same convention?
> > 3. Assuming #2 is "yes", what form should this standard technique take?
> >
> > You are arguing that the answer to #1 is "no"?
> >
> > One thing that I find interesting is the amount of discussion here .. this
> > is suggestive of *something* to be sure. Perhaps it's a misunderstanding of
> > the problem, or perhaps there is *some* kind of problem here but people
> > differ as to what it is.
> >
> > One thing I like about Lua is the intuitive and clean nature of the
> > language. But the Lua concept of a "sequence" to my mind is a bit odd .. it
> > feels more like something that fell out of an internal optimization of table
> > implementation (compact storage for integer keys and fast indexing) rather
> > than a designed feature. This makes the # operator fragile; any 3rd party
> > code or library can "corrupt" an array and make # return arbitrary invalid
> > values and there is no way to discover this that I am aware of. I find this
> > behavior a bit odd, and clearly some of the other posters do as well, as
> > many suggestions here make # more robust (or provide a similar mechanism
> > that is itself more robust).
> >
> > Why is this relevant? Because my "empty" design (flawed as it might be)
> > makes # more robust by providing a way to have them sparse but still a
> > defined size, and the various other suggestions also do the same by other
> > means.
> >
> > --Tim
> >
> > On Jul 5, 2013, at 1:24 PM, Andrew Starks <andrew.starks@trms.com> wrote:
> >
> > Would never be seen by the garbage collector because nil isn't added to its
> > list of stuff to track and 1 isn't explicitly constructed. [obj] = nil,
> > because nil is now seen by gc as a weak object.
> >
> > If I'm saying this clearly, that behavior would also explain locals well,
> > too.
> >
> > This behavior would, to my understanding, have avoided the need for another
> > value and only require the need to know if something was set to nil or
> > whether it was absent.
> >
> > That seemed cheaper than a new type and simpler to add. Since that's not how
> > things are, I think that the way it works is fine, because everything else
> > seems to make Lua "ever so slightly bigger."
> >
> >
>
>
> I do understand that everyone is more-or-less focused on the idea of
> an `empty` type and that I've been speaking to mechanisms to know if
> something is "there but nil" or simply "never there".
>
> In case it is not obvios, I do this because the "there and nil/not
> there at all" approach can be made to accomplish the goal and it is
> painfully close to already being within Lua, today.
>
> First, I re-looked at tables. I found that when I was playing with
> collectgarbage, if I set an index to something and then to nil, the
> array seemed to still hold the key. If I simply set a previously
> undefined index to nil, Lua seemed to wisely optimize that away.
>
> Step 1: Do not optimize away the setting of a key value to nil,
> provided that the key is of a type that does not have explicit
> construction (strings and numbers). [This must be how lua treats local
> assignment to nil.]
>
> Step 2: Modify `rawget` so that it returns 0 values (`return `) when a
> key is not existent.
>
> rawget(t, bar) -- bar is a non-existant key
> --> --0 values, not nil. So then therefore (and this is maybe the
> biggest egg to crack using this approach):
>
> type(rawget(t, bar))
> -->  bad argument #1 to 'type' (value expected)
>
> OPTIONAL STEP 3: Make it so that the return value of `__index` can
> also return 0 values. This may also require a change to the way lua
> treats indexed tables as arguments, though I don't know that for sure.
>
> I would posit that these changes are pretty mild, with the possible
> exception of erroring on an index that was absent. I wonder if anyone
> can see any bugs that would come up, due to these changes?
>
> If the above were true, then consider this code:
>
> `````
> printf = function(...) print(string.format(...)) end
> local array_stop  = 10000
> local function array_iter (a, i)
>      i = i + 1
>      local v = a[i]
> --!!!! today, the next line always results in true (select always
> returns 1), no mater what. That's because rawget returns
> --!!!! 1, even if the slot was never defined (not even nil is there)
>      if select('#', rawget(a, i)) == 1 then -- a key set to nil would
> result in true, under this proposal.
>           return i, v
>      end
> end
>
> local t = setmetatable({},{
> --!! __index doesn't work, today. It always returns 1 value (nil),
> even if `return ` or return is absent.
> --Either the return is always nil, or `t[i]` in an argument list has
> the same affect that non-indexed variable access does, which is to
> always be promoted to nil, if they're undefined.
> __index = function(a, i)
>      if select('#', rawget(a, i)) == 1 then
>           return nil
>      else
>           return -- if this were possible and select('#', a[i]) would
> result in `0` when the array was never set, we'd be in business,
> without changing rawget.
>      end
> end,
> __ipairs = function(a)
>      return array_iter, a, 0
> end,
> --What redefining __len would look like:
> --[[
> __len = function(a)
>      local i = 1
>      while select('#', rawget(a, i)) > 0 do --ick. right now this pegs
> the processor, obviously.
>           i = i + 1
>      end
>      return i - 1
> end
> --]]
> })
> -- here are the results that lead me to believe that lua is storing
> nils in tables:
> collectgarbage()
> local k1= collectgarbage("count")
> printf("Start bytes:\t\t%10.2f",k1)
>
>
> --> If we do not do this first...
> for x = 1,  array_stop do
>      t[x] = tostring(x)
> end
>
> --Then this gets optimized away.
> for x = 1, array_stop do
>      t[x] = nil
> end
>
> -- As it stands, according to collectgarbage("count"), there seem to
> be nils assigned to number, so far as I can tell.
> --Obviously, the optimization of  leaving `t[x] = nil` empty, would
> need to be removed, in order for this approach to be considered
> -- viable. That is, as when you declare `local foo`, lua needs to
> actually put nil into th value spot at `t[x]`. Also, if `x` were a
> table, then
> --our nil would obviosly go away and all would be truly empty.
>
>
> print( select('#', rawget(t, 100)), select('#', rawget(t, array_stop + 1)))
> --Today, this is always "1, 1"
> --> 1, 0 would open up many opportunities.
>
>
>
> local k2 = collectgarbage("count")
> printf("Before collection:\t%10.2f%10.2f",  k2, k2 -k1 )
> local k2 = collectgarbage("count")
> printf("Before collection:\t%10.2f%10.2f",  k2, k2 -k1 )
> print( select('#', rawget(t, 100)), select('#', rawget(t, array_stop + 1)))
> --> 1, 1 :(
> collectgarbage()
>
> local k2 = collectgarbage("count")
> printf("Before setting table to nil:\t%10.2f%10.2f",  k2, k2 -k1 )
> t= nil
>
> collectgarbage()
> local k3 = collectgarbage("count")
> printf("After collection:\t%10.2f%10.2f", k3, k2 - k3)
>
>
> `````
>
> So, is this completely awesome sauce? No. But, it doesn't add anything
> terribly new and it doesn't change behavior in a big bad way.
>
> But with these changes, we can get at the true source of the "nil"
> that we got at a given table index. And that gives us a pretty solid
> way to make sparse arrays.
>
> I can understand if there is hate for this approach. I just wanted to
> be sure I articulated the reasoning behind my talking about this, as
> opposed to a new type.
>
> - Andrew
>

Honestly, all these ideas just seem to be adding a lot more complication and confusion, trying to solve two problems at once.

When people talk about wanting to store nil in a table, it seems like they're mostly bringing up the same two situations:
1) I want to store nil values at arbitrary keys in a map.
2) I want to store nil values at integer keys in an array.

These are similar, but are prevented by different underlying issues, and have different solutions.

For case 1, the "problem" is that setting a field to nil removes it. This seems to be quite rarely an actual problem, and only crops up when you're reading data from an external source (such as a database) where "nil" doesn't mean "this field does not exist" like it does in Lua, *and* you need to iterate all of the fields without knowing their names in advance.
The commonly given solution to this issue is to use a sentinel value to mark such fields. That does just fine most of the time, but has the issue that different modules each have to define their own such values (which was the point of the original thread).
To me it seems like this isn't a terribly big problem, because it's rare that you're reading information from a source where nil ~= nonexistent *and* you need to iterate the fields without knowing their names in advance *and* you need to pass that information on to another module that also needs a way to represent "nil but existent" *and* this module doesn't provide the ability to supply your own sentinel value *and* you can't afford to iterate through and translate db.null to json.null or whatever change you need to make when passing your data to another module.
Anyway, in that situation, there's another solution that nobody seems to have brought up: eliminate the "names not known in advance" issue by getting a list of names. e.g. instead of your database query returning a table like:
{{id=1, name="Joe", gender='M'},
{id=2, name="Jean, gender='F'},
{id=3, name="Sam", gender=nil}}
have it structured like:
{columns={'id', 'name', 'gender'},
{1, "Joe", 'M'},
{2, "Jean", 'F'},
{3, "Sam", nil}}
That way you can iterate over the list of names instead of the data, and not miss any fields, even if their values are nil.

The other thing people keep bringing up is case 2 - storing nil values in arrays. There are a lot of "solutions" to this, but all are really only bandaging over the problem. You can store the length separately and use metamethods to make everything work nicely, but that's going to incur a performance penalty - not only because of the metamethods, but because you haven't actually solved the underlying problem.

Lua's tables contain an array portion and a hash map portion, for performance, and the real problem people are running into is that the array portion itself can't store nil values. With metamethod solutions, you're storing some (potentially all) of your values in the hash map portion, defeating the purpose of the array portion in the first place, because the array is still going to end at the first nil values.

It does seem awfully strange to me: Lua's strings can contain any byte, because they store the length separately from the string data. This is generally considered a good thing, especially compared to C strings - it takes length lookup from O(n) to O(1) and allows them to contain arbitrary binary data. Yet Lua's arrays don't have this same feature, and instead use the C-string method of a magic terminating value that can't appear anywhere in the array, which limits what they can store. (Though I assume length lookup is still O(1) in this case!) And as a "bonus" it makes the # operator's behaviour somewhat strange as well - not always very reliable or useful, and apparently rather confusing to some people.

People keep commenting "oh look table.setn() is back", and I keep wondering if that's really a bad thing. What was so bad about it that it had to be removed? To me, storing the length separately from the data seems like the most sensible idea.