Extending For And Next |
|
The Lua API makes it quite easy to integrate "foreign" objects, including foreign maps, but does not provide a facility to iterate over them. That is, by defining gettable
and settable
tag methods, it is feasible to completely integrate an indexed foreign object into the Lua environment, but this does not extend to the use of for k,v in foreign_map
or, for that matter, k,v = next(foreign_map, k)
. Both of these constructions can be useful.
I was also intrigued by the possibility of extending next
to "generator" functions. A generator function works exactly like the Lua next
library function; given an iteration-state, it produces the next iteration-state and a value. To take a concrete example, I might define a generator as a closure on a character string, so that I could say:
keywords = words "do else elseif end for if in repeat unless while" for i, v in keywords do dict[v] = (dict[v] or 0) + 1 end
which struck me as fairly elegant.
It turned out that the patch was quite simple, so I've posted it in the file area for those who might want to play with it.
The patch provides a new tag method, "next"
, which is invoked by the for k,v in obj do end
construction; in the next
library function; and in the lua_next
API. Consistent with what seems to be Lua style, a new rawnext
library function and lua_rawnext
API are provided which avoid the use of the tag method. These work just as tables do, with the exception that they return and use "iteration states" rather than "keys". If the object also defined gettable
and settable
tag methods, it would probably be wise to arrange for a key and an iteration state to be the same thing; however, for a pure iteration object, the iteration state could be anything provided that nil
indicates both the beginning and the end of the iteration. In fact, there is no need for it to be distinct on each iteration.
In addition, the patched constructions can iterate over a function rather than a table or an object with the "next" tag method; the function is called with a single argument, which is an iteration state, and must return two arguments, the first of which is the next iteration state and the second of which is the associated value. This is really only of cosmetic value in the case of next
and lua_next
but it avoids having to know whether the iteratee is functional or objective.
The words
generator described above is defined as follows:
function words(str) return function(k) local _, k, v = strfind(%str, "([%w]+)", (k or 0) + 1) return k,v end end
As can be seen, it retains its argument as a closure, and uses the index of the last character in each word as an iteration state.
Similarly, I could define
function wordsInFile(file) return function(k) k = read(%file, "*w") return k, k end end
In this case, the iteration state is essentially meaningless during the iteration; it serves only to signal termination. Obviously, more sophisticated implementations are possible.
Filters and transformers take a generator and return another generator. A simple example of a transformer is
function toupper(gen) return function(k) local v k, v = next(%gen, k) if k then return k, strupper(v) end end end
This can be generalised:
function gmap(fn, gen) return function(k) local v k, v = next(%gen, k) return k, k and %fn(v) end end
Now I can write things like:
for i, v in toupper(words(str)) do -- something end
A filter is rather like a transformer but is not one-to-one. A sample filter is similar to the Perl grep
function:
-- given a predicate and a generator, generate only those -- values for which the predicate returns true function gfilter(fn, gen) return function(k) local v k, v = next(%gen, k) while k do if %fn(v) then return k, v end k, v = next(%gen, k) end end end
I hope that gives a taste of the possibilities.
The majority of the patch is in lvm.c
. Here I modify the opcodes LFORPREP
and LFORLOOP
to use the new vm function luaV_next
, which implements the switch on ttype and tag method. next
and lua_next
are also modified to use this same function, producing (I hope) consistent results.
This does slow for loops down slightly, since the test is done every time through the loop. Timing tests on a P3/866 with 128MB RAM running FreeBSD 3.2 (my development machine) showed that a very basic for loop was approximately 10% slower with the patch. (On the other hand, next is about 4% faster.) I have a somewhat more complex patch which caches the tag method on the stack, resulting in a slowdown of only 7%, but I'm not yet happy enough with it to release it. I believe that the 4.1alpha architecture lends itself better to this, and I'm currently working my way through the 4.1alpha VM in an attempt to do it.
No changes are made to the parser or to the stack layout during a for loop, so the patched Lua ought to be binary-compatible; it only provides additional facilities. However, I wouldn't swear to that statement.
I regard all of this as an interesting experiment. I think the sample code provided with the patch shows the expressive power of the new construct, quite apart from the ability to define iteratable userdata.
However, the confusion between iteration state and key continues to bother me. I note that in 4.1alpha, the list for is actually implemented with an iteration state and a key, which I think is more logical (and ought to be slightly faster); I'd be interested in the possibilities of extending the semantics of for loops and the next
library function to allow for this. While it is often the case that a key can be used as an iteration-state, it is not always convenient (as the words
example shows).
While keeping the iteration state hidden during a for loop is tempting, it reduces flexibility. The advantage of using a for loop and a generator rather than a traditional map
function is primarily that it is possible to abandon an iteration or modify it. That is, it should (at least) be possible to use next
inside a for loop in order to skip over elements. In some cases, it may also be possible to restart an iteration by assigning to the iteration-state variable.
Clearly, generators come in many categories and need to be adequately documented. They may be:
At the same time, the key->value model is a little rigid. In the case of a vector, the key may not be very interesting to me; in the case of a database, I might want to use a generator which returns more than one value. In theory, this could be accomodated by extending the syntax of for to something like for name_list in exp1 [, exp1] ...
(the second exp1 would be a starting iteration state with nil
as a default).
I welcome any thoughts; you can reach me at rlake(at)oxfam.org.pe. Or just put them here, or on the mailing list.
I suggest that you make the patch using options "-urN" and include the test programs in the patch. --JohnBelmonte