Generic Input Algorithms |
|
func
library, which was designed to make working with input more intuitive. The source can be found at Files:wiki_insecure/func.lua.
There are two special input iterators, numbers()
and words()
, which work like the very useful io.lines()
iterator. To print all words found in the standard input:
-- words.lua require 'func' for w in words() do print(w) end
$ lua words.lua < test.txt
func
supplies the very convenient function printall()
. It writes out all members of a sequence to standard output. By default, it puts out 7 items per line, separated by spaces, but you can optionally change these. In this case we do want each value on its own line:
printall(words(),'\n')
numbers()
creates a sequence of all numbers found in its input. For example, to sum all input numbers:
require 'func' local s for x in numbers() do s = s + x end print(s)
func
defines a generic sum()
function. It returns the sum and the number of fields, so it's easy to calculate the average.
local s,n = sum(numbers()) print('average value =',s/n)
numbers()
will find everything that looks like a number in a file, and will safely ignore anything else. So it's useful for heavily commented data or output files.
These iterators take an optional extra parameter, which can be a file or a string. For example, to print the sum and number of items in a file passed as a command-line parameter:
f = io.open(arg[1]) print(sum(numbers(f))) f:close()
copy()
:
function copy(iter) local res = {} local k = 1 for v in iter do res[k] = v k = k + 1 end return res end
t = copy(numbers '10 20 30') -- will be {10,20,30} s = sum(list(t)) -- will be 60
list()
, which allows tables to be used as sequences. It's common to operate on arrays with these functions, so if you do pass them a table list()
will automatically be assumed. To print out an array of numbers in a particular format, one can use something like printall(t,' ',5,'%7.3f')
to format them nicely.
Here's an implementation of the system command sort, that uses the printall()
function to output each value of a sequence. I can't simply say table.foreach(t,print)
, because that operation passes both the index and the value, so I would in effect get line numbers as well!
t = copy(io.lines()) table.sort(t) printall(t,'\n') -- try table.foreach(t,print) and see!
sort()
function this becomes a one-liner:
printall(sort(io.lines()),'\n')
slice()
. This is passed an iterator, a start index, and an item count. For example, this is a simple version of the head
command; it shows the first ten lines of its input.
printall(slice(io.lines(),1,10),'\n')
require 'func' print(count(words()))
count()
is not so useful. But it can be given a function to select the items to be counted. For example, this gives me a rough idea of how many public functions there are in a Lua file. (If I didn't tie the match down to the begining, it would pick up local and anonymous functions as well)
require 'func' print(count(io.lines(),matching '^%s*function'))
matching()
is the following simple function. It creates a closure (a function bound to a local context) which is called for each item in the sequence:
function matching(s) local strfind = string.find return function(v) return strfind(v,s) end end
You can of course use any sequence in these operations. If you have the very useful lfs (Lua File System) library loaded, then t = copy_if(lfs.dir(path),matching '%.cpp$')
will fill a list with all files with extension .cpp
from path
.
Another useful way to modify count()
's input is to use unique()
-- number of unique words in a file print(count(unique(words())))
unique()
is not implemented in the usual way, which requires the sequence to be first sorted. Instead, it uses count_map()
to create a map where the keys are the items and the values are the counts. The rest is easy once we have keys()
, which is the complementary brother of list()
:
function unique(iter) local t = count_map(iter) return keys(t) end
table.foreach(count_map(words()),print)
join()
them together. This will print out the differences between two files:
for x,y in join(numbers(f1),numbers(f2)) do print(x-y) end
{ print $1/$4, $3/$4 }
func
library supplies the iterator fields()
for this purpose. Here is the equivalent Lua code:
for x,y,z in fields{1,3,4} do print(x/z,y/z) end
print(count(fields{7},greater_than(44000)))
{ if ($7 > 44000) k++ } END { print(k) }
fields()
can work with any input delimiter. This reads a set of values from a comma-separated file - note that passing n instead of a list of field ids is equivalent to {1,2,...n}:
for x,y in fields({1,2},',',f) do ...
for x,y in fields(2,',',f) do ... --equivalent--
random()
to create a table with random values:
> tt = copy(random(10000)) > = sum(tt) 5039.542771691 10000
local t = {} local random = math.random for i = 1,1e6 do t[i] = random() end
words()
and numbers()
run-times.) The advantage is that there is less code to go wrong; the generic programming people believe that explicit loops are "tedious and error-prone", as Stroustrup says.
The second objection is that it leads to weird and unnatural code. This can certainly be the case in C++, which (let's face it) is not really suited to the functional style; there are no closures, and rigid static typing constantly gets in the way, requiring everything to be templates. The style suits Lua much better - doing this in C++ would not be half as readable, even using the Boost Lambda library:
-- sum of squares of input data using an internal iterator for_each(numbers(),function(v) s = s + v*v end)
-- sum of squares of input data using an external iterator for v in numbers() do s = s + v*v end
f
is not a string, then words(f)
will use the file object f
. In fact, f
can be any object which has a read
method. All that the code assumes is that f:read()
will return the next line of input text. Here is a more involved example, where I've created a class Files
which allows us to read from a list of files. The obvious application is to mimic AWK's behaviour, where every file on the command-line becomes part of standard input.
Files = {} function Files.create(list) local files = {} files.list = {} local n = table.getn(list) for i = 1,n do files.list[i] = list[i] end files.open_next = Files.open_next files.read = Files.read files:open_next() return files end function Files:open_next() if self.f then self.f:close() end local nf = table.remove(self.list,1) if nf then self.f = io.open(nf) return true else self.f = nil return false end end function Files:read() local ret = self.f:read() if not ret then if not self:open_next() then return nil end return self.f:read() else return ret end end
Files.create()
. Lua programs are passed a global table called arg
, which has the command-line arguments, arg[1]
,arg[2]
, etc. But there's also arg[0]
, the script name, and arg[-1]
the actual program name. The explicit loop in question is to make quite sure we don't copy those fields!
files = Files.create(arg) printall(words(files))
func
is a straightforward variation on the same theme; iterators and functions as closures. Section 7.1 of PiL [ref?] explains the issues well, and I've used the allwords
example as the basis for words()
and numbers()
. fields()
was first implemented in a naive way, grabbing each field in turn, but later was implemented as one call to string.find()
, by creating a custom regular expression. For example, if fields 1 and 3 are needed, separated by commas, then the regexp looks like this - fields are defined as anything which is not a comma, and we capture using () the required fields.
'%s*([^,]+),[^,]+,([^,])'
func
operations with any library which provides an iterator. And that often simplifies code tremendously. For instance, here is how it can be with luasql. Consider the canonical way to access all rows of a query result:
cur = con:execute 'SELECT * FROM [Event Summaries]' mag = -9 row = cur:fetch({},'a') while row do if row.Magnitude > mag then mag = row.Magnitude end row = cur:fetch(row,'n') end cur:close()
I can simply this by creating an iterator which keeps track of row
:
function rows(cursor) local row = {} return function() return cursor:fetch(row,'a') end end for row in rows(cur) do if row.Magnitude > mag then mag = row.Magnitude end end
cursor:fetch
twice and look after a local row
. We can also implement an equivalent to fields
:
function column(fieldname,cursor) local row = {} return function() row = cur:fetch(row,'a') if not row then return nil else return row[fieldname] end end end local minm,maxm = minmax(column('Magnitude',cur))
print(count(column('Magnitude',cur),greater_than(2)))
-- SteveDonovan