lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


On Wed, Oct 5, 2011 at 9:29 PM, Martijn Hoekstra
<martijnhoekstra@gmail.com> wrote:
> Depends on how much sophistication you want. Something like this could
> probably be done fastest with a quick simple parser. Runtime
> performance wouldn't be great, but it would be fairly straightforward.

LPEG has already been suggested, and I would definitely agree with
that. However, if sticking to standard functions is preferable, here
is my attempt at a generic iterator-based tokenizer:


 local function itokens_aux(str, startpos)
   local token, nextpos = string.match(str, '^%s*"(.-)"()', startpos)
   if not token then
     token, nextpos = string.match(str, '^%s*(%w+)()', startpos)
   end
   return nextpos, token
 end

 function itokens(str)
   return itokens_aux, str, 1
 end


Note that the first value generated by the iterator is not useful and
should be ignored by assigning it to _.
So, an example of usage:


 t = {}
 for _, token in itokens [[foo "hello world" bar]] do
   t[#t+1] = [["]] .. token .. [["]]
 end
 print(table.concat(t, ' '))


-Duncan