[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: Lpeg and malformed input / Lpeg and subjects that do not fit into memory
- From: Aladdin Lampé <genio570@...>
- Date: Fri, 19 Sep 2008 15:16:10 +0200
Thank you Roberto. That's exactly what I needed. I've implemented my parser using Lpeg and the first beta version seems to work.
I've got another question, now that I've achieved this. I want to create an iterator based on Lpeg captures, in order to be able to write things like this:
for r in csv_records(subject) do
print(r)
end
My Lpeg capture is implemented like this:
local record_Ct = position_C * lpeg.Ct(record_C) * position_C / record_cb
and the record callback (record_cb) looks like this:
local function record_cb(i, t, j) -- a new record has been found and is ready to be sent
[...]
coroutine.yield(t)
[...]
end
But I get a Lua error saying that it is not possible to yield over a C/metamethod call...
I've immediately thought about the excellent LuaCoco by Mike Pall and patched my running version of Lua and everything works as expected now.
Question is: Is there another "smarter" solution? Is it necessary to patch Lua to achieve this? (I would obviously prefer not to).
Maybe I could design my "lpeg callback iterator" another way?
Any suggestion or simple examples of other ways of designing this kind of requirement would be highly appreciated.
Aladdin
PS: BTW, I've seen that LuaCoco was on the Lua 5.2 roadmap, so maybe this won't be an issue in the near future ;-)
>> 1. Lpeg is great when the subject follows strictly a given
>> grammar. But how to parse *malformed* CSV files, for instance? (and
>> maybe generate "warnings" or "errors")
>
>Usually we add other options to the approriate parts of the grammar to
>handle erroneus input. Something like this:
>
> CVS <- ( (%nl )*) -> {}
> record <- ( (',' )* ) -> {} (%nl / !.) /
>
>ErrorCase should match the input up to some anchor point (e.g., a newline).
>
>> What I would like to do with Lpeg is the following:
>> subject = read a chunk of N=4096 bytes of my big CSV file, when |!.| matches in the defined grammar (ie. 'end
>of subject'), use a Lpeg callback to see if:
>> - more input is needed (because the record is not matched yet)
>> - or if the match was successful
>> - or if it is the real end of subject (ie. 'end of file')
>> Is that possible with current version of Lpeg or is another way of
>> solving this planned in a near future?
>
>I don't see any problem in doing that, but you have to manage the buffer
>yourself. That is, when matching the initial part of the buffer, use
>a position capture to tell how far the match went. Then read another
>piece, concatenate it with the unhandled part of the previous buffer,
>and repeat.
>
>-- Roberto
_________________________________________________________________
Installez gratuitement les 20 émôticones Windows Live Messenger les plus fous ! Cliquez ici !
http://www.emoticones-messenger.fr/