[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: Lpeg and malformed input / Lpeg and subjects that do not fit into memory
- From: Roberto Ierusalimschy <roberto@...>
- Date: Thu, 18 Sep 2008 10:07:41 -0300
> 1. Lpeg is great when the subject follows strictly a given
> grammar. But how to parse *malformed* CSV files, for instance? (and
> maybe generate "warnings" or "errors")
Usually we add other options to the approriate parts of the grammar to
handle erroneus input. Something like this:
CVS <- (<record> (%nl <record>)*) -> {}
record <- ( <field> (',' <field>)* ) -> {} (%nl / !.) / <ErrorCase>
ErrorCase should match the input up to some anchor point (e.g., a newline).
> What I would like to do with Lpeg is the following:
> subject = read a chunk of N=4096 bytes of my big CSV file, when |!.| matches in the defined grammar (ie. 'end of subject'), use a Lpeg callback to see if:
> - more input is needed (because the record is not matched yet)
> - or if the match was successful
> - or if it is the real end of subject (ie. 'end of file')
> Is that possible with current version of Lpeg or is another way of
> solving this planned in a near future?
I don't see any problem in doing that, but you have to manage the buffer
yourself. That is, when matching the initial part of the buffer, use
a position capture to tell how far the match went. Then read another
piece, concatenate it with the unhandled part of the previous buffer,
and repeat.
-- Roberto