[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: Reading CSV
- From: Coda Highland <chighland@...>
- Date: Tue, 3 Dec 2013 14:01:32 -0800
On Tue, Dec 3, 2013 at 11:10 AM, Sean Conner <sean@conman.org> wrote:
> It was thus said that the Great Geoff Leyland once stated:
>> Hi,
>>
>> What’s the current best option for a CSV (or tab separated, for that
>> matter) file?
>>
>> I’ve had a look at http://lua-users.org/wiki/CsvUtils and
>> http://lua-users.org/wiki/LuaCsv, searched LuaRocks (nothing came up, but
>> perhaps I’m using the wrong search term) and looked at Penlight’s
>> data.read. As far as I can tell, most solutions either:
>> - read the whole file in one go (constructing a table of all the values
>> becomes impractical as files get larger)
>> - read lines with “*l” and so are opinionated about what constitutes a
>> newline
>> - don’t handle embedded newlines in quoted fields
>>
>> There’s also an LPeg example, but as I understand it, LPeg works on whole
>> strings, not file streams?
>
> Yes, but you can read a line at a time and use LPeg to break the line
> down. You mentioned that there are issues with what constitutes a newline,
> but there are ways around that. One method I use is:
You missed the part about handling newlines in quoted fields.
Essentially what it means is that your example also needs to manage
whether the newline at the end of the buffer constitutes a record
separator or whether it's part of a field's content. It's not all THAT
difficult to accomplish -- personally I choose to use a state machine
iterating over characters on the stream (and I pull in blocks at a
time from the file for efficiency and iterate over the buffer) to
accomplish it.
/s/ Adam