[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: Reading CSV
- From: Sean Conner <sean@...>
- Date: Tue, 3 Dec 2013 14:10:41 -0500
It was thus said that the Great Geoff Leyland once stated:
> Hi,
>
> What’s the current best option for a CSV (or tab separated, for that
> matter) file?
>
> I’ve had a look at http://lua-users.org/wiki/CsvUtils and
> http://lua-users.org/wiki/LuaCsv, searched LuaRocks (nothing came up, but
> perhaps I’m using the wrong search term) and looked at Penlight’s
> data.read. As far as I can tell, most solutions either:
> - read the whole file in one go (constructing a table of all the values
> becomes impractical as files get larger)
> - read lines with “*l” and so are opinionated about what constitutes a
> newline
> - don’t handle embedded newlines in quoted fields
>
> There’s also an LPeg example, but as I understand it, LPeg works on whole
> strings, not file streams?
Yes, but you can read a line at a time and use LPeg to break the line
down. You mentioned that there are issues with what constitutes a newline,
but there are ways around that. One method I use is:
-- Oh, let's just use the MIT license here.
--
-- MIT LICENSE HERE
local lpeg = require "lpeg"
-- End of Line Marker. This matches an optional CR with a mandatory LF.
-- If your system uses different end of line markers, change this.
local eoln = lpeg.P"\r"^-1 * lpeg.P"\n"
-- Parse data. This will return a "line" (per definition of eoln) and
-- additional data.
local lineparse = lpeg.C((lpeg.P(1) - eoln)^0) * eoln * lpeg.C(lpeg.P(1)^0)
do
-- start with an empty buffer.
local data = ""
function read_line(file)
-- -----------------------------------------------------------------
-- data being nil means we've hit the end of file, so we return nil.
-- -----------------------------------------------------------------
if data == nil then
return nil
end
-- ------------------------------------------------------------------
-- attempt to read a line (per the eoln definition) and any remaining
-- data.
-- ------------------------------------------------------------------
local line,rest = lineparse:match(data)
-- ---------------------------------------------------------------------
-- if line is nil, there wasn't a line's worth of data. so we need some
-- more, in this case, 1024 byte worth (adjust to taste). If we receive
-- nil from our stream, we've hit end of file, so we'll just return what
-- we have buffered, and mark that we've hit end of stream.
-- ---------------------------------------------------------------------
if line == nil then
local more = file:read(1024)
if more == nil then
local d = data
data = nil
return d
end
-- ------------------------------------------------------------------
-- we've read the data. Append it to our buffer, then call ourselves
-- again (tall call).
-- ------------------------------------------------------------------
data = data .. more
return read_line(file)
end
-- ---------------------------------------------------------------------
-- The rest of the data goes into the buffer; we then return the line we
-- just read.
-- ---------------------------------------------------------------------
data = rest
return line
end
end
Now, with that out of the way, you can read the file line-by-line and have
LPeg parse the line for you.
-spc