Re: io:lines() and \0

On Feb 21, 2014, at 10:00 , Sean Conner wrote:

It was thus said that the Great René Rebe once stated:

On Feb 20, 2014, at 23:03 , Sean Conner wrote:

It was thus said that the Great René Rebe once stated:

The next time you parse a text file which accidental has a \0 somewhere
you probably want this bug fix, too ;-) Especially after you spend hours
to figure out what is going on, …

You would have had the same trouble with C if you used fgets().

I have no trouble using fgets, but then if I would not use Lua I would actually use C++, ...

And could you tell me what tool created a text file with embedded NULs in
it? I want to avoid using said tool ...

As mentioned I hit this while implementing a CGI upload, so parsing MIME data.

Well, MIME is also used in email, which by definition, is *not* 8-bit
clean, which is why MIME was created in the first place, to stuff binary
data into a 7-bit ASCII data stream and NUL bytes are most assuredly not
allowed. That's why I asked.

I too, wrote code to parse CGI data and even used it for a personal
project to upload pictures on my iPhone via a webpage. I never had an issue
with reading a NUL byte with that, so I decided to check the implementation
to see how I avoided problems with NUL bytes, because I certainly don't
remember there being any issues in the first place.

Well, I apparently sidestepped the issue entirely:

local function multipart(separator,data)
local boundary = lpeg.P("--" .. separator)
local hdrs     = core.parse_headers(mime._HEADERS,contentdisp._HEADERS)
local body     = lpeg.C((lpeg.P(1) - boundary)^0)
local section = boundary
                * core.CRLF
                * lpeg.Ct(lpeg.Cg(hdrs,"headers") * lpeg.Cg(body,"body"))
local sections = lpeg.Ct(section^1) * boundary * lpeg.P"--" * core.CRLF

local tmp = sections:match(data)
...
end

by using LPeg.

But, LPeg aside, there are other ways of reading in the data. One: if
Content-Length: exists, convert the value to an integer, and pass that to
f:read(), which will read that many bytes of data (using the C function
fread()). If the Content-Length: header doesn't exist, and the
Content-Transfer-Encoding: header indicates 8bit, then yes, you have an
issue, and one that can be solved without patching Lua, by writing a C
module to Do The Right Thing. Because even *if* the Lua team accept your
proposal, at best, it'll be placed on the Lua bugs page and won't become a
part of Lua until the next official release X years from now (that might be
fine if you are not planning on releasing your code and have a locally

And because it may take some time to get into an official release we now stop

fixing bugs and improvements?

Because some core function does not behave as expected we should all

suffer and apply complex and hard to read and follow workarounds like your

LPeg example?

patched Lua; anyone else that wants to use your code will need to have a
patched Lua). And don't forget about LuaJIT (different team). It probably
suffers from the same issue.

What changes LuaJIT picks from vanilla Lua is a separate issue, AFAIR so far

not even all 5.2 features made it into it. Since when does Roberto only make

changes to Lua when Mike approves them as well?

If at all possible, if you can find a copy of _The Standard C Library_ by
P.J. Plauger, read chapter 12. It talks about the history of <stdio.h> and
the issues that went into the standards process (started in 1983!) about
handling I/O in C and why text was so problematic back then (for different
reasons than now, and not all related to using different end of line
markers) and the compromises made by the ANSI committee.