[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: os.execute is scary; Jay rants about strings again (was Re: [ANNOUNCE] lua stdlibs release: now Lua 5 compatible)
- From: Jay Carlson <nop@...>
- Date: Wed, 28 Jan 2004 12:15:45 -0500
Reuben Thomas wrote:
Almost all of the code is written purely in terms of the standard library
(one or two routines use standard Unix command-line utilities via
os.execute).
os.execute and io.popen are scary functions. When using them, you have
to be very certain about the contents of the strings being passed in.
I'm going to pick on your code, since you were brave enough to post it,
but this is an underlying problem many of us have run into. Me, I ran
into it the hard way. It would be nice if we could solve it just once.
For instance, in std.io.env.lua you have:
shell ("ls -aU " .. d .. " 2>/dev/null")))
If d has a space in it, this will not do the right thing. (Of course,
d=".;rm -rf /" is worse.)
This can be fixed up a little:
shell ("ls -aU '" .. d .. "' 2>/dev/null")
but now we can't use directories with single quotes in them. For
correctness we can then reject directories with single quotes. Well,
that's bad, but not as bad as letting the shell interpret parts of d as
shell special characters.
That's as far as I got in my code. You can do fancier quoting to allow
single-quotes as well.
It is rare that you want to use shell special characters outside certain
stereotyped ways, and common that you already have arguments divided up
into separate elements. So in my code I had:
execute{"mkdir", d}
which would quote all its arguments and then call os.execute. The way
I'd write that shell command is
shell{"ls", "-aU", d, stderr=false}
I don't think I dealt with pipelines in this code, but it seems like
something like this would make sense:
execute("find", d, "-type", "f", "-print", PIPE, "xargs", "wc"}
where PIPE is a unique table. In fact, we can use tables to flag
sequences of arguments we know are safe:
execute("find", d, {"-type f -xdev ! -empty"}, PIPE, "xargs", "wc"}
or with some other kind of sugar like argl"-type f etc etc etc". What
this marks is that "-type f" is of a different type than normal
strings---the author is really writing out a *list* of strings in a more
convenient surface syntax.
It would also be nice to set the environment. With a tool like
/usr/bin/env, that's possible too; possible syntax:
shell{env={TZ="GMT"}, "date"}
It's possible to write this all portably on POSIX systems, since we know
what the shell is like. Well, we think we do---ISTR some old security
problems from underescaping in bash. In some sense the right thing is
to do this in C, using fork and execve. That way we don't have to
depend on shell unquoting of things, and we have complete control over
the environment. In C, we can also support doing similar things on
Windows using a different implementation (albeit a subset).
We can steal design from other systems. For example, here's a recent
Python approach to this:
http://www.python.org/peps/pep-0324.html
with a discussion at
http://mail.python.org/pipermail/python-dev/2004-January/041606.html
I'll post the code I have once I find it.
Now, the usual rant: strings are the root of all evil, because they are
untyped. When building aggregate data structures out of them through
concatenation, your language can't help you get the well-formedness and
semantics right. This makes them dangerous. Because they are used as
the (often ill-specified) communication channel with other applications
there can be even more bugs lurking in their use. When possible, strings
should be moved to the edges of your program in such a way that you
don't have to worry about quoting or typing issues in your normal
application code. However, strings are quite convenient, since most
modern languages and their libraries have good support for them. So in
the quest for structural purity, don't make it annoying to do the right
thing or else people will go back to exec("ls "..d) :-)
On that note, specialized interpolation could do some good. Consider
the imaginary
local d="My Library"
my_execute($"ls $d")
where $"ls $d" is a constructor for {"ls $d", ["d"]=d}. my_execute
could split that on spaces into {"ls", "$d", d="My Library"}, then
perform substitution of variables to {"ls", "My Library"}. I think this
has to be syntax, because you can't get a capture of the d local, even
using the debug library, if d is in an outer scope.
The current Lua solution would be
my_execute(interpolate{"ls $d", d=d})
which isn't so bad except that it's annoying and feels like the old lame
Python lambda upvalues workaround.
I don't know whether I think adding more syntax is a good idea or not.
Jay