[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Turing-incomplete Lua?
- From: Ben Crowell <luacrowell04@...>
- Date: Wed, 1 Dec 2004 14:07:30 -0600
David Given wrote:
>If your applications just want to deal with static configuration files, i.e.
>the configuration won't change each time it's run, then you could plausibly
>do everything in two stages: you write your configuration in a Lua script,
>which is then executed by a preprocessor that emits the data in a binary,
>easy-to-parse format which is used by your applications.
In the example of a configuration file, I think this might work in theory,
but is probably a bad idea in practice, because the binary output would be
hard to understand, and you'd have no idea whether it was doing the right thing.
In the example of translating a book written TeX, the approach doesn't work
at all, because you don't want to do more editing to the PDF file later on, you
want to do more editing to the source code.
Jeff Koftinoff wrote:
>You'd also want to prevent malicious scripts from eating all available
>memory.
>Incidentally, is this for a particular application? Off the top of my
>head, I can't think of anything that would use an untrusted config file.
An example would be sharing snippets of your emacs initialization file with
other people -- you see a lot of these floating around on the web, and some of
them are hundreds, or even thousands, of lines of Lisp code. It's not just
an issue of trust, it's also an issue of verifiability and reliability. For
instance, I don't want to make a change to my own sendmail config file that
would, without my intending it, cause sendmail to go into an infinite loop
under certain obscure conditions. I think your Quake example is also very
apropos.
Jeff Koftinoff wrote:
>
> But if you made the mini-lua grammar simple enough you could guarantee
> that there would be no possibility of infinite loops. A timeout is
> problematic in many ways! Please don't do things like that.
>
> If you just restrict the grammar so that all loops have a fixed repeat
> count (non-variable) and no recursive function definitions are allowed
> then you don't need a sandbox and you don't need problematic
> halt-sensing and you can be guaranteed that the config file will never
> halt your process. I believe a system like this would be very very useful.
If you want to have a Turing-incomplete language with loops, then forcing
loops to have a fixed repeat count, and disallowing recursion, would certainly
work. However, it would be nice to find ways of relaxing those
requirements. For instance, a program whose function is to filter stdin
and write to stdout might be allowed to loop for as long as there was input,
but might not be allowed to loop an unforseeable number of times while handling
*one* line. In other words, loop counts could be either hard-coded or input, but
not calculated at runtime. You could also allow things like this:
for i=1,(2+2*3) do print(i) end
Also, recursion could be allowed, but the depth of recursion would have
the same status as the loop count. Loops and recursions could also be allowed
to terminate earlier than their hard-coded counts.
Eero Pajarre wrote:
>Would it really be useful?
>
>I would still be possible to do loops which are not infinite,
>but which would take 100000000000000000000000 years to complete.
>
>In computer science sense these are of course completely different,
>but in practice do you really care?
Well, in the abstract computer science sense, Lua is not a Turing-complete
language, because it's implemented on a machine with a finite address space.
It's perfectly possible to write an automated program that will take any
Lua program that fits in the address space, and determine whether it halts
or not. (The checker program would have to run on a machine with a bigger
address space. And of course the checker program would probably be much too
slow in almost all cases to be of any practical use.)
But I think the theoretical distinction between Turing-complete
and Turing-incomplete languages does provide some useful conceptual
guidance.
I agree with Jeff that the idea of a time-out is probably not very useful
in general. It focuses on the way Turing's theory is traditionally developed
mathematically (in terms of the halting problem), but that's just one of the
features of Turing-complete languages that makes them undesirable for certain
purposes. The broader issue is automatic analysis of the code. For instance,
PostScript is a Turing-complete language, but PDF isn't. You can design a
PostScript printer so that it will time out of the code takes a long time
to run, but that doesn't fix the problems that arise from the ill-advised
decision to make it a Turing-complete language. For instance, there are
applications that are supposed to be able to extract a single page from
a multi-page PS file; however, these applications will fail on certain PS
files, because they can't figure out enough about what the code's behavior
really is. (For instance, people have written desktop calculator programs
in PostScript -- I wonder what the fifth page of such a program would be? :-)
It's the same story with not being able to spell-check TeX files without
messy, inaccurate heuristics. The problem is automated analysis, not
halting.