Re: tables as parsers

On Wed, Mar 27, 2019 at 9:06 PM Soni "They/Them" L. <fakedme@gmail.com> wrote:

On 2019-03-27 10:10 p.m., Coda Highland wrote:
> On Wed, Mar 27, 2019 at 6:47 PM Soni "They/Them" L. <fakedme@gmail.com
> <mailto:fakedme@gmail.com>> wrote:
>
> I'm rolling with this weird idea. is it possible to make a simple
> parser
> out of tables? I tried some stuff but ran into a few issues. I don't
> wanna use LPeg or anything more complicated than string.find(foo,
> bar,
> 1, true).
>
> e.g.:
>
> local ltk = {} -- lua_tokens
> ltk.string = {}
> ltk['"'] = "string"
> ltk["'"] = "string" -- how to handle end of string correctly?
> ltk.string['\\'] = "escape"
> ltk.string.escape = {}
> ltk.string.escape['z'] = {}
> ltk.string.escape['z'].next = ltk.string.escape['z']
> for i, v in ipairs({" ", "\t", "\n", etc}) do
> ltk.string.escape['z'][v] = "next"
> end
> ltk.string.escape['z'][''] = ltk.string -- not sure if there'd be a
> better way to do this
> -- etc
>
>
> Not a bad start for someone who's never done this formally before.
>
> There are a zillion different ways to define a parser. The Dragon Book
> recommendation is well-given.
>
> I think the insight you're missing is that the table really represents
> the structure of a state machine. Your top-level table shouldn't
> contain things like ["'"] = "string" directly; rather, you'd want
> something like ltk.base["'"]. Then you wouldn't have
> ltk.string.escape, you'd just have ltk.string and ltk.escape. The
> nesting doesn't belong in the table itself; instead, your parser will
> maintain a stack of states, and you push a state when you start into a
> parsing rule and you pop it off when you've finished that rule.

Yeah but then I wouldn't have self-referential tables. Besides I wasn't
planning on having a stack. FSMs (same class as regex) don't have a stack.

FSMs can't handle parentheses. String escapes are about the limit of what they can manage.

>
> The place where this is suddenly going to get complicated is if your
> grammar contains any prefix ambiguities (e.g. "fork" vs "forward"),
> because then you have to implement backtracking. For most simpler
> grammars it's best to just break it into two passes -- one that makes
> simple lexical tokens out of a string of characters, and one that
> evaluates the syntax out of a list of lexical tokens.

"fork" vs "forward" is unambiguous and doesn't require backtracking.

Derp. That's my mistake. You're right, that one's good.

/s/ Adam