[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: LPeg indent/whitespace based grammar
- From: Matthias Dörfelt <lists@...>
- Date: Fri, 7 Jul 2017 13:09:16 -0700
Awesome, thanks that is super helpful!
Haven’t looked at the “re” portion of lpeg, yet. I was initially gonna stick with pure LPeg, but let’s see!
Is it pretty straight forward to express your re code in pure lua & LPeg?
Thanks!
> On Jul 7, 2017, at 07:38, Duncan Cross <duncan.cross@gmail.com> wrote:
>
> On Fri, Jul 7, 2017 at 5:23 AM, Matthias Dörfelt <lists@mokafolio.de> wrote:
>> While my implementation works, I feel like I am circumventing a lot of the
>> actual capturing mechanisms of LPeg in order to deal with back-referencing
>> to build the hierarchy. I’d like to get some feedback on weather there are
>> any better ways to approach this kind of grammar using LPeg. I am new to it
>> in general so I might be missing something obvious here.
>
> Here's how I would approach it.
>
> First, some helper patterns:
>
>
> --
> local initial_indent = lpeg.Cg(lpeg.P(''), 'indent')
>
> local match_same_indent = lpeg.Cmt(
> lpeg.Cb('indent'),
> function(s, p, indent)
> if s:sub(p, p + #indent - 1) ~= current_indent then
> return false
> end
> return p + #indent
> end)
>
> local capture_deeper_indent = lpeg.Cg(
> match_same_indent * lpeg.S' \t'^1,
> 'indent')
> --
>
>
> initial_indent captures an empty string and stores it as the named
> group "indent".
>
> match_same_indent checks that the immediate next set of characters is
> identical to the current contextual value of the named group "indent".
>
> capture_deeper_indent requires that the current indent matches at this
> position, and is immediately followed by at least one more whitespace
> character. If it succeeds, the full new indent is stored as a new
> named group capture for "indent".
>
> To create the actual hierarchy document grammar, I would turn to the
> "re" module, passing in the helper patterns via the "defs" table
> parameter:
>
>
> --
> local hierarchy = re.compile([[
>
> hierarchy <- %initial_indent {|
> {:children: {| (<first_root> <next_root>*)? |} :}
> |} !.
>
> first_root <- <entry>
> next_root <- %nl <entry>
>
> entry <- {|
> {:name: <name> :}
> {:children: {| (<first_child> <next_child>*)? |} :}
> |}
>
> first_child <- %nl %capture_deeper_indent <entry>
> next_child <- %nl %match_same_indent <entry>
>
> name <- (!%nl .)+
>
> ]], {
> initial_indent = initial_indent;
> match_same_indent = match_same_indent;
> capture_deeper_indent = capture_deeper_indent;
> })
> --
>
>
> To explain the elements of this grammar in English:
>
> <hierarchy>:
> - capture the empty string as an initial base value for "indent"
> - create a table with a "children" sub-table
> - populate "children" with either no values, or <first_root>
> followed by zero or more <next_root>s
> - make sure there is no more input
>
> <first_root>: same as <entry>
>
> <next_root>: newline character followed by <entry>
>
> <entry>: a table containing
> - a <name> captured as field "name"
> - a sub-table "children" containing either no values, or
> <first_child> followed by zero or more <next_child>ren
>
> <first_child>:
> - newline character
> - capture a new, deeper indent than the current contextual indent
> - <entry>
>
> <next_child>:
> - newline character
> - match the current contextual indent
> - <entry>
>
> <name>: one or more non-newline characters
>
>
> A couple of quirks to note:
>
> 1. This implementation does not enforce indentation in sets of four
> spaces, but instead any arbitrary set of spaces and tabs. Hopefully it
> should not be too difficult to modify capture_deeper_indent to be
> strict about this, if you want.
>
> 2. The "indent" value ends up retained as a field inside "children"
> tables in the final result. For example:
>
> --
> print(hierarchy:match('a\n b').children[1].children.indent)
> --
>
> ....will print four spaces. If this is an undesirable side-effect, a
> function capture could be used to remove this field once the
> "children" table is fully captured.
>
>
> -Duncan
>
>