[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: There's a bug in my LPeg code, but I can't find it
- From: Sean Conner <sean@...>
- Date: Fri, 17 Aug 2018 20:33:16 -0400
Usually, I'm the one to answer LPeg questions, but tonight I need some
help with LPeg, and I'm hoping someone might see something I'm missing. It
has to do with my URL parsing module [1]. The following code presents the
bug:
url = require "org.conman.parsers.url.url"
lpeg = require "lpeg"
x = url * lpeg.Cp()
a,b = x:match "/status" print(b) -- prints 8, okay
a,b = x:match "/status/" print(b) -- prints 9, okay
a,b = x:match "/status " print(b) -- prints 8, okay
a,b = x:match "/status/ " print(b) -- prints 8, WAT?
The code in url that matters [2]:
path_absolute <- {| {:root: %istrue :} '/' (segment_nz ('/' segment)* )? |}
segment_nz <- {~ pchar+ ~}
segment <- ! . / {~ pchar+ ~} -- NOTE
pchar <- unreserved / pct_encoded / sub_delims / ':' / '@'
pct_encoded <- %pct_encoded
sub_delims <- '!' / '$' / '&' / "'" / '(' / ')'
/ '*' / '+' / ',' / ';' / '='
unreserved <- %ALPHA / %DIGIT / '-' / '.' / '_' / '~'
The 'segment' rule *should* be
segment <- ! . / {~ pchar* ~}
But fixing that issue doesn't resolve my current issue. Why is the
trailing slash, when followed by a space, not parsed as part of the URL? I
can work around the bug (for some usecases; see below for a possibly related
issue) but it's annoying me that I can't seem to locate the issue.
Possibly related:
a,b = x:match "/status#a" print(b) -- prints 10 okay
a,b = x:match "/status/#a" print(b) -- prints 8 WAT?
-spc (Puzzled by this ... )
[1] Installable as
luarocks install org.conman.parsers.url.url
Also as part of
https://github.com/spc476/LPeg-Parsers
viewable at:
https://github.com/spc476/LPeg-Parsers/blob/9fe3db4c0a52264f9e0e78200cc0f7dda0008f04/url/url.lua
[2] The code is literally transcribed from RFC-3986.