[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: Of Unicode in the next Lua version
- From: David Demelier <demelier.david@...>
- Date: Wed, 19 Jun 2013 19:12:52 +0200
Le vendredi 14 juin 2013 00:29:29 Pierre-Yves Gérardy a écrit :
> On Thu, Jun 13, 2013 at 11:36 PM, Jay Carlson <nop@nop.com> wrote:
> > On Jun 12, 2013, at 9:53 AM, Pierre-Yves Gérardy wrote:
> > Don't forget getchar(S, 2) -> error("not defined at position 2"). I
> > really like Julia's idea of strings as partial functions.
> I'd prefer getchar(S, 2) --> false, 3.
>
> >> A similar function could return code points instead of strings.
> >
> > Would you use that much?
>
> Yes, before I broke Unicode support in LuLPeg, that's what I was
> using. It allows to check if a character is in a given range, and it
> is barely slower than returning a sub-string (doing the conversion in
> Lua). In LuaJIT, computing the code point with standard arithmetic
> (mod, division and floor) is faster than getting the sub-string. It
> should be even faster by using the bit library.
>
> > Miles Bader pointed out a lot of string iteration code is phrased in terms
> > of gmatch--or should be. And in that case, there are no string positions
> > at all.
> Well, in my case, it isn't, but an LPeg clone is probably not usual in
> terms of string processing.
>
> > The major problem for UTF-8 then would be convincing the pattern matcher
> > to consume an entire UTF-8 sequence for ".".
> In the 2012 Workshop presentation, Roberto talks about deprecating the
> old patterns, so unicode in gmatch will probably never see the light
> of day... I don't know if/how he plans to handle Unicode in LPeg.
>
Please do not, it will break scripts again. This is a very big breakage. The
current patterns are very great and I would like to keep them as they are very
simple.
> As posted in the other thread, I plan to tackle this in LuLPeg with
> P8(), R8() and S8(), that will live alongside their byte-matching
> cousins.
>
> -- Pierre-Yves