[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: Of Unicode in the next Lua version
- From: Pierre-Yves Gérardy <pygy79@...>
- Date: Sun, 16 Jun 2013 01:28:45 +0200
On Sun, Jun 16, 2013 at 12:05 AM, Jay Carlson <nop@nop.com> wrote:
> If you don't decide on ingress, IsValidUTF8() is still decided, but the definition will be a global property of the codebase.[1] Similarly, if you don't decide what to do with pseudo-UTF-8 surrogates ("CESU-8"), the program as a whole gets this knowledge smeared all over it.
> [...snip...]
Reading this, I realize how out of my depth I am with regards to Unicode...
> For most plumbing I can ignore them and treat Unicode as a stream of codepoints, since everybody working above that level[3] is already in a world of pain. I try not to make it worse.
That's basically what I planned to do with LuLPeg. Allow to define
leaf patterns as UTF-8 strings, ranges and sets of characters, plus a
special value to detect encoding errors.
I might also add a constructor that takes any indexable value. It
could receive two-stage tables for character classes, if someone (not
me!) were to implement them in Lua...
-- Pierre-Yves