lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


On Sun, Jun 16, 2013 at 12:05 AM, Jay Carlson <nop@nop.com> wrote:
> If you don't decide on ingress, IsValidUTF8() is still decided, but the definition will be a global property of the codebase.[1] Similarly, if you don't decide what to do with pseudo-UTF-8 surrogates ("CESU-8"), the program as a whole gets this knowledge smeared all over it.
> [...snip...]

Reading this, I realize how out of my depth I am with regards to Unicode...

> For most plumbing I can ignore them and treat Unicode as a stream of codepoints, since everybody working above that level[3] is already in a world of pain. I try not to make it worse.

That's basically what I planned to do with LuLPeg. Allow to define
leaf patterns as UTF-8 strings, ranges and sets of characters, plus a
special value to detect encoding errors.

I might also add a constructor that takes any indexable value. It
could receive two-stage tables for character classes, if someone (not
me!) were to implement them in Lua...

-- Pierre-Yves