lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


Miles Bader wrote:
[...]
It seems there needs to be a clear distinction between "raw char" (given
that lpeg is quite usable for binary data) and "unicode char".

The problem is that Unicode doesn't really have any such concept as a 'character', which means that traditional string handling methods basically don't work with it (even if you ignore UTF-8 encoding). A single displayable thing can actually be made up of several Unicode code points, and may even have several different (but technically equivalent) representations.

I'm afraid it's just a fundamentally hard problem, and I haven't seen any decent abstractions over it yet.

Making P(x) count utf8 chars would certainly be convenient for people
reading utf8 files, but... it doesn't seem the cleanest thing in
general....

*Nothing* about Unicode is clean...

--
David Given
dg@cowlark.com