[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: Will Lua kernel use Unicode in the future?
- From: Klaus Ripke <paul-lua@...>
- Date: Thu, 29 Dec 2005 15:06:13 +0100
On Thu, Dec 29, 2005 at 11:38:57AM +0000, Lisa Parratt wrote:
>
> On 29 Dec 2005, at 11:06, Klaus Ripke wrote:
> >http://lua-users.org/wiki/LuaUnicode
>
> A few observations, reading this page:
>
> Lack of "\U+1234" style unicode character escapes - it strikes me
> that the code to isolate such an escape, and then convert it to an 8
> bit string would only take a few lines of code. Is there a good
> theological reason why this isn't supported?
It would require the parser to settle for a given encoding
like UTF-8 or UCS2 or UTF16 or ...
OTOH a preprocessing step either at build time or
as a load hook could do this and much more.
Personally I prefer to have my editor produce UTF-8.
> Inability to use UTF-8 identifiers due to use of isalpha and isalnum
> - surely it would be better to use hardcoded functions for
> determining if the characters in an identifier are valid? Otherwise
> there will be potential locale issues anyway. Locales should apply to
> human languages, not computer languages!
d'accord (although this is not unicode related)
> Unicode string comparison and normalisation issues - I might be being
> forgetful, but I was under the impression C99 added Unicode compliant
> wide character comparison functions - perhaps these should be used if
> present?
You might not want to use wide chars at all
(there are pros and cons compared to using UTF-8 internally).
For UTF-8 good old strcoll/strxfrm (hence Lua) does the job,
with appropriate locale settings.
Anyway many consider the "locale" mechanism broken,
and a full implementation of the unicode collation algorithm
has to be quite expensive.
regards