On Sat, Apr 29, 2017 at 9:41 AM, Dirk Laurie <dirk.laurie@gmail.com> wrote:
2017-04-29 15:21 GMT+02:00 Roberto Ierusalimschy <roberto@inf.puc-rio.br>:
At present all the entries from 0x80 to 0xFF in the constant array
luai_ctype in lctype.c are zero: no bit set.
There are three unused bits. Couldn't two of them be used to mean
UTF8_FIRST and UTF8_CONT?
This is only the first step, but if the idea is shot down here already,
the others need not be mentioned.
This particular idea has very low cost, so I don't see why to shot it
down before knowing the rest of the story. What does it mean for Lua
to be "UTF-8 aware"?
-- Roberto
The next step would be a compiler option under which the lexer
accepts a UTF-8 first character followed by the correct number
of UTF-8 continuation characters as being alphabetic for the
purpose of being an identifier or part of one.
I'm very against even inching towards this destination. Lua is a
*language*. As soon as we start allowing identifiers outside of ASCII,
we begin to cultivate "dialects". Only with full support would
anyone's Lua be able to load scripts written with identifiers from
another language. And, of course, programmers not fluent in that
language would be at great disadvantage.
Air traffic control for flight standardized on English so any pilot
can communicate with any flight controller. In the same way, I think
it makes a lot of sense for programmers to accept that English is the
lingua franca for programming, including comments, documentation, and
identifiers. There really is no upside to allowing non-English (ASCII)
identifiers.
Maybe that's self-serving as an American for which I apologize.