[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: Could Lua itself become UTF8-aware?
- From: Marc Balmer <marc@...>
- Date: Sun, 30 Apr 2017 12:11:29 +0200
Am 30.04.17 um 05:05 schrieb Patrick Donnelly:
> On Sat, Apr 29, 2017 at 9:41 AM, Dirk Laurie <dirk.laurie@gmail.com> wrote:
>> 2017-04-29 15:21 GMT+02:00 Roberto Ierusalimschy <roberto@inf.puc-rio.br>:
>>>> At present all the entries from 0x80 to 0xFF in the constant array
>>>> luai_ctype in lctype.c are zero: no bit set.
>>>>
>>>> There are three unused bits. Couldn't two of them be used to mean
>>>> UTF8_FIRST and UTF8_CONT?
>>>>
>>>> This is only the first step, but if the idea is shot down here already,
>>>> the others need not be mentioned.
>>>
>>> This particular idea has very low cost, so I don't see why to shot it
>>> down before knowing the rest of the story. What does it mean for Lua
>>> to be "UTF-8 aware"?
>>>
>>> -- Roberto
>>
>> The next step would be a compiler option under which the lexer
>> accepts a UTF-8 first character followed by the correct number
>> of UTF-8 continuation characters as being alphabetic for the
>> purpose of being an identifier or part of one.
>
> I'm very against even inching towards this destination. Lua is a
> *language*. As soon as we start allowing identifiers outside of ASCII,
> we begin to cultivate "dialects". Only with full support would
> anyone's Lua be able to load scripts written with identifiers from
> another language. And, of course, programmers not fluent in that
> language would be at great disadvantage.
>
> Air traffic control for flight standardized on English so any pilot
> can communicate with any flight controller. In the same way, I think
> it makes a lot of sense for programmers to accept that English is the
> lingua franca for programming, including comments, documentation, and
> identifiers. There really is no upside to allowing non-English (ASCII)
> identifiers.
>
> Maybe that's self-serving as an American for which I apologize.
This comparison is flaky. Although ATCs _use_ english for their
communication with pilots, this does not mean that they speak other
languages as well. Using english is a convention and it is not enforced
by making sure a newborn only learns english, learns no other languages,
and 25 years later becomes an ATC.
Using english is not technically enforced, it is a convention.
So even when Lua were to allow emojis as identifiers, you are not forced
to use that. You can, by convention, restrict yourself to ASCII only.