|
On 29/06/2011 5.29, Tom N Harris wrote:
On 06/28/2011 04:24 PM, Lorenzo Donati wrote:Unicode escape sequences are platform independent. They are useful for the same reasons why ASCII codes are useful, at least for people working with Unicode.Technically, Lua doesn't even require ASCII,
I admit I cut the sentence short, but I didn't mean that Lua supports ASCII (the manual expressly states that string.byte returns non-portable codes), but that, in general, if a language supports a specific character set (ASCII was an example), it is useful to specify character codes in a program instead of characters. And if it is useful for a given pre-unicode charset, it is useful for Unicode too (for the same reasons).
> as the recent adventures
with lctype.c have shown. Unicode is platform specific because not all platforms use the same encoding (UTF-8 vs UTF-16). And when Unicode isn't being used at all this will just be dead-weight in the parser.
Well, I'm not an expert, but aside from the different encodings (UTF-8, 16, 32 and endianness variants), Unicode is standardized. So if you are going to write a file in UTF-8, then the byte sequence for, say, a smiley, will be the seme on any computer on Earth that claims support for UTF-8. There is no risk of "codepage hell". Of course there are lots of non- or partially conforming applications/systems, but that's another point.
How about supporting escape sequences greater than 255 when sizeof(char)>1 ?
I don't understand exactly what you mean. Do you mean writing, for example (assuming a new \GXXXX...multibyte esc sequence),\G10fa1b instead of \x10\xfa\x1b (here I assume translation to Lua 5.2 new esc sequences)?
The power of specific unicode esc sequences is that Lua will make the table lookup for you, so it will translate a code point to the specific byte sequence for, say, UTF-8 encoding.
-- Lorenzo