Re: Managing Unicode (UTF-8 and UTF-16) data in Lua

lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

Subject: Re: Managing Unicode (UTF-8 and UTF-16) data in Lua
From: Coda Highland <chighland@...>
Date: Sun, 7 Aug 2016 13:21:04 -0700

On Sun, Aug 7, 2016 at 7:59 AM, Egor Skriptunoff
<egor.skriptunoff@gmail.com> wrote:
>
>> > Operations on fixed width character strings (such as UTF-16) are
>> > processed faster.
>>
>> UTF-16 isn't fixed char width.
>
>
> Yes, you are absolutely correct.
> UTF-16 uses surrogate pairs to represent codepoints above 0x10000.
> But Windows does not support them.
> When you are writing a surrogate-pair-symbol to Windows console
> (I've tested this on Win7 with a simple program using WriteConsoleW),
> it gets displayed as two question marks,
> that is, Windows considers it as two separate symbols instead of just one.
>
> If Windows does not support surrogate pairs, why should we?
> That's why we can treat UTF-16 on Windows as fixed-char-width encoding.
>
> Of course, this means that 100% correct Unicode "print()" function is
> non-implementable for Windows console applications.
>

Windows DOES "support" surrogates -- it upgraded from UCS-2
(equivalent to UTF-16 constrained to the BMP) to UTF-16 a long time
ago (Win7, I think). But it supports them in the sense that it renders
them correctly and won't screw them up if they exist. The support is
roughly equivalent to Lua's UTF-8 support: if you know what you're
doing and you explicitly ask for it, then it can deal with it, but if
you just use the naive wide-string functions it'll treat them as
multiple characters.

/s/ Adam

Follow-Ups:
- Re: Managing Unicode (UTF-8 and UTF-16) data in Lua, Coda Highland
- Re: Managing Unicode (UTF-8 and UTF-16) data in Lua, Egor Skriptunoff
- Re: Managing Unicode (UTF-8 and UTF-16) data in Lua, Viacheslav Usov

References:
- Managing Unicode (UTF-8 and UTF-16) data in Lua, Paul Moore
- Re: Managing Unicode (UTF-8 and UTF-16) data in Lua, Egor Skriptunoff
- Re: Managing Unicode (UTF-8 and UTF-16) data in Lua, Paul Moore
- Re: Managing Unicode (UTF-8 and UTF-16) data in Lua, Egor Skriptunoff
- Re: Managing Unicode (UTF-8 and UTF-16) data in Lua, Scott Morgan
- Re: Managing Unicode (UTF-8 and UTF-16) data in Lua, Egor Skriptunoff

Prev by Date: Auto-compiling LDoc documentation at Git commit
Next by Date: Re: Managing Unicode (UTF-8 and UTF-16) data in Lua
Previous by thread: Re: Managing Unicode (UTF-8 and UTF-16) data in Lua
Next by thread: Re: Managing Unicode (UTF-8 and UTF-16) data in Lua
Index(es):
- Date
- Thread