Re: Should Lua be more strict about Unicode errors?

lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

Subject: Re: Should Lua be more strict about Unicode errors?
From: Ricardo Ramos Massaro <ricardo.massaro@...>
Date: Thu, 3 Sep 2015 04:21:56 -0300

On Wed, Sep 2, 2015 at 4:24 PM, Dirk Laurie <dirk.laurie@gmail.com> wrote:
> Actually, I have only just for the first time ever read all of the
> Wikipedia page. At the bottom, it says:
>
> WTF-8 (Wobbly Transformation Format − 8-bit) is UTF-8 where the
> encodings of the surrogate halves (U+D800 through U+DFFF) are allowed.
> This is necessary to store possibly-invalid UTF-16, such as Windows
> filenames. The term seems to have come from the Rust programming
> language.[31] Many systems that deal with UTF-8 work this way without
> considering it a different encoding, as it is simpler. The source code
> samples above work this way, for instance.

Note that Wikipedia is misleading when it says "Many systems that deal
with UTF-8 work this way without considering it a different encoding,
as it is simpler."

WTF-8 dictates that you take special care when concatenating strings:
if the first string ends with a leading surrogate half and the second
string starts with an trailing surrogate half, you have to merge the
two surrogate halves into a single code point encoded in valid UTF-8.

This is a minor point, but it's important to note that Lua can't claim
to support WTF-8 in its current state (nor am I suggesting it should).

-Ricardo

Follow-Ups:
- Re: Should Lua be more strict about Unicode errors?, Coda Highland

References:
- Re: Should Lua be more strict about Unicode errors?, Jay Carlson
- Re: Should Lua be more strict about Unicode errors?, Soni L.
- Re: Should Lua be more strict about Unicode errors?, Roberto Ierusalimschy
- Re: Should Lua be more strict about Unicode errors?, Coda Highland
- Re: Should Lua be more strict about Unicode errors?, Dirk Laurie

Prev by Date: Re: [Ann] The Howl Editor 0.3
Next by Date: Modifying a table during an ipairs loop
Previous by thread: Re: Should Lua be more strict about Unicode errors?
Next by thread: Re: Should Lua be more strict about Unicode errors?
Index(es):
- Date
- Thread