[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: Should Lua be more strict about Unicode errors?
- From: Coda Highland <chighland@...>
- Date: Fri, 4 Sep 2015 14:38:54 -0700
On Fri, Sep 4, 2015 at 2:23 PM, Jay Carlson <nop@nop.com> wrote:
> On 2015-09-02, at 2:59 PM, Dirk Laurie <dirk.laurie@gmail.com> wrote:
>
>> I estimate that not more than 1% of people who
>> have read the Lua manual have also read RFC3629. Quite a
>> few more have read the Wikipedia page,
>
> I’ll still take the INTERNET STANDARD over some Wikipedia page as my appeal to authority.
>
> I suppose I could fix the Wikipedia page. This part needs editing, and/or to be moved to the “Derivatives” section:
>
> ===
>> Whether an actual application should do this is debatable, as it makes
>> it impossible to store invalid UTF-16 (that is, UTF-16 with unpaired
>> surrogate halves) in a UTF-8 string.
> ===
>
> It is impossible to represent invalid UTF-16-like sequences as a UTF-8 sequence. UTF-8 and UTF-16 map the same number of codepoints, so where would you put the extra codes in UTF-8?
>
> If you have requirements for UTF-8-like string handling which require non-standard behavior, please call the derived format something else. “UTF-8” really does mean something.
>
> Jay
Standards are only standards when they're actually used. Sometimes,
the de jure standard is ignored and informal de facto standards arise
by practical consensus.
Besides, the standard maxim in these cases is "be liberal in what you
accept; be conservative in what you send." Why should you throw an
error when reading data that diverges from the standard if the result
is still meaningful? Sure, don't GENERATE these UTF-8 codes, but don't
barf on them either.
/s/ Adam