[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: Changes in the validation of UTF-8
- From: Andrew Gierth <andrew@...>
- Date: Sun, 17 Mar 2019 21:01:40 +0000
>>>>> "Dirk" == Dirk Laurie <dirk.laurie@gmail.com> writes:
Dirk> Lua in no way even comes close to validating against the current
Dirk> UTF-8 standard. We've been through this before. Marc Balmer in
Dirk> particular has been quite trenchant on this point.
Other than the fact that it fails to reject encoded surrogates, what
invalid sequence does the code in lua 5.3.5 accept?
Dirk> All that Lua does is to verify that a string satisfies the basic
Dirk> UTF-8 encoding: ASCII or a starting byte whose introductory
Dirk> string of 1's says how many bytes in total are being encoded,
Dirk> followed by the right number of 10... bytes.
That's ... not what the 5.3.5 utf8_decode does. Did you read it? Test
it?
--
Andrew.