[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: Small UTF-8 encoder
- From: Jay Carlson <nop@...>
- Date: Tue, 19 Jun 2012 10:49:02 -0400
On Jun 19, 2012, at 10:14 AM, Xavier Wang wrote:
> local function utf8_sep(n, a, ...)
> notice it will produce all value in 32bit, but not yours, yours will
> fall on 2097152
UTF-8 now stops at U+10FFFF, so they both produce valid output inside their domain. I've talked about using the no-longer valid area above this to dynamically encode grapheme clusters in an internal "DUTF-8" but that's just a pipe dream for now.
Neither implementation screens out the surrogate area either.
Jay