[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: Of Unicode in the next Lua version
- From: Pierre-Yves Gérardy <pygy79@...>
- Date: Sun, 16 Jun 2013 00:56:55 +0200
On Sat, Jun 15, 2013 at 10:08 PM, Jay Carlson <nop@nop.com> wrote:
> UTF-8 is constructed such that Unicode code points are ordered lexicographically under 8-bit strcmp. So you can replace that with
>
> function utf8.inrange(str single_codepoint, str lower_codepoint, str upper_codepoint)
> return single_codepoint >= lower_codepoint and single_codepoint <= upper_codepoint;
> end
I hadn't realized this. I'm acreting knowledge on the go, I've yet to
rigorously explore Unicode... I find UTF-8 beautiful in lots of
regards. UTF-16 baffles me, though. Do you know why they reserved
codepoints, which are supposed to correspond to symbols, to the
implementation details of an encoding? I whish there was a UTF-16'
that followed the UTF-8 strategy.
> and you don't need to extract the codepoint from a longer string if you write "< upper_codepoint_plus_one"; this lets you test an arbitrary byte offset for range membership.
I don't understand what you mean here :-/
-- Pierre-Yves