Re: Of Unicode in the next Lua version

lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

Subject: Re: Of Unicode in the next Lua version
From: Pierre-Yves Gérardy <pygy79@...>
Date: Sun, 16 Jun 2013 00:56:55 +0200

On Sat, Jun 15, 2013 at 10:08 PM, Jay Carlson <nop@nop.com> wrote:
> UTF-8 is constructed such that Unicode code points are ordered lexicographically under 8-bit strcmp. So you can replace that with
>
> function utf8.inrange(str single_codepoint, str lower_codepoint, str upper_codepoint)
>   return single_codepoint >= lower_codepoint and single_codepoint <= upper_codepoint;
> end

I hadn't realized this. I'm acreting knowledge on the go, I've yet to
rigorously explore Unicode... I find UTF-8 beautiful in lots of
regards. UTF-16 baffles me, though. Do you know why they reserved
codepoints, which are supposed to correspond to symbols, to the
implementation details of an encoding? I whish there was a UTF-16'
that followed the UTF-8 strategy.

> and you don't need to extract the codepoint from a longer string if you write "< upper_codepoint_plus_one"; this lets you test an arbitrary byte offset for range membership.

I don't understand what you mean here :-/

-- Pierre-Yves

Follow-Ups:
- Re: Of Unicode in the next Lua version, Peter Cawley
- Re: Of Unicode in the next Lua version, Jay Carlson

References:
- Of Unicode in the next Lua version, Pierre-Yves Gérardy
- Re: Of Unicode in the next Lua version, Roberto Ierusalimschy
- Re: Of Unicode in the next Lua version, Pierre-Yves Gérardy
- Re: Of Unicode in the next Lua version, Jay Carlson

Prev by Date: Re: Of Unicode in the next Lua version
Next by Date: Re: Of Unicode in the next Lua version
Previous by thread: Re: Of Unicode in the next Lua version
Next by thread: Re: Of Unicode in the next Lua version
Index(es):
- Date
- Thread