Re: Of Unicode in the next Lua version

lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

Subject: Re: Of Unicode in the next Lua version
From: Jay Carlson <nop@...>
Date: Thu, 13 Jun 2013 17:36:06 -0400

On Jun 12, 2013, at 9:53 AM, Pierre-Yves Gérardy wrote:

> I just read Roberto's slides from the 2012 Lua workshop, and I have a
> suggestion for the UTF-8 library.
> 
> It is efficient, and often practical, to deal with byte indices, even
> in Unicode strings. It is the approach taken by Julia, and I use it in
> LuLPeg. The API is simple:
> 
>    char, next_pos = getchar(subject, position)
> 
>    S = "∂ƒ"
>    getchar(S, 1) --> '∂', 4

Don't forget getchar(S, 2) -> error("not defined at position 2").  I really like Julia's idea of strings as partial functions.

> A similar function could return code points instead of strings.

Would you use that much?

Miles Bader pointed out a lot of string iteration code is phrased in terms of gmatch--or should be. And in that case, there are no string positions at all. The major problem for UTF-8 then would be convincing the pattern matcher to consume an entire UTF-8 sequence for ".".

Jay

Follow-Ups:
- Re: Of Unicode in the next Lua version, Pierre-Yves Gérardy

References:
- Of Unicode in the next Lua version, Pierre-Yves Gérardy

Prev by Date: RE: Anyone using Decoda IDE?
Next by Date: Re: [ANN] LuLPeg v0.1
Previous by thread: Re: Of Unicode in the next Lua version
Next by thread: Re: Of Unicode in the next Lua version
Index(es):
- Date
- Thread