Of Unicode in the next Lua version

lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

Subject: Of Unicode in the next Lua version
From: Pierre-Yves Gérardy <pygy79@...>
Date: Wed, 12 Jun 2013 15:53:07 +0200

I just read Roberto's slides from the 2012 Lua workshop, and I have a
suggestion for the UTF-8 library.

It is efficient, and often practical, to deal with byte indices, even
in Unicode strings. It is the approach taken by Julia, and I use it in
LuLPeg. The API is simple:

    char, next_pos = getchar(subject, position)

    S = "∂ƒ"
    getchar(S, 1) --> '∂', 4
    getchar(S, 4) --> 'ƒ', 6
    getchar(S, 6) --> nil, nil

A similar function could return code points instead of strings.

What do you think about this?

-- Pierre-Yves

[0] http://www.lua.org/wshop12/Ierusalimschy.pdf

Follow-Ups:
- Re: Of Unicode in the next Lua version, Dirk Laurie
- Re: Of Unicode in the next Lua version, Jay Carlson
- Re: Of Unicode in the next Lua version, Roberto Ierusalimschy

Prev by Date: Re: [ANN] LuLPeg v0.1
Next by Date: Re: [ANN] LuLPeg v0.1
Previous by thread: Re: Size of userdata
Next by thread: Re: Of Unicode in the next Lua version
Index(es):
- Date
- Thread