[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Of Unicode in the next Lua version
- From: Pierre-Yves Gérardy <pygy79@...>
- Date: Wed, 12 Jun 2013 15:53:07 +0200
I just read Roberto's slides from the 2012 Lua workshop, and I have a
suggestion for the UTF-8 library.
It is efficient, and often practical, to deal with byte indices, even
in Unicode strings. It is the approach taken by Julia, and I use it in
LuLPeg. The API is simple:
char, next_pos = getchar(subject, position)
S = "∂ƒ"
getchar(S, 1) --> '∂', 4
getchar(S, 4) --> 'ƒ', 6
getchar(S, 6) --> nil, nil
A similar function could return code points instead of strings.
What do you think about this?
-- Pierre-Yves
[0] http://www.lua.org/wshop12/Ierusalimschy.pdf