lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


> Getting lua's core to change its view of strings to being something
> other than a byte-sequence isn't going to happen, its not the lua way,

Sure.


> Getting a new library into the lua core is unlikely, but could happen.

A very basic support for UTF-8, in the lines suggested by Miles Bader,
seems a good start. Something more or less like this:

utf8.len(s, [l]) -> number of code points in s up to 'l'-th byte (or nil
if s is not properly formed)

utf8.byteoffset(s, l) -> offset (in bytes) where 'l'-th code point
starts

utf8.frontier(s, l) -> offset (in bytes) where code point containing
l-th byte starts (ends?)

utf8.codepoint(s, i, j) -> code points in s from *byte* offset i to j
(default i=1, j=i); i adjusts backward and j adjusts forward until a
proper frontier. (It might be useful another function to return a table
with those code points; {utf8.codepoint(s, 1, -1)} may be too heavy.)

utf8.char(cp1, cp2, ...) -> string formed by code points cp1, cp2, ...
(If cp1 is a table, string formed by the code points in it?)

-- Roberto