lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


Alex Queiroz wrote:
[...]
     This is an interesting point. I and thought I had everything
figured out for my VM's text handling...

There's some good reading here (which I hadn't found before):

http://www.unicode.org/reports/tr29

It turns out to be possible to programmatically split a Unicode string up into its component grapheme clusters (what I was incorrectly referring to as glyphs, and what most people think of as characters). So, it ought to be fairly simple to do a Lua addon where you can say:

for c in s:graphemes() do
  print(c)
end

...where c is a *string* containing a particular grapheme cluster (which might be quite long; the link has an example of a four-code point cluster). This would actually allow a string to be broken down into an array of grapheme clusters to give true random access, which I'd previously thought of as being impossible. It'd be expensive, though... possibly it'd be worth doing lazily.

--
┌─── dg@cowlark.com ───── http://www.cowlark.com ─────
│
│ "They laughed at Newton. They laughed at Einstein. Of course, they
│ also laughed at Bozo the Clown." --- Carl Sagan