[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: Of Unicode in the next Lua version
- From: Pierre-Yves Gérardy <pygy79@...>
- Date: Wed, 12 Jun 2013 17:12:31 +0200
On Wed, Jun 12, 2013 at 5:02 PM, Dirk Laurie <dirk.laurie@gmail.com> wrote:
> If `pos` comes before `char`, one can write an iterator on the model
> of `ipairs`:
>
> for pos,char in utf8(str) do ...
Almost... but you end up with the position of the next character... So
you need some trickery. Assuming a valid UTF-8 string:
Usage:
for finish, start, char in utf8_next_char, "˙†ƒ˙©√" do
print(cpt)
end
`start` and `finish` being the bounds of the character, and `cpt`
being the UTF-8 code point.
It produces:
˙
†
ƒ
˙
©
√
local
function utf8_next_char (subject, i)
i = i and i+1 or 1
if i > #subject then return end
local offset = utf8_offset(s_byte(subject,i))
return i + offset, i, s_sub(subject, i, i + offset)
end
it has the annoying property of passing the end position before the
start position, but it is stateless.
-- Pierre-Yves