Re: Of Unicode in the next Lua version

lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

Subject: Re: Of Unicode in the next Lua version
From: Pierre-Yves Gérardy <pygy79@...>
Date: Wed, 12 Jun 2013 17:12:31 +0200

On Wed, Jun 12, 2013 at 5:02 PM, Dirk Laurie <dirk.laurie@gmail.com> wrote:
> If `pos` comes before `char`, one can write an iterator on the model
> of `ipairs`:
>
>     for pos,char in utf8(str) do ...


Almost... but you end up with the position of the next character... So
you need some trickery. Assuming a valid UTF-8 string:

Usage:
     for finish, start, char in utf8_next_char, "˙†ƒ˙©√" do
        print(cpt)
    end
`start` and `finish` being the bounds of the character, and `cpt`
being the UTF-8 code point.
It produces:
    ˙
    †
    ƒ
    ˙
    ©
    √
local
function utf8_next_char (subject, i)
    i = i and i+1 or 1
    if i > #subject then return end
    local offset = utf8_offset(s_byte(subject,i))
    return i + offset, i, s_sub(subject, i, i + offset)
end

it has the annoying property of passing the end position before the
start position, but it is stateless.

-- Pierre-Yves

References:
- Of Unicode in the next Lua version, Pierre-Yves Gérardy
- Re: Of Unicode in the next Lua version, Dirk Laurie

Prev by Date: Re: Of Unicode in the next Lua version
Next by Date: Re: [ANN] LuLPeg v0.1
Previous by thread: Re: Of Unicode in the next Lua version
Next by thread: Re: Of Unicode in the next Lua version
Index(es):
- Date
- Thread