[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: byteoffset() in lutf8lib.c from 5.3, work2
- From: Coroutines <coroutines@...>
- Date: Tue, 13 May 2014 15:14:16 -0700
On Tue, May 13, 2014 at 6:31 AM, Roberto Ierusalimschy
<roberto@inf.puc-rio.br> wrote:
> Utf8.offset does not decode anything. All functions in the library
> that decode sequences do protect against decoding invalid sequences.
The manual says we should only be feeding utf8.offset() valid UTF8 --
so on that premise alone what I'm talking about shouldn't merit any
changes. I just thought it might be useful limit those loops so they
don't iterate beyond what would be considered a valid UTF8 byte
sequence -- the result can't be trusted to be correct because we're
passing offset() invalid UTF8, but it can be "less incorrect". (heh)
I was thinking some code might depend on offset() returning an index
within 3 bytes of where it's called from someday.. people might expect
that because it operates on valid UTF8 it'll return valid offsets per
that assumption?
Er, I think I'm beating a dead horse -- I just thought I'd point it
out for more consideration :>