Re: byteoffset() in lutf8lib.c from 5.3, work2

lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

Subject: Re: byteoffset() in lutf8lib.c from 5.3, work2
From: Coroutines <coroutines@...>
Date: Tue, 13 May 2014 15:14:16 -0700

On Tue, May 13, 2014 at 6:31 AM, Roberto Ierusalimschy
<roberto@inf.puc-rio.br> wrote:

> Utf8.offset does not decode anything. All functions in the library
> that decode sequences do protect against decoding invalid sequences.

The manual says we should only be feeding utf8.offset() valid UTF8 --
so on that premise alone what I'm talking about shouldn't merit any
changes.  I just thought it might be useful limit those loops so they
don't iterate beyond what would be considered a valid UTF8 byte
sequence -- the result can't be trusted to be correct because we're
passing offset() invalid UTF8, but it can be "less incorrect".  (heh)

I was thinking some code might depend on offset() returning an index
within 3 bytes of where it's called from someday.. people might expect
that because it operates on valid UTF8 it'll return valid offsets per
that assumption?

Er, I think I'm beating a dead horse -- I just thought I'd point it
out for more consideration :>

Follow-Ups:
- Re: byteoffset() in lutf8lib.c from 5.3, work2, Sean Conner

References:
- byteoffset() in lutf8lib.c from 5.3, work2, Coroutines
- Re: byteoffset() in lutf8lib.c from 5.3, work2, Roberto Ierusalimschy

Prev by Date: Re: Shared libraries
Next by Date: Re: byteoffset() in lutf8lib.c from 5.3, work2
Previous by thread: Re: byteoffset() in lutf8lib.c from 5.3, work2
Next by thread: Re: byteoffset() in lutf8lib.c from 5.3, work2
Index(es):
- Date
- Thread