byteoffset() in lutf8lib.c from 5.3, work2

lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

Subject: byteoffset() in lutf8lib.c from 5.3, work2
From: Coroutines <coroutines@...>
Date: Tue, 13 May 2014 04:52:17 -0700

In the preliminary manual it states utf8.offset() assumes it is called
on valid UTF8 strings.

In a few places the underlying bytesoffset() function has loops that
iterate infinitely forward or backward provided what it's iterating
over is a continuation byte.  It might be a good idea to limit these.
If it's called only on valid UTF8 it only has to iterate forward or
backward 2 times (from a starting continuation byte) at most.  With
the current code it will do just that -- but if called in invalid UTF8
it might return unpredictable results.

I'm not sure what the most correct behavior would be -- I'm not sure
this needs changing, I just thought I'd mention this grey area ~
pretty sure it falls within 'undefined behavior'.

We're supposed to use utf8.len() to validate the string, yes?

I lost the thread about 'work talk' so I started this one :s

Follow-Ups:
- Re: byteoffset() in lutf8lib.c from 5.3, work2, Roberto Ierusalimschy

Prev by Date: Re: [ANN] ZeroBrane Studio 0.60; now with simplified UI, run-time breakpoints, bookmark support and more
Next by Date: [ANN] Lua Workshop 2014 -- registration open
Previous by thread: Re: [ANN] ZeroBrane Studio 0.60; now with simplified UI, run-time breakpoints, bookmark support and more
Next by thread: Re: byteoffset() in lutf8lib.c from 5.3, work2
Index(es):
- Date
- Thread