[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: byteoffset() in lutf8lib.c from 5.3, work2
- From: Coroutines <coroutines@...>
- Date: Tue, 13 May 2014 04:52:17 -0700
In the preliminary manual it states utf8.offset() assumes it is called
on valid UTF8 strings.
In a few places the underlying bytesoffset() function has loops that
iterate infinitely forward or backward provided what it's iterating
over is a continuation byte. It might be a good idea to limit these.
If it's called only on valid UTF8 it only has to iterate forward or
backward 2 times (from a starting continuation byte) at most. With
the current code it will do just that -- but if called in invalid UTF8
it might return unpredictable results.
I'm not sure what the most correct behavior would be -- I'm not sure
this needs changing, I just thought I'd mention this grey area ~
pretty sure it falls within 'undefined behavior'.
We're supposed to use utf8.len() to validate the string, yes?
I lost the thread about 'work talk' so I started this one :s