[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: Of Unicode in the next Lua version
- From: Jay Carlson <nop@...>
- Date: Sun, 16 Jun 2013 15:43:04 -0400
On Jun 16, 2013, at 7:30 AM, David Heiko Kolf wrote:
> The BOM in UTF-8 is mainly annoying for plain ASCII applications where
> UTF-8 should be transparent in strings. But as far as I remember it is
> not invalid UTF-8 (though its only use is to show that text is indeed
> UTF-8). An Unicode-aware application can just ignore it.
Yeah, but it should be dropped to avoid ZWNBSPs just randomly littering the interior of concatenated texts, complicating things like search. It's harmless for applications which understand semantics above the codepoint level but I'm proceeding with the assumption Lua will not have those.
I think there may be some advantages to declaring U+FEFF to be not-valid even though it technically is; it has no business being in the interior of any recently generated text. I'll go see if my codebase vomits at the idea....
http://www.unicode.org/faq/utf_bom.html#bom6
Jay